# br Fig Structure of SRRT SEM br mean absolute error

Fig. 4. Structure of SRRT-SEM.

mean absolute error (MAE). In this work, LS is used as the basis for measuring the improvement of a split in training each regression tree.

3.2.1.1. Prior knowledge-based regression tree generation. It is well known that one of the fundamental and critical factors for a good ensemble is diversity which measures the difference between the LLY507 learners of the ensemble. If the base learners are very similar, combining them will not lead to performance improvement. So it is important to generate diverse base learners. In this work, a semi-random feature selection and bootstrap sampling are carried out to randomly select features and samples to create subsets for training each regression tree, thereby increasing the diversity of the regression trees. Note that traditional feature selection aims to find the best feature subset from all features for learning [33]. From the ensemble perspective, however, feature selection is to search for appropriate feature subsets for generating base learners of an ensemble [29]. The random subspace method [17] is a popular feature selection method for ensemble, which selects a small number of dimensions from a given feature space based on a pseudorandom procedure. However, random subspace method cannot guarantee that the base learners generated based on the selected features can perform well. To address this issue, a semi-random feature selection approach with the help of a priori knowledge is proposed on the basis of the

Table 1

Algorithm of regression tree construction and feature importance computation.

Construction process of regression tree

1. create tree node N based on D;

7. for each possible split point:

8. split the dataset into two parts;

9. calculate the error of the split: splitError;

10. if splitError is less than N.e then

16. if N.f cannot be split anymore then

23. Normalizing the importance of features;

Output: A regression tree;

Importance of features.

random subspace method for generating diverse and accurate regression trees, which is called semi-random regression tree generation method.

For a dataset with m features, 2m − 1 feature subsets, which are also called feature subspace, are available for generat-ing base learners. Different feature subsets can produce diverse base learners, but not all feature combinations can produce learners with good performance. Here we are mainly concerned with the performance in terms of accuracy. If the feature selection is entirely arbitrary, there will be a large risk of assembling some poor base learners, from which no good en-sembles can be built. Therefore, it is necessary to evaluate the selected features based on a priori knowledge. Although traditional feature selection methods cannot be used directly here, some a priori knowledge about the importance of the features can be acquired to guide the feature selection for ensemble construction. In the proposed semi-random feature se-lection method, a regression tree is employed to determine the importance of various features in tree generation. The total reduction of MSE resulting from adding a feature in the model can be used as a measure for the importance of the feature. Table 1 presents the pseudo code of each regression tree construction and feature importance evaluation.

A large number of regression trees need to be generated to form a base learner pool in the proposed method. For the generation of one tree, a predefined number of feature subsets are randomly generated at first. Then the importance value of each feature subset is computed by summing up the importance value of each feature in the subset. Finally, the feature sub-sets are ranked according to their importance and the one with highest importance is chosen. Thus, each tree corresponds to a refined feature subspace, and it is fully split based on the subspace using training subset after bootstrap sampling. Although feature selection often results in degradation in performance of base learners, the ensemble consisting of a large number of base learners can usually achieve a very small generalization error. The semi-random feature selection ensures that the feature with a big importance value has a higher probability of being selected.