# br SRRT SEM br GEFS br Random

SRRT-SEM

GEFS

Random

Gradient

Random Forest
AdaBoost

Regression Tree

Subspace

Boosting

Regressor

Regression Tree

Method

Regression Tree

F-ratio p-value F-ratio
p-value

F-ratio
p-value
F-ratio
p-value
F-ratio
p-value
F-ratio p-value F-ratio
p-value

GEFS
RMSE
–
–

Random Subspace
RMSE
–
–

Method
MAE
–
–

Gradient Boosting
RMSE
–
–

Regression Tree
MAE
–
–

Random Forest
RMSE
–
–

Regressor
MAE
–
–

AdaBoost
RMSE
–
–

Regression Tree
MAE
–
–

Regression Tree
RMSE
–
–

The 10-fold cross-validation results of Table 9 show that MK-571 sodium salt hydrate the proposed SRRT-SEM has the lowest average value of RMSE and MAE, and the highest average value of R2. There is a small discrepancy between the predictive survival time and the real survival time in the prediction model with proposed regression method and the average gap between prediction value and true value is 9.1194 months. The values of R2 suggest that the SRRT-SEM model can explain 38.93% of the survival month variability, better than GEFS (23.35%), random subspace method (16.39%), random forest (19.61%), gradient boosting regression tree (19.45%), AdaBoost regression tree (14.88%) and regression tree (11.98%).

To assess the e ciency of the proposed method in terms of the performance indicators, we carried out statistical signif-icance tests using the commercial software SPSS (Version 19.0), which compare the proposed method with GEFS, random subspace method, gradient boosting regression tree, random forest, AdaBoost regression tree and regression tree. The analy-sis of variance (ANOVA) is used to analyze the RMSE, MAE and R2 values obtained by the compared methods. The difference is considered statistically significant if the p-value is less than 0.05. Table 10 summarizes the resulting F-ratios and p-values.

Performance comparison of two-stage model and one-stage regression model.

Method

SRRT-SEM
GEFS
Random Subspace
Random
Gradient Boosting
AdaBoost
Regression Tree

Method
Forest
Regression Tree

Model

Model

Difference

From Table 10, we can see that the p-values of the proposed method versus the compared methods in terms of the three indicators are all smaller than 0.05, indicating that there is a statistically significant difference between the performance of the compared algorithms and that of the proposed method. To be specific, GEFS is significantly better than the regression tree in terms of RMSE. In terms of MAE, GEFS, the gradient boosting regression tree and the random forest are significantly better than AdaBoost regression tree. In terms of R2, GEFS is significantly better than the random subspace method, Ad-aBoost regression tree and regression tree, while the gradient boosting regression tree and random forest are significantly better than the regression tree. From Table 9, it is also worth noticing that the real ensemble size of the random subspace method, GEFS, random forest regressor, gradient boosting regression tree and AdaBoost regression tree are 100 while that of SRRT-SEM regressor has a mean value of 21 thanks to the strategy of the selective ensemble. Although the process of model selection brings SRRT-SEM a higher time complexity in training, SRRT-SEM takes less time in real prediction than the compared ensemble methods due to its low complexity.

4.4.4. Comparison of the two-stage model and one-stage regression model

Cancer survival time prediction can also be realized by one-stage regression model in which survival time is predicted directly whether or not a case is survival (survived more than 5 years). A comparison of two-stage model and one-stage regression model is conducted on the seven regression methods above in terms of RMSE, MAE and R2, and the comparison results are summarized in Table 11. We compute difference by indicator values of one-stage regression model minus that of two-stage model and every indicator value is obtained by averaging the results of 10 folds. All values of RMSE and MAE are positive and all values of R2 are negative, which indicates that the proposed two-stage model has smaller RMSE, MAE and higher R2. The performance increase includes more than 1.6 months of prediction error reduction and over 0.11% of model explanation ability improvement, which verifies that the two-stage model is a good model for cancer survival time prediction.