Skip to main content
. 2021 Jan 22;4(1):e2031730. doi: 10.1001/jamanetworkopen.2020.31730

Table 2. Model Performance for Predicting Survival in the Electronic Health Record Cohort.

Training cohort Variables, No. iAUC, mean (SD)a
RCT datab 101 0.722 (0.118)
EHR data with all variables available 84 0.762 (0.106)
EHR data with all independent variables (no covariates) 60 0.775 (0.098)
EHR data with top 25 variables selected by RFEc 25 0.792 (0.097)
EHR data with top 15 variablesc 15 0.785 (0.098)
EHR data with top 10 variablesc 10 0.779 (0.099)

Abbreviations: EHR, electronic health record; iAUC, integrated area under the curve; RCT, randomized clinical trial; RFE, recursive feature elimination.

a

Mean (SD) iAUC from the 10-fold validation on 90% of the EHR data set.

b

Among the 101 RCT variables, 17 were not available in the EHR data set because they are not routinely collected. Their missing values were imputed using a penalized Gaussian regression model based on nonmissing variables. Other partially missing variables were also imputed.

c

Top variables are those with the highest absolute correlation with the outcome.