Forest plot of the c-statistic for the prediction of risk for the primary endpoint at 36 months across the three subgroups in the Phase 1 analysis, i.e. the analysis excluding cardiac magnetic resonance imaging parameters. ICD patients, patients with left ventricular ejection fraction ≤ 35% who had received a cardioverter-defibrillator implantation for primary prevention of sudden cardiac death; non-ICD patients ≤35%: patients who did not carry a cardioverter-defibrillator and had a left ventricular ejection fraction ≤ 35%; and non-ICD patients >35%, patients who did not carry a cardioverter-defibrillator and had a left ventricular ejection fraction > 35%. In ICD patients, endpoint was first appropriate therapy, and in non-ICD patients ≤35% and non-ICD patients >35%, endpoint was sudden cardiac death. In two data sets with non-ICD patients, the primary endpoint included additionally life-threatening ventricular arrhythmias (ventricular fibrillation or ventricular tachycardia). Please note that the leave-one-data set-out cross-validation was applied meaning that each time one data set was left out, a model was built in all remaining data sets and the model was then applied in the data set that had been left out. This cycle was then repeated for every data set. The resulted estimates of predictive performance for the primary endpoint, one per data set, were then combined by random effects meta-analysis providing the overall estimate of the predictive performance of ejection fraction across all data sets as well as the associated prediction interval, which gives the expected performance in a new data set that is similar to the analysed ones. A wide prediction interval indicates limited generalizability to a new data set. To select the candidate predictors for the multivariable models, only those predictors were considered that were present in ≥75% of observations and recorded in the majority of data sets. For the multivariable flexible parametric survival models, backwards selection under Bayesian information criteria stopping rule was applied. The named data sets on the y-axis denote the data set left-out for model development and then used to validate the subsequent model to produce the corresponding performance estimates shown. For abbreviations of the individual data sets, please see the ‘Description of data sets’ in the Supplementary data online, Material