Abstract
It is unclear to what extent simulated versions of real data can be used to assess potential value of new biomarkers added to prognostic risk models. Using data on 4522 women and 3969 men who contributed information to the Framingham CVD risk prediction tool, we develop a simulation model that allows assessment of the added contribution of new biomarkers. The simulated model matches closely the one obtained using real data: discrimination area under the curve (AUC) on simulated vs actual data is 0.800 vs 0.799 in women and 0.778 vs 0.776 in men. Positive correlation with standard risk factors decreases the impact of new biomarkers (ΔAUC 0.002-0.024), but negative correlation leads to stronger effects (ΔAUC 0.026-0.101) than no correlation (ΔAUC 0.003-0.051). We suggest that researchers construct simulation models similar to the one proposed here before embarking on larger, expensive biomarker studies based on actual data.
Keywords: risk, correlation, performance, synthetic cohorts
INTRODUCTION
Identification of novel prognostic biomarkers remains an area of active research. The American Heart Association Expert Panel on Atherosclerotic Diseases and Emerging Risk Factors scientific statement outlines 6 phases of evaluation for new markers: 1 proof of concept, 2 prospective validation, 3 incremental value, 4 clinical utility, 5 clinical outcomes, and 6 cost effectiveness.1 After prospective evaluation demonstrates that the marker predicts the development of the outcome of interest but before we can assess its clinical utility, the new marker must be shown to “add significant incremental prognostic information to a model that includes the established risk markers.”1 This phase is often costly, as the new biomarker needs to be measured in a sufficiently large sample with a sufficiently long follow-up.
We use simulations based on multivariate normal distribution to guide researchers planning to engage in biomarker discovery and evaluation. Our objective is related to other work aimed at construction of “synthetic cohorts”—simulated versions of real data that cannot be accessed due to privacy or other concerns.2 Its application is intended as a pre-phase to phase #3 above but can also be used to guide earlier phases by suggesting the features of biomarkers most likely to yield promise for prognostic performance. The inputs are derived from public access databases used to develop an existing prognostic model. For example, the National Heart Lung and Blood Institute’s BIOLINCC repository can be used to recreate the standard risk factor profile among individuals with and without new onset cardiovascular disease (CVD) who contributed data to the Framingham general CVD risk model.3 We use this Framingham data to illustrate our method; however, in practical applications, researchers need to focus on the specific prognostic model they intend to improve with new biomarkers.
METHODS
Data for simulation
The 10-year general CVD risk functions are based on the derivation presented in.3 A cohort of 4522 Framingham Heart Study women and 3969 men, aged 30 to 74, free of broadly defined general CVD at baseline in the early 1970s or 1980s were followed for up to 12 years for the development of incident general CVD. The sex-specific 10-year risk models were based on “standard” risk factors and included age, treated and non-treated systolic blood pressure (SBP), total and HDL cholesterol, smoking status, and diabetes.
Simulation model
The simulation model relies on the fact that metrics of prognostic model performance depend primarily on the distribution of risk factors among those with and without “events” and the correlations among the risk factors themselves. Given access to individual-level data, we can estimate these distributions and correlations. Multivariate normal distribution serves as the basis for the simulations: we assume that continuous predictors are normally distributed. If this assumption is violated, the Box-Cox transformation4 can be used to achieve approximate normality. Binary predictors, including sex or smoking, are created by dichotomizing the simulated normal variables. Given the low rate of loss to follow-up in our sample (< 6%), we use binary outcome logistic regression, but the approach can be extended to time-to-event outcomes.
Following the practice in the field used to develop the general CVD functions,3 we simulate separate models for women and men. First, standard normal multivariate vectors of size equal to the number of predictors are simulated, separately for those who do and do not develop CVD. The correlational structure conditional on CVD is set to that observed in the real data.3 We use Pearson correlation among continuous variables (age, SBP, total and HDL cholesterol), tetrachoric correlation5 among those to be transformed into binary variables (antihypertensive treatment, diabetes, smoking) and biserial correlation6 for correlations between normal and binary predictors. Continuous predictors are obtained by multiplying by standard deviations and adding means from the real data onto the simulated normal variables. Binary predictors are obtained by dichotomizing the simulated standard normal variables at points selected to guarantee that the prevalence of binary feature (e.g., smoking) among events and non-events matches that observed in the real data.
Having created a simulated analog of the observed data enables us to add theoretical biomarkers with the desired effect size and correlation structure. This is accomplished by increasing the dimension of the standard normal multivariate vectors simulated in step one by the number of biomarkers we want to add. The only limitation is imposed by the correlation structure—the overall variance-covariance matrices have to be positive definite. Then the conditional (within events and non-events) means and standard deviations of the new biomarkers can be imposed in a manner similar to that in which the standard continuous risk factors were simulated. These means, standard deviations, and correlations can be purely theoretical or can be derived from previous phases of biomarker assessment (e.g., from a cross-sectional assessment in a case-control setting or case-cohort study).
Here, for exploratory purposes, we consider conditional correlations of the new biomarker with standard risk factors equal to zero (new predictor uncorrelated with standard risk factors within persons with and without CVD), 0.2 (correlation that can be realistically encountered in practical applications), and -0.2 (to investigate trends in the correlation structure). The means and standard deviations of the new predictor are selected to correspond to effect sizes of 0.2, 0.5, and 0.8, which Cohen7 labeled as weak, medium, and strong. We also use the microsimulation model to investigate how many new biomarkers are needed to achieve an improvement in model performance of similar magnitude to that obtained by adding 1 uncorrelated (within event groups) predictor with strong effect size of 0.8.
Model performance metrics
To quantify the incremental value of new predictors, we use 2 common global metrics: the increase in the area under the receiver operating characteristic curve (ΔAUC),8 and increase in the discrimination slope, also known as the integrated discrimination improvement (IDI)9 and its relative form.10 We present medians of 399 simulations. All analyses were performed using SAS version 9.2.
RESULTS
During follow-up, 7.8% of women and 15% of men experienced CVD. Table 1 presents model performance metrics for the actual and simulated data sets. Values of the AUC are nearly identical, and the discrimination slopes are also close for the 2 datasets. This observation suggests that our simulated set is a good approximation to the actual data.
Table 1.
Performance of Framingham cardiovascular risk prediction functions developed on actual and simulated data
| Women |
Men |
|||
|---|---|---|---|---|
| Actual | Simulated | Actual | Simulated | |
| AUC | 0.799 | 0.800 | 0.776 | 0.778 |
| Discrimination slope | 0.107 | 0.118 | 0.132 | 0.138 |
Table 2 presents the impact of adding biomarkers of different strengths and correlations to the existing functions in women and men. Several observations emerge. First, it appears that a biomarker of at least intermediate strength (effect size of 0.5) and conditionally uncorrelated with the risk factors already included in the model is necessary to appreciably increase the model performance metrics. In this case, the AUC goes up by about 0.02, and the relative IDI exceeds 20%. This effect is vastly reduced when there is a positive correlation between the new marker and standard risk factors, even if the magnitude of this correlation is not very large—when it equals 0.2 within event and non-event subgroups, the change in AUC goes down to 0.003 or less, and the relative IDI drops to 5%. Perhaps the most promising observation applies to the situation in which the correlation of the new marker with standard risk factors is negative. In this case, the impact of the new marker is substantially amplified—AUC increases by 0.06, and the relative IDI exceeds 60%. A medium strength biomarker with a modest negative correlation with standard risk factors appears to have a more profound impact on model performance than a strong biomarker that is conditionally uncorrelated.
Table 2.
Expected impact of adding new continuous predictor of varying strength to Framingham cardiovascular risk prediction models
| Correlation within event groups | Uncorrelated | 0.2 | −0.2 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Effect size | 0.2 | 0.5 | 0.8 | 0.2 | 0.5 | 0.8 | 0.2 | 0.5 | 0.8 | |
| Odds ratio | women | 1.22 | 1.65 | 2.23 | 0.89 | 1.27 | 1.82 | 1.82 | 2.62 | 3.77 |
| men | 1.22 | 1.66 | 2.24 | 0.88 | 1.28 | 1.84 | 1.86 | 2.69 | 3.90 | |
| ΔAUC | women | 0.003 | 0.019 | 0.044 | 0.002 | 0.002 | 0.020 | 0.026 | 0.056 | 0.088 |
| men | 0.004 | 0.022 | 0.051 | 0.002 | 0.003 | 0.024 | 0.030 | 0.065 | 0.101 | |
| Δslope (IDI) | women | 0.004 | 0.026 | 0.067 | 0.001 | 0.006 | 0.033 | 0.028 | 0.075 | 0.146 |
| men | 0.005 | 0.034 | 0.084 | 0.001 | 0.007 | 0.042 | 0.039 | 0.099 | 0.180 | |
| Relative IDI | women | 0.034 | 0.221 | 0.567 | 0.005 | 0.051 | 0.281 | 0.237 | 0.649 | 1.248 |
| men | 0.040 | 0.241 | 0.604 | 0.008 | 0.053 | 0.303 | 0.277 | 0.708 | 1.293 | |
In Figure 1, we plot of the number of biomarkers vs their effect size needed to achieve the same improvement as we would with 1 strong biomarker (effect size = 0.8). We note a strong inverse relationship between the number of markers and their correlation. If the required number exceeds 10, it might be difficult to find that many markers uncorrelated with each other and with the standard risk factors. And, as seen in Table 2, when positive correlation is present, the potential effect size decreases rapidly.
Figure 1.
Number of biomarkers needed to increase prognostic performance by amount equal to one strong biomarker.
DISCUSSION
Our main findings are 4-fold. First, we suggest that the impact of new biomarkers on risk prediction models can be reliably assessed using an appropriate simulation model that can serve as a screening tool for expensive prospective studies on actual data. We have shown that such a model can be constructed to closely match the performance observed in the real data. This finding is of particular importance in the present time, when multitudes of biomarkers are readily or potentially available, but research resources need to be used wisely. If followed, our approach can help direct resources towards the most promising markers and thereby obviate many futile studies. We suggest that the planning of studies of new biomarkers include simulation results (similar to those presented here) as a key preliminary step, in addition to statistical power calculations for detecting simple associations.
Second, the proposed approach enables analysis on a “synthetic” cohort that does not require access to patient-level data, and, instead, uses aggregate summaries to construct the cohort. This strategy helps address concerns about violating protected health information and makes the suggested approach suitable for applications in genetics and genomics as well as studies conducted within health systems. Differential privacy may be considered to further mitigate the privacy concerns involved in such a synthetic data generation process.11
Third, we observe that effect size and correlation play an important role in the impact of new biomarkers on risk prediction models. Given the same amount of correlation, markers with stronger effect sizes tend to perform better. However, among markers with the same strength and direction of effect size, those that are conditionally (within events and non-events) negatively correlated with the standard predictors offer the largest improvement in model performance. This goes against the more intuitive notion that uncorrelated-ness is the most desirable feature of new markers, and has been justified theoretically in.12 Still, conditional uncorrelated-ness is better than positive correlation.
Fourth, our simulations suggest that adding large numbers of weak markers is unlikely to meaningfully improve CVD risk prediction models. The more markers we consider, the more likely they are to be (positively) correlated, diminishing their impact. Combining correlated markers into marker scores can also work,13 as long as the direction of the correlations is the same.
Our results can shed a new light on some controversies surrounding the usefulness of new biomarkers. For example, several reports postulated that it might be worthwhile to add C-reactive protein to CVD risk functions,14–16 whereas others claimed the contrary.17,18 Our results show that even if the independent effect of a biomarker is moderately large, its weak positive correlation with other risk predictors would be sufficient to greatly diminish its added incremental prognostic value in a fully adjusted model.
Our study has several limitations. First, the performance of our simulation method was illustrated using models from the Framingham Heart Study. Whether it can work equally well for other risk prediction models needs to be verified on a case-by-case basis. Fortunately, such verification should not be difficult if the model of interest and its component predictors are readily available. Second, our simulated results assumed multivariate normality within event sub-groups, which may not hold in all applications. We submit that in our experience, the majority of continuous predictors can be transformed to be approximately normal, and the methods used are fairly robust to non-gross violations of normality. Third, we used simple correlational structures, wherein the correlations of the new marker and standard predictors are fixed. It is conceivable that more complex correlational structures could yield results not covered in our work (see19); however, our intention here was to be illustrative and not exhaustive.
In conclusion, we propose that researchers involved in biomarker research use simulation models (similar to the one proposed here) before embarking on larger, expensive initiatives based on actual data. Our results suggest that the search for new biomarkers should be mindful of the interplay among their number, effect size, and correlation structure.
FUNDING
This work was supported by the National Heart, Lung, and Blood Institute’s Framingham Heart Study (contracts N01-HC-25195 and HHSN268201500001I). Dr Vasan is supported in part by the Evans Scholar award and Jay and Louise Coffman Endowment, Department of Medicine, Boston University School of Medicine.
Conflict of interest statement. Dr Michael Pencina declares grants from Sanofi/Regeneron to Duke. Other authors have no competing interests to declare.
CONTRIBUTORS
All authors made substantial contributions to the conception and design of the work; Drs. K. and M. Pencina analyzed and interpreted data and drafted the work. Drs. D’Agostino and Vasan revised it critically for important intellectual content; all authors gave final approval of the version to be published. Dr M. Pencina agrees to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
References
- 1. Hlatky MA, Greenland P, Arnett DK, Ballantyne CM, Criqui MH, Elkind MSV.. Criteria for evaluation of novel cardiovascular risk: a scientific statement from the American Heart Association. Circulation 2009; 11917: 2408–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Kinney SK, Reiter JP, Reznek AP, Miranda J, Jarmin RS, Abowd JM.. Towards unrestricted public use business microdata: the synthetic Longitudinal Business Database. Int Stat Rev 2011; 793: 362–84. [Google Scholar]
- 3. D’Agostino RB, Vasan RS, Pencina MJ, et al. General cardiovascular risk profile for use in primary care. Circulation 2008; 1176: 743–53., [DOI] [PubMed] [Google Scholar]
- 4. Box GEP, Cox DR.. An analysis of transformations. J R Stat Soc Ser B 1964; 26: 211–52. [Google Scholar]
- 5. Hershberger SL. Tetrachoric correlation. In: Balakrishnan N, Colton T, Everitt B, Piegorsch W, Ruggeri F, Teugels JL, eds Wiley StatsRef: Statistics Reference Online. 2014. doi:10.1002/9781118445112.stat06172.
- 6. Kornbrot D. Point biserial correlation. In: Balakrishnan N, Colton T, Everitt B, Piegorsch W, Ruggeri F, Teugels JL, eds Wiley StatsRef: Statistics Reference Online. 2014. doi:10.1002/9781118445112.stat06227.
- 7. Cohen J. Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988. [Google Scholar]
- 8. Harrell FE, Lee KL, Mark DB.. Tutorial in biostatistics: multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996; 154: 361–87. [DOI] [PubMed] [Google Scholar]
- 9. Pencina MJ, D’Agostino RB, D’Agostino RB, Vasan RS.. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 2008; 272: 157–72. [DOI] [PubMed] [Google Scholar]
- 10. Pencina MJ, D’Agostino RB, Pencina KM, Janssens AC, Greenland P.. Interpreting incremental value of markers added to risk prediction models. Am J Epidemiol 2012; 1766: 473–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. In: Halevi S, Rabin T, eds Proceedings of the Third conference on Theory of Cryptography (TCC'06). Berlin: Springer; 2006: 265–284.
- 12. Demler OV, Pencina MJ, D’Agostino RB.. Impact of correlation on predictive ability of biomarkers. Stat Med 2013; 3224: 4196–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Ganz P, Heidecker B, Hveem K, et al. Development and validation of a protein-based risk score for cardiovascular outcomes among patients with stable coronary heart disease. JAMA 2016; 31523: 2532–41. [DOI] [PubMed] [Google Scholar]
- 14. Wilson PW, Pencina M, Jacques P, Selhub J, D’Agostino R Sr, O’Donnell CJ.. C-reactive protein and reclassification of cardiovascular risk in the Framingham Heart Study. Circ Cardiovasc Qual Outcomes 2008; 12: 92–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Ridker PM, Buring JE, Rifai N, Cook NR.. Development and validation of improved algorithms for the assessment of global cardiovascular risk in women: the Reynolds risk score. JAMA 2007; 2976: 611–9. [DOI] [PubMed] [Google Scholar]
- 16. Ridker PM, Paynter NP, Rifai N, Gaziano JM, Cook NR.. The Reynolds risk score for men. Circulation 2008; 11822: 2243–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. The Emerging Risk Factors Collaboration. C-reactive protein, fibrinogen, and cardiovascular disease prediction. N Engl J Med 2012; 36714: 1310–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Graham I, Atar D, Borch-Johnsen K, et al. European guidelines on cardiovascular disease prevention in clinical practice: executive summary. Eur Heart J 2007; 2819: 2375–414. [DOI] [PubMed] [Google Scholar]
- 19. Bansal A, Pepe MS.. When does combining markers improve classification performance and what are implications for practice? Stat Med 2013; 3211: 1877–92. [DOI] [PMC free article] [PubMed] [Google Scholar]

