Lokker et al presented an interesting model to predict citation counts for clinical articles.1 This topic is so important that the paper will probably attract many citations. We want to clarify some of the nomenclature of validation of prediction models, to avoid confusion in future reporting.
The authors randomly divided 1274 articles into a derivation data set of 757 articles for development of a prediction model and a validation dataset for testing of 504 articles, after exclusion of outliers with >150 citations. This procedure is an example of a split sample approach, but the authors refer to it as cross validation. Cross validation would mean that we develop a model in the first part of the data and test it in the second part, and then repeat the procedure with development in the second part and testing in the first.
The authors report that explained variation (R2) decreased from 0.60 at development to 0.56 at validation, and refer to this decrease as shrinkage. Shrinkage is not an appropriate term for this decrease; a better label is optimism.2 3 Optimism is the phenomenon that prediction models tend to perform more poorly in new data than in the data where the model was developed; it occurs especially when many predictors are considered in relatively small datasets.4
Ironically, a need for shrinkage is well illustrated in figure 2, where the residuals are generally positive for low predictions (which were often too low), and generally negative for high predictions (which were often too high).1 Shrinkage should be applied to the regression coefficients for more reliable predictions.2 4 5
How valid is this model to predict citations? Firstly, the authors did not shrink regression coefficients, which implies that high predictions will be too high and low predictions too low for articles fulfilling the inclusion criteria. Secondly, for a future article we cannot know beforehand whether the article is an outlier, i.e. having more than >150 citations. Exclusion of outliers at validation is artificial and should not have been done; it has inflated the R2 of the model. As always with prediction models, future validation is required and may reveal disappointing performance.
Competing interests: None declared.
References
- 1.Lokker C, McKibbon KA, McKinlay RJ, Wilczynski NL, Haynes RB. Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: retrospective cohort study. BMJ 2008;336:655-7. (22 March.) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361-87. [DOI] [PubMed] [Google Scholar]
- 3.Steyerberg EW, Harrell FE, Jr, Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol 2001;54:774-81. [DOI] [PubMed] [Google Scholar]
- 4.Steyerberg EW, Eijkemans MJ, Harrell FE, Jr, Habbema JD. Prognostic modeling with logistic regression analysis: in search of a sensible strategy in small data sets. Med Decis Making 2001;21:45-56. [DOI] [PubMed] [Google Scholar]
- 5.Copas JB. Regression, prediction and shrinkage. J R Stat Soc, Ser B 1983;45:311-54. [Google Scholar]