Rücker et al. (1) responded to our study of selective cutoff reporting in studies of diagnostic test accuracy (2), agreeing that cutoff selection is indeed a problem in primary studies and meta-analyses of diagnostic test accuracy. They described a model that they have developed—the multiple-cutoffs model—that allows for the inclusion of multiple cutoffs per study in a single analysis of aggregate data, modeling the distribution function rather than each point of the receiver operating characteristic curve separately (3). Using their multiple-cutoffs model, they reanalyzed our data in 2 ways: first, they included only cutoffs that were published in the original primary studies; second, they included all data from our individual participant data (IPD) meta-analysis, using all cutoffs for all studies (1). Rücker et al. found that, based on their model, sensitivities and specificities were similar for both sets of analyses and well approximated the results of our bivariate random-effects IPD meta-analyses, which included all cutoffs for all studies (1). Rücker et al. also found that, using their model, confidence intervals for sensitivity estimates tended to be narrower than our confidence intervals, and they attributed this finding to fact that that their model uses data for all available studies and cutoffs simultaneously (1). They concluded that, by using their model, they were able to approximate our IPD results, even when using diagnostic accuracy data only from published cutoffs (1).
It would be highly advantageous to be able to use a modelling approach to approximate the performance of diagnostic tests across thresholds when some primary studies do not report all relevant cutoff data. The degree to which this can be done accurately, however, depends on the validity of the assumptions of the model. In the studies included in our IPD meta-analysis, most studies (11 of 13) published accuracy results for the standard Patient Health Questionnaire–9 cutoff score of 10, which was also the strongest-performing cutoff for maximizing combined sensitivity and specificity (2). Accuracy data from primary studies were missing symmetrically on either side of this cutoff. Thus, although we identified what appeared to be biased reporting of results from some cutoffs and not others, the pattern of missing accuracy data may not have been typical because of its symmetry and because the cutoff threshold that is recognized as standard in the field also seems to be the best-performing cutoff. There are other examples from study-level meta-analyses of depression screening tools where many included studies do not report data from the cutoff threshold that is considered standard, presumably because that cutoff performed poorly. For instance, in the largest existing meta-analysis of the diagnostic accuracy of the Hospital Anxiety and Depression Scale for detecting major depressive disorder (4), the authors attempted to assess accuracy for the standard cutoff score of 8, but results from this cutoff were published for just over half of otherwise eligible studies.
The model developed by Rücker et al. is promising. However, it involves several unknowns, and it will be important to test how well it replicates the results of full IPD data when accuracy data are published only for a limited set of cutoffs in the original primary studies. In particular, it should perform well in meta-analyses that may not be anchored with robust data at the best-performing cutoff, and where included datasets have more skewed patterns of missing accuracy data.
Acknowledgments
B.L., A.B., and B.D.T. contributed to drafting the response to the letter to the editor.
B.L. was supported by a Canadian Institutes of Health Research (CIHR) Frederick Banting and Charles Best Canada Graduate Scholarship doctoral award. A.B. and B.D.T. were supported by Fonds de recherche du Québec – Santé (FRQS) researcher salary awards.
Conflict of interest: none declared.
References
- 1. Rücker G, Steinhauser S, Schumacher M. Re: “Selective cutoff reporting in studies of diagnostic test accuracy: a comparison of traditional and individual patient data meta-analyses of the Patient Health Questionnaire-9 depression screening tool.” Am J Epidemiol. 2017;186(7):894–895. [DOI] [PubMed] [Google Scholar]
- 2. Levis B, Benedetti A, Levis AW, et al. . Selective cutoff reporting in studies of diagnostic test accuracy: a comparison of conventional and individual-patient-data meta-analyses of the Patient Health Questionnaire-9 depression screening tool. Am J Epidemiol. 2017;185(10):954–964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Steinhauser S, Schumacher M, Rücker G. Modelling multiple thresholds in meta-analysis of diagnostic test accuracy studies. BMC Med Res Methodol. 2016;16(1):97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Brennan C, Worrall-Davies A, McMillan D, et al. . The Hospital Anxiety and Depression Scale: a diagnostic meta-analysis of case-finding ability. J Psychosom Res. 2010;69(4):371–378. [DOI] [PubMed] [Google Scholar]