Abstract
An overview is presented of the rationale, design, and analysis plan for the WMH‐CIDI clinical calibration studies. As no clinical gold standard assessment is available for the DSM‐IV disorders assessed in the WMH‐CIDI, we adopted the goal of calibration rather than validation; that is, we asked whether WMH‐CIDI diagnoses are ‘consistent’ with diagnoses based on a state‐of‐the‐art clinical research diagnostic interview (SCID; Structured Clinical Interview for DSM‐IV) rather than whether they are ‘correct’. Consistency is evaluated both at the aggregate level (consistency of WMH‐CIDI and SCID prevalence estimates) and at the individual level (consistency of WMH‐CIDI and SCID diagnostic classifications). Although conventional statistics (sensitivity, specificity, Cohen's κ) are used to describe diagnostic consistency, an argument is made for considering the area under the receiver operator curve (AUC) to be a more useful general‐purpose measure of consistency. In addition, more detailed analyses are used to evaluate consistency on a substantive level. These analyses begin by estimating prediction equations in a clinical calibration subsample, with WMH‐CIDI symptom‐level data used to predict SCID diagnoses, and using the coefficients from these equations to assign predicted probabilities of SCID diagnoses to each respondent in the remainder of the sample. Substantive analyses then investigate whether estimates of prevalence and associations when based on WMH‐CIDI diagnoses are consistent with those based on predicted SCID diagnoses. Multiple imputation is used to adjust estimated standard errors for the imprecision introduced by SCID diagnoses being imputed under a model rather than measured directly. A brief illustration of this approach is presented in comparing the precision of SCID and predicted SCID estimates of prevalence and correlates under varying sample designs. Copyright © 2004 Whurr Publishers Ltd.
Keywords: clinical calibration, concordance, epidemiologic research design, reliability, validity
Full Text
The Full Text of this article is available as a PDF (333.1 KB).
References
- Agresti A. An Introduction to Categorical Data Analysis. New York: Wiley, 1996. [Google Scholar]
- Angold A, Costello EJ, Farmer EM, Burns BJ, Erkanli A. Impaired but undiagnosed. J Am Acad Child Adolesc Psychiatry 1999; 38(2): 129–37. [DOI] [PubMed] [Google Scholar]
- Bromet EJ, Dunn LO, Connell MM, Dew MA, Schulberg HC. Long‐term reliability of diagnosing lifetime major depression in a community sample. Arch Gen Psychiatry 1986; 43(5): 435–40. [DOI] [PubMed] [Google Scholar]
- Byrt T, Bishop J, Carlin JB. Bias, prevalence and kappa. J Clin Epidemiol 1993; 46(5): 423–9. [DOI] [PubMed] [Google Scholar]
- Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 1960; 20: 37–46. [Google Scholar]
- Cook RJ. Kappa and its dependence on marginal rates In Armitage P, Colton T. eds. Encyclopedia of Biostatistics. New York: Wiley, 1998, 2166–8. [Google Scholar]
- Crocker L, Algina J. Introduction to Classical and Modern Test Theory. New York: Holt, Rinehart & Winston, 1986. [Google Scholar]
- Daskalakis C, Laird NM and Murphy JM. Regression analysis of multiple‐source longitudinal outcomes: a ‘Stirling County’ depression study. Am J Epidemiol 2002; 155(1): 88–94. [DOI] [PubMed] [Google Scholar]
- Eaton WW, Neufeld K, Chen LS, Cai G. A comparison of self‐report and clinical diagnostic interviews for depression: diagnostic interview schedule and schedules for clinical assessment in neuropsychiatry in the Baltimore epidemiologic catchment area follow‐up. Arch Gen Psychiatry 2000; 57(3): 217–22. [DOI] [PubMed] [Google Scholar]
- Faraone SV, Tsuang MT. Measuring diagnostic accuracy in the absence of a ‘gold standard.’ Am J Psychiatry 1994; 151(5): 650–7. [DOI] [PubMed] [Google Scholar]
- Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol 1990; 43(6): 543–9. [DOI] [PubMed] [Google Scholar]
- Fennig S, Craig T, Lavelle J, Kovasznay B, Bromet EJ. Best‐estimate versus structured interview‐based diagnosis in first‐admission psychosis. Compr Psychiatry 1994; 35(5): 341–8. [DOI] [PubMed] [Google Scholar]
- First MB, Spitzer RL, Gibbon M, Williams JBW. Structured Clinical Interview for DSM‐IV Axis I Disorders, Research Version, Non‐patient Edition (SCID‐I/NP). New York: Biometrics Research, New York State Psychiatric Institute, 2002. [Google Scholar]
- Garrett ES, Eaton WW, Zeger S. Methods for evaluating the performance of diagnostic tests in the absence of a gold standard: a latent class model approach. Stat Med 2002; 21 (9): 1289–307. [DOI] [PubMed] [Google Scholar]
- Gibbon M, McDonald‐Scott P, Endicott J. Mastering the art of research interviewing. A model training procedure for diagnostic evaluation. Arch Gen Psychiatry 1981; 38(11): 1259–62. [DOI] [PubMed] [Google Scholar]
- Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143(1): 29–36. [DOI] [PubMed] [Google Scholar]
- Helzer JE, Spitznagel EL, McEvoy L. The predictive validity of lay diagnostic interview schedule diagnoses in the general population: a comparison with physician examiners. Arch Gen Psychiatry 1987; 44(12): 1069–77. [DOI] [PubMed] [Google Scholar]
- Houck PR, Spiegel DA, Shear MK, Rucci P. Reliability of the self‐report version of the Panic Disorder Serverity Scale. Depress Anxiety 2002; 15(4): 183–5. [DOI] [PubMed] [Google Scholar]
- Janca A, Robins LN, Bucholz KK, Early TS, Shayka JJ. Comparison of the Composite International Diagnostic Interview and clinical DSM‐III‐R criteria checklist diagnoses. Acta Psychiatr Scand 1992; 85: 440–3. [DOI] [PubMed] [Google Scholar]
- Jensen PS, Watanabe HK, Richters JE. Who's up first? Testing for order effects in structured interviews using a counterbalanced experimental design. J Abnorm Child Psychol 1999; 27(6): 439–45. [DOI] [PubMed] [Google Scholar]
- Kendler KS, Neale MC, Kessler RC, Heath AC, Eaves LJ. A population‐based twin study of major depression in women. The impact of varying definitions of illness. Arch Gen Psychiatry 1992; 49(4): 257–66. [DOI] [PubMed] [Google Scholar]
- Kessler R. The World Health Organization International Consortium in Psychiatric Epidemiology (ICPE): Initial work and future directions – the NAPE lecture 1998. Acta Psychiatr Scand 1999; 99: 2–9. [DOI] [PubMed] [Google Scholar]
- Kessler RC, Berglund P, Demler O, Jin R, Koretz D, Merikangas KR, Rush AJ, Walters EE, Wang PS. The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCSR). JAMA 2003; 289(23): 3095–105. [DOI] [PubMed] [Google Scholar]
- Kessler RC, McGonagle KA, Zhao S, Nelson CB, Hughes M, Eshleman S, Wittchen HU, Kendler KS. Lifetime and 12‐month prevalence of DSM‐III‐R psychiatric disorders in the United States. Results from the National Comorbidity Survey. Arch Gen Psychiatry 1994; 51(1): 8–19. [DOI] [PubMed] [Google Scholar]
- Kessler RC, and Üstün TB. The World Health Organization World Mental Health 2000 Initiative. Hospital Management International 2000: 195–6. [Google Scholar]
- Kessler RC and Üstün TB. The World Mental Health (WMH) survey initiative version of the World Health Organization Composite International Diagnostic Interview (CIDI). International Journal of Methods in Psychiatric Research 2004; (this issue). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kessler RC, Wittchen H‐U, Abelson JM, McGonagle K, Schwarz N, Kendler KS, Knauper B, Zhao S. Methodological studies of the Composite International Diagnostic Interview (CIDI) in the US National Comorbidity Survey. International Journal of Methods in Psychiatric Research 1998; 7(1): 33–55. [Google Scholar]
- Kessler RC, Wittchen HU, Abelson JM and Zhao S. Methodological issues in assessing psychiatric disorder with self‐reports In Stone AA, Turrkan JS, Bachrach CA, Jobe JB, Kurtzman HS, Cain VS. eds. The Science of Self‐Report: Implications for Research and Practice. Mahwah NJ: Erlbaum, 2000, 229–25. [Google Scholar]
- Kish L, Frankel MR. Inferences from complex samples. Journal of the Royal Statistical Society 1974; 36 (Series B): 1–37. [Google Scholar]
- Kraemer HC, Morgan GA, Leech NL, Gliner JA, Vaske JJ, Harmon RJ. Measures of clinical significance. J Am Acad Child Adolesc Psychiatry 2003; 42(12): 1524–9. [DOI] [PubMed] [Google Scholar]
- Lucas CP, Fisher P, Piacentini J, Zhang H, Jensen PS, Shaffer D, Dulcan M, Schwab‐Stone M, Regier D, Canino G. Features of interviews questions associated with attenuation of symptom reports. J Abnorm Child Psychol 1999; 27(6): 429–37. [DOI] [PubMed] [Google Scholar]
- Mannuzza S, Fyer AJ, Martin LY, Gallops MS, Endicott J, Gorman J, Liebowitz MR, Klein DF. Reliability of anxiety assessment. I. Diagnostic agreement. Arch Gen Psychiatry 1989; 46(12): 1093–101. [DOI] [PubMed] [Google Scholar]
- Murphy JM, Monson RR, Laird NM, Sobol AM, Leighton AH. A comparison of diagnostic interviews for depression in the Stirling County study: challenges for psychiatric epidemiology. Arch Gen Psychiatry 2000; 57(3): 230–6. [DOI] [PubMed] [Google Scholar]
- Narrow WE, Rae DS, Robins LN, Regier DA. Revised prevalence estimates of mental disorders in the United States: using a clinical significance criterion to reconcile two surveys' estimates. Arch Gen Psychiatry 2002; 59(2): 115–23. [DOI] [PubMed] [Google Scholar]
- National Advisory Mental Health Council . Health care reform for Americans with severe mental illnesses: report of the National Advisory Mental Health Council. Am J Psychiatry 1993; 150: 1447–65. [DOI] [PubMed] [Google Scholar]
- Pincus HA, Davis WW, McQueen LE. ‘Subthreshold’ mental disorders. A review and synthesis of studies on minor depression and other ‘brand names’. Br J Psychiatry 1999; 174: 288–96. [DOI] [PubMed] [Google Scholar]
- Pincus HA, Zarin DA, First M. ‘Clinical significance’ and DSM‐IV. Arch Gen Psychiatry 1998; 55(12): 1145; author reply 1147–8. [DOI] [PubMed] [Google Scholar]
- Ramirez Basco M, Bostic JQ, Davies D, Rush AJ, Witte B, Hendrickse W, Barnett V. Methods to improve diagnostic accuracy in a community mental health setting. Am J Psychiatry 2000; 157(10): 1599–605. [DOI] [PubMed] [Google Scholar]
- Regier DA. Community diagnosis counts [Commentary]. Arch Gen Psychiatry 2000; 57: 223–4. [DOI] [PubMed] [Google Scholar]
- Regier DA, Kaelber CT, Rae DS, Farmer ME, Knauper B, Kessler RC, Norquist GS. Limitations of diagnostic criteria and assessment instruments for mental disorders. Implications for research and policy. Arch Gen Psychiatry 1998; 55(2): 109–15. [DOI] [PubMed] [Google Scholar]
- Rehm J, Üstün TB, Saxena S, Nelson CB, Chatterji S, Ivis F, Adlaf E. On the development and psychometric testing of the WHO screening instrument to assess disablement in the general population. International Journal of Methods in Psychiatric Research 1999; 8: 110–23. [Google Scholar]
- Robins LN. Epidemiology: reflections on testing the validity of psychiatric interviews. Arch Gen Psychiatry 1985; 42(9): 918–24. [DOI] [PubMed] [Google Scholar]
- Robins LN. Diagnostic grammar and assessment: translating criteria into questions. Psychol Med 1989; 19(1): 57–68. [DOI] [PubMed] [Google Scholar]
- Robins LN, Helzer JE, Croughan J, Ratcliff KS. National Institute of Mental Health Diagnostic Interview Schedule. Its history, characteristics, and validity. Arch Gen Psychiatry 1981; 38(4): 381–9. [DOI] [PubMed] [Google Scholar]
- Robins LN, Regier DA. eds. Psychiatric Disorders in America: The Epidemiologic Catchment Area Study. New York: The Free Press, 1991. [Google Scholar]
- Robins LN, Wing J, Wittchen HU, Helzer JE, Babor TF, Burke J, Farmer A, Jablenski A, Pickens R, Regier DA, Sartorius N, Towle LH. The Composite International Diagnostic Interview. An epidemiologic instrument suitable for use in conjunction with different diagnostic systems and in different cultures. Arch Gen Psychiatry 1988; 45(12): 1069–77. [DOI] [PubMed] [Google Scholar]
- Rohde PL, Seeley JR. Comparability of telephone and face‐to‐face interviews in assesing axis I and II disorders. Am J Psychiatry 1997: 1593–8. [DOI] [PubMed] [Google Scholar]
- Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons, 1987. [Google Scholar]
- Rush AJ, Gullion CM, Basco MR, Jarrett RB, Trivedi MH. The Inventory of Depressive Symptomatology (IDS): psychometric properties. Psychol Med 1996; 26(3): 477–86. [DOI] [PubMed] [Google Scholar]
- Shear MK, Brown TA, Barlow DH, Money R, Sholomskas DE, Woods SW, Gorman JM, Papp LA. Multicenter collaborative panic disorder severity scale. Am J Psychiatry 1997; 154(11): 1571–5. [DOI] [PubMed] [Google Scholar]
- Shrout PE, Newman SC. Design of two‐phase prevalence surveys of rare disorders. Biometrics 1989; 45: 549–55. [PubMed] [Google Scholar]
- Sobin C, Weissman MM, Goldstein RB, Adams P, Wickramaratne P, Warner V, Lish JD. Diagnostic interviewing for family studies: comparing telephone and face‐to‐face methods for the diagnosis of lifetime psychiatric disorders. Psychiatr Genet 1993; 3: 227–33. [Google Scholar]
- Spitznagel EL, Helzer JE. A proposed solution to the base rate problem in the kappa statistic. Arch Gen Psychiatry 1985; 42(7): 725–8. [DOI] [PubMed] [Google Scholar]
- Substance Abuse and Mental Health Services Administration . Final notice establishing definitions for (1) children with a serious emotional disturbance, and (2) adults with a serious mental illness. Fed Regist 1993; 58: 29422–5. [Google Scholar]
- Üstün TB, Chatterji S, Rehm J. Limitations of diagnostic paradigm: it doesn't explain ‘need’. Arch Gen Psychiatry 1998; 55 (12): 1145–6; author reply 1147–8. [DOI] [PubMed] [Google Scholar]
- Williams JBW, Gibbon M, First MB, Spitzer RL, Davies M, Borus J, Howes MJ, Kane J, Harrison GP, Jr. , Rounsaville B and Wittchen H‐U. The structured clinical interview for DSM‐III‐R (SCID) II: Multisite test‐retest reliability. Arch Gen Psychiatry 1992; 49: 630–6. [DOI] [PubMed] [Google Scholar]
- Wittchen H‐U. Reliability and validity studies of the WHO – Composite International Diagnostic Interview (CIDI): a critical review. J Psychiatr Res 1994; 28(1): 57–84. [DOI] [PubMed] [Google Scholar]
- Wolter KM. Introduction to Variance Estimation. New York: Springer‐Verlag, 1985. [Google Scholar]