Validation of Computerized Adaptive Testing in an Outpatient Non-academic Setting: the VOCATIONS Trial

Eric Daniel Achtyes; Scott Halstead; LeAnn Smart; Tara Moore; Ellen Frank; David J Kupfer; Robert Gibbons

doi:10.1176/appi.ps.201400390

. Author manuscript; available in PMC: 2016 Jun 16.

Published in final edited form as: Psychiatr Serv. 2015 Jun 1;66(10):1091–1096. doi: 10.1176/appi.ps.201400390

Validation of Computerized Adaptive Testing in an Outpatient Non-academic Setting: the VOCATIONS Trial

Eric Daniel Achtyes ¹, Scott Halstead ², LeAnn Smart ³, Tara Moore ⁴, Ellen Frank ⁵, David J Kupfer ⁶, Robert Gibbons ⁷

PMCID: PMC4910384 NIHMSID: NIHMS695020 PMID: 26030317

Abstract

Objective

Computerized adaptive tests (CAT) provide an alternative to fixed-length assessments for diagnostic screening and severity measurement of psychiatric disorders. We sought to cross-sectionally validate a suite of computerized adaptive tests for mental health (CAT-MH) in a community psychiatric sample.

Methods

145 adult psychiatric outpatients and controls were prospectively evaluated with CAT for depression, mania and anxiety symptoms, compared to gold-standard psychiatric assessments including: Structured Clinical Interview for DSM IV-TR (SCID), Hamilton Rating Scale for Depression (HAM-D₂₅), Patient Health Questionnaire (PHQ-9), Center for Epidemiologic Studies Depression Scale (CES-D), and Global Assessment of Functioning (GAF).

Results

Sensitivity and specificity for the computerized adaptive diagnostic test for depression (CAD-MDD) were .96 and .64, respectively (.96 and 1.00 for major depression versus controls). CAT for depression severity (CAT-DI) correlated well to standard depression scales HAM-D₂₅ (r=.79), PHQ-9 (r=.90), CES-D (r=.90) and had OR=27.88 for current SCID major depressive disorder diagnosis across its range. CAT for anxiety severity (CAT-ANX) correlated to HAM-D₂₅ (r=.73), PHQ-9 (r=.78), CES-D (r=.81), and had OR=11.52 for current SCID generalized anxiety disorder diagnosis across its range. CAT for mania severity (CAT-MANIA) did not correlate well to HAM-D₂₅ (r=.31), PHQ-9 (r=.37), CES-D (r=.39), but had an OR=11.56 for a current SCID bipolar diagnosis across its range. Participants found the CAT-MH suite of tests acceptable and easy to use, averaging 51.7 items and 9.4 minutes to complete the full battery.

Conclusions

Compared to current gold-standard diagnostic and assessment measures, CAT-MH provides an effective, rapidly-administered assessment of psychiatric symptoms.

INTRODUCTION

With expansion of Medicaid eligibility and passage of the Affordable Care Act, there is additional pressure on the mental healthcare system to efficiently and effectively provide mental health assessment and treatment for millions of additional people seeking care. As measurement-based care becomes the standard for assessment of illness severity and improvement with treatment, well-validated, affordable and quick measures are needed to help busy clinicians treat patients rapidly and effectively.

Computerized adaptive diagnosis (CAD) and testing (CAT) have the potential to provide rapid, systematic testing on a population level.^1–2 The paradigm shift between traditional fixed-length tests and adaptive tests is that traditional tests fix the items and allow the measurement precision to vary, whereas adaptive tests fix measurement precision and allow the items to vary. The net result is that it is possible to extract the relevant information contained in a bank of hundreds of symptom-questions using only a small number of optimal items for each person. Depending on the application, the degree of required precision can be selected a priori, so that national screening programs can use less precision than clinic screening, which in turn may require less precision than a randomized clinical trial.

Application of CAT differs from standard assessments of symptom severity in several important ways. First, traditional scales may be hampered by a ‘practice effect’ due to retaking the same measure repeatedly over time. Since CAT adapts to the current severity level of a patient, these practice effects are eliminated because patients receive different items each time the test is administered. Second, for repeated assessments, traditional tests make no use of the information contained in the preceding test administrations. By contrast, in CAT, the last CAT-based severity measure can be used to start the next CAT, selecting the next most informative item conditional on the previous sessions’ estimated severity level. Third, traditional measurement provides a score (typically the sum of the item scores) but no estimate of uncertainty in the score for a given patient. The standard approach of computing a total score also adds potential bias because items with different numbers of response categories (e.g. the Hamilton Rating Scale for Depression) are weighted differently when computing a total score (i.e. an item with 2 categories receives less weight than an item with 5 categories). Since CAT is based on an underlying statistical model of measurement (item response theory – IRT), the number of categories no longer differentially weight the importance of the item in computing the severity score, and each estimated score has a corresponding uncertainty estimate. Item response theory produces the estimate of uncertainty, and CAT mandates that all patients are tested until they achieve the desired level of uncertainty; hence all subjects are tested with the same level of precision. Traditional tests lack this desirable statistical property. Please see the online appendix for additional explanation of CAT/IRT principles.

It is also important to note that severity measurement and diagnosis are two very different operations. In severity measurement, we seek to maximize information surrounding the symptom severity of the patient. In diagnosis, we seek to maximize information at the threshold above which the probability of the diagnosis exceeds 50%. Gibbons et.al.³ have developed a computerized adaptive diagnostic screener for depression (CAD-MDD). They found that the CAD-MDD could ascertain a diagnosis of major depressive disorder with sensitivity of .95 and specificity of .87 using an average of 4 questions, taking less than one minute to administer (mean of 46 +/− 29 sec), making it an exceedingly rapid and effective screener.

If shown to be valid across a wide variety of patient populations, these tools could fill a key void allowing automated testing of millions of people with a quick, easily-administered online tool. Standard scales like the PHQ-9 have been validated in a wide range of treatment settings. The CAT for depression severity (CAT-DI), the CAD-MDD (depression diagnostic screener) and CAT-ANX (anxiety severity) have been validated previously in both academic and a non-psychiatric community hospital. In order to assess the validity and potential impact of these tests on general outpatient community psychiatric practice as well as to provide initial validation of the CAT-MANIA (mania severity), we sought in this study to validate the utility of the CAT-MH (mental health) suite of tests in a non-academic, community sample of adult psychiatric outpatients.

METHODS

The item bank and original calibration sample

The original studies developed a 1008 item question bank consisting of 452 depression items, 467 anxiety items, and 89 bipolar items. Separate CATs were developed for each of these three primary domains. The items were selected based on a review of over 100 existing depression or depression-related rating scales with most items modified to refer to the previous two-week time period and self-rated on a 5-point ordinal scale. These tools and methods have been described in detail elsewhere^1–11 and have been previously validated in an academic center (University of Pittsburgh psychiatric clinics) and a non-psychiatric community general medical hospital (DuBois Regional Medical Center).

Validation sample

A prospective cross-sectional validation study of the CAT-MH suite of tests was conducted at the outpatient clinics of Pine Rest Christian Mental Health Services located in Grand Rapids, Michigan between April 18, 2012 and March 29, 2013. Pine Rest is a large, not-for-profit, free-standing psychiatric system with a spectrum of comprehensive psychiatric services ranging from inpatient to partial hospitalization, including a network of outpatient clinics in the surrounding community. Pine Rest outpatient clinics serve a population including: 63.60% of patients with a commercial insurance plan, 12.48% self pay, 12.03% with Medicare, and 11.89% from community mental health contracts (e.g. uninsured) or Medicaid. This study was conducted in compliance with the ethical principles of the Declaration of Helsinki, the U.S. Food and Drug Administration guidelines and the International Conference on Harmonisation Good Clinical Practices Guidelines. The human participants review board at Mercy Health Saint Mary’s approved the study, and individuals signed a written informed consent prior to initiation of any study procedures.

Participants were a convenience sample of women and men, aged 18–70, who presented to Pine Rest Christian Mental Health Services clinics seeking care, as well as healthy controls. Participants were recruited using IRB-approved advertisements in clinic waiting rooms as well as on the Pine Rest website. Patients had to be willing and able to sign informed consent in order to participate. Exclusion criteria included patients with schizophrenia, schizoaffective disorder or other psychotic disorder; organic mood disorder due to a general medical condition or substance use disorder; drug or alcohol dependence in the prior 3 months; ill enough to require inpatient hospitalization due to suicide risk or psychosis; Alzheimer’s or Parkinson’s disease.

Upon signing informed consent, participants were administered the following assessments by trained raters blinded to the patients’ clinical diagnoses prior to evaluation: Structured Clinical Interview for DSM IV-TR (SCID),¹² the Hamilton Rating Scale for Depression (HAM-D₂₅),¹³ Patient Health Questionnaire (PHQ-9),¹⁴ Center for Epidemiologic Studies Depression Scale (CES-D),¹⁵ Global Assessment of Functioning (GAF)¹⁶ as well as a demographics questionnaire and a study participation evaluation. Participants also took the most recent version of the CAT-MH which contains the depression, anxiety, and mania/hypomania components of the entire 1008 item bank, including: the CAD-MDD for current depression diagnosis, CAT-DI for current depression severity, CAT-ANX for current anxiety severity, and CAT-MANIA for current manic/hypomanic symptom severity. CAT-MH depression, anxiety, and mania scores were correlated with SCID, HAM-D₂₅, CES-D, and PHQ-9 scores, and DSM IV-TR cases of depression, anxiety and bipolar disorders.

Statistical methods

Sample size computations were conducted to determine the ability to find significant differences in sensitivity and specificity between the original findings for the CAD-MDD and the results of this validation study. Assuming a Type I error rate of 5% and power of 80%, n=150 permits detection of approximately 10% differences in sensitivity (.95 vs .86) and specificity (.87 vs .75).

Data analysis was performed by the senior author (XXX) at the University of XXXXXXX who takes full responsibility for the accuracy of the analysis. The goal was to test the reproducibility of previous analyses of sensitivity, specificity, correlation with gold-standard symptom severity scales (HAM-D, CES-D, and PHQ-9) in this community sample. Logistic regression was used to relate severity scores to the presence or absence of DSM IV-TR diagnoses.

RESULTS

Participants

A total of 150 patients signed informed consent. Four did not meet inclusion criteria and 1 withdrew consent. One hundred forty-five patients completed all testing and were included in the analysis. See online eFigure 1. All participants received the CAT-MH suite of tests (CAD-MDD for current depression screening, and CAT for current depression, anxiety, and mania severity), a SCID for DSM IV-TR diagnostic interview, and HAM-D₂₅, CES-D, PHQ9, and GAF scales.

Patient demographics

Of the 145 adult patients in the sample 79% were female; 10% were Hispanic; 90% were white, 5% were black, 3% were Asian, and 3% indicated other; 58% were married, 24% were never married, 5% were living with a partner, and the remainder were divorced (10%), separated (2%) or widowed (<1%). In terms of education, 40% had a college degree or more, 42% had some college and 16% had graduated high school or had a GED. See Table 1.

Table 1.

Demographics of Adult Community Outpatient Psychiatric Sample (N=145)

Age Group (in years)	N	%
18–29	24	16
30–39	25	17
40–49	49	34
50–59	33	23
60–70	14	10
Gender	N	%
Male	31	21
Female	114	79
Race	N	%
Caucasian	130	90
African American	7	5
Asian	4	3
Other	4	3
Marital Status	N	%
Married	84	58
Never married	34	24
Living with partner	8	5
Divorced	15	10
Separated	3	2
Widowed	1	< 1
Educational Status	N	%
College degree or more	57	40
Some college	61	42
High school diploma/GED	24	16
Some high school	3	2

Open in a new tab

Diagnoses

In terms of current DSM IV-TR diagnoses, the sample consisted of 27 patients with major depressive disorder, 27 patients with generalized anxiety disorder, 13 patients with bipolar I disorder, 11 patients with bipolar II disorder, 15 patients with dysthymic disorder, 2 with minor depression (depression, not otherwise specified), 16 patients with panic disorder, 6 patients with agoraphobia, 13 patients with social phobia, 9 patients with specific phobia, 11 patients with obsessive compulsive disorder, 12 patients with post traumatic stress disorder, and 15 patients with anxiety, not otherwise specified. There were 19 healthy controls i.e., patients with no current or past history of DSM IV-TR diagnosis. There were considerable comorbidities in this patient population, explaining why the sum of diagnoses exceeds the sample size. See Table 2.

Table 2.

Diagnostic Prevalence Rates for Current, Partial and Full Remission of DSM IV-TR Diagnoses based on Structured Clinical Interview for DSM IV-TR (SCID) (N=145)

	Total Current and Lifetime SCID Diagnoses	Currently Meets Full Criteria (Symptomatic)	Currently in Partial Remission	Currently Asymptomatic (Full Remission)
Bipolar I Disorder	20	13	3	4
Bipolar II Disorder	20	11	7	2
Bipolar Disorder, Not Otherwise Specified	3	1	0	2
Major Depressive Disorder	62	27	15	20
Dysthymic Disorder	15	15	0	0
Depressive Disorder, Not Otherwise Specified	6	2	0	4
Generalized Anxiety Disorder	27	27	0	0
Panic Disorder	34	16	12	6
Agoraphobia	8	6	2	0
Social Phobia	17	13	4	0
Obsessive Compulsive Disorder	15	11	1	3
Specific Phobia	16	9	5	2
Post Traumatic Stress Disorder	33	12	17	4
Anxiety Disorder, Not Otherwise Specified	19	15	0	4

Open in a new tab

CAD-MDD: a diagnostic screen for major depression

Given the high degree of pathology and comorbidity in the sample, it was expected that the high sensitivity seen in other studies would be replicated, but with lower specificity. This was found in the overall sample, where sensitivity was .96 (.95 in the original CAD-MDD study) and specificity was .64 (.87 in the original CAD-MDD study which included a much greater number and proportion of controls). However, if the sample was restricted to patients meeting DSM IV-TR criteria for major depressive disorder in the past month and healthy controls, sensitivity remained at .96, but specificity increased to 1.00 (i.e. there were no false positives and only one false negative out of a total of 46 patients). These results are consistent with what would be expected in a primary care setting where the majority of patients would not meet criteria for a DSM IV-TR major depressive disorder.^17,18 Note that this was achieved using an average of 4.1 questions and taking 36.1 seconds to complete.

CAT-DI: depression severity measure

The dimensional measure of depressive severity demonstrated correlations with traditional scales such as the HAM-D₂₅ (r=.79), PHQ-9 (r=.90), CES-D (r=.90), and GAF (r=-.70). The CAT-DI correlated highly with the CAT-ANX (r=.82), but less so with the CAT-MANIA (r=.38). In terms of its relationship with current DSM IV-TR major depressive disorder diagnosis, the CAT-DI had an odds ratio (OR) = 6.97 (3.14, 15.51), p<.001. This scale took an average of 16.8 items and 3.4 minutes to complete. See Table 3. For every unit increase in CAT-DI score, the likelihood of a current DSM IV-TR major depressive disorder diagnosis increases 7-fold. Given that the range of scores on the CAT-DI is from −2 to 2, the actual span gives an OR of 27.88, a 28-fold increase in probability of major depressive disorder from the low to the high end of the CAT-DI scale.

Table 3.

Correlation of CAT-DI, CAT-ANX and CAT-MANIA to Each Other and to 4 Traditional Scales (HAM-D₂₅, PHQ-9, CES-D, GAF), Odds Ratios for Corresponding Diagnoses (Current Major Depressive Disorder, Generalized Anxiety Disorder or Bipolar) and Average Number of Questions and Time to Complete (N=145)¹

	Correlations to Traditional Scales and to Each Other							Odds Ratio (OR²) for Corresponding DSM Diagnosis				Testing Length/Time to Complete
	HAM-D₂₅	PHQ-9	CES-D	GAF	CAT-ANX	CAT-MANIA	CAT-DI	DSM Diagnosis	OR	95% CI	p	Average # of questions	Average # of minutes
CAT-DI	.79	.90	.90	.70	.82	.38	NA	Major Depressive Disorder	6.97	3.14–15.51	<.001	16.8	3.4
CAT-ANX	.73	.78	.81	.68	NA	.47	.82	Generalized Anxiety Disorder	2.88	1.72–4.83	<.001	12.9	2.0
CAT-MANIA	.31	.37	.39	−.29	.47	NA	.38	Bipolar	2.89	1.47–5.71	<.002	17.9	3.4

Open in a new tab

CAT-DI = computerized adaptive test, depression inventory; CAT-ANX = computerized adaptive test, anxiety; CAT-MANIA = computerized adaptive test, mania; HAM-D₂₅ = 25 item Hamilton Rating Scale for Depression; PHQ-9 = 9-item Patient Health Questionnaire; CES-D = Center for Epidemiologic Studies Depression Scale; GAF = Global Assessment of Functioning

The ORs are for a one unit increase in the corresponding CAT, reflecting 25% of the total metric

CAT-ANX: anxiety severity measure

The dimensional measure of anxiety severity demonstrated correlations with traditional scales like the HAM-D₂₅ (r=.73), PHQ-9 (r=.78), CES-D (r=.81), and GAF (r=−.68). These results indicate that depression and anxiety have considerable overlap, which is known to be true neurobiologically and is also observed clinically.¹⁹ The CAT-ANX correlated highly with the CAT-DI (r=.82), but less so with the CAT-MANIA (r=.47). In terms of its relationship with current DSM IV-TR generalized anxiety disorder diagnosis, the CAT-ANX had an OR = 2.88 (1.72, 4.83), p<.001. This scale took an average of 12.9 items and 2.0 minutes to complete. See Table 3. Given that the range of scores on the CAT-ANX is from −2 to 2, the actual span gives an OR of 11.52, a 12-fold increase in probability of generalized anxiety disorder from the low to the high end of the scale.

CAT-MANIA: mania severity measure

The dimensional measure of the hypomania/mania spectrum (CAT-MANIA) demonstrated correlations with traditional scales that were relatively low, as expected: HAM-D₂₅ (r=.31), PHQ-9 (r=.37), CES-D (r=.39), and GAF (r=−.29). These results indicate that depression and mania have limited overlap, at least at a single point in time, and is confirmed clinically, as depressive and manic symptoms often co-occur, but true mixed states as defined by DSM IV-TR are uncommon.^20,21 The CAT-MANIA correlated minimally with the CAT-DI (r=.38), and with the CAT-ANX (r=.47). In terms of its relationship with current DSM IV-TR bipolar diagnoses (bipolar I disorder, bipolar II disorder, bipolar, not otherwise specified), the CAT-MANIA had an OR = 2.89 (1.47, 5.71), p<.002. This scale took an average of 17.9 items and 3.4 minutes to complete. See Table 3. Given that the range of scores is from −2 to 2, the actual span gives an OR of 11.56, a 12-fold increase in probability of a bipolar disorder diagnosis from the low to the high end of the CAT-MANIA scale. This was the first time the CAT-MANIA had been validated in a clinical sample.

Patient impressions of usability of the CAT-MH

Participants took, on average, 51.7 items and 9.4 minutes to complete the entire CAT-MH. As summarized in Table 4, patients found the computerized adaptive tests overall easy and acceptable to use, felt comfortable answering personal questions about themselves, answered them honestly, preferred them compared to a pencil and paper test, and felt the test accurately reflected their mood. There was some concern that older patients would not find the computerized test as easy to take. This was not found to be the case, as correlations to age ranged from .22 to .35.

Table 4.

Participant Ratings of CAT-MH Usability (N=145)

Overall Rating	N	%
Excellent	44	30
Very Good	61	42
Good	36	25
Fair	4	3
Poor	0	0
Ease of Use	N	%
Very Easy	28	19
Easy	71	49
Neutral	42	29
Difficult	4	3
Very Difficult	0	0
Comfortable Answering Personal Questions	N	%
Very comfortable	108	75
Comfortable	32	22
Neutral	1	0.7
Uncomfortable	1	0.7
Very uncomfortable	0	0
Missing data	3	2
Answered Questions Honestly	N	%
Strongly agreed	133	92
Agreed	9	6
About 50/50	0	0
Disagree	0	0
Strongly disagree	0	0
Missing data	3	2
Computer vs. Paper	N	%
Computer	125	86
Paper	14	10
Equivocal	3	2
Missing data	3	2
Questions Accurately Reflected Mood	N	%
A great deal	129	89
Very much	12	8
Somewhat	4	3
Not very much	0	0
Not at all	0	0

Open in a new tab

DISCUSSION

This was the first prospective, cross-sectional study to validate the CAT-MH suite of tests, including the CAT-MANIA scale, in a community, outpatient psychiatric setting against gold-standard diagnostic and severity measures including the SCID for DSM IV-TR, HAM-D₂₅, the CES-D, PHQ-9, and GAF. Considering the high rate of DSM IV-TR disorders in this clinic sample, the high rate of comorbidity, and the small number of healthy controls, the CAT-MH performed well. Sensitivity remained at high levels and specificity decreased as expected. However, when the sample was restricted to patients with confirmed major depressive disorder and healthy controls for the CAD-MDD, sensitivity was unchanged, but specificity increased to 1.00 (i.e. no false positives). Out of 46 participants, there was only one misclassification. This bodes well for applications in primary care, where the majority of patients (90% or more) will not have a current DSM IV-TR major depressive disorder.

Despite sampling a patient cohort with multiple diagnoses, the three severity tests also performed well. Significant relationships to DSM IV-TR diagnoses of major depressive disorder, generalized anxiety disorder, and current bipolar disorders for each of the three respective dimensional measures (CAT-DI, CAT-ANX and CAT-BP) were found and the CAT-DI was strongly related to traditional depression severity measures. In general, patients appeared to have a positive overall impression of the test, were comfortable answering questions using a computer interface, found it easy to use, reported answering honestly, and indicated that the questions accurately reflected their mood (89%). Interestingly, 86% indicated that they preferred the computer interface to a traditional paper and pencil test.

The strengths of this study include the prospective nature of the evaluations, the broad inclusion criteria that improve generalizability, and the use of gold-standard diagnostic and symptom severity comparators.

Limitations of the study include its cross-sectional design that does not allow for test-retest and longitudinal assessment of improvement over time. We would expect, given the adaptive nature of the testing and the large question bank from which to draw unique questions that these assessments would be superior to standard assessments for longitudinal follow up and avoid the potential bias of the practice effect, but this needs to be demonstrated in future studies.

A further limitation of these assessments is the inability to detect lifetime history of psychiatric disorders. For example, for the accurate diagnosis of bipolar disorder longitudinal data are required, while the CAT-MANIA scale is only useful in assessing current manic symptoms. Per the SCID for DSM IV-TR, there were 13 participants with full criterion, current manic symptoms indicative of bipolar I disorder, 11 with hypomania indicative of bipolar II disorder, and 1 with current bipolar NOS. When lifetime episodes of mania or hypomania were taken into account by the SCID for DSM IV-TR assessment, there were a total of 20 patients with bipolar I disorder, 20 with bipolar II disorder, and 3 with bipolar NOS in this cohort. See Table 2. This is critical because if those patients (with mania currently in full remission (n=8) or partial remission (n=10)) were incorrectly diagnosed with unipolar depression, they may be inappropriately treated with antidepressants, rather than with mood stabilizers, which may be ineffective for the treatment of bipolar disorder.^22–25

Finally, the population, which consisted of 79% women, 90% Caucasians, 40% with college degree and 42% some college, is not representative of other, more diverse patient populations. Future testing in these populations is required.

CONCLUSIONS

The results of this prospective, cross-sectional validation study suggest that the CAT-MH suite of tests provide a rapidly-administered, accurate assessment of depression diagnosis and symptom severity across a broad range of mood and anxiety symptoms in an adult, community outpatient psychiatric population.

Supplementary Material

Data Supplement

NIHMS695020-supplement-Data_Supplement.docx^{(27.2KB, docx)}

Acknowledgments

We would like to acknowledge the contributions of Dr. H, MD who helped train our clinical staff in administering the diagnostic interviews as well as Mr. I, LMSW, CTS; Ms. J, LMSW; and Mr. K, LMSW, who interviewed participants and helped gather study data.

GRANT SUPPORT

This project was supported by grants from the Pine Rest Foundation – CAT-DI/SCID Assessment Tool (Dr. B and Dr. A), and the National Institute of Mental Health – MH66302 (Dr. G), and in-kind support of computer software from Michigan State University (Dr. A).

Footnotes

DISCLOSURES

Dr. A has received research grant support from Pfizer, Janssen, Otsuka, AssurEx, Pine Rest Foundation, NIMH, NIDA, ARRA, NIAAA, CMMS, Dartmouth College, the General Hospital Corporation, North Shore Long Island Jewish Health System and served as a consultant for Publicis Healthcare Communications Group. He also serves on an advisory board for Roche.

Dr. B has received research grant support from the Pine Rest Foundation.

Ms. C has no potential conflicts of interest to disclose related to this study.

Ms. D has no potential conflicts of interest to disclose related to this study.

Dr. E has the following disclosures: she received royalties from the American Psychological Association and Guilford Press; member of the Advisory Board of Servier International; Editorial Consultant for the American Psychiatric Press; and has received honoraria from Lundbeck; and she and her spouse, Dr. F have financial interests in Adaptive Testing Technologies (www.adaptivetestingtechnologies.com), through which the CAT-MH tests will be made available. Dr. F is also a consultant to the American Psychiatric Association (as Chair of the DSM-5 Task Force); holds joint ownership of copyright for the Pittsburgh Sleep Quality Index (PSQI); and he is a stockholder in AliphCom.

Dr. F has the following disclosures: Consultant to the American Psychiatric Association (as Chair of the DSM-5 Task Force); joint ownership of copyright for the Pittsburgh Sleep Quality Index (PSQI); received honorarium for manuscript submission to Medicographia (Servier); member of the Valdoxan Advisory Board of Servier International; a stockholder in AliphCom; and he and his spouse, Dr. E are stockholders in Psychiatric Assessments, Inc. Dr. E also has the following disclosures: received royalties from the American Psychological Association and Guilford Press; member of the Valdoxan Advisory Board of Servier International; Editorial Consultant for the American Psychiatric Press; and has received honoraria from Lundbeck.

Dr. G has been an expert witness for the US Department of Justice, Wyeth, Merck, and Pfizer. He has financial interests in Adaptive Testing Technologies (www.adaptivetestingtechnologies.com), through which the CAT-MH tests will be made available.

PREVIOUS PRESENTATION

Parts of these data were presented May 7, 2014 at the American Psychiatric Association annual meeting in New York, NY, the American Society of Clinical Psychopharmacology annual meeting in Hollywood, FL June 17, 2014 and at the Institute on Psychiatric Services meeting in San Francisco, CA on November 2, 2014.

Contributor Information

Eric Daniel Achtyes, Pine Rest Christian Mental Health Services, 300 68th Street SE, Grand Rapids, Michigan 49548Michigan State University College of Human Medicine – Psychiatry and Behavioral Medicine, 15 Michigan Street NE, Grand Rapids, Michigan 49503.

Scott Halstead, Pine Rest Christian Mental Health Services – Psychology, Grand Rapids, Michigan.

LeAnn Smart, Pine Rest Christian Mental Health Services, Grand Rapids, Michigan.

Tara Moore, U of Pittsburgh – Psychiatry, Pittsburgh, Pennsylvania.

Ellen Frank, University of Pittsburgh – Department of Psychiatry, Western Psychiatric Institute & Clinic 3811 O’Hara Street, Pittsburgh, Pennsylvania 15213.

David J. Kupfer, University of Pittsburgh School of Medicine – Psychiatry, 3811 O’Hara Street Room 210, Pittsburgh, Pennsylvania 15213

Robert Gibbons, University of Chicago – Center for Health Statistics, 5841 S. Maryland Avenue MC 2007 office W260, Chicago, Illinois 60637.

References

1.Gibbons RD, Weiss DJ, Kupfer DJ, et al. Using computerized adaptive testing to reduce the burden of psychiatric assessment. Psychiatric Services. 2008;59:361–368. doi: 10.1176/appi.ps.59.4.361. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Gibbons RD, Weiss DJ, Pilkonis PA, et al. Development of the CAT-ANX: a computerized adaptive test for anxiety. American Journal of Psychiatry. 2013;171:187–194. doi: 10.1176/appi.ajp.2013.13020178. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Gibbons RD, Hooker G, Finkelman MD, et al. The computerized adaptive diagnostic test for major depressive disorder (CAD-MDD): a screening tool for depression. Journal of Clinical Psychiatry. 2013;74:669–674. doi: 10.4088/JCP.12m08338. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Gibbons RD, Weiss DJ, Pilkonis PA, et al. Development of a computerized adaptive test for depression. Archives of General Psychiatry. 2012;69:1104–1112. doi: 10.1001/archgenpsychiatry.2012.14. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Gibbons RD, Hedeker DR. Full-information item bifactor analysis. Psychometrika. 1992;57:423–436. [Google Scholar]
6.Breiman L, Friedman JH, Olshen RA, et al. Classification and regression trees. Monterey, CA: Wadsworth & Brooks; 1984. [Google Scholar]
7.Quinlan JR. C4.5: Programs for machine learning. Morgan Kaufmann; San Mateo, CA: 1993. [Google Scholar]
8.Breiman L. Bagging predictors. Machine Learning. 1996;24:123–140. [Google Scholar]
9.Freund Y, Schapire RE. Experiments with a new boosting algorithm. ICML. 1996;96:148–156. [Google Scholar]
10.Breiman L. Random forests. Machine Learning. 2001;45:5–32. [Google Scholar]
11.Friedman J. Greedy function approximation: a gradient boosting machine. Annals of Statistics. 2001;29:1180. [Google Scholar]
12.First MB, Spitzer RL, Gibbon M, et al. Structured Clinical Interview for the DSM-IV-TR Axis I Disorders, Non-patient Edition (SCID-I/NP, 1/2007 revision) New York Biometrics Research, New York State Psychiatric Institute; New York, NY: 2007. [Google Scholar]
13.Hamilton M. A rating scale for depression. Journal of Neurology, Neurosurgery & Psychiatry. 1960;23:56–62. doi: 10.1136/jnnp.23.1.56. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine. 2001;16:606–613. doi: 10.1046/j.1525-1497.2001.016009606.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Radloff LS. The CES-D scale: a self-report depression scale for research in the general population. Applied Psychological Measurement. 1977;1:385–401. [Google Scholar]
16.Endicott J, Spitzer RL, Fleiss JL, et al. The global assessment scale. A procedure for measuring overall severity of psychiatric disturbance. Archives of General Psychiatry. 1976;33:766–771. doi: 10.1001/archpsyc.1976.01770060086012. [DOI] [PubMed] [Google Scholar]
17.Wittchen HU, Jacobi F, Rehm J, et al. The size and burden of mental disorders and other disorders of the brain in Europe 2010. European Neuropsychopharmacology. 2011;21:655–679. doi: 10.1016/j.euroneuro.2011.07.018. [DOI] [PubMed] [Google Scholar]
18.Whooley M. Diagnosis and treatment of depression in adults with comorbid medical conditions. The Journal of the American Medical Association. 2012;307:1848–1857. doi: 10.1001/jama.2012.3466. [DOI] [PubMed] [Google Scholar]
19.Gorman JM. Comorbid depression and anxiety spectrum disorders. Depression and Anxiety. 1997;4:160–168. doi: 10.1002/(SICI)1520-6394(1996)4:4<160::AID-DA2>3.0.CO;2-J. [DOI] [PubMed] [Google Scholar]
20.Cassidy F, Yatham LN, Berk M, et al. Pure and mixed manic subtypes: a review of diagnostic classification and validation. Bipolar Disorders. 2008;10:131–143. doi: 10.1111/j.1399-5618.2007.00558.x. [DOI] [PubMed] [Google Scholar]
21.Vieta E, Morralla C. Prevalence of mixed mania using 3 definitions. Journal of Affective Disorders. 2010;125:61–73. doi: 10.1016/j.jad.2009.12.019. [DOI] [PubMed] [Google Scholar]
22.Baldessarini RJ, Leahy L, Arcona S, et al. Patterns of psychotropic drug prescriptions for U.S. patients with diagnoses of bipolar disorders. Psychiatric Services. 2007;58:85–91. doi: 10.1176/ps.2007.58.1.85. [DOI] [PubMed] [Google Scholar]
23.Post RM, Altshuler LL, Leverich GS, et al. Mood switch in bipolar depression: comparison of adjunctive venlafaxine, bupropion and sertraline. British Journal of Psychiatry. 2006;189:124–131. doi: 10.1192/bjp.bp.105.013045. [DOI] [PubMed] [Google Scholar]
24.Sachs GS, Nierenberg AA, Calabrese JR, et al. Effectiveness of adjunctive antidepressant treatment for bipolar depression. New England Journal of Medicine. 2007;356:1711–1722. doi: 10.1056/NEJMoa064135. [DOI] [PubMed] [Google Scholar]
25.Ghaemi SN, Ostacher MM, El-Mallakh RS, et al. Antidepressant discontinuation in bipolar depression: A Systematic Treatment Enhancement Program for Bipolar Disorder (STEP-BD) randomized clinical trial of long-term effectiveness and safety. Journal of Clinical Psychiatry. 2010;71:372–380. doi: 10.4088/JCP.08m04909gre. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Supplement

NIHMS695020-supplement-Data_Supplement.docx^{(27.2KB, docx)}

[R1] 1.Gibbons RD, Weiss DJ, Kupfer DJ, et al. Using computerized adaptive testing to reduce the burden of psychiatric assessment. Psychiatric Services. 2008;59:361–368. doi: 10.1176/appi.ps.59.4.361. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Gibbons RD, Weiss DJ, Pilkonis PA, et al. Development of the CAT-ANX: a computerized adaptive test for anxiety. American Journal of Psychiatry. 2013;171:187–194. doi: 10.1176/appi.ajp.2013.13020178. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Gibbons RD, Hooker G, Finkelman MD, et al. The computerized adaptive diagnostic test for major depressive disorder (CAD-MDD): a screening tool for depression. Journal of Clinical Psychiatry. 2013;74:669–674. doi: 10.4088/JCP.12m08338. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Gibbons RD, Weiss DJ, Pilkonis PA, et al. Development of a computerized adaptive test for depression. Archives of General Psychiatry. 2012;69:1104–1112. doi: 10.1001/archgenpsychiatry.2012.14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Gibbons RD, Hedeker DR. Full-information item bifactor analysis. Psychometrika. 1992;57:423–436. [Google Scholar]

[R6] 6.Breiman L, Friedman JH, Olshen RA, et al. Classification and regression trees. Monterey, CA: Wadsworth & Brooks; 1984. [Google Scholar]

[R7] 7.Quinlan JR. C4.5: Programs for machine learning. Morgan Kaufmann; San Mateo, CA: 1993. [Google Scholar]

[R8] 8.Breiman L. Bagging predictors. Machine Learning. 1996;24:123–140. [Google Scholar]

[R9] 9.Freund Y, Schapire RE. Experiments with a new boosting algorithm. ICML. 1996;96:148–156. [Google Scholar]

[R10] 10.Breiman L. Random forests. Machine Learning. 2001;45:5–32. [Google Scholar]

[R11] 11.Friedman J. Greedy function approximation: a gradient boosting machine. Annals of Statistics. 2001;29:1180. [Google Scholar]

[R12] 12.First MB, Spitzer RL, Gibbon M, et al. Structured Clinical Interview for the DSM-IV-TR Axis I Disorders, Non-patient Edition (SCID-I/NP, 1/2007 revision) New York Biometrics Research, New York State Psychiatric Institute; New York, NY: 2007. [Google Scholar]

[R13] 13.Hamilton M. A rating scale for depression. Journal of Neurology, Neurosurgery & Psychiatry. 1960;23:56–62. doi: 10.1136/jnnp.23.1.56. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine. 2001;16:606–613. doi: 10.1046/j.1525-1497.2001.016009606.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Radloff LS. The CES-D scale: a self-report depression scale for research in the general population. Applied Psychological Measurement. 1977;1:385–401. [Google Scholar]

[R16] 16.Endicott J, Spitzer RL, Fleiss JL, et al. The global assessment scale. A procedure for measuring overall severity of psychiatric disturbance. Archives of General Psychiatry. 1976;33:766–771. doi: 10.1001/archpsyc.1976.01770060086012. [DOI] [PubMed] [Google Scholar]

[R17] 17.Wittchen HU, Jacobi F, Rehm J, et al. The size and burden of mental disorders and other disorders of the brain in Europe 2010. European Neuropsychopharmacology. 2011;21:655–679. doi: 10.1016/j.euroneuro.2011.07.018. [DOI] [PubMed] [Google Scholar]

[R18] 18.Whooley M. Diagnosis and treatment of depression in adults with comorbid medical conditions. The Journal of the American Medical Association. 2012;307:1848–1857. doi: 10.1001/jama.2012.3466. [DOI] [PubMed] [Google Scholar]

[R19] 19.Gorman JM. Comorbid depression and anxiety spectrum disorders. Depression and Anxiety. 1997;4:160–168. doi: 10.1002/(SICI)1520-6394(1996)4:4<160::AID-DA2>3.0.CO;2-J. [DOI] [PubMed] [Google Scholar]

[R20] 20.Cassidy F, Yatham LN, Berk M, et al. Pure and mixed manic subtypes: a review of diagnostic classification and validation. Bipolar Disorders. 2008;10:131–143. doi: 10.1111/j.1399-5618.2007.00558.x. [DOI] [PubMed] [Google Scholar]

[R21] 21.Vieta E, Morralla C. Prevalence of mixed mania using 3 definitions. Journal of Affective Disorders. 2010;125:61–73. doi: 10.1016/j.jad.2009.12.019. [DOI] [PubMed] [Google Scholar]

[R22] 22.Baldessarini RJ, Leahy L, Arcona S, et al. Patterns of psychotropic drug prescriptions for U.S. patients with diagnoses of bipolar disorders. Psychiatric Services. 2007;58:85–91. doi: 10.1176/ps.2007.58.1.85. [DOI] [PubMed] [Google Scholar]

[R23] 23.Post RM, Altshuler LL, Leverich GS, et al. Mood switch in bipolar depression: comparison of adjunctive venlafaxine, bupropion and sertraline. British Journal of Psychiatry. 2006;189:124–131. doi: 10.1192/bjp.bp.105.013045. [DOI] [PubMed] [Google Scholar]

[R24] 24.Sachs GS, Nierenberg AA, Calabrese JR, et al. Effectiveness of adjunctive antidepressant treatment for bipolar depression. New England Journal of Medicine. 2007;356:1711–1722. doi: 10.1056/NEJMoa064135. [DOI] [PubMed] [Google Scholar]

[R25] 25.Ghaemi SN, Ostacher MM, El-Mallakh RS, et al. Antidepressant discontinuation in bipolar depression: A Systematic Treatment Enhancement Program for Bipolar Disorder (STEP-BD) randomized clinical trial of long-term effectiveness and safety. Journal of Clinical Psychiatry. 2010;71:372–380. doi: 10.4088/JCP.08m04909gre. [DOI] [PubMed] [Google Scholar]

PERMALINK

Validation of Computerized Adaptive Testing in an Outpatient Non-academic Setting: the VOCATIONS Trial

Eric Daniel Achtyes

Scott Halstead

LeAnn Smart

Tara Moore

Ellen Frank

David J Kupfer

Robert Gibbons

Abstract

Objective

Methods

Results

Conclusions

INTRODUCTION

METHODS

The item bank and original calibration sample

Validation sample

Statistical methods

RESULTS

Participants

Patient demographics

Table 1.

Diagnoses

Table 2.

CAD-MDD: a diagnostic screen for major depression

CAT-DI: depression severity measure

Table 3.

CAT-ANX: anxiety severity measure

CAT-MANIA: mania severity measure

Patient impressions of usability of the CAT-MH

Table 4.

DISCUSSION

CONCLUSIONS

Supplementary Material

Acknowledgments

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases