Abstract
The present study examined the utility of the anhedonic depression scale from the Mood and Anxiety Symptoms Questionnaire (MASQ-AD) as a way to screen for depressive disorders. Using receiver-operator characteristic analysis, the sensitivity and specificity of the full 22-item MASQ-AD scale, as well as the 8 and 14-item subscales, were examined in relation to both current and lifetime DSM-IV depressive disorder diagnoses in two nonpatient samples. As a means of comparison, the sensitivity and specificity of a measure of a relevant personality dimension, neuroticism, was also examined. Results from both samples support the clinical utility of the MASQ-AD scale as a means of screening for depressive disorders. Findings were strongest for the MASQ-AD 8-item subscale and when predicting current depression status. Furthermore, the MASQ-AD 8-item subscale outperformed the neuroticism measure under certain conditions. The overall usefulness of the MASQ-AD scale as a screening device is discussed, as well as possible cutoff scores for use in research.
Keywords: depressive disorders, anhedonic depression, Mood and Anxiety Symptoms Questionnaire, receiver-operator characteristic analysis, screening
Introduction
There are a variety of strategies that clinical researchers can use to recruit individuals with specific forms of psychopathology. One strategy is to target individuals seeking treatment for the condition of interest. A key limitation of this strategy is that those individuals seeking treatment can be expected to be unrepresentative of individuals who suffer from that condition (du Fort, Newman, & Bland, 1993). An alternative approach is to use specific advertising techniques to target individuals who report suffering from these conditions, though again there is no way to ensure that those who respond are representative. A third approach is to screen, using diagnostic interviews, a very large number of individuals (with the number of individuals to be screened guided by base rates). This strategy can be very inefficient because of the relatively large amount of time that must be devoted to screening each participant.
A related recruitment approach involves screening a large number of participants with an instrument that can be administered quickly and easily, and then conducting follow-up assessments with a subset of these individuals using more extensive diagnostic procedures. This approach has the advantages of being more efficient than conducting full assessments with a large number of participants, as well as the ability to identify non-treatment seeking individuals with psychopathology. Of course, the feasibility of adopting this approach is premised on two conditions: (1) that sufficiently predictive instruments have been identified which can accurately distinguish individuals who are likely to meet diagnostic criteria from individuals who are likely to not meet diagnostic criteria; and (2) that information is available for determining an appropriate cut-off value that can be used for screening decisions. Both of these conditions can be addressed using receiver-operator characteristic (ROC) analysis (Green & Swets, 1966). In ROC analysis, one obtains a curve in which the sensitivity (i.e., the rate at which the instrument at a given value indicates the presence of a condition when the condition is actually present) is plotted against the specificity (i.e., the rate at which the instrument at a given value indicates the absence of a condition when the condition is not actually present) for the full range of scores on a given measure. The adequacy of a given measure as a screening tool can be determined by calculating the area under the ROC curve (AUC). AUC reflects the probability that a randomly selected “case” will score higher on the test or measure than a randomly selected “control” (Hanley & McNeil, 1983). Furthermore, sensitivity and specificity for specific scores on the measure can be examined to determine an appropriate clinical cut-off. ROC analysis is growing in popularity as a procedure for evaluating the utility of specific self-report instruments as screening tools for use in clinical research (e.g., Behar, Alcaine, Zuellig, & Borkovec, 2003; Chen, Faraone, Biederman, & Tsuang, 1994), in part because test results are robust even when the number of cases and controls is unequal in the sample (Rice & Harris, 1995).
Since the majority of individuals with depressive disorders do not seek professional treatment (Flament, Cohen, Choquet, Jeammet, & Ledoux, 2001; Kendler, 1995), recruiting research participants from clinical settings will likely exclude a very large proportion of the population with depressive disorders. Furthermore, motivational deficits, coupled with the stigma associated with mental disorders, may make depressed individuals less likely to respond to targeted advertisements. Finally, the base rates of depressive disorders, though higher than some other forms of psychopathology, are still low enough that conducting diagnostic procedures with an unselected sample of participants would not be very cost-effective. Thus, there is a clear need for self-report instruments that can be administered quickly and easily and can accurately identify individuals likely to have depressive disorders. Research involving ROC analysis has examined the utility of popular self-report measures of depression, such as the Beck Depressive Inventory (BDI; Beck, Steer, & Brown, 1996; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961). Findings from these studies have generally been encouraging (e.g., Kumar, Steer, Teitelman, & Villacis, 2002; Lasa, Ayuso-Mateos, Vázquez-Barquero, Díez-Manrique, & Dowrick, 2000). Nevertheless, many of these popular measures have been criticized as having poor discriminant validity, since they primarily measure general distress or negative affect, which is not unique to depression (see Watson & Clark, 1984). One possible implication of this criticism is that these instruments are likely to have high sensitivity but low specificity (see Sloan et al., 2002). According to the tripartite model of depression and anxiety, low levels of positive affect (anhedonia) are unique to depressive disorders, whereas elevated levels of negative affect are shared by both depressive and anxiety disorders (Clark & Watson, 1991). Self-report instruments have been developed to measure this unique component of depression, perhaps the most popular being the Mood and Anxiety Symptoms Questionnaire (MASQ; Watson, Clark, et al., 1995; Watson, Weber, et al., 1995). The MASQ includes an anhedonic depression scale, which was designed to measure the low levels of positive affect unique to depression (along with other symptoms that are thought to differentiate depressive disorders from anxiety disorders, such as lack of motivation).
Three studies have used ROC procedures to examine the utility of the MASQ anhedonic depression scale in clinical settings as a means of identifying individuals with depressive disorders (Boschen & Oei, 2007; Buckby, Yung, Cosgrave, & Cotton, 2007; Buckby, Yung, Cosgrave, & Killackey, 2007). Though all three studies showed that scores on the scale predict depressive disorder diagnoses, some disagreement still exists regarding the ultimate utility of this scale for clinical applications. Importantly, one of these studies (Buckby, Yung, Cosgrave, & Killackey, 2007) showed that the MASQ anhedonic depression scale outperformed a popular measure of depression (the Center for Epidemiologic Studies-Depression Scale) in predicting depressive disorder diagnoses. To date, no research has examined the utility of the MASQ anhedonic depression scale as a means for screening for depressive disorders in nonclinical settings. Such work is critical to exploring the potential utility of the MASQ anhedonic depression scale for research applications, or for initial screening for depressive disorders in primary health care settings.
Research has shown that items on the anhedonic depression scale of the MASQ load onto two separate factors, one of which consists of 8 items regarding depressed mood, lack of motivation, and other symptoms of depressive disorders (e.g., “felt really slowed down”), and another which consists of 14 reverse-scored items related to experiencing pleasant emotions (e.g., “felt like nothing was very enjoyable”; Nitschke, Heller, Imig, McDonald, & Miller, 2001; Watson, Clark, et al., 1995; Watson, Weber, et al., 1995). Existing ROC research examining the MASQ anhedonic depression scale has not examined these subscales separately to determine whether one of these subscales outperforms the other and/or the total scale.
The present project examined the utility of the MASQ anhedonic depression scale (MASQ-AD) as a screening tool for depressive disorders using ROC analysis. The utility of the MASQ-AD 22-item scale, as well as that of the 8- and 14-item subscales, was examined in a sample of college students in Study 1 and in a sample of community members in Study 2. The present study also went beyond past research by comparing the MASQ anhedonic depression scales with a measure of neuroticism, which is a personality trait shown to confer risk for a broad range of psychopathology, including but not limited to depression (Ormel et al., 2004).
Study 1
Method
Participants
Participants were 108 university students (60% female) ages 18–22 (M = 19.0; SD = 1.0) who were recruited to participate in a large scale neuroimaging study. All participants passed exclusion criteria related to a neuroimaging study: left-handedness, history of serious brain injury, abnormal hearing or vision, metal in their body, pregnancy, or non-native English speaker. For reasons associated with the primary goals of the neuroimaging study, efforts were made to oversample individuals with symptoms of anxiety and/or depression. To achieve this goal, a large number of individuals (n = 2,637) were initially assessed using self-report measures of anhedonic depression, anxious arousal, and worry. This screening session occurred one to six months prior to the collection of the data reported in this paper. Questionnaire scores from this session were used to determine who would participate in the next stage of the research; they were not used in the analyses presented in this paper. Based on their scores on these questionnaires, five groups of participants were recruited. Specifically, three groups scored above the 80th percentile (percentile levels determined from previous testing; Nitschke, Heller, Imig, McDonald, & Miller, 2001) on either the 8-item MASQ anhedonic depression subscale (n = 17), the MASQ anxious arousal scale (n = 18), or the Penn State Worry Questionnaire (n = 14; Meyer et al., 1990), and below the 50th percentile on the other two scales. A fourth group scored above the 80th percentile on all three measures (n = 29), and the final group scored below the 50th percentile on all three measures (n = 29)1. All participants received monetary compensation for participating in the study.
Self-Report Questionnaires
Anhedonic Depression
Participants completed the anhedonic depression scale from the Mood and Anxiety Symptoms Questionnaire a second time, after being recruited to participate in the neuroimaging study. Scores from this second administration were used in the analyses reported below. On the MASQ-AD, individuals indicate how frequently they have experienced a variety of different symptoms during the past week. This scale is composed of 22 items such as “felt like nothing was very enjoyable” and “felt really slowed down.” Research has indicated that this scale has good convergent and discriminant validity in undergraduate and community samples (Nitschke, Heller, Palmieri, & Miller, 1999; Nitschke et al., 2001; Watson, Clark, et al., 1995; Watson, Weber, et al., 1995). Since past research has shown that the items of anhedonic depression scale of the MASQ load onto two separate factors (Nitschke et al., 2001; Watson, Clark, et al., 1995; Watson, Weber, et al., 1995), analyses were conducted with the full 22-item scale as well as the 8- and 14-item subscales. In Study 1, alphas for the 22-, 8-, and 14-item scales were .94, .94, and .86, respectively.
Neuroticism
Participants also completed the 60-item NEO-Five Factor Inventory (Costa & McCrae, 1992) after being recruited to participate in the study. The 12-item Neuroticism scale is composed of items such as “I often feel inferior to others” and “I often feel tense and jittery”. Participants rated how characteristic each statement is of them. Research has indicated that this scale has good reliability and convergent validity in a variety of samples (Costa & McCrae, 1992). In the present sample, alpha for the neuroticism scale was .93.
Diagnostic Interview
Within approximately two weeks of completing the questionnaires described above, each participant was interviewed by an advanced doctoral student in clinical psychology using the Structured Clinical Interview for DSM-IV Disorders, Nonpatient Edition (SCID-NP; First, Spitzer, Gibbon, & Williams, 2002) to assess for symptoms of Axis I pathology. All final diagnostic decisions were determined through consensus of the interviewers in consultation with one of the authors (GM), a licensed clinical psychologist who has supervised over 2000 SCID cases. Interviewers were blind to participants’ scores on the self-report questionnaires.
For the current study, we used information gathered during the SCID-NP to classify all participants on four variables related to current and lifetime depressive disorder diagnoses. The first variable was based on whether the participant met DSM-IV diagnostic criteria for a current Major Depressive Episode (MDE) at the time of the interview. The second variable was based on whether the participant met diagnostic criteria for any current DSM-IV depressive disorder at the time of the interview. This included individuals who met full diagnostic criteria for a current MDE, as well as individuals who met full diagnostic criteria for Dysthymic Disorder, Substance Induced Mood Disorder with Depressive Features, Mood Disorder due to a General Medical Condition with Depressive Features, or Depressive Disorder, Not Otherwise Specified at the time of the interview. The third variable was based on whether the participant met DSM-IV diagnostic criteria for lifetime Major Depressive Disorder (MDD). The fourth variable was based on whether the participant had ever met diagnostic criteria for any DSM-IV depressive disorder, or a bipolar disorder (Bipolar I, II, or Cyclothymic Disorder) with a history of clinically significant depressive symptoms2. These diagnostic variables were not treated as mutually exclusive; thus, participants who qualified for current MDE also qualified for current depressive disorders, those who qualified for current MDE also qualified for lifetime MDD, and those who qualified for current depressive disorders also qualified for lifetime depressive disorders.
To examine interrater reliability, a secondary rater (blind to the original diagnoses) listened to audiotaped SCID-NP interviews of 20 participants (10 randomly selected cases who, according to the original diagnostician, met diagnostic criteria for lifetime MDD, and 10 randomly selected cases who did not) and provided ratings for the four diagnostic variables. Kappa was 1.00 for current MDE, 1.00 for current depressive disorders, .70 for lifetime MDD, and .90 for lifetime depressive disorders.
Analyses
The four diagnostic variables (current MDE, current depressive disorders, lifetime MDD, and lifetime depressive disorders) were used as the criteria for evaluating the absolute and relative effectiveness of the self-report questionnaires as a means of screening for depressive disorders, using ROC procedures. In each of these analyses, all participants in the sample who did not qualify as positive cases for that variable were treated as negative cases. For example, for analyses conducted with the current MDE variable, all participants in the sample who did not meet full criteria for a current Major Depressive Disorder (including remaining participants who qualified as positive cases for the other diagnostic variables, such as lifetime Major Depressive Disorder) were treated as negative cases. Areas under the ROC curves (AUCs) were calculated for each instrument to quantify the general utility of each scale as a means of screening for current and lifetime depressive disorder. Statistical significance of AUC estimates (i.e., whether these estimates are significantly above chance, which is .50) was determined using nonparametric tests (Hanley & McNeil, 1983). Since these tests are nonparametric, they do not require any statistical assumptions about the distributions of questionnaire scores and/or the base rates on the diagnostic variables. Although no specific guidelines for interpreting the size of AUC estimates are currently available, the following have been employed across a wide range of disciplines (e.g., Luna-Herrera et al., 2003; Starr et al., 2004; Thuiller et al., 2003): 0.50–0.60 = fail, 0.60–0.70 = poor, 0.70–0.80 = fair, 0.80–0.90 = good, 0.90–1.0 = excellent. All analyses were conducted using SPSS, Version 16.0.
To compare the relative effectiveness of each of the scales, AUCs for the different self-report scales were contrasted using the procedures for comparing correlated ROC curves described by DeLong and colleagues (DeLong, DeLong, & Clarke-Pearson, 1988). These analyses were conducted using locally written Matlab programs (Matlab R2007a, Natick, MA). Also, planned follow-up analyses for the MASQ-AD scales were conducted by calculating sensitivity, specificity, positive predictive power, and negative predictive power for specific scale values3, which in turn were used to explore optimal clinical cutoffs. In order to determine optimal cutoff scores, the Youden (1950) Index was computed, which has been shown to yield lower misclassification rates than other commonly used methods (Perkins & Schisterman, 2006).
Results and Discussion
Table 1 provides means, standard deviations, and ranges for the self-report measures. The descriptive statistics for this sample closely resemble those reported from past studies involving unselected student samples (e.g., Watson, Clark, et al., 1995). Based on the SCID-NP, three participants (2.7%) met criteria for a current MDE and six participants (5.6%) met criteria for current depressive disorders. Seventeen participants (15.7%) met criteria for lifetime MDD, and 28 (25.9%) participants met criteria for lifetime depressive disorders. The rates in this sample, whose mean age was 19.0, are comparable to rates reported in epidemiological studies of depressive disorders in older adolescents (e.g., Lewinsohn, Rohde, & Seeley, 1998).
Table 1.
STUDY 1 | STUDY 2 | |||||
---|---|---|---|---|---|---|
Self-Report Scale | Mean | SD | Range | Mean | SD | Range |
MASQ-22-item scale | 55.4 | 13.9 | 26–101 | 54.4 | 13.0 | 30–92 |
MASQ-8-item subscale | 15.7 | 5.1 | 8–35 | 15.1 | 4.6 | 8–32 |
MASQ-14-item subscale | 39.7 | 10.8 | 14–66 | 39.2 | 10.1 | 14–67 |
NEO-FFI Neuroticism | 35.3 | 9.3 | 12–60 | 33.2 | 8.9 | 14–58 |
Note. In Study 2, n = 166 for the MASQ-AD 22-item scale and 14-item subscale, n = 167 for the MASQ-AD 8-item subscale, and n = 154 for the NEO-FFI Neuroticism scale.
Table 2 contains AUCs for the four criterion variables: current MDE, current depressive disorders, lifetime MDD, and lifetime depressive disorders. As can be seen in Table 2, the self-report scales effectively predicted depressive disorder diagnoses, particularly for current MDE and depressive disorders. Specifically, the MASQ-AD 8-item scale and the neuroticism scale reliably predicted whether participants met criteria for a current MDE. The AUCs for both scales were in the good range, with the neuroticism scale bordering on the excellent range. The MASQ-AD 8-item subscale and the neuroticism scale also predicted lifetime MDD, with AUCs in the fair range. In addition, the full 22-item MASQ-AD scale predicted lifetime MDD, although the AUC for this scale bordered on the poor range. All of the scales predicted current depressive disorders, with all of the AUCs for the MASQ-AD scales in the good range, and the AUC for the neuroticism scale bordering on the excellent range. In addition, all of the scales predicted lifetime depressive disorders, with the AUCs for the full 22-item MASQ-AD scale, the 8-item subscale, and the neuroticism scale in the fair range, and the AUC for the 14-item subscale in the poor range.
Table 2.
Self-Report Scale | CURRENT | LIFETIME | ||||||
---|---|---|---|---|---|---|---|---|
MDE | Depressive Disorders | MDD | Depressive Disorders | |||||
Study 1 | Study 2 | Study 1 | Study 2 | Study 1 | Study 2 | Study 1 | Study 2 | |
MASQ-22 | .83 | .83** | .88** | .84** | .70* | .62* | .70** | .61* |
MASQ-8 | .87* | .85** | .88** | .88** | .72** | .59 | .72** | .61* |
MASQ-14 | .77 | .79** | .86** | .79** | .64 | .61* | .66* | .58 |
Neuroticism | .90* | .78* | .90** | .76** | .79** | .59 | .79** | .68** |
Asymptotic significance:
p < .05,
p < .01
Given that the results of the ROC analyses largely supported the effectiveness of all of the self-report measures as a means of screening for depressive disorders, we proceeded to examine whether the four scales differed from one another by conducting pairwise comparisons of the AUCs. The results revealed one statistically significant difference: the neuroticism scale outperformed the MASQ-AD 14-item subscale as a predictor of lifetime depressive disorders, (χ2 = 4.29, p = .04). No other pairs of scales differed significantly from one another.
Given that one of the primary goals of the project was to explore possible cutoff scores on the MASQ-AD scales that could be used to screen for depressive disorders, sensitivity, specificity, positive predictive power, and negative predictive power were computed for specific scale values. This was done using the 8-item subscale to predict current MDE status4 because that subscale slightly outperformed the full 22-item scale and the 14-item subscale in the ROC analyses, and the results were stronger for current than for lifetime depressive disorders. The results are presented in Table 3. A cutoff score of 21 on the MASQ-AD 8-item subscale maximized the Youden Index, thus achieving a balance of sensitivity and specificity. At this cutoff, negative predictive power was excellent (1.0), though positive predictive power was fairly low (.13).
Table 3.
Cutoff Score | Sensitivity | Specificity | PPP | NPP | ||||
---|---|---|---|---|---|---|---|---|
Study 1 | Study 2 | Study 1 | Study 2 | Study 1 | Study 2 | Study 1 | Study 2 | |
11 | 1.00 | 1.00 | .11 | .12 | .03 | .03 | 1.00 | 1.00 |
12 | 1.00 | 1.00 | .17 | .21 | .03 | .03 | 1.00 | 1.00 |
13 | 1.00 | 1.00 | .23 | .32 | .04 | .04 | 1.00 | 1.00 |
14 | 1.00 | .80 | .31 | .42 | .04 | .04 | 1.00 | .98 |
15 | 1.00 | .80 | .41 | .54 | .05 | .04 | 1.00 | .98 |
16 | 1.00 | .80 | .53 | .62 | .06 | .05 | 1.00 | .99 |
17 | 1.00 | .80 | .56 | .73 | .06 | .06 | 1.00 | .99 |
18 | 1.00 | .80 | .63 | .77 | .07 | .08 | 1.00 | .99 |
19 | 1.00 | .80 | .68 | .85 | .08 | .10 | 1.00 | .99 |
20 | 1.00 | .80 | .73 | .87 | .10 | .16 | 1.00 | .99 |
21 | 1.00 | .80 | .80 | .90 | .13 | .19 | 1.00 | .99 |
22 | .67 | .80 | .83 | .90 | .10 | .20 | .99 | .99 |
23 | .33 | .80 | .85 | .93 | .06 | .25 | .99 | .99 |
24 | .33 | .40 | .87 | .95 | .07 | .20 | .98 | .98 |
25 | .33 | .40 | .89 | .96 | .08 | .22 | .98 | .98 |
26 | .33 | .40 | .89 | .97 | .08 | .29 | .98 | .98 |
27 | .33 | .40 | .90 | .99 | .09 | .50 | .98 | .98 |
28 | .33 | .20 | .93 | .99 | .13 | .50 | .98 | .98 |
29 | .00 | .20 | .93 | .99 | .00 | .50 | .98 | .96 |
Overall, the results from Study 1 support the utility of the MASQ-AD scale as a means of screening for depressive disorders, though the MASQ-AD did not significantly outperform the neuroticism scale in predicting current or lifetime diagnostic status. The results were stronger for all of the self-report measures when predicting current rather than lifetime diagnostic status.
Nevertheless, these findings are qualified by some important limitations. The sample for this study was a sample of convenience consisting of undergraduate participants who were preselected on the basis of their scores on several self-report scales, including one of the MASQ-AD subscales. Consequently, it is possible that certain portions of the population distributions for the self-report measures (in particular, the MASQ-AD scales) were unrepresented, or underrepresented, in this sample. In turn, this may have inflated their discriminative power. Furthermore, the fact that the sample consisted solely of college students raises questions about the generalizability of the results. To address these limitations, Study 2 sought to replicate the findings from Study 1 in a sample of unselected community participants.
Study 2
Method
Participants
Participants were 167 community members (65 % female) ages 19–51 (M = 34.7; SD = 9.2). Participants were recruited though advertisements targeting adults interested in participating in a neuroimaging study and were screened for the same exclusion criteria as used in Study 1. All participants received monetary compensation for participating.
Self-Report Questionnaires
Anhedonic Depression
Participants completed the MASQ-AD. As in Study 1, analyses examined the 22, 8, and 14-item scales. In the present sample, alphas were 92, .94, and .80, respectively. One participant had missing data on the 22- and 14-item versions of the MASQ scale and was excluded from analyses involving these scales.
Neuroticism
Participants also completed the NEO-FFI. In the present sample, the alpha for the neuroticism scale was .88. Twelve participants had missing data on the NEO-FFI and were excluded from analyses involving this scale.
Diagnostic Interview
Within approximately two weeks of completing the questionnaires described above, each participant was interviewed by a clinical psychology graduate student using the SCID-NP, which was used classify participants on the same four variables described in Study 1 (current MDE, current depressive disorders, lifetime MDD, and lifetime depressive disorders). As in Study 1, these diagnostic variables were not treated as mutually exclusive, interviewers were blind to participants’ scores on the self-report questionnaires, and all final diagnostic decisions were determined through consensus.
Again, a secondary rater listened to audiotaped SCID-NP interviews of 20 participants (10 randomly selected cases who, according to the original diagnostician, met for lifetime MDD, and 10 randomly selected cases that did not) and provided ratings for the four diagnostic variables. Kappa was 1.00 for current MDE, .77 for current depressive disorders, 1.00 for lifetime MDD, and .76 for lifetime depressive disorders.
Analyses
The same analyses were conducted as in Study 1. Again, all analyses were conducted using SPSS, Version 16.0, and locally written Matlab programs.
Results and Discussion
Table 1 contains means, standard deviations, and ranges for self-report measures. The descriptive statistics for this sample closely resemble those reported from past studies involving unselected adult samples (e.g., Watson, Clark, et al., 1995). Based on the SCID-NP, five participants (3.0%) met criteria for a current MDE, and 11 (6.6%) participants met criteria for current depressive disorders. Fifty-five participants (32.9%) met criteria for lifetime MDD and 81 (48.5%) participants met criteria for lifetime depressive disorders. The rates in this sample for current diagnoses were comparable to rates reported in epidemiological research, though the rates for the two lifetime variables were higher than estimates from past research (e.g., APA, 2000; Kessler et al., 2003).
As can be seen in Table 2, for the most part, the self-report scales effectively predicted current MDE and depressive disorders, with the 8-item MASQ-AD scale performing particularly well. Specifically, all four self-report scales predicted whether participants met criteria for a current MDE, with the AUCs for the full 22-item MASQ-AD scale and the 8-item subscale in the good range and the AUCs for neuroticism scale and the 14-item MASQ-AD subscale in the fair range. Likewise, all four self-report scales predicted whether participants met criteria for a current depressive disorder, with the AUCs for each scale falling into these same ranges. Unlike Study 1, the self-report scales did not predict lifetime MDD and depressive disorders very well. Specifically, only the full 22-item MASQ-AD scale and the 14-item subscale significantly predicted lifetime MDD diagnosis, although the AUCs were both in the poor range. The full 22-item MASQ-AD scale, the 8-item subscale, and the neuroticism scale predicted lifetime depressive disorders, although again the AUCs for all of these scales were in the poor range.
Again, the results of comparisons of the AUCs for the self-report measures revealed very few significant differences. In this sample, the MASQ-AD 8-item subscale outperformed the neuroticism scale (χ2 = 3.71, p = .05) and the MASQ-AD 22 item scale outperformed the 14-item subscale (χ2 = 3.77, p = .05) as a means of predicting current depressive disorders. Also, the neuroticism scale outperformed the MASQ-AD 14-item subscale (χ2 = 4.83, p = .03) as a means of predicting lifetime depressive disorders.
Table 3 presents sensitivity, specificity, positive predictive power, and negative predictive power for the 8-item scale when predicting current MDE status4. A cutoff score of 23 on the MASQ-AD 8-item subscale maximized the Youden Index, thus achieving a balance of sensitivity and specificity. At this cutoff, negative predictive power was once again excellent (.99). Positive predictive power, though low, was better than in Study 1 (.25).
The results of Study 2 provide additional support for the utility of the MASQ-AD scales as a means of screening for depressive disorders, thus replicating the main finding from Study 1 without its limitations. Furthermore, unlike in Study 1, the MASQ-AD 8-item subscale outperformed the neuroticism measure when predicting current depressive disorders. This is consistent with the notion that anhedonia is specific to depressive disorders (relative to anxiety disorders) (Clark & Watson, 1991), and thus measures that specifically developed to tap this dimension of depression are likely to have higher specificity.
The results for all of the self-report measures were again stronger when predicting current rather than lifetime clinical diagnostic status. Unlike Study 1, only the full 22-item MASQ-AD scale and the 14-item subscale predicted lifetime MDD at a level above chance, and the AUCs were not very strong in either case. Furthermore, although the full 22-item MASQ-AD scale, the 8-item subscale, and the neuroticism scale predicted lifetime depressive disorders at a level above chance, again none of the AUCs were very strong. This suggests that screening for lifetime history of depressive disorders is difficult in an unselected sample of participants with a broader age range.
General Discussion
In both evaluations of the MASQ-AD scale as a means of screening for depressive disorders, all of the self-report scales that were examined predicted current depressive disorders. In Study 1, all four self-report scales also predicted lifetime depressive disorders, and the MASQ-AD 8-item subscale and the neuroticism scale predicted current MDE and lifetime MDD. In Study 2, all four self-report scales also predicted current MDE. Furthermore, the 8-item scale significantly outperformed the neuroticism scale when predicting current depressive disorders.
Overall, these findings support the usefulness of the MASQ-AD scale as a means of screening for depressive disorders in nonclinical settings, and suggest that the 8-item subscale may be the best means of screening for depressive disorders (amongst the scales that were examined). Results were stronger for this subscale in several analyses compared to the 14-item subscale and the full 22-item scale. More importantly, this subscale requires less time to administer. The MASQ-AD scales appear to be more effective as a means of screening for current than for lifetime depressive disorders. Though the MASQ-AD scales predicted lifetime MDD and lifetime depressive disorders in Study 1, the results were much less impressive for these two variables in Study 2. This discrepancy could be due to systematic differences between the two samples (e.g., the larger age range of participants in Study 2 relative to Study 1). It is important to note that our results are applicable to categorically defined diagnostic entities (such as those in the DSM-IV) and not to dimensional definitions of psychopathology.
The AUCs obtained for the MASQ-AD scales predicting current MDE and depressive disorders were strong in both samples and were comparable to those reported for some common biomedical tests (Swets et al., 1979) as well as other self-report measures used to screen for psychopathology (e.g., the Penn State Worry Questionnaire as a means of screening for Generalized Anxiety Disorder; Fresco et al., 2003; the Beck Depression Inventory as a means of screening for Major Depressive Disorder; Kumar et al., 2002). Thus, the MASQ-AD appears to be a potentially useful tool for researchers who wish to screen for current depressive disorders. In recent years, increased attention has been devoted to identifying untreated depressed individuals in primary-care settings (e.g., Zich, Attkisson, & Greenfield, 1990). The results of the present study indicate that the MASQ-AD, and in particular the 8-item subscale, may also be quite useful for depression screening in primary-care settings.
In order to explore what might be an appropriate clinical cutoff score on the MASQ-AD when screening for depressive disorders, sensitivity, specificity, positive predictive power, and negative predictive power were examined for specific values in both samples. Results were consistently strongest for the MASQ-AD 8-item scale and when predicting current diagnostic status, so analyses focused on the 8-item scale predicting current MDE. Results from Study 1 showed that the optimal cutoff score (based on the Youden Index) was 21; in Study 2, the optimal cutoff score was 23. Of course, the most appropriate cutoff score to use for any specific application will depend on the nature of the sample, as well as the relative importance of sensitivity and specificity.
To concretize the implications of present findings for research on depressive disorders, a table of results for a hypothetical sample of 500 unselected community participants was constructed, using data from Study 2 to estimate how participants would be classified as meeting criteria for a current Major Depressive Episode based on a cutoff score of 23 on the MASQ-AD 8-item subscale. Table 4 shows that, of 48 individuals at or above that cutoff, 12 (71%) of the 17 individuals who would meet diagnostic criteria for a current MDE would be correctly identified. The remaining 36 would be judged false positives. Perhaps more importantly, 447 individuals who would not qualify for a current MDE would be correctly screened out. Clearly, this approach would be much more efficient than conducting full diagnostic assessments with all 500 participants.
Table 4.
Accurate | Inaccurate | Total | |
---|---|---|---|
Below cutoff (<23) | 447 | 5 | 452 |
At or above cutoff (≥23) | 12 | 36 | 48 |
numbers rounded up for inaccurate classifications and down for accurate classifications
The participants in Study 1 were college students selected on the basis on their scores on several questionnaires, including the MASQ-AD scale. Though the mean and standard deviations for the MASQ-AD from this sample were comparable to those from past research involving unselected student samples (e.g., Watson, Clark, et al., 1995), concerns could be raised about whether the results from Study 1 would generalize to an unselected sample. To address this concern, in Study 2 we replicated the findings using an unselected sample of community participants.
As expected, the rates of current depressive disorder diagnoses in both samples were comparable to estimates for the general population. Though the absolute rates of current depressive disorders were low (6.5% in Study 1 and 6.6% in Study 2), this sort of sample is appropriate for examining the utility of screening for depressive disorders in research and primary-care settings. The rates of lifetime depressive disorder diagnoses in Study 1 were comparable to the rates reported in previous research examining older adolescents, though the rates of lifetime depressive disorder diagnoses in Study 2 were higher than usually found in the general population. As previously noted, one of the major strengths of ROC methodology is that test results are robust even when the number of cases and controls is unequal in the sample (Rice & Harris, 1995). The fact that our results were very comparable to results from past studies on the MASQ-AD scale using ROC methods (Boschen & Oei, 2007; Buckby, Yung, Cosgrave, & Cotton, 2007; Buckby, Yung, Cosgrave, & Killackey, 2007) serves to bolster confidence in applicability of our findings. Nonetheless, to be confident that the MASQ anhedonic depression scale can be used to successfully select individuals with depressive disorders, whether for research purposes or in primary-care settings, additional replication is needed.
The present comparison of the MASQ-AD scales to a self-report measure of neuroticism indicated that the MASQ-AD 8-item subscale outperformed the neuroticism scale under certain circumstances (i.e., when predicting current depressive disorders in an unselected sample). Thus, the MASQ-AD scale may be more appropriate as a means of screening for current depressive disorders than a measure of neuroticism in unselected participants from a wide range of age groups. Coupled with results from research showing that the MASQ-AD scale can outperform other popular measures of depression (Buckby, Yung, Cosgrave, & Killackey, 2007), these findings also suggest that the MASQ-AD scale may be more appropriate to use for such purposes than scales that primarily gauges high levels of general distress or negative affect. Future research should directly compare the effectiveness of MASQ-AD scales and other popular measures of depression (e.g., the Beck Depression Inventory) when screening for depressive disorders in nonclinical settings.
In summary, the findings from the present studies support the utility of the MASQ-AD scales as a means of screening for depressive disorders. Results were stronger for current than for lifetime depressive disorders, and suggest that the 8-item subscale offers the best discriminative power. Investigators may be able to maximize efficiency by utilizing this measure as an initial screening tool in clinical research.
Acknowledgments
This work was supported by the National Institute of Mental Health (R01 MH61358, T32 MH19554, T32 MH067533, P50 MH079485). The authors thank Adrienne Abramowitz, Joscelyn Fisher, Brenda Hernandez, and Angela Lawson for their assistance conducting structured clinical interviews.
Footnotes
One participant did not meet criteria for any of the five groups. Since group membership was not directly relevant to the present project, data from this individual was included in the analyses.
All analyses for the lifetime depressive disorders variable were rerun excluding individuals who qualified for bipolar disorders, and the results were virtually identical in both samples.
Positive and negative predictive power were included because some have argued that these indices are more clinically meaningful than sensitivity and specificity (Kessel & Zimmerman, 1993; Widiger et al., 1984).
Statistics for specific values on the other self-report scales that were examined, as well as for the other diagnostic variables, are available from the authors upon request.
Publisher's Disclaimer: The following manuscript is the final accepted manuscript. It has not been subjected to the final copyediting, fact-checking, and proofreading required for formal publication. It is not the definitive, publisher-authenticated version. The American Psychological Association and its Council of Editors disclaim any responsibility or liabilities for errors or omissions of this manuscript version, any version derived from this manuscript by NIH, or other third parties. The published version is available at www.apa.org/pubs/journals/PAS.
Contributor Information
Keith Bredemeier, Department of Psychology, University of Illinois at Urbana-Champaign.
Jeffrey M. Spielberg, Department of Psychology, University of Illinois at Urbana-Champaign
Rebecca Levin Silton, Department of Psychology, University of Illinois at Urbana-Champaign.
Howard Berenbaum, Department of Psychology, University of Illinois at Urbana-Champaign.
Wendy Heller, Department of Psychology and Beckman Institute Biomedical Imaging Center, University of Illinois at Urbana-Champaign.
Gregory A. Miller, Departments of Psychology and Psychiatry and Beckman Institute Biomedical Imaging Center, University of Illinois at Urbana-Champaign
References
- American Psychiatric Association. Diagnostic and statistical manual of mental disorders. 4. Washington, DC: American Psychiatric Association; 2000. [Google Scholar]
- Beck AT, Ward CH, Mendelson M, Mock J, Erbaugh J. An inventory for measuring depression. Archives of General Psychiatry. 1961;4:561–571. doi: 10.1001/archpsyc.1961.01710120031004. [DOI] [PubMed] [Google Scholar]
- Beck AT, Steer RA, Brown GK. Manual for the Beck Depression Inventory-II. San Antonio, TX: Psychological Corporation; 1996. [Google Scholar]
- Behar E, Alcaine O, Zuellig AR, Borkovec TD. Screening for generalized anxiety disorder using the Penn State Worry Questionnaire: a receiver operating characteristic analysis. Journal of Behaviour Therapy and Experimental Psychiatry. 2003;34:25–43. doi: 10.1016/s0005-7916(03)00004-1. [DOI] [PubMed] [Google Scholar]
- Boschen MJ, Oei TPS. Discriminant validity of the MASQ in a clinical sample. Psychiatry Research. 2007;150:163–171. doi: 10.1016/j.psychres.2006.03.008. [DOI] [PubMed] [Google Scholar]
- Buckby JA, Yung AR, Cosgrave EM, Cotton AM. Distinguishing between anxiety and depression using the Mood and Anxiety Symptoms Questionnaire (MASQ) British Journal of Psychology. 2007;46:235–239. doi: 10.1348/014466506X132912. [DOI] [PubMed] [Google Scholar]
- Buckby JA, Yung AR, Cosgrave EM, Killackey EJ. Clinical utility of the Mood and Anxiety Symptoms Questionnaire (MASQ) in a sample of young help-seekers. BMC Psychiatry. 2007;7:50. doi: 10.1186/1471-244X-7-50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Costa PT, McCrae RR. Revised NEO Personality Inventory (NEOPI-R) and Five Factor Inventory (NEO-FFI) Professional Manual. Odessa, FL: Psychological Assessment Resources; 1992. [Google Scholar]
- Chen WJ, Faraone SV, Biederman J, Tsuang MT. Diagnostic accuracy of the Child Behavior Checklist scales for attention-deficit hyperactivity disorder: A receiver-operating characteristic analysis. Journal of Consulting and Clinical Psychology. 1994;62:1017–1025. doi: 10.1037/0022-006X.62.5.1017. [DOI] [PubMed] [Google Scholar]
- Clark LA, Watson D. Tripartite model of anxiety and depression: psychometric evidence and taxonomic implications. Journal of Abnormal Psychology. 1991;100:316–336. doi: 10.1037//0021-843x.100.3.316. [DOI] [PubMed] [Google Scholar]
- DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: A non-parametric approach. Biometrics. 1988;44:837–845. [PubMed] [Google Scholar]
- du Fort GG, Newman SC, Bland RC. Psychiatric comorbidity and treatment seeking: Sources of selection bias in the study of clinical populations. Journal of Nervous and Mental Disease. 1993;181:467–474. [PubMed] [Google Scholar]
- First MB, Spitzer RL, Gibbon M, Williams JBW. Structured Clinical Interview for DSM-IV Axis I Disorders, Research Version, Non-Patient Edition. Biometrics Research, New York State Psychiatric Institute; New York, NY: 2002. [Google Scholar]
- Flament MF, Cohen D, Choquet M, Jeammet P, Ledoux S. Phenomenology, psychosocial correlates, and treatment seeking in Major Depression and Dysthymia of adolescence. Journal of the American Academy of Child & Adolescent Psychiatry. 2001;40:1070–1078. doi: 10.1097/00004583-200109000-00016. [DOI] [PubMed] [Google Scholar]
- Green DM, Swets JA. Signal detection theory and psychophysics. New York: Wiley; 1966. [Google Scholar]
- Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]
- Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148:839–843. doi: 10.1148/radiology.148.3.6878708. [DOI] [PubMed] [Google Scholar]
- Kendler KS. Is seeking treatment for depression predicted by a history of depression in relatives? Implications for family studies of affective disorder. Psychological Medicine. 1995;25:807–814. doi: 10.1017/s0033291700035054. [DOI] [PubMed] [Google Scholar]
- Kessel JB, Zimmerman M. Reporting errors in studies of the diagnostic performance of self-administered questionnaires: Extent of the problem, recommendations for standardized presentation of results, and implications for the peer review process. Psychological Assessment. 1993;54:395–399. [Google Scholar]
- Kessler RC, Berglund P, Demler O, Jin R, Koretz D, Merikangas KR, Rush AJ, Walters EE, Wang PS. The epidemiology of major depressive disorder: Results from the National Comorbidity Survey Replication (NCS-R) Journal of the American Medical Association. 2003;289:3095–4105. doi: 10.1001/jama.289.23.3095. [DOI] [PubMed] [Google Scholar]
- Kumar G, Steer RA, Teitelman KB, Villacis L. Effectiveness of Beck Depression Inventory–II subscales in screening for Major Depressive Disorders in adolescent psychiatric inpatients. Assessment. 2002;9:164–170. doi: 10.1177/10791102009002007. [DOI] [PubMed] [Google Scholar]
- Lasa L, Ayuso-Mateos JL, Vázquez-Barquero JL, Díez-Manrique FJ, Dowrick CF. The use of the Beck Depression Inventory to screen for depression in the general population: A preliminary analysis. Journal of Affective Disorders. 2000;57:261–265. doi: 10.1016/s0165-0327(99)00088-9. [DOI] [PubMed] [Google Scholar]
- Lewinsohn PM, Rohde P, Seeley JR. Major depressive disorder in older adolescents: prevalence, risk factors, and clinical implications. Clinical Psychology Review. 1998;18:765–794. doi: 10.1016/s0272-7358(98)00010-5. [DOI] [PubMed] [Google Scholar]
- Luna-Herrera J, Marinez-Cabrera G, Parra-Maldonado R, Enciso-Moreno JA, Torres-Lopez J, Quesada-Pasqual F, et al. Use of Receiver Operating Characteristic curves to assess the performance of a microdilution assay for determination of drug susceptibility of clinical isolates of Mycobacterium tuberculosis. European Journal of Clinical Microbiology & Infectious Diseases. 2003;22:21–27. doi: 10.1007/s10096-002-0855-5. [DOI] [PubMed] [Google Scholar]
- Nitschke JB, Heller W, Palmieri PA, Miller GA. Contrasting patterns of brain activity in anxious apprehension and anxious arousal. Psychophysiology. 1999;36:628–637. [PubMed] [Google Scholar]
- Nitschke JB, Heller W, Imig JC, McDonald RP, Miller GA. Distinguishing dimensions of anxiety and depression. Cognitive Therapy and Research. 2001;25:1–22. [Google Scholar]
- Ormel J, Rosmalen J, Farmer A. Neuroticism: a non-informative marker of vulnerability to psychopathology. Social Psychiatry and Psychiatric Epidemiology. 2004;39:906–912. doi: 10.1007/s00127-004-0873-y. [DOI] [PubMed] [Google Scholar]
- Perkins NJ, Schisterman EF. The inconsistency of ‘‘optimal’’ cutpoints obtained using two criteria based on the receiver operating characteristic curve. American Journal of Epidemiology. 2006;163:670–675. doi: 10.1093/aje/kwj063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rice ME, Harris GT. Violent recidivism: Assessing predictive validity. Journal of Consulting and Clinical Psychology. 1995;63:737–748. doi: 10.1037//0022-006x.63.5.737. [DOI] [PubMed] [Google Scholar]
- Sloan DM, Marx BP, Bradley MM, Strauss CC, Lang PJ, Cuthbert BC. Examining the high-end specificity of the Beck Depression Inventory using an anxiety sample. Cognitive Therapy and Research. 2002;26:719–727. [Google Scholar]
- Starr AJ, Smith WR, Frawley WH, Borer DS, Morgan SJ, Reinert CM, et al. Symptoms of Posttraumatic Stress Disorder after orthopedic trauma. Journal of Bone and Joint Surgery. 2004;86-A:1115–1121. doi: 10.2106/00004623-200406000-00001. [DOI] [PubMed] [Google Scholar]
- Swets JA, Pickett RM, Whitehead SF, Getty DJ, Schnur JA, Swets JB, et al. Assessment of diagnostic technologies: advanced measurement methods are illustrated in a study of computed tomography of the brain. Science. 1979;205:753–759. doi: 10.1126/science.462188. [DOI] [PubMed] [Google Scholar]
- Thuiller W, Vayreda J, Pino J, Sabate S, Lavorel S, Gracia C. Large-scale environmental correlates of forest tree distributions in Catalonia (NE Spain) Global Ecology & Biogeography. 2003;12:313–325. [Google Scholar]
- Watson D, Clark LA. Negative affectivity: The disposition to experience aversive emotional states. Psychological Bulletin. 1984;96:465–490. [PubMed] [Google Scholar]
- Watson D, Clark LA, Weber K, Assenheimer JS, Strauss ME, McCormick RA. Testing a tripartite model: II. Exploring the symptom structure of anxiety and depression in student, adult, and patient samples. Journal of Abnormal Psychology. 1995;104:15–25. doi: 10.1037//0021-843x.104.1.15. [DOI] [PubMed] [Google Scholar]
- Watson D, Weber K, Assenheimer JS, Clark LA, Strauss ME, McCormick RA. Testing a tripartite model: I. Evaluating the convergent and discriminant validity of anxiety and depression symptom scales. Journal of Abnormal Psychology. 1995;104:3–14. doi: 10.1037//0021-843x.104.1.3. [DOI] [PubMed] [Google Scholar]
- Widiger TA, Hurt SW, Frances A, Clarkin JF, Gilmore M. Diagnostic efficiency and DSM-III. Archives of General Psychiatry. 1984;41:1005–1012. doi: 10.1001/archpsyc.1984.01790210087011. [DOI] [PubMed] [Google Scholar]
- Youden WJ. An index for rating diagnostic tests. Cancer. 1950;3:32–35. doi: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
- Zich JM, Attkisson C, Greenfield TK. Screening for depression in primary care clinics: The CES-D and the BDI. International Journal of Psychiatry in Medicine. 1990;20:259–277. doi: 10.2190/LYKR-7VHP-YJEM-MKM2. [DOI] [PubMed] [Google Scholar]