Skip to main content
PLOS One logoLink to PLOS One
. 2020 Oct 26;15(10):e0241370. doi: 10.1371/journal.pone.0241370

The bi-factor structure of the 17-item Hamilton Depression Rating Scale in persistent major depression; dimensional measurement of outcome

Neil Nixon 1,2, Boliang Guo 1,3, Anne Garland 2, Catherine Kaylor-Hughes 1,3, Elena Nixon 1,3, Richard Morriss 1,3,*
Editor: Thach Duc Tran4
PMCID: PMC7588071  PMID: 33104761

Abstract

Background

The 17-item Hamilton Depression Rating Scale (HDRS17) is used world-wide as an observer-rated measure of depression in randomised controlled trials (RCTs) despite continued uncertainty regarding its factor structure. This study investigated the dimensionality of HDRS17 for patients undergoing treatment in UK mental health settings with moderate to severe persistent major depressive disorder (PMDD).

Methods

Exploratory Structural Equational Modelling (ESEM) was performed to examine the HDRS17 factor structure for adult PMDD patients with HDRS17 score ≥16. Participants (n = 187) were drawn from a multicentre RCT conducted in UK community mental health settings evaluating the outcomes of a depression service comprising CBT and psychopharmacology within a collaborative care model, against treatment as usual (TAU). The construct stability across a 12-month follow-up was examined through a measurement equivalence/invariance (ME/I) procedure via ESEM.

Results

ESEM showed HDRS17 had a bi-factor structure for PMDD patients (baseline mean (sd) HDRS17 22.6 (5.2); 87% PMDD >1 year) with an overall depression factor and two group factors: vegetative-worry and retardation-agitation, further complicated by negative item loading. This bi-factor structure was stable over 12 months follow up. Analysis of the HDRS6 showed it had a unidimensional structure, with positive item loading also stable over 12 months.

Conclusions

In this cohort of moderate-severe PMDD the HDRS17 had a bi-factor structure stable across 12 months with negative item loading on domain specific factors, indicating that it may be more appropriate to multidimensional assessment of settled clinical states, with shorter unidimensional subscales such as the HDRS6 used as measures of change.

Introduction

The 17-item Hamilton Depression Rating Scale (HDRS17), which was developed in late 1950s has been the most frequently used observer-rated measure of depression for research including randomised controlled trials (RCTs) of treatments for depression [13]. The positive and negative features of the HDRS17 have been comprehensively reviewed [46]. One of its most serious problems, poor inter-rater and test-retest reliability has been addressed with the development of the GRID- HDRS17 version [5]. Overall despite its flaws, it continues to be recommended by licensing and treatment guideline bodies such as the Federal Drug Administration in the US [7] and the National Institute for Care Excellence [8] because of its longitudinal continuity for historical comparison in more than 1500 randomised controlled trials, widespread use for meta-analysis and the lack of a superior measure despite many attempts and considerable resources including the National Institute of Mental Health and the World Health Organisation [46].

However, concerns persist about the widespread use of the of the HDRS17 as a unidimensional measure of depression severity, given indications that it has a more complex factor structure that is not fully captured by a single, total score [913]. Evidence supporting a multidimensional structure has been reviewed by Fried et al [9] and demonstrated across different methodologies, including hierarchical confirmatory factor analysis (CFA) showing a general 2nd order depression factor [13] and exploratory factor analysis in ‘treatment naïve’ mainly non-persistent depression [14, 15]. Fried et al [9], went further to show that this multifactorial structure became more pronounced as depression severity increased, indicating the potential importance of assessing HDRS17 structure in clearly defined clinical groups identified by levels of persistence and severity.

In fact, since Hamilton’s original factor analysis [2], relatively little work has been done on the HDRS17 factor structure in patient groups with more severe, persistent major depressive disorder (PMDD) under treatment in mental health settings. Findings have instead emerged from a variety of other clinical settings [11] and populations, including people whose primary health problem was not depression [12]; and since the nature and complexity of depression has been shown to vary widely across these populations, including the degree of persistence, melancholia, anxiety and other associated co-morbidity [1618], it follows from Fried et al [9] that these findings cannot be assumed to give a true impression of how the HDRS17 functions within more severe PMDD. Additionally, methods of statistical analysis have changed over the 60 years since Hamilton’s original work and earlier reports often lacked the more robust analytical approach now available for establishing factor structure through Exploratory Structural Equational Modelling (ESEM) [19].

Current modelling via ESEM also allows assessment of the related issue of measurement invariance, assessed through the consistency of construct measurement across time. Whilst Fried et al [9] showed this was generally poor for a range of depression measures including the HDRS17, more recent work using ESEM has shown invariance over 12 months for a patient completed outcome measure in more severe PMDD (the 9-item Personal Health Questionnnaire; PHQ-9) [15, 20]. However, there has been no equivalent assessment of clnician outcome measures, such as the HDRS17 in this PMDD popultation.

Previous statistical approaches assessing the dimensionality of the HDRS17 have included both exploratory factor analysis (EFA) and CFA [13]. However, recent literature has shown that both EFA and CFA have methodological limitations [21, 22]. In EFA modelling it is impossible to incorporate latent EFA factors into subsequent analyses and it is not easy to test measure invariance across groups and/or times [22]. In CFA modelling, each item is strictly allowed to load on one factor and all non-target loadings are constrained to zero. The latest factor analytical ESEM, integrates the best features of both EFA and CFA together by applying EFA rigorously to specify more appropriately the underlying factor structure together with the advanced statistical methods typically associated with CFAs [22]. ESEM allows cross item factor loadings which are coherent with the underlying theory and/or item contents that item(s) could cross load on different latent factors; ESEM could reduce the bias in parameter estimates due to zero loading restriction which generally results in inflated CFA factor correlation because items might not be perfect factor indicators with some degree of irrelevant association with other constructs [2224].

Following on from these findings, in order to explore whether the HDRS17 has a general depression factor and additional domain specific factors a bi-factor model exploring psychometric multidimensionality within ESEM, is now recommended, rather than the traditional second order factor analytical model [23, 25, 26]. Compared with hierarchical factor analysis model (Fig 1), bi-factor models have statistical advantages such as fitting data better and allowing external prediction by group factors with or without overall factors [27]. Conceptually, as group factors in a bi-factor model are not subsumed by the overall factor [28], they represent factors explaining items variances which were not accounted for by the overall factor [27]. Therefore the group specific factors have influence over and above the general factor that might help explain the clinical heterogeneity observed among individual patients with depression [29], providing valuable clarity for future research and practice.

Fig 1. Schematic example of 2nd order factor and bi-factor model: G = general factor, F = group factor.

Fig 1

There is therefore an opportunity and need to re-assess the factor structure and measurement invariance of the HDRS17 in more severe PMDD, made more pressing by the fact that treatment guidelines, including those currently in preparation [30], continue to use the HDRS17 as a single total score across a range of depression severity and persistence.

We address this issue here via ESEM bi-factor modelling in a well-defined patient population with moderate to severe PMDD, recruited from UK mental health care settings in a previously published RCT [18], assessing construct stability across 12-month follow-up using a measurement equivalence/invariance (ME/I) procedure. The chosen 12-month period is clinically relevant through the extended clinical treatment often necessary in patients with PMDD.

Materials and methods

Patients and instruments

Patients (N = 187) were drawn from a multicentre pragmatic randomised controlled trial (RCT) evaluating outcomes of a Special Depression Service (SDS; specialist pharmacotherapy and psychotherapy within a collaborative care model) against treatment as usual (TAU) within UK mental health services [18]. At the time of recruitment participants were all adults receiving community treatment for persistent depression in one of three UK mental health centres (Nottingham, Derby and Cambridge). Ethics approval was obtained from the Trent Research Ethics Service in Derby, England. Approval number 09/H0405/42. Oral and written informed consent was obtained from each participant.

Participants were eligible for the study if they were: thought by the referrer to have primary unipolar depression; aged 18 years or over; able and willing to give oral and written informed consent to participate in the study; had been offered or received direct and continuous care from one or more health professionals in the preceding 6 months and currently be under the care of a secondary care mental health team; had a diagnosis of major depressive disorder with a current major depressive episode according to the structured clinical interview for DSM-IV (SCID) [31]; met five of nine NICE criteria for symptoms of moderate depression; had a score of ≥16 on the 17-item GRID version of the Hamilton Depression Rating Scale (HDRS17) [5]; and had a Global Assessment of Functioning (GAF) [32] score ≤ 60. Referrals were excluded if they: were in receipt of emergency care for suicide risk; were at risk of severe neglect, or posed a homicide risk, unless that risk was adequately contained in their current care setting; were not fluent English speakers; were pregnant; had unipolar depression secondary to a primary psychiatric or medical disorder, except when bipolar disorder was identified by the research team after referral with unipolar depression because an SDS would be expected to manage bipolar depression in clinical practice (n = 8, 4.3%).

The mean age of patients was 46.8 years (sd 11.4) and 61.1% (114 of total 187) were female. Following randomisation 93 (49.7%) patients were allocated to the SDS treatment arm and 94 (50.3%) to treatment as usual (TAU). In the treatment arm, participants received specialist pharmacological and cognitive behaviour therapy within a collaborative care model structured and planned over 12 months. TAU comprised multidisciplinary, community-based care delivered by general mental health services. The primary clinical outcome measure in this trial was the HDRS17 assessed at baseline, 6 and 12 month follow up time points [33]. One hundred and sixty-three (87%) participants entering the RCT suffered depression for more than 1 year with the median (interquartile range) duration of the current episode of 6.5 (2.6–16.0) years. The mean (sd) severity of the HDRS17 at baseline was 22.6 (5.2) years. Melancholia was present in 105 (56.1%) participants and 146 (78.1%) also had a comorbid anxiety disorder. The study design, data collection procedures, treatment offered and trial results can be found from the published protocol [33] and trial report [18].

The HDRS17 evaluates depression severity through items on: 1) depressed mood, 2) guilt, 3) suicidal thought or action, 4) insomnia initial, 5) insomnia middle, 6) insomnia late, 7) work and interests (assessing pleasure and functioning), 8) motor retardation, 9) motor agitation, 10) psychic anxiety, 11) somatic anxiety, 12) appetite, 13) tiredness, 14) sexual interest, 15) hypochondriasis, 16) weight loss, 17) insight. Among these 17 items, 9 items are scored on a 5-point scale (0–4) and 8 items on 3-point scale (0–2) with higher scores indicating greater depressive severity for all items. In keeping with current practice, the total item score was used to quantify the severity of depression and treatment effect estimates in the RCT [2].

Statistics

We first examined the frequency of patients’ response on each HDRS17 item across three time points (baseline, 6 and 12 months). ESEM was then used to explore the factor structure of the HDRS17 [22]. With reference to existing evidence on the factor structure of the HDRS17, we tested separately one to five first order factors and also bi-factor models with two-three domain specific factors for data measured at each time point. Data measured at each time point were stored in wide format for ESEM modelling with alike items measured at adjacent time correlated to take into account the non-independence of data due to the nature of longitudinal design [34]. Ordinal item score was analysed with the WLSMV estimator using Delta parameterization; missing values were automatically accounted for using the full-information maximum likelihood approach built into Mplus [35, 36]. Measurement invariance across all follow-up time points for the best fitted factor structure was further tested using ESEM by comparing configural invariance model and scalar invariance (item factor loading and item threshold invariance) model fittings [9, 34, 37]. All ESEM models were performed using software Mplus 8 and in keeping with standard practice correlation between item residuals was set as 0 [37].

Several fitting indices along with chi-square (χ2) test were used to judge model fit as χ2 tests are sensitive to large sample sizes and non-normal data [38]. The criterion are both comparative fit index (CFI) and the non-normed fit index (NNFI) > 0.90, Root Mean Square Error of Approximation (RMSEA) < 0.08 [39]. The factor loading and item-factor mapping pattern were additionally examined by two senior psychiatrists (RM, NN) to make the factor structure clinically plausible and meaningful. Model comparisons were evaluated by reference to the χ2 change test using Mplus DIFFTEST function to conduct χ2 difference tests, as the WLSMV estimator was used to analyse ordinal items scores [37]. Since the χ2 change tests are influenced by sample size and data non-normality [34, 40, 41], the CFI change is independent of both model complexity and sample size and it is not correlated with the overall fit measurements. A reduction of 0.01 or more in CFI suggests the null hypothesis of no difference should be rejected [41]. We therefore mainly judged model improvement on the CFI change [34, 41] A number of specific modelling details are presented alongside the results.

Results

Frequency of item response

The frequency of each item by arm across measurement time are presented as an appendix. There is an extreme response pattern for the item “insight loss”, for which all but one response was recorded as 0 across measurement time. This extreme response on item “insight loss” would result in it being excluded from all ESEM modelling due to 0 variability. Hence all ESEM models in this study were performed using 16 items.

HDRS17 factor structure

Model fitting indices of structure included one to five first order factors and bi-factor models with two or three domain specific factors for measures at each time (Table 1). Although the model fitting increased with an increased number of latent factors, the items-factors association mapping showed that the bi-factor model with two domain specific factors (bi-2factor) had the most meaningful factor structure in term of model fitting and item-factor mapping pattern. A similar pattern was shown when all models in Table 1 were rerun with alike item loading set to be equal across measurement time (Table 2). The item-factor association mapping also showed that a bi-factor model with two domain specific factors (Table 4) had the most meaningful factor structure (Table 3). By examining the factor loading pattern shown in Table 3, it was suggested HDRS17 measured a general depression factor for patients with moderate-severe PMDD, which comprised all items except “motor retardation” together with a vegetative-worry factor comprising positively loading items “insomnia” (early, middle and late), ‘weight loss”, “appetite loss” and negative loading items “psychic anxiety” and “hypochondriasis”; and a retardation-agitation factor comprising positive loading items “motor retardation”, “depressed mood”, diminished pleasure (“work and interests”), “suicidal thoughts” and negative loading for “agitation”. Item factor loadings for all models shown in Table 2 are presented as supplementary material (appendix).

Table 1. Modelling fitting indices for model with different 1st order and bi-factor structures.

Model χ2(df),p = RMSEA CFI NNFI ΔCFI Δχ2(df),p =
1-factor 1375.249(1045), 0.000 .041 .812 .797
2-factor 1213.822(991), 0.000 .035 .873 .856 .61 159.470(54), 0.000
3-factor 1079.051(934), 0.001 .029 .917 .900 .44 142.193(57), 0.000
Bi-2factor# 1079.051(934), 0.001 .029 .917 .900 142.193(57), 0.000
4-factor 949.400(874), 0.038 .021 .957 .945 .40 139.142(60), 0.000
Bi-3factor# 949.400(874), 0.038 .021 .957 .945 139.142(60), 0.000*
5-factor 839.582(811), 0.236 .014 .984 .977 .27 124.888(63), 0.000

#Bi-2(3) factor model has same fitting indices as 3(4) factor model.

*Comparing with bi-2factor model.

Table 2. Modelling fitting indices for various models with equal loading across measurement time.

Model χ2(df),p = RMSEA CFI NNFI ΔCFI Δχ2(df),p =
1-factor 1372.823(1075), 0.000 .038 .831 .822
2-factor 1247.154(1047), 0.000 .032 .886 .877 111.539(28), 0.000
3-factor 1143.753(1012), 0.002 .026 .925 .916 .49 106.877(35), 0.000
Bi-2factor# 1143.753(1012), 0.002 .026 .925 .916 .49 106.877(35), 0.000
4-factor 1058.244(970), 0.025 .022 .950 .942 .25 90.859(42), 0.000
Bi-3factor# 1058.244(970), 0.025 .022 .950 .942 .25 90.859(42), 0.000*
5-factor 969.202(921), 0.131 .017 .973 .966 .17 103.371(49), 0.000

#Bi-2(3) factor model has same fitting indices as 3(4) factor model.

*Comparing with bi-2factor model.

Table 4. Fitting indices of ME/I across measurement time.

Model χ2(df),p = RMSEA CFI NNFI ΔCFI Δχ2(df),p =
Configural 1079.051(934), 0.001 .029 .917 .900
Scalar 1201.002(1053),0.001 .027 .916 .910 .001 147.674 (119), p = 0.038

Table 3. Factor loading of best fitted model.

Item Vegetative Worry General depression Retardation Agitation
depressed mood -0.072 0.440 0.342
guilt feeling -0.069 0.391 0.130
suicidal thoughts -0.006 0.381 0.260
insomnia initial 0.478 0.181 -0.018
insomnia middle 0.636 0.239 0.072
insomnia delayed 0.465 0.192 0.043
work & interests 0.111 0.380 0.322
motor retardation 0.097 0.054 0.601
Agitation -0.036 0.336 -0.366
psychic anxiety -0.302 0.546 -0.003
somatic anxiety -0.098 0.486 0.008
appetite decrease 0.281 0.399 -0.070
Tiredness 0.07 0.519 0.069
sexual interest -0.008 0.268 0.122
Hypochondriasis -0.259 0.328 -0.106
weight loss 0.386 0.352 -0.351

# estimate in bold statistically significant at p<0.05.

Stability of factor structure across measure time

The fitting indices of ME/I test models for configural and scalar invariance across measurement time are presented for comparison in Table 4, indicating that the scalar invariant model should be retained as the CFI drop is 0.001 with χ2 increase at 147.674 (df = 119), p = 0.038. These results evidence that the bi-2factor structure is stable through follow up from baseline to 6 and 12 months.

In view of this stable but complex bi-2factor structure, including negative item loadings on both domain specific factors, we conducted a further post-hoc analysis of the most commonly used HDRS subscale, the HDRS6 in the same cohort to investigate its potential as an alternative change measure to the full HDRS17 in moderate-severe PMDD [42]. The HDRS6 comprises 6 items: depressed mood, work and interests (pleasure), general somatic (tiredness), psychic anxiety, guilt feelings and psychomotor retardation; and since it was not plausible to perform an exploratory analysis testing a model with 1 to 3 factors on a 6-item scale, we instead used a one factor model to test its unidimensional factor structure. Results given in Tables 5 and 6 show that all 6 items of the HDRS6 subscale loaded positively and significantly, with time invariance; supporting this as a stable, unidimensional outcome measure in moderate-severe PMDD, in contrast to the 17-item scale.

Table 5. Factor loading for HDRS6 subscale.

Item HDRS6
Depressed Mood .544*
Work and Interests .526*
General Somatic (Tiredness) .474*
Psychic Anxiety .417*
Guilt Feelings .396*
Psychomotor retardation .407*

* all loading estimates statistically significant at p<0.01.

Table 6. Fitting indices of ME/I across measurement time, HDRS6 subscale.

Model χ2(df),p = RMSEA CFI NNFI ΔCFI Δχ2(df),p =
Configural 193.266(120),0.000 .057 .934 .913
Scalar a 459.218(165),0.000 .098 .736 .755 259.790(45), p = 0.000
Scalar b* 229.625(146),0.000 .055 .925 .921 -.009 44.883(26), p = 0.012

* scalar b model freed 24 of 55 (43%) threshold parameters estimates.

Discussion

In light of findings that the HDRS17 is not a unidimensional measure of depression [9, 14, 43, 44], that the factor structure may differ between clinical populations [9] and may not be stable over time, we aimed to assess the HDRS17 in a well-defined group of patients with moderate to severe PMDD, using contemporary ESEM modelling. Consistent with much of this earlier work, our results in moderate-severe PMDD showed that the HDRS17 had a bi-factor, rather than unidimensional structure. We additionally showed that this structure was time-invariant through the full 12-month period of study. The bi-factor structure comprised a general depression factor and two domain specific factors, which we refer to as ‘vegetative-worry’ and ‘retardation-agitation’. The bi-factor structure was further complicated by the two domain specific factors including both positively and negatively loading items, problematising use of the HDRS17 as an outcome measure in moderate to severe PMDD–even allowing for multiple domain scoring within a bi-factor structure, we are left with the problem of incorporating domain factor items with opposite directionality. This problem was previously encountered within development of the 6-item subscale (HDRS6) where agitation was excluded due to reciprocal interaction with the other items [45]; and opposite directionality cannot be surprising when applying the GRID-HDRS17 to severe PMDD, when severe retardation is described by as ‘all movements very slowed’ and severe agitation as ‘cannot sit still…pacing’ [5].

Given these findings on the complex multidimensional structure of the HDRS17 in moderate-severe PMDD and the associated question of its legitimacy as an outcome measure for this patient group, we ran a further post-hoc analysis of the most commonly used 6-item subscale to test its dimensionality and potential as an alternative measure of change to the 17-item scale [42]. The HDRS6 subscale was derived through item analysis of the HDRS17 against global assessment of depression by experienced psychiatrists and it has already demonstrated a unidimensional structure in some clinical populations [43, 45]. Our results confirm this unidimensionality in moderate-severe PMDD, additionally showing time-invariance over 12 months; supporting use of the HDRS6 as an appropriate outcome measure in this group. In contrast our findings on the HDRS17 do not support its use in this way.

What then for the 17-item scale? Firstly, it seems likely that this was initially conceived as a state measure, rather than a measure of change [2]. It’s more complex structure, including concepts now understood as near polar opposites (e.g. agitation and retardation as operationalised in the GRID-HDRS17) may still be more relevant to the assessment of settled clinical states, where the domain factors we have identified may further clarify depression type, acting as predictor variables to assist development of treatment strategies [45]. A patient loading high on worry (psychic anxiety, hypochondriasis) rather than vegetative disturbance (sleep, appetite, weight), may for example benefit from more targeted initial clinical interventions reflecting this delineated state rather than non-specific depression treatments [46]. The HDRS17 might then be repeated later on for this individual, not as a measure of change, but to re-conceptualise a later settled state (such as a limited but stable treatment response) in order to develop next-step treatment strategies–in this model outcome change would be assessed through more parsimonious, evidence-based item-sets, such as the HDRS6.

This approach seems in keeping with the initial history of the Hamilton scale. An awareness of the multidimensionality of the HDRS17 dates back 60 years to Hamilton’s original work, also based in observations on patients suffering severe depression within mental health treatment; identifying four hierarchical factors (“general”, “endogenous”, “anxious” and “insomnia”) that show parallels with the bi-factor model derived here; including a main “general depression” factor, a retarded-depressed factor, a broadly vegetative factor and a separate factor including psychic anxiety [2]. Subsequent use of the HDRS17 to report a single, total item score risks missing the potential richness and purpose of this scale; confirmed again by the ESEM structure presented here. Similarly, use of the HDRS17 to measure change seems both unintended and unsupported by the growing evidence base.

The strengths of our study include a well characterised sample; the systematic application of a standardised interview version of the HDRS17; the multicentre design; and assessment over three-time intervals across 12 months with adequate retention. The systematic application of both psychiatric and psychological treatment over this time period in one group versus usual care provided both a test of the robustness of the factor structure of the HDRS17 and data from a broad group within UK mental health service care. Analysis included the first use of the most advanced ESEM modelling which allows cross factor loading and bi-factor modelling to simultaneously explore the overall latent factor and specific sub-factors for PMDD patients HDRS17 measures, incorporating the ME/I test of invariance [22, 23, 40].

Our findings on the HDRS17 and HDRS6 are however limited to a single UK cohort of patients with moderate-severe PMDD. Given previous findings that factor structure may change with clinical characteristics of depression, such as severity [9], our findings do not presume that the same structure holds for other populations with less persistent, complex or severe depression. This caution fits with recognised features of PMDD, such as rumination/worry [47] and high comorbidity (e.g. 78.1% of the current cohort had a separate anxiety disorder), which may not be present in less severe, less persistent depression. Equally, psychomotor disturbance (through agitation or retardation) identified within our PMDD cohort may be much less prevalent in patients recruited from primary care or other general medical settings [1, 17]. It seems quite plausible in this regard that a different factor structure may emerge in these different clinical groups and whilst our preliminary findings in PMDD remain important, they cannot be assumed to generalise. Rather, important differences between clinical groups may be reflected in real changes to the underlying factor structure of the measurement tool, accounting for some observed differences between this and earlier studies conducted in predominately non-persistent depression [14]. Other important limitations include: the lack of a specific power calculation for the purposes of the current analysis [18, 33], though its size was sufficient to perform factor analysis modelling based in previous work on the methodology used here [48]; and the 40 per cent attrition over 12 months follow up, though again this left a sufficient sample for invariance analysis.

Data from the current study could be meta-analysed in future with other studies with similar designs and analysis methods to provide more robust results on the HDRS17 factor structure in patients with moderate-severe PMDD.

Conclusions

These preliminary findings in patients with moderate-severe PMDD indicate the HDRS17 has a bi-factorial structure characterised by a general depression factor with two additional factors, ‘vegetative-worry’ and ‘retardation-agitation’. This conceptual structure was found to be relatively stable across a 12-month follow up period but negative item loading on the HDRS17 domain specific factors does not support its use as an outcome measure in this clinical population. Instead, the HDRS17 may be more appropriate in the multidimensional assessment of settled clinical states, helping to guide targeted interventions; with shorter unidimensional subscales such as the HDRS6 used as measures of change.

Supporting information

S1 Data

(CSV)

S2 Data

(DOCX)

Acknowledgments

We wish to acknowledge the contribution of all the staff who obtained the data and participants in the original randomised controlled trial from which this analysis was conducted.

Data Availability

All relevant data are within the paper and its Supporting Information files

Funding Statement

The data collected for this report was funded by 2 centre grants from the National Institute for Health Research (NIHR) Collaboration for Leadership in Applied Research and Care Nottinghamshire, Derbyshire and Lincolnshire (awarded to Manning N, Morriss R, Cooke M, Currie G, Hollis C, Kai J, Schneider J, Walker M), and the NIHR Collaboration for Leadership in Applied Research and Care East Midlands (awarded to Khunti K, Morriss R, Singh S, Gladman J, Waring J. Since these are centre grants they do not have grant numbers but are identified by the name of the grant. The grants were awarded by the National Institute for Health Research whose URL is https://www.nihr.ac.uk The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Rush AJ. STAR*D: what have we learned? Am J Psychiatry. 2007;164:201–4. 10.1176/ajp.2007.164.2.201 [DOI] [PubMed] [Google Scholar]
  • 2.Hamilton M. A rating scale for depression. Journal of Neurology, Neurosurgery & Psychiatry. 1960;23(1):56–62. 10.1136/jnnp.23.1.56 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Salagre E, Fernandes BS, Dodd S, Brownstein DJ, Berk M. Statins for the treatment of depression: A meta-analysis of randomized, double-blind, placebo-controlled trials. Journal of Affective Disorders. 2016;200(Supplement C):235–42. 10.1016/j.jad.2016.04.047. [DOI] [PubMed] [Google Scholar]
  • 4.Zimmerman M, Posternak MA, Chelminski I. Is It time to replace the Hamilton Depression Rating Scale as the primary outcome measure in treatment studies of depression? Journal of Clinical Psychopharmacology. 2005;25(2):105–10. 10.1097/01.jcp.0000155824.59585.46 00004714-200504000-00001. [DOI] [PubMed] [Google Scholar]
  • 5.Williams JBW, Kobak KA, Bech P, Engelhardt N, Evans K, Lipsitz J, et al. The GRID-HAMD: standardization of the Hamilton Depression Rating Scale. International Clinical Psychopharmacology. 2008;23(3):120–9. 10.1097/YIC.0b013e3282f948f5 00004850-200805000-00002. [DOI] [PubMed] [Google Scholar]
  • 6.Vaccarino AL, Evans KR, Kalali AH, Kennedy SH, Engelhardt N, Frey BN, et al. The Depression Inventory Development Workgroup: A Collaborative, Empirically Driven Initiative to Develop a New Assessment Tool for Major Depressive Disorder. Innovations in Clinical Neuroscience. 2016;13(9–10):20–31. PMC5141593. [PMC free article] [PubMed] [Google Scholar]
  • 7.Guidance for industry: Patient-reported outcome measures: use in medical product development to support labeling claims, (2009). [DOI] [PMC free article] [PubMed]
  • 8.NICE. National Clinical Practice Guideline 90: The Treatment and Management of Depression in Adults (updated version). In: Excellence NIfHC, editor. The Royal College of Psychiatrists, London E1 8AA: The British Psychological Society & The Royal College of Psychiatrists; 2010.
  • 9.Fried EI, van Borkulo CD, Epskamp S, Schoevers RA, Tuerlinckx F, Borsboom D. Measuring depression over time… Or not? Lack of unidimensionality and longitudinal measurement invariance in four common rating scales of depression. Psychological Assessment. 2016;28(11):1354–67. 10.1037/pas0000275 [DOI] [PubMed] [Google Scholar]
  • 10.Bech P, Csillag C, Hellström L, Fleck MPdA. The time has come to stop rotations for the identification of structures in the Hamilton Depression Scale (HAM-D17). Revista Brasileira de Psiquiatria. 2013;35:360–3. 10.1590/1516-4446-2013-1116 [DOI] [PubMed] [Google Scholar]
  • 11.Olden M, Rosenfeld B, Pessin H, Breitbart W. Measuring depression at the end of life. Assessment. 2009;16(1):43–54. 10.1177/1073191108320415 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Moritz S, Meier B, Hand I, Schick M, Jahn H. Dimensional structure of the Hamilton Depression Rating Scale in patients with obsessive–compulsive disorder. Psychiatry Research. 2004;125(2):171–80. 10.1016/j.psychres.2003.11.003 [DOI] [PubMed] [Google Scholar]
  • 13.Cole JC, Motivala SJ, Dang J, Lucko A, Lang N, Levin MJ, et al. Structural validation of the Hamilton Depression Rating Scale. Journal of Psychopathology and Behavioral Assessment. 2004;26(4):241–54. 10.1023/b:joba.0000045340.38371.04 [DOI] [Google Scholar]
  • 14.Dunlop BW, Cole SP, Nemeroff CB, Mayberg HS, Craighead WE. Differential change on depressive symptom factors with antidepressant medication and cognitive behavior therapy for major depressive disorder. Journal of affective disorders. 2018;229:111–9. Epub 2017/12/27. 10.1016/j.jad.2017.12.035 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Guo B, Kaylor-Hughes C, Garland A, Nixon N, Sweeney T, Simpson S, et al. Factor structure and longitudinal measurement invariance of PHQ-9 for specialist mental health care patients with persistent major depressive disorder: Exploratory Structural Equation Modelling. Journal of affective disorders. 2017;219:1–8. 10.1016/j.jad.2017.05.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Rush AJ, Wisniewski SR, Zisook S, Fava M, Sung SC, Haley CL, et al. Is prior course of illness relevant to acute or longer-term outcomes in depressed out-patients? A STAR*D report. Psychological Medicine. 2012;.42(6):1131–49. 10.1017/S0033291711002170 2012-11782-002. [DOI] [PubMed] [Google Scholar]
  • 17.Bennabi D, Aouizerate B, El-Hage W, Doumy O, Moliere F, Courtet P, et al. Risk factors for treatment resistance in unipolar depression: A systematic review. Journal of Affective Disorders. 2015;171(0):137–41. 10.1016/j.jad.2014.09.020 [DOI] [PubMed] [Google Scholar]
  • 18.Morriss R, Garland A, Nixon N, Guo B, James M, Kaylor-Hughes C, et al. Efficacy and cost-effectiveness of a specialist depression service versus usual specialist mental health care to manage persistent depression: a randomised controlled trial. The Lancet Psychiatry. 2016;3(9):821–31. 10.1016/S2215-0366(16)30143-2 [DOI] [PubMed] [Google Scholar]
  • 19.Arens AK, Morin AJS. Improved representation of the self-perception profile for children through bifactor Exploratory Structural Equation Modeling. American Educational Research Journal. 2017;54(1):59–87. 10.3102/0002831216666490 [DOI] [Google Scholar]
  • 20.Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: the validity of a brief depression severity measure. Journal of General Internal Medicine. 2001;16:606–13. 10.1046/j.1525-1497.2001.016009606.x 10.1046/j.1525-1497.2001.016009606.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Asparouhov T, Muthén B. Exploratory Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal. 2009;16(3):397–438. 10.1080/10705510903008204 [DOI] [Google Scholar]
  • 22.Marsh HW, Morin AJS, Parker PD, Kaur G. Exploratory Structural Equation Modeling: An integration of the best features of exploratory and confirmatory factor analysis. Annual Review of Clinical Psychology. 2014;10(1):85–110. 10.1146/annurev-clinpsy-032813-153700 . [DOI] [PubMed] [Google Scholar]
  • 23.Morin AJS, Arens AK, Marsh HW. A Bifactor Exploratory Structural Equation Modeling framework for the identification of distinct sources of construct-relevant psychometric multidimensionality. Structural Equation Modeling: A Multidisciplinary Journal. 2016;23(1):116–39. 10.1080/10705511.2014.961800 [DOI] [Google Scholar]
  • 24.Guay F, Morin AJS, Litalien D, Valois P, Vallerand RJ. Application of Exploratory Structural Equation Modeling to evaluate the Academic Motivation Scale. The Journal of Experimental Education. 2015;83(1):51–82. 10.1080/00220973.2013.876231 [DOI] [Google Scholar]
  • 25.Morin AJS, Arens AK, Tran A, Caci H. Exploring sources of construct-relevant multidimensionality in psychiatric measurement: A tutorial and illustration using the Composite Scale of Morningness. International Journal of Methods in Psychiatric Research. 2016;25(4):277–88. 10.1002/mpr.1485 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chen FF, West SG, Sousa K. A comparison of bifactor and second-order models of quality of life. Multivariate Behavioral Research. 2006;41:189–255. 10.1207/s15327906mbr4102_5 [DOI] [PubMed] [Google Scholar]
  • 27.Reise SP, Moore TM, Haviland MG. Bifactor models and rotations: Exploring the extent to which multidimensional data yield univocal scale scores. Journal of Personality Assessment. 2010;92(6):544–59. 10.1080/00223891.2010.496477 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Patrick CJ, Hicks BM, Nichol PE, Krueger RF. A Bi-factor approach to modelling the structure of the psychopathy checklist-revised. Journal of Personality Disorders. 2007;21(2):118–41. 10.1521/pedi.2007.21.2.118 PMC2242629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Simms LJ, Grös DF, Watson D, O'Hara MW. Parsing the general and specific components of depression and anxiety with bifactor modeling. Depression and Anxiety. 2008;25(7):E34–E46. 10.1002/da.20432 [DOI] [PubMed] [Google Scholar]
  • 30.NICE. Depression in adults: treatment and management (update). In: NICE, editor. online: National Guideline Alliance; 2020.
  • 31.First MB, Gibbon M. User's Guide for the Structured Clinical Interview for DSM-IV Axis II Personality Disorders: SCID-II American Psychiatric Pub; 1997. [Google Scholar]
  • 32.American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. 4th ed Washington, D.C.: American Psychiatric Association; 1994. [Google Scholar]
  • 33.Morriss R, Marttunnen S, Garland A, Nixon N, McDonald R, Sweeney T, et al. Randomised controlled trial of the clinical and cost effectiveness of a specialist team for managing refractory unipolar depressive disorder. BMC Psychiatry. 2010;10(1):100 10.1186/1471-244X-10-100 10.1186/1471-244X-10-100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Vandenberg RJ, Lance CE. A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research. Organizational Research Methods. 2000;3:4–70. 10.1177/109442810031002 10.1177/109442810031002 [DOI] [Google Scholar]
  • 35.Graham JW. Adding missing-data relevant variables to FIML-based structural equation models. Structural Equation Modeling: A Multidisciplinary Journal. 2003;10:80–100. 10.1207/S15328007SEM1001_4 [DOI] [Google Scholar]
  • 36.Enders CK, Bandalos DL. The relative performance of Full Information Maximum Likelihood estimation for missing data in structural equation models. Structural Equation Modeling: A Multidisciplinary Journal. 2001;8(3):430–57. 10.1207/S15328007SEM0803_5 [DOI] [Google Scholar]
  • 37.Muthén BO, Muthén LK. Mplus User's Guide. Los Angeles, CA 90066: Muthén & Muthén; 2017. 2017. [Google Scholar]
  • 38.Wen Z, Hau KT, Marsh HW. Structural equation model testing: Cutoff criteria for goodness of fit indices and chi-square test. Acta Psychologica Sinica. 2004;36(2):186–94. [Google Scholar]
  • 39.Marsh HW, Lüdtke O, Muthén B, Asparouhov T, Morin AJS, Trautwein U, et al. A new look at the big five factor structure through exploratory structural equation modeling. Psychological Assessment. 2010;22(3):471–91. 10.1037/a0019227 [DOI] [PubMed] [Google Scholar]
  • 40.Marsh HW, Muthén B, Asparouhov T, Lüdtke O, Robitzsch A, Morin AJS, et al. Exploratory Structural Equation Modeling, integrating CFA and EFA: Application to students' evaluations of university teaching. Structural Equation Modeling: A Multidisciplinary Journal. 2009;16(3):439–76. 10.1080/10705510903008220 [DOI] [Google Scholar]
  • 41.Cheung GW, Rensvold RB. Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal. 2002;9:233–55. 10.1207/S15328007SEM0902_5 [DOI] [Google Scholar]
  • 42.Bech P, Allerup P, Gram LF, Reisby N, Rosenberg R, Jacobsen O, et al. The Hamilton Depression Scale: Evaluation of objectivity using logistic models. Acta Psychiatrica Scandinavica. 1981;63(3):290–9. 10.1111/j.1600-0447.1981.tb00676.x [DOI] [PubMed] [Google Scholar]
  • 43.Licht RW, Qvitzau S, Allerup P, Bech P. Validation of the Bech-Rafaelsen Melancholia Scale and the Hamilton Depression Scale in patients with major depression: is the total score a valid measure of illness severity? Acta Psychiatrica Scandinavica. 2005;111(2):144–9. 10.1111/j.1600-0447.2004.00440.x . [DOI] [PubMed] [Google Scholar]
  • 44.Gibbons RD, Clark DC, Kupfer DJ. Exactly what does the Hamilton Depression Rating Scale measure? Journal of Psychiatric Research. 1993;27(3):259–73. Epub 1993/07/01. 10.1016/0022-3956(93)90037-3 . [DOI] [PubMed] [Google Scholar]
  • 45.Bech P. Rating scales in depression: Limitations and pitfalls. Dialogues in clinical neuroscience. 2006;8(2):207–15. . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Coplan JD, Aaronson CJ, Panthangi V, Kim Y. Treating comorbid anxiety and depression: Psychosocial and pharmacological approaches. World journal of psychiatry. 2015;5(4):366–78. 10.5498/wjp.v5.i4.366 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Wiersma JE, van Oppen P, van Schaik DJ, van der Does AJ, Beekman AT, Penninx BW. Psychological characteristics of chronic depression: a longitudinal cohort study. J Clin Psychiatry. 2011;72(3):288–94. 10.4088/JCP.09m05735blu . [DOI] [PubMed] [Google Scholar]
  • 48.Mundfrom DJ, Shaw DG, Ke TL. Minimum sample size recommendations for conducting factor analyses. International Journal of Testing. 2005;5(2):159–68. 10.1207/s15327574ijt0502_4 [DOI] [Google Scholar]

Decision Letter 0

Thach Duc Tran

5 May 2020

PONE-D-20-01015

The bi-factor structure of the 17-item Hamilton Depression Rating Scale with persistent major depression disorder in specialist mental health services.

PLOS ONE

Dear Prof Morriss,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by Jun 19 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Thach Duc Tran, M.Sc., Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I welcome the initiative of the authors to explore psychometric properties of the HDRS17 as, despite generally recognized shortcomings of this scale, it is the gold standard for measuring depression. The manuscript is well written and I agree with all of the strengths outlined on lines 295-304. As a manuscript about the dimensionality of the HDRS17 I have little to criticize. However, it is a grave scientific weakness of the paper that it makes wide claims about valid measurement, which implies much more than dimensionality. As this issue is so pivotal in this paper, and as it is bound to mislead the readership of the journal on a very central issue to psychometrics, I will focus on this issue in the following.

Through the paper, one finds an apparent conflation between the summary scale score — i.e. they official, conventional and very simple way of deriving a single total score for the whole HDRS17 — and the highly elaborate, multidimensional person estimates derived from the structural model. We see a hint of this in the introduction, where the authors express concern that the total score of the HDRS17 may not measure the severity of a general depression factor. Yet they show no interest in testing a unidimensional measure. In the discussion (line 281), the authors state that "Such consistency in the factor structure of the HDRS17 [across two samples] would suggest that the HDRS17 is a valid measure of the general severity of depression in such settings". Here the authors again disregard the fact that the HDRS17 is conventionally scored based on the assumption of unidimensionality. However, the most manifest collision between the two confused concepts — the HDRS as a summary scale and the HDRS as it is modelled in ESEM — is found in the conclusions, line 319: "The total score of items on this measure [the HDRS17] are valid at each time point and over time". This is fundamentally wrong; a total score cannot be valid if it reflects more than one dimension. Also, the authors use a model without tau-equivalence, so their claim does not apply to the raw total score of the HDRS17, placing their validity claims far afield from the actual use of the HDRS17. Additionally, a valid total score builds on the assumptions of local independence and the absence of differential item functioning.

If the authors wish to say anything about the validity of the raw total score of the HDRS-17, they need to test (or at least evaluate) all the above assumptions. If, on the other hand, the authors wish to revolutionize the way the HDRS-17 is used, they need to explain to the reader how their model can be used for the purpose of valid measurement (i.e. entering every item score into a computer program which will then return three measures for each individual) and discuss benefits and limitations of this use over alternative uses (e.g. the conventional HDRS-17 summary score as well as shortened subscales with the potential for a statistically sufficient raw score, such as the HamiltonD6/melancholia subscale). Alternatively, if the authors are only concerned with the dimensionality of the scale, they should clearly state this as their aim and they should clarify to the reader that their results do not support most of the assumptions underlying the current use of the HDRS17.

Reviewer #2: 1) The authors identified individuals recruited in specialist mental health setting. It would be beneficial in understanding this population. Prevalence and severity of depression would help to identify why other measures were not chooses such as the BDI-II or the PHQ-9.

2) The authors mention inconsistencies of the HDRS17 factor structures even when compared to psychiatric interviews. The study’s methodology consisted of the Structured Clinical Interview (SCID) for DSM-IV. Analytical techniques using the SCID and the HDRS could have showed some consistency and validity of the data in terms of construct validity and reliability of the measure. There feels to be some missing processes known to be influential on the area of reported research. The study would benefit from analyses of reliability and validity across time stamps.

3) The authors evidenced fairly steady drops in the level of severity of depression. It would be beneficial to understand if treatment setting was moderating or mediating these values. Treatment with combination of CBT and psychopharmacological agent have shown the greatest reduction in depression severity. The authors study showed greater reduction of retardation-agitation factors score changes greater than worry-insomnia-factor. An elaboration of these results would be beneficial especially since factor loading of retardation-agitation are significant for depressed mood, suicidal thoughts, and agitations. It would have also been helpful if the results were comparable to other depression measures such as the PHQ-9 and BDI-II, and gain a better understanding of measurement precision and score comparability. The study in limited to the HDRS17 and it is difficult to really assess and control that the HDRS is capturing the whole spectrum of depression and it would be beneficial to apply some classical item response analysis.

4) Based on the above response and the novelty of the research study and inconsistencies related to the HDRS. The title of this study would be best fitted as a preliminary study of results.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Erik Vindbjerg

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: HDRS17.SL.docx

PLoS One. 2020 Oct 26;15(10):e0241370. doi: 10.1371/journal.pone.0241370.r002

Author response to Decision Letter 0


18 Jun 2020

Response to PLOS ONE reviewers.

Reviewer comments on normal font and authors’ replies in bold and italics.

Reviewer #1: I welcome the initiative of the authors to explore psychometric properties of the HDRS17 as, despite generally recognized shortcomings of this scale, it is the gold standard for measuring depression. The manuscript is well written and I agree with all of the strengths outlined on lines 295-304. As a manuscript about the dimensionality of the HDRS17 I have little to criticize. However, it is a grave scientific weakness of the paper that it makes wide claims about valid measurement, which implies much more than dimensionality. As this issue is so pivotal in this paper, and as it is bound to mislead the readership of the journal on a very central issue to psychometrics, I will focus on this issue in the following.

Through the paper, one finds an apparent conflation between the summary scale score — i.e. they official, conventional and very simple way of deriving a single total score for the whole HDRS17 — and the highly elaborate, multidimensional person estimates derived from the structural model. We see a hint of this in the introduction, where the authors express concern that the total score of the HDRS17 may not measure the severity of a general depression factor. Yet they show no interest in testing a unidimensional measure. In the discussion (line 281), the authors state that "Such consistency in the factor structure of the HDRS17 [across two samples] would suggest that the HDRS17 is a valid measure of the general severity of depression in such settings". Here the authors again disregard the fact that the HDRS17 is conventionally scored based on the assumption of unidimensionality. However, the most manifest collision between the two confused concepts — the HDRS as a summary scale and the HDRS as it is modelled in ESEM — is found in the conclusions, line 319: "The total score of items on this measure [the HDRS17] are valid at each time point and over time". This is fundamentally wrong; a total score cannot be valid if it reflects more than one dimension. Also, the authors use a model without tau-equivalence, so their claim does not apply to the raw total score of the HDRS17, placing their validity claims far afield from the actual use of the HDRS17. Additionally, a valid total score builds on the assumptions of local independence and the absence of differential item functioning.

If the authors wish to say anything about the validity of the raw total score of the HDRS-17, they need to test (or at least evaluate) all the above assumptions. If, on the other hand, the authors wish to revolutionize the way the HDRS-17 is used, they need to explain to the reader how their model can be used for the purpose of valid measurement (i.e. entering every item score into a computer program which will then return three measures for each individual) and discuss benefits and limitations of this use over alternative uses (e.g. the conventional HDRS-17 summary score as well as shortened subscales with the potential for a statistically sufficient raw score, such as the HamiltonD6/melancholia subscale). Alternatively, if the authors are only concerned with the dimensionality of the scale, they should clearly state this as their aim and they should clarify to the reader that their results do not support most of the assumptions underlying the current use of the HDRS17.

We are grateful for the comments of reviewer 1, which stimulated considerable further thought. Following this we have substantially rewritten the introduction (paragraphs 2-5 and 7-9 of introduction) and discussion ( paragraphs 1-6, 6 and conclusion) sections, focusing on the factor structure of the scale, as a multidimensional measure. In keeping with this we have removed the previous references to ‘validity’ of the total score and have also removed the other specific lines identified.

We have presented our results asa dimensional approach to the measurment of outcome in one UK cohort of patients with moderate-severe persistent major depression and made reference in the discussion to the importance of replication within other similar cohorts; and that results cannot be assumed to generalise to other less severe or persistent populations in other settings (such as primary care), referencing further work on this.

Given the need for replication of the factor structure of the HDRS-17, we thought it important to also explicitly state that findings reported on change over 12 months are exploratory and have updated Table 5 to give further detail. We discuss some of the problems involved in operationalising this multidimensional model and the importance of replicating the bi-factor structure as a necessary prelude to this. With all of this in mind, as Reviewer 1 intimates, we are not in a position to revolutionize current use of the HDRS-17. We have instead made more modest calls for further work towards replication of our findings in similarly defined clinical groups (more severe persistent major depression). We have also discussed reasons why we cannot assume the observed factor structure to hold in other less severe, less persistent and likely less complex groups.

Holding to the limitations above, we have also given some greater salience to comparisons with current 6-item scales (of Bech and Maier), including the different ways these scales were derived and some of the similarities in item content with the current model. Given the importance of replicating our current findings we have not though gone any further to suggest our bi-factor model is used in preference to these scales.

As a result of these comments, the paper has been substantially re-written, particularly through the introduction and discussion sections and the line numbers have therefore changed but all quotes referenced by Reviewer 1 have been removed or substantially amended. Table 5 in the results section has also been re-drawn in keeping with the early stage of findings here and the more exploratory approach to what change over 12 months represents within this model. We think the manuscript is improved by these changes and are grateful to reviewer 1.

Reviewer 2

Summary: The authors of this study explored the 17-item Hamilton Depression Rating Scale (HDRS-17) by examining the dimensionality of this measure with individuals diagnosed with moderately severe persistent major depressive disorder (PMDD). Results reported a bi-factor structure of two group factors (e.g., insomnia worry and retardation-agitation) that were stable over a 12-month period. The authors suggested that the HDRS-17 is a valid measure of depression for patients with moderately severe depression.

Below I summarize my comments and concerns:

1) The authors identified individuals recruited in specialist mental health setting. It would be beneficial in understanding this population. Prevalence and severity of depression would help to identify why other measures were not chooses such as the BDI-II or the PHQ-9.

We are grateful to Reviewer 2 and will answer this in parts:

1. All subjects were recruited from community mental health settings in the UK and with only half of these receiving additional collaborative care and the others continuing to receive general community mental health care. We have clarified this in the text ( method lines 121-123).

2. Inclusion criteria from the initial study stated depression of at least moderate clinical severity, operationalised as HDRS-17 of 16 or over. We chose a mixture of clinician-rated outcome measures (CROMs; HDRS-17) and patient-rated outcome measures (PROMs; QIDS, PHQ-9 and BDI-1) considering that whilst we might expect a moderate-strong correlation, they could not necessarily be relied on to measure the same construct. The study team was aware that the different types of measure (CROMs and PROMs) might hold different perspectives and different points of reference (e.g. a clinical range vs personal experience) and so we considered them complimentary rather than replicative measures, both relevant to the study of Persistent Major Depression. The degree of correlation between CROMs and PROMs in this population is an interesting question and we have answered in more detail below, under point 3, comparing our findings with recently published material, though we have chosen not to include this data in the current manuscript in order to maintain a clearer focus on dimensionality of the HDRS17. We have though clarified some further detail from the initial study in the revised text ( methods lines 125-136).

2) The authors mention inconsistencies of the HDRS17 factor structures even when compared to psychiatric interviews. The study’s methodology consisted of the Structured Clinical Interview (SCID) for DSM-IV. Analytical techniques using the SCID and the HDRS could have showed some consistency and validity of the data in terms of construct validity and reliability of the measure. There feels to be some missing processes known to be influential on the area of reported research. The study would benefit from analyses of reliability and validity across time stamps.

We have removed the reference to inconsistency with psychiatric interview, including any structured interview (e.g. SCID). We did not have strong enough data from the SCID to assess correlation but as shown in Table 1 (below), there was a moderate-strong correlation across the 3 time-points measured (Baseline, 6 and 12 months) between the HDRS-17 and three Patient Report Outcome Measures (PHQ-9, BDI, QIDS). The correlation was higher (within the moderate-strong range) for later timepoints, consistent with recent findings from Hershenberg et al. (2020) who showed greater correlation for patients under longer-term care (vs. patients at first contact). One might speculate on the reasons for greater agreement between clinician and patient outcome measures over time, such as developing trusting relationships, but the phenomenon seems consistent with other recent literature. Although we agree this is an important area, we would rather keep the current focus on factor structure and consider correlations in a later paper.

.

3) The authors evidenced fairly steady drops in the level of severity of depression. It would be beneficial to understand if treatment setting was moderating or mediating these values. Treatment with combination of CBT and psychopharmacological agent have shown the greatest reduction in depression severity. The authors study showed greater reduction of retardation-agitation factors score changes greater than worry-insomnia-factor. An elaboration of these results would be beneficial especially since factor loading of retardation-agitation are significant for depressed mood, suicidal thoughts, and agitations. It would have also been helpful if the results were comparable to other depression measures such as the PHQ-9 and BDI-II, and gain a better understanding of measurement precision and score comparability. The study in limited to the HDRS17 and it is difficult to really assess and control that the HDRS is capturing the whole spectrum of depression and it would be beneficial to apply some classical item response analysis.

We are grateful to Reviewer 2 for this comment and will answer in several parts to this, which are addressed separately below:

1. The specialist treatment (specialist CBT and psychopharmacology) was not dissociable from the treatment setting, indeed the integration of these approaches through ‘collaborative care’ was integral to the main study design. This intervention (collaborative care delivery of specialist CBT and psychopharmacology) was associated with measurable changes during treatment (not present at baseline) and we therefore consider treatment as a mediator (rather than moderator) of outcome. Although the suggestion of a mediator analysis is interesting, it is outside the scope of the current study, on HDRS-17 dimensionality and we are reluctant to change the focus of the current paper. However, we thank reviewer 2 for stimulating thought on this and will consider a future path analysis, assessing correlated change of transdiagnostic measures with the separate HDRS factors shown here.

2. We agree the greater reduction in the retardation-agitation factor is of relevance clinically and have added to both the discussion (lines 250-275) and conclusion sections to reflect this. We have given further detail on this change in factor score over 12 months (Table 5) but have discussed it as an exploratory analysis; with need to replicate the basic bi-factor structure and then work towards operationalizing findings.

3. We have performed a correlation analysis, presented below in Table 1, which shows significant and moderate-strong correlation between the observer rated HDRS-17 and all patient completed outcome measures used in this study (PHQ-9, BDI-1 and QIDS, at each time point). Correlation shown in this analysis was similar to that reported in another recent publication in a Treatment Resistant Depression population (Hershenberger et al, 2020), particularly for the closest comparison (at baseline comparison, before any clinical improvement).

Table 1

HDRS-17 Current Study HDRS-17

Baseline 6 months 12 months Reported*

BDI-1 0.56 0.75 0.78 0.65

QIDS-SR 0.55 0.69 0.76 0.57

PHQ-9 0.49 0.68 0.79 Not Available

*From Hershenberg et al, Journal of Affective Disorders, April 2020

4. Our study aimed to assess the dimensionality of the HDRS-17 measure with preliminary evidence of a bi-factor structure fitting the data well. We did not hypothesise that the HDRS-17 measures a single concept but conducted an exploratory analysis that identified a multidimensional construct, with specific factor scores indicating the severity of relevant aspects of depression. In the initial RCT, we did not hypothesise that the HDRS-17 would capture the whole spectrum of depression and included a number of scales therefore (PHQ-9, BDI and QIDS-SR) as complimentary, rather than strictly replicative measures that might give a fuller understanding, both from different perspectives (patient vs clinician) and slightly different concepts of ‘depression’ (for example including appetite gain, as well as loss in QIDS-SR). Although we accept the point made here, again we would rather keep the focus of the current paper on the important issue of the dimensionality of the HDRS-17.

4) Based on the above response and the novelty of the research study and inconsistencies related to the HDRS. The title of this study would be best fitted as a preliminary study of results.

Thank you for this recommendation, which we have accepted in full.

The full title now reads “The bi-factor structure of the 17-item Hamilton Depression Rating Scale in moderate to severe persistent major depression; dimensional measurement of outcome”

We think this better represents the early findings reported here that need replication and we’re grateful to Reviewer 2 for the stimulating comments, which have hopefully improved the manuscript.

Attachment

Submitted filename: Response to reviewers bi-factor HDRS-17.docx

Decision Letter 1

Thach Duc Tran

15 Jul 2020

PONE-D-20-01015R1

The bi-factor structure of the 17-item Hamilton Depression Rating Scale in persistent major depression; dimensional measurement of outcome.

PLOS ONE

Dear Dr. Morriss,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Aug 29 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Thach Duc Tran, M.Sc., Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have used the feedback to substantially rewrite parts of the introduction, discussion, and conclusion section, "focusing on the factor structure of the scale, as a multidimensional measure". In their response to my previous comments, they explain how they have now toned down the implications of their results, e.g. that they now refrain from referring to the validity of the total score, and they explicate that their results need replication before they can be generalized. In fact, the authors now say nothing about validity, and they focus on how the results should first be replicated before we look at the challenges of operationalising — and presumably validating — the model for measurement. Here, it would have been helpful if the authors had chosen more explicitly amount the three routes outlined in my previous review: (a) to deal with the validity of a raw total score, (b) to revolutionize the way we use the HDRS-17, and (c) to deal exclusively with the dimensionality. While the authors refrain from a, they still stray between path b and c.

The authors include a comparisson of their model with the six-item subscales defined by Bech and Maier, respectively. This is particularly welcome, as the Melancholia subscale has received support for construct validity, i.e. it appears that it can be utilized for measurement. This would be a good place to establish, how the authors consider the potential validity of their bi-factor model of the HDRS. The authors promote their model, contingent on replication, by writing that "This multidimensional reporting might provide a more accurate guide to future research and treatment in moderate-severe PMDD, including outcome monitoring". As such, they call for a fundamental change of the way the HDRS is used (what I refer to as "path b" above). This raises a host of questions about how the scale could be used and how it would need to be validated for such use. Strictly speaking, acknowledging the multidimensionality of the HDRS may allow us to retrieve more information from the responses, but when dealing with an essentially psychometric manuscript, we would not expect the authors to dismiss or postpone the issue of validity. As an example, when the authors allow negative loadings on two factors, we need to clarify whether person estimates for these traits can be used for outcome measurement.

A second point to make about the Melancholia subscale, is that it resembles the general depression factor of this study. The authors relate it to the retardating/agitation factor, while, in fact, five out the the six items of the Melancholia scale display their highest loading on the general depression factor. The authors need to address this discrepancy. It also puts them in a better position to compare the utility of the Melancholia subscale and their General depression factor used as a scale for outcome measurement.

As a separate, technical issue, I would like to ask the authors to clarify whether they included residual correlation parameters for the three imsomnia related items. Given the well-proven instability of the HDRS, and the authors' call for replication, it is important to have full transparency of the detection and modelling of any residual correlations, theorized ones as well as empirically discovered ones.

I will reserve a final comment for inspiration, which may guide the authors in their reflection of the issues pointed out above. Here, I will try and explicate my own understanding of the implications of suggesting the bifactor model for clinical use. First, the factor structure of the HDRS has prooven highly unstable, and we need to put a lot of faith into the bifactor model to believe this is now going to change. But we cannot deny that it is possible, so I've raised no issue with this. Second, if we want to establish construct validity, we would likely need to do an elaborate analysis based on multidimensional item response theory, which would allow for testing of differential item functioning. Again, we cannot deny that this is possible despite the highly elaborate model, so I have raised no issue with this. Then, for clinical utility, we would need to enter each response into software that would return a person estimate for each factor based on the psychometric model. I wonder if the authors are aware of this. Based on the loadings in Table 3, we would expect to see the highest reliability on the General depression estimate, and as this factor has no negative loadings, it can potentially be used for outcome measurement. Arguements can be made for and against this estimate, relative to the simpler and well supported estimate of the Melancholica subscale, or even the 10-item Bech and Raphaelsen scale, which cover additional ICD-10 items of Major Depression (including suicidal ideation). The remaining two subscales cannot be used for outcome measurement, but as the authors suggest, may be used for outcome preduction for different types of intervention. Reliability may be limited, so for patients within a potentially wide confidence interval no clear guidance will be given. Given this fragmentation of the scale into an outcome scale and two predictor scales, we may ask ourselves why we are so keen to stick with the particular pool of items included in the HDRS-17.

Reviewer #2: The authors have addressed concerns noted in first reviewer comments. I would have liked to see more in-depth methodology consisting of analytical techniques using the SCID, though the authors commented limited data. It would have been refreshing to see how associated or related the HDRS-17 would have been in addressing the criterion of persistent depression with other measures such as the PHQ-9, BDI-II, as this in my opinion would have added more clarity and depth regarding the dimensionality of the HDRS-17. However, authors were able to address my comments and concerns and I believe this is a publishable piece of work. I also appreciated a title change.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Erik Vindbjerg

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Oct 26;15(10):e0241370. doi: 10.1371/journal.pone.0241370.r004

Author response to Decision Letter 1


25 Aug 2020

Response to Reviewer 1

1. We were asked to deal with the potential validity of the bi-factor model in the context of the Melancholia (Bech, Maier) subscale - that this should not be postponed and the specific example of negative loadings needs to be addressed, including whether person estimates for these traits can be used for outcome measurement?

We re-considered the implications of the HDRS17 bi-factor structure shown in our analysis. We agree that this is not appropriate for use as an outcome measure in the moderate-severe PMDD group studied here, given that it is not a unidimensional measure and that domain factors include negatively loading items. In light of this and previous evidence on the HDRS6 (Bech), we conducted a further post-hoc analysis to assess the dimensionality of the HDRS6 subscale in this group. This analysis is given below and included as a post-hoc analysis in our updated results section. It shows a significant positive loading of all 6 items (0.396 – 0.544, p < 0.01) for this moderate-severe PMDD group, with time invariance. Considering this fuller evidence within the wider context advocated by Reviewer 1, we have revised our discussion, proposing the HDRS6 as an outcome measure in moderate-severe PMDD. The HDRS17 may retain a clearer role in psychiatric assessment where its more complex structure identifies types of PMDD, acting as a potential predictor of effective treatment strategies for settled states of depression (e.g. for agitated vs. retarded depression).

2. Address why we have related the melancholia subscale to the ‘retardation-agitation’ domain rather than the general factor, when 5 out of 6 items on the melancholia subscale (Bech) display highest loadings on the general factor?

The HDRS-6 comprises depressed mood, anhedonia (work and interests), guilt, fatigue, psychological anxiety and psychomotor retardation (Bech 1981); and as Reviewer 1 points out, 5 of these 6 items showed their highest loading on the general factor, rather than either of the domain factors. Prompted by this, we ran a further analysis investigating the structure of the HDRS6 in our PMDD cohort, specifically to investigate whether this showed a unidimensional structure and time-invariance. For this reason (and because it is not plausible to do exploratory analysis testing model with 1 to 3 factors on a 6-item scale), we focused on testing the unidimensional factor structure only, i.e. we used a one factor model with results given below:

Item HDRS6

Depressed Mood .544*

Work and Interests .526*

General Somatic (Tiredness) .474*

Psychic Anxiety .417*

Gulit Feelings .396*

Psychomotor retardation .407*

* all loading estimates are statistically significant at p<0.01

The HDRS6 also showed stability over time:

Table Bech scale Fitting indices of ME/I across measurement time

Model �2(df),p= RMSEA CFI NNFI ΔCFI Δ�2(df),p=

Configural 193.266(120),0.000 .057 .934 .913

Scalar a 459.218(165),0.000 .098 .736 .755 259.790(45), p=0.000

Scalar b 229.625(146),0.000 .055 .925 .921 -.009 44.883(26), p=0.012

Note: scalar b model freed 24 of 55 (43%)threshold parameters

The unidimensionality, positive item loading and stability of the HDRS6 indicate it may be used as an outcome measure in moderate-severe PMDD. Additionally it is more parsimonious than the HDRS17, therefore quicker to administer and much more likely to be of clinical use as an outcome measure. We’re grateful for Reviewer 1 pointing us in this direction and have changed our discussion to indicate the potential use of the HDRS17 within assessment, with its more complex structure helping to predict effective treatment interventions; with evidence-based item-sets such as the HDRS6 used for outcome measurement. This approach seems consistent with Hamilton’s (1960) own assessment of the multidimensionality of the HDRS17 and its use in measuring settled states, rather than treatment change.

3. Did we include residual correlation parameters for the three insomnia related items?

No, as correlating residuals is generally not allowed when running SEM models and would only have been justifiable for repeated items or MTMM items when Items were analysed as a different set of factors.

4. Given the instability of the HDRS factor structure, will the bi-factor model change this?

No, we agree this won’t save the HDRS17 as a measure of outcome change. The bi-factor structure may remain useful to clinicians and researchers assessing settled states of depression (rather than assessing change in state) and we have added to the discussion of this.

5. Only the General factor can be used as an outcome measurement, with the subscales (containing neg loadings) only as predictor factors (e.g. for different types of interventions being more helpful) – given this why even stick with the particular set of questions in the HDRS-17?

We have carefully considered this. Given our results on the factor structure of the HDRS-17 in this well-defined clinical population (moderate-severe PMDD), we consider a more evidence-based position would be to recommend use item sets such as the HDRS6 in outcome measurement, with the HDRS17 having a more restricted use, within assessment of settled states (where the more complex structure may be an advantage in developing treatment strategies). We are very grateful for the helpful comments of Reviewer 1 in reconceptualising these results.

Attachment

Submitted filename: Response to reviewers bi-factor HDRS-17.docx

Decision Letter 2

Thach Duc Tran

8 Sep 2020

PONE-D-20-01015R2

The bi-factor structure of the 17-item Hamilton Depression Rating Scale in persistent major depression; dimensional measurement of outcome.

PLOS ONE

Dear Dr. Morriss,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Oct 23 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Thach Duc Tran, M.Sc., Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I commend the authors on a well executed revision to their manuscript. My only additional request, is that the authors add a brief clarification in their manuscript, that they did not find any substantial residual correlations in the bi-factor model, as this would violate a basic assumption of the model. Also, optionally, I may recommend the authors to state that their results for the HDRS-6 do not establish if the total score features statistical sufficiency, and as such individual items may need to be weighed differently (e.g. higher weight to depressed mood, lower weight to psychomotor retardation, etc.). This is simply to underline that a scale is not simply valid or not, nor simply valid with a perticular population or not, but that validity also depends on the use case, e.g. whether a raw total score is used or if individual item responses are loaded differently in the scoring, in accordance with their loading in the CFA.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Erik Vindbjerg

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Oct 26;15(10):e0241370. doi: 10.1371/journal.pone.0241370.r006

Author response to Decision Letter 2


12 Oct 2020

Response to reviewers.

We are grateful for the further comments, which are addressed specifically below. Additionally, we have included the data requested.

Review Comments to the Author

6. “I commend the authors on a well executed revision to their manuscript. My only additional request, is that the authors add a brief clarification in their manuscript, that they did not find any substantial residual correlations in the bi-factor model, as this would violate a basic assumption of the model”

We thank Reviewer 1 for the further comments. It is not a recommended convention to correlate error terms when preforming CFA and SEM [1-3] and we therefore set the correlation between item residuals as 0, as per Mplus default setting for ESEM. For complete clarity we have added to lines 174/5 in the text: All ESEM models were performed using software Mplus 8 and in keeping with standard practice correlation between item residuals was set as 0.

“Also, optionally, I may recommend the authors to state that their results for the HDRS-6 do not establish if the total score features statistical sufficiency”

We discussed this option within the team and whilst tempting, we concluded that additional comment on a post-hoc analysis would take the discussion too far from the central findings. However, in writing this we agree that future work with a main focus on statistical sufficiency (or indeed alternative item inclusion) is warranted.

1. McDonald R, Ho M-H. Principles and practice in reporting Structural Equation Analyses. Psychological Methods. 2002;7:64-82. doi: 10.1037/1082-989X.7.1.64.

2. Schreiber JB, Nora A, Stage FK, Barlow EA, King J. Reporting Structural Equation Modeling and Confirmatory Factor Analysis results: A review. The Journal of Educational Research. 2006;99(6):323-38. doi: 10.3200/JOER.99.6.323-338.

3. Hermida R. The problem of allowing correlated errors in Structural Equation Modeling: Concerns and considerations. Computational Methods in Social Sciences. 2015;3:1-17.

Attachment

Submitted filename: Response to reviewers bi-factor HDRS-17Oct2020.docx

Decision Letter 3

Thach Duc Tran

14 Oct 2020

The bi-factor structure of the 17-item Hamilton Depression Rating Scale in persistent major depression; dimensional measurement of outcome.

PONE-D-20-01015R3

Dear Dr. Morriss,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Thach Duc Tran, M.Sc., Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Thach Duc Tran

16 Oct 2020

PONE-D-20-01015R3

The bi-factor structure of the 17-item Hamilton Depression Rating Scale in persistent major depression; dimensional measurement of outcome.

Dear Dr. Morriss:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Thach Duc Tran

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Data

    (CSV)

    S2 Data

    (DOCX)

    Attachment

    Submitted filename: HDRS17.SL.docx

    Attachment

    Submitted filename: Response to reviewers bi-factor HDRS-17.docx

    Attachment

    Submitted filename: Response to reviewers bi-factor HDRS-17.docx

    Attachment

    Submitted filename: Response to reviewers bi-factor HDRS-17Oct2020.docx

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information files


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES