Skip to main content
PLOS One logoLink to PLOS One
. 2021 Jul 1;16(7):e0253899. doi: 10.1371/journal.pone.0253899

Accuracy of the Geriatric Depression Scale (GDS)-4 and GDS-5 for the screening of depression among older adults: A systematic review and meta-analysis

Ana Brañez-Condorena 1,2, David R Soriano-Moreno 3, Alba Navarro-Flores 4, Blanca Solis-Chimoy 1,2, Mario E Diaz-Barrera 5,6, Alvaro Taype-Rondan 7,*
Editor: Eleanor Ochodo8
PMCID: PMC8248624  PMID: 34197527

Abstract

Background

The Geriatric Depression Scale (GDS) is a widely used instrument to assess depression in older adults. The short GDS versions that have four (GDS-4) and five items (GDS-5) represent alternatives for depression screening in limited-resource settings. However, their accuracy remains uncertain.

Objective

To assess the accuracy of the GDS-4 and GDS-5 versions for depression screening in older adults.

Methods

Until May 2020, we systematically searched PubMed, PsycINFO, Scopus, and Google Scholar; for studies that have assessed the sensitivity and specificity of GDS-4 and GDS-5 for depression screening in older adults. We conducted meta-analyses of the sensitivity and specificity of those studies that used the Diagnostic and Statistical Manual of Mental Disorders (DSM) or the International Classification of Diseases-10 (ICD-10) as reference standard. Study quality was assessed with the QUADAS-2 tool. We performed bivariate random-effects meta-analyses to calculate the pooled sensitivity and specificity with their 95% confidence intervals (95% CI) at each reported common cut-off. For the overall meta-analyses, we evaluated each GDS-4 version or GDS-5 version separately by each cut-off, and for investigations of heterogeneity, we assessed altogether across similar GDS versions by each cut-off. Also, we assessed the certainty of evidence using the GRADE methodology.

Results

Twenty-three studies were included and meta-analyzed, assessing eleven different GDS versions. The number of participants included was 5048. When including all versions together, at a cut-off 2, GDS-4 had a pooled sensitivity of 0.77 (95% CI: 0.70–0.82) and a pooled specificity of 0.75 (0.68–0.81); while GDS-5 had a pooled sensitivity of 0.85 (0.80–0.90) and a pooled specificity of 0.75 (0.69–0.81). We found results for more than one GDS-4 version at cut-off points 1, 2, and 3; and for more than one GDS-5 version at cut-off points 1, 2, 3, and 4. Mostly, significant subgroup differences at different test thresholds across versions were found. The accuracy of the different GDS-4 and GDS-5 versions showed a high heterogeneity. There was high risk of bias in the index test domain. Also, the certainty of the evidence was low or very low for most of the GDS versions.

Conclusions

We found several GDS-4 and GDS-5 versions that showed great heterogeneity in estimates of sensitivity and specificity, mostly with a low or very low certainty of the evidence. Altogether, our results indicate the need for more well-designed studies that compare different GDS versions.

Introduction

Depression is a major global public health issue [1]. Older adults represent a vulnerable group, likely due to aging-related factors, such as loss of skills and decreased functional activity [2]. It is estimated that around 10% to 20% of older adults worldwide live with depression [3]. This condition increases the risk of suicide [4], the risk of comorbidities’ complications [5], the use of health services and care costs, and overall mortality [4,6]. Hence, it represents a source of high burdening, not only for patients but for healthcare systems.

In older adults, depression´s somatic symptoms are similar to other chronic health conditions [7], and mood changes are less prevalent and commonly replaced by physical discomfort [8,9], resulting in challenging diagnosis and subsequent delay of treatment access. Thus, some structured depression screening scales that focus on elderly population have been developed [10]. There are several scales for screening for depression among older adults, such as the Geriatric Depression Scale (GDS) [11], the Center for Epidemiologic Studies Depression Scale (CES-D), and others. However, the GDS is one of the most used to identify depression among older adults. Among the strengths of the GDS, its use may be easier in people with cognitive impairment because of the simple yes-no format, and it can be used in hospital and community settings [11].

Its full version contains 30 questions (GDS-30) and requires substantial time for assessment. Therefore, shorter GDS versions, selecting some of the GDS-30 items [12,13], have been proposed for a rapid depression assessment in time-restricted scenarios, such as GDS versions with four items (called GDS-4), and GDS versions with five items (called GDS-5) [1423].

The accuracy of these GDS-4 and GDS-5 versions remains unclear [24]. Although some previous systematic reviews have assessed this subject, these tend to pool different GDS-4 or GDS-5 versions in the same quantitative analysis, even though each version includes different questions [13,2527]. Thus, we performed a systematic review that aims to assess the accuracy of the GDS-4 and GDS-5 versions for depression screening in older adults.

Material and methods

We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines [28]. The study protocol is registered in PROSPERO (CRD42020170864).

Eligibility criteria

The inclusion criteria were as follows: 1) Observational studies that reported the sensitivity and specificity of any of the GDS-4 and GDS-5 versions for the diagnosis of depression, using the DSM or ICD-10 diagnosis criteria as reference standard, since these provide a commonly used and accepted framework for depression diagnosis in the clinical practice [29], 2) studies that were conducted in older adults (at least 2/3 of the study participants must have had ≥ 55 years old), 3) studies that specified the items of the GDS-4 and GDS-5 versions, and 4) studies that provided enough data to construct a 2x2 contingency table to assess sensitivity and specificity. No restrictions on language, publication date, validation of language translation of the short GDS versions, or the mode of test assessment were applied.

Search strategy

We systematically searched the following databases and search engines: PubMed, PsycINFO and Scopus until April 24, 2020. Additionally, we searched the first 100 results retrieved in Google Scholar up to May 16, 2020. Google Scholar was searched to identify grey literature through the first 100 records, as systematic reviews usually examine the first 100 records in Google Scholar [3032] because it is a large and unspecific source of grey literature, which sorts results by relevance and coincidence. The search strategy is available at the S1 Table of the Supplementary Material. Later, we complemented the search by reviewing manually the lists of references of all the studies included in the data selection process, the lists of articles that cited each of these included studies (through Google Scholar), and the lists of studies included in previous systematic or narrative reviews on the subject, until May 2020 [13,2527,3339].

Data selection and extraction

Initially, we removed all duplicated records by using the EndNote software. Two independent authors (ANF and DRSM) independently screened all results for inclusion, first reviewing the titles and abstracts, and later performing a full-text assessment, trough EndNote software. Any disagreement during the selection process was discussed with a third party (ABC) and resolved by consensus.

Two authors (ANF and BSC) independently performed the data extraction from each included study using a standardized Microsoft Excel sheet. Differences were solved by a third researcher (ABC). The following variables were extracted: first author, year of publication, country, population characteristics (number of participants, setting, sex, age), inclusion and exclusion criteria, prevalence of depression in the study according to the reference standard, funding, intervention (short GDS version, language of the test, mode of test assessment, GDS-4 or GDS-5 questions, cut-off used, number of true positives, false positives, true negatives, and false negatives), reference standard (International Classification of Diseases [ICD], the Diagnostic and Statistical Manual of Mental Disorders [DSM], structured interview, or others), type of depression evaluated, and numerical results of sensitivity and specificity. When there were doubts about any information reported in the studies, we sent emails to the authors to clarify the information.

Risk of bias and certainty of the evidence

Two researchers (DRSM and MEDB) independently assessed the risk of bias of the included studies using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool [40]. This tool has four domains: patient selection, index test, reference standard, and flow and timing. The reference standards considered appropriate for this assessment were any version of the DSM or the ICD-10. In case of disagreement, a consensus was achieved with a third researcher (ATR).

Additionally, we used the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) methodology to report the certainty on the evidence [41,42]. Risk of bias, indirect evidence, inconsistency, imprecision, and publication bias were assessed. We downgraded the certainty of evidence when fewer than 70% of studies had at least 7 of 10 items at low risk according to QUADAS-2, when fewer than 70% of studies had the components (population, index test, or reference standard) similar to the initial diagnostic question, when heterogeneity was moderate or high, when the confidence interval range was greater than or equal to 10%, and when fewer than 4 studies evaluated the outcome of interest.

Statistical analyses

We conducted meta-analyses of the sensitivity and specificity of each of the GDS-4 and GDS-5 versions whenever studies fulfilled the following condition: 1) There was more than one study that compared the same version of GDS-4 or GDS-5 at the same cut-off point. We performed the meta-analyses of GDS-4 and GDS-5 separately.

When there were at least four studies to include in the meta-analysis, we used bivariate mixed-effects models via random effects that consider the correlation between sensitivity and specificity by each study to provide estimates of effects [43]. When less than four studies were included for a meta-analysis, the mixed-effects model assessment was not appropriate, so we performed meta-analyses of proportions using the exact binomial distribution. We calculated the pooled sensitivity and specificity with their 95% confidence intervals.

In addition, we meta-analyzed altogether the results of the included studies that assessed the same cut-off point of any GDS-4 version. Likewise, we meta-analyzed altogether the results of the included studies that assessed the same cut-off point of any GDS-5 version.

Heterogeneity was assessed through visual assessment of forest plots. To assess if there were subgroup differences across different GDS versions, also we evaluated heterogeneity through visual assessment of forest plots. All analyses were performed using the Stata v14.0 software.

Results

Overall, 2,740 records were retrieved in the database systematic search. After removal of duplicates, 2,254 records were screened, and 71 records were full-text reviewed. From these, we excluded 52 records for not fulfilling the inclusion criteria. Reasons for exclusion are explained in S2 Table. Nineteen records were included in this initial process.

Additionally, we identified seven records that meet our inclusion criteria after searching the lists of references of all included studies, the lists of references of previous reviews, and the lists of articles that cited each of the included studies (through Google Scholar). For a total of 26 included records.

Of these, some presented results from the same study: Allgaier 2011 and Allgaier 2013, Castelo 2007 and Castelo 2010, and Cheng 2004 and Cheng 2005 [4447]. Thus, 26 records representing 23 unique studies were included in the qualitative synthesis. Details of the selection process could be found in Fig 1.

Fig 1. Flow diagram (study selection).

Fig 1

Studies characteristics

The number of participants included was 5048. Individual studies´ participants ranged from 60 to 586. Regarding the population, one study was performed only in people without dementia [48], and the rest of the studies were performed in both groups of patients [1519,2123,4447,4961]. Regarding the gold standard used for depression, most studies used the DSM-IV [15,18,22,4447,49,52,55,56,58,59]. Other standards used were the DSM-III [17,23], DSM-III-R [19,51,60], DSM-IV-TR [54], DSM-V [16,48], and ICD-10 [21,49,53,57,61]. One study did not specify which DSM was evaluated [50]. Nine studies additionally used a structured interview to conduct their assessment such as the Structured Clinical Interview for DSM (SCID) or Composite International Diagnostic Interview (CIDI) [15,23,4447,49,51,54,56]. The characteristics of the 23 studies are summarized in Table 1 and detailed in S3 Table.

Table 1. Characteristics of the included studies.

Author (Country), Year Settings N Index test Reference standard
Van Marwijk (Netherlands), 1995 [23] Clinic outpatients 586 GDS-4 by Van Marwijk Major depression and dysthymia assessed by DIS based on DSM-III
Almeida (Brazil), 1999 [49] Clinic outpatients 64 GDS-4 by Van Marwijk Major depressive episode (F32) and dysthymia (F34.1) assessed by ICD-10 checklist of symptoms according to ICD-10, and DSM-IV
Hoyl (US), 1999 [15] Clinic outpatients 74 GDS-4 by D’Ath, Van Marwijk
GDS-5 by Hoyl
Major depression and depression not otherwise specified assessed by PRIME-MD based on DSM-IV
Galaria (US), 2000 [19] Clinic outpatients 70 GDS-4 by Galaria Major depression assessed by DSM-III-R
Chattat (Italy), 2001 [50] Clinic outpatients 126 GDS-5 by Hoyl Clinical Diagnosis of Depression assessed by DSM (Not specified)
De Dios (Spain), 2001 [18] Clinic outpatients 155 GDS-5 by De Dios or Ortega Major depression, dysthymic disorder, an adaptative disorder with depressive mood and adaptative disorder mixed anxious-depressive assessed by DSM-IV
Pomeroy (UK), 2001 [57] Clinic inpatients 87 GDS-4 by D’Ath A depressive episode (F32) assessed by ICD-10
Rinaldi (Italy), 2003 [58] Clinic outpatients and nursing home patients 181 GDS-5 by Hoyl Major depression, dysthymia, bipolar depression, and depression not otherwise specified assessed by DSM-IV
Cheng (China), 2004 and Cheng, 2005 [17,60] Clinic outpatients 444/442 GDS-4 by Cheng Major depressive disorder, dysthymia, depressive disorder not otherwise specified, adjustment disorder with depressed mood, dementia with depression assessed by DSM-III/Major depression, dysthymia, depressive disorder not otherwise specified, adjustment disorder with depressed mood and dementia with depression assessed by DSM-III-R
Jongenelis (Netherlands), 2005 [56] Nursing homes patients 333 GDS-4 by D’Ath, Van Marwijk
GDS-5 by Hoyl
Major depression and minor depression assessed by SCAN for DSM-IV
Martinez (Spain), 2005 [21] Clinic outpatients 249 GDS-4 by Martinez
GDS-5 by Martinez
Clinic diagnosis of depression (Not specified) assessed by ICD-10
De la Torre (Peru), 2006 [52] Clinic outpatients 400 GDS-4 by Galaria Clinic diagnosis of depression assessed by DSM-IV
Castelo (Brazil), 2007 and Castelo, 2010 [46,47] Clinic outpatients 220 GDS-4 by D’Ath, Van Marwijk Major depressive episode assessed by SCID-I for DSM-IV
Izal (Spain), 2007 [54] Community inhabitants and nursing home patients 233 GDS-5 by Hoyl Major depression assessed by SCID-I for DSM-IV-TR
Ortega (Spain), 2007 [22] Clinic outpatients 301 GDS-5 by De Dios or Ortega Clinical diagnostic of mood disorder assessed by DSM-IV
Cheng (China), 2010 [61] Clinic outpatients 110 GDS-4 by Cheng
GDS-5 by Cheng or Heisel
Major depression (F32), dysthymia (F34.1), bipolar disorder-depressive episode (F31.3), adjustment disorder with depressed reactions (F43.21), mixed anxiety and depressive disorder (F41.8), dementia with depressive symptoms (F06.31), assessed by ICD-10
Izal (Spain), 2010 [55] Community inhabitants 105 GDS-5 by Hoyl Major depression assessed by DSM-IV
Allgaier (Germany), 2011 and Allgaier, 2013 [44,45] Nursing homes patients 92 GDS-4 by D’Ath Major depression disorder/Major depression and minor depression assessed by SCID-I for DSM-IV
Chin (China), 2014 [51] Community inhabitants 388 GDS-5 by Hoyl (cut-off point was not reported for other versions) Major depressive disorder assessed by mPDA according to DSM-III-R
Apostolo (Portugal), 2018 [16] Clinic outpatients and inpatients, community inhabitants, and nursing home patients 139 GDS-5 by Apostolo Major depressive episode assessed by DSM-V
Dokuzlar (Turkey), 2018 [48] Community inhabitants 437 GDS-4 by Van Marwijk
GDS-5 by Hoyl
Major depression assessed by DSM-V
Eriksen (Norway), 2019 [53] Clinic outpatients, community inhabitants, and nursing patients 194 GDS-5 by Hoyl A depressive episode (F32) assessed by ICD-10
Sacuiu (Sweden), 2019 [59] Clinic outpatients and inpatients, community inhabitants and nursing home patients 60 GDS-4 by D’Ath, Van Marwijk
GDS-5 by Hoyl, Cheng or Heisel
Major Depressive Disorder assessed by DSM-IV

DSM: Diagnostic and Statistical Manual of Mental Disorders, ICD: International Classification of Diseases, PRIME-MD: Primary Care Evaluation of Mental Disorders, SCID-I: Structured Clinical Interview for DSM Axis I, DIS: Diagnostic Interview Schedule, SCAN: Schedule of Clinical Assessment in Neuropsychiatry, mPDA: Modified Psychiatrist Diagnostic Assessment.

GDS-4 and GDS-5 versions

Regarding the short GDS versions, 11 studies only assessed the GDS-4 [17,19,23,4447,49,52,57,60], 8 studies only assessed the GDS-5 [16,18,22,50,5355,58], and 7 studies evaluated both of them [15,21,48,51,56,59,61]. We found several GDS-4 and GDS-5 versions that included different items from the original GDS-30. The GDS-4 versions used in the published studies were: D’Ath (n = 7) [15,44,45,47,51,56,57,59], Van Marwijk (n = 8) [15,23,46,48,49,51,56,59], Cheng (n = 2) [17,60,61], Galaria (n = 2) [19,62], Martinez (n = 1) [21] and two authors did not specified which version was used. For the GDS-5, the versions assessed were: Hoyl (n = 9) [15,48,50,51,5356,58,59], De Dios or Ortega (n = 2) [18,22], Cheng or Heisel (n = 2) [59,61], Molloy (n = 1) [51], Martinez (n = 1) [21], and Apostolo (n = 1) [16].

Each of the GDS-4 and GDS-5 versions assessed different combinations of GDS-30 items. The list of the items assessed by each version is detailed in Table 2. The most assessed questions were the number 1 (satisfied with life) and 3 (life is empty).

Table 2. List of items of each GDS-4 and GDS-5 version found in the included studies.

GDS-30 items (over the past week) GDS-4 GDS-5
D’Ath Van Marjwik Cheng Galaria Martinez Hoyl De Dios/Ortega Cheng/Heisel Molloy* Martinez Apostolo
1. Satisfied with your life x x   x   x x   x   x
2. Dropped many of your activities and interests   x   x              
3. Your life is empty x       x     x x x  
4. Often get bored         x x x     x  
7. In good spirits most of the time     x   x         x x
8. Afraid that something bad is going to happen to you x                    
9. Happy most of the time x x           x x   x
10. Often feel helpless       x x x x   x x  
11. Often get restless and fidgety     x                
12. Prefer to stay at home rather than go out and do things   x       x x        
14. Have more problems with memory than most       x              
15. Wonderful to be alive now     x         x     x
16. Feel downhearted and blue     x                
17. Feel worthless the way you are now           x   x x    
20. Hard to get started on new projects             x        
21. Feel full of energy                   x  
22. Feel that your situation is hopeless               x     x

* The assessment of depression is different. If there is a negative answer in item 16, then the evaluation of depression is carried out with the complete GDS-30. If there is a positive answer in item 16, then the evaluation of depression is carried out with the 4 remaining items.

Risk of bias

Using the QUADAS-2 tool, we found a high risk of bias in most of the studies. There was high risk of bias in the index test domain. Specifically, the question about the lack of pre-specification of the cut-off points used was the most common flaw (Fig 2).

Fig 2. Quality assessment using the QUADAS-2 tool.

Fig 2

Diagnostic outcomes

As stated before, we assessed the sensitivity and specificity of studies that used the DSM or ICD-10 diagnosis criteria as a reference standard, for all GDS-4 and GDS-5 versions. Thus, 23 studies were included in these quantitative analyses.

GDS-4

For the GDS-4 assessment, 14 studies with a total of 3266 participants were included. We obtained eleven sensitivity and specificity estimates, which gave information regarding six versions of GDS-4 at different cut-offs: D’Ath at cut-off 1 and 2; Van Marwijk at cut-off 1, 2 and 3; Cheng at cut-off 1, 2, 3 and 4; Martinez at cut-off 2; and Galaria at cut-off 2 (Table 3).

Table 3. Summary of diagnostic estimates.
GDS-4
Cut-off GDS version Studies (n) Sensitivity (95% CI) Quality of the Evidence (GRADE) Specificity (95% CI) Quality of the Evidence (GRADE)
1 D’Ath 5 (806) 0.92 (0.84–0.96) VERY LOWa,b,c 0.61 (0.51–0.70) VERY LOWb,c,d
Van Marwijk 5 (1 650) 0.92 (0.76–0.97) VERY LOW d,e,f 0.51 (0.28–0.74) VERY LOWd,e,f
Cheng 2 (594) 0.88 (0.81–0.93) LOWa,b 0.46 (0.38–0.53) MODERATEb
Pooled results 9 (2 423) 0.89 (0.85–0.93) MODERATEa 0.53 (0.42–0.65) VERY LOWd,f
2 D’Ath 4 (559) 0.76 (0.61–0.86) VERY LOWa,e,f,i 0.81 (0.69–0.89) VERY LOWb,d,e,i
Van Marwijk 5 (1 275) 0.79 (0.62–0.90) VERY LOWc,d,f,j 0.72 (0.54–0.85) VERY LOWc,d,f,j
Cheng 2 (594) 0.73 (0.66–0.80) LOW a,b 0.63 (0.56–0.70) MODERATEb
Galaria 2 (470) 0.90 (0.84–0.96) VERY LOWb,c,e 0.80 (0.76–0.84) LOWc,e
Martinez 1 (249) 0.73 (0.64–0.82) LOWb,h 0.78 (0.72–0.84) LOWb,h
Pooled results 11(2 740) 0.77 (0.70–0.82) VERY LOWa,b,c,e 0.75 (0.68–0.81) VERY LOWb,c,d,e
3 Van Marwijk 1 (64) 0.85 (0.73–0.97) VERY LOWf,h 0.67 (0.51–0.84) VERY LOWf,h
Cheng 2 (594) 0.59 (0.53–0.65) MODERATEb 0.79 (0.70–0.85) LOWa,b
Pooled results 3 (658) 0.63 (0.53–0.71) LOWa,b 0.78 (0.69–0.84) LOWa,b
4 Cheng 1 (444) 0.79 (0.73–0.85) LOWb,h 0.92 (0.88–0.95) MODERATEh
GDS-5
1 Cheng or Heisel 1 (150) 0.88 (0.82–0.95) LOWb,h 0.38 (0.26–0.51) VERY LOWf,h
Apostolo 1 (139) 0.91 (0.80–1.00) VERY LOWb,g,h 0.55 (0.46–0.64) VERY LOWb,g,h
Hoyl 1 (333) 0.93 (0.87–0.99) LOWb,h 0.29 (0.71–0.83) LOWb,h
Pooled results 3 (622) 0.89 (0.83–0.94) LOWb,c 0.41 (0.30–0.53) VERY LOWc,d,f
2 Cheng or Heisel 1 (150) 0.78 (0.70–0.86) LOWb,h 0.56 (0.43–0.69) VERY LOWf,h
De Dios or Ortega 2 (456) 0.98 (0.96–1.00) HIGH 0.83 (0.79–0.87) HIGH
Hoyl 9 (2 071) 0.85 (0.79–0.90) VERY LOWa,b,e 0.77 (0.69–0.83) VERY LOWb,d,e
Martinez 1 (249) 0.89 (0.73–0.89) LOWb,h 0.73 (0.66–0.80) LOWb,h
Pooled results 14 (3 065) 0.85 (0.80–0.90) MODERATEa 0.75 (0.69–0.81) VERY LOWb,d
3 Apostolo 1 (139) 0.78 (0.61–0.95) VERY LOW f,g,h 0.85 (0.79–0.92) VERY LOW b,g,h
Cheng or Heisel 2 (210) 0.64 (0.54–0.74) VERY LOW b,c,e 0.79 (0.71–0.87) VERY LOW b,c,e
Hoyl 1(333) 0.77 (0.56–0.97) VERY LOW e,f,g,h 0.86 (0.76–0.96) VERY LOW b,e,g,h
Pooled results 3 (622) 0.60 (0.50–0.68) VERY LOW b,e,i 0.83 (0.74–0.89) VERY LOW a,b,e,i
4 Apostolo 1 (139) 0.39 (0.19–0.59) VERY LOWf,g,h 0.97 (0.95–1.00) LOWg,h
Cheng or Heisel 1 (150) 0.45 (0.35–0.55) LOWb,h 0.90 (0.82–0.97) LOWb,h
Pooled results 2 (289) 0.44 (0.35–0.53) LOWb,c 0.94 (0.88–1.00) VERY LOWa,b,c
5 Apostolo 1 (139) 0.13 (0.00–0.27) VERY LOWf,g,h 1.00 (1.00–1.00) LOWg,h

n: Number of participants.

a The heterogeneity is moderate.

b Wide confidence intervals.

c Between 50% and 70% of the studies have similar components to the question.

d The heterogeneity is great.

e High risk of bias.

f Very wide confidence intervals.

g The study presents two components similar to the question.

h Only one study has been evaluated.

i Less than 50% of the studies have similar components (population, index test, or reference standard) to the question.

j Very high risk of bias.

When taken together, GDS-4 versions at cut-off 1 had a pooled sensitivity of 0.90 (95% CI: 0.85–0.93) and a pooled specificity of 0.57 (95% CI: 0.45–0.67), at cut-off 2 had a pooled sensitivity of 0.77 (95% CI: 0.70–0.82) and a pooled specificity of 0.75 (95% CI: 0.68–0.81), and at cut-off 3 had a pooled sensitivity of 0.63 (95% CI: 0.53–0.71) and a pooled specificity of 0.78 (95% CI: 0.69–0.84).

Among the GDS-4 versions, the results for those with the lower cut-off point tend to have a higher sensitivity and a lower specificity. When assessing the sensitivity and specificity estimates, the Galaria at cut-off 2 and the Cheng at cut-off 4 had the greatest balance, the first one favoring the sensitivity and the second one the specificity.

We assessed and found differences in sensitivity and specificity estimates for the different GDS-4 versions, at each cut-off point used.

GDS-5

For the GDS-5 assessment, 15 studies with a total of 3085 participants were included. We obtained thirteen sensitivity and specificity estimates, which gave information regarding five versions of GDS-5 at different cut-offs: De Dios or Ortega at cut-off 2, Hoyl at cut-off 1, 2 and 3, Martinez at cut-off 2, Apostolo at cut-off 1, 3, 4 and 5, and Heisel or Cheng at cut-offs 1, 2, 3 and 4 (Table 3).

When taken together, GDS-5 versions at cut-off 1 had a pooled sensitivity of 0.89 (95% CI 0.83–0.94) and a pooled specificity of 0.41 (95% CI 0.30–0.53), at cut-off 2 had a pooled sensitivity of 0.85 (95% CI: 0.80–0.90) and a pooled specificity of 0.75 (95% CI: 0.69–0.81), at cut-off 3 had a pooled sensitivity of 0.60 (95% CI 0.50–0.68) and a pooled specificity of 0.83 (95% CI 0.74–0.89), and at cut-off 4 had a pooled sensitivity of 0.44 (95% CI 0.35–0.53) and a pooled specificity of 0.94 (95% CI 0.88–1.00).

Among the GDS-5 versions, the results for those with the lower cut-off point tend to have a higher sensitivity and a lower specificity. When assessing the sensitivity and specificity estimates, the De Dios or Ortega at cut-off 2 had the greatest balance of sensitivity (0.98, 95% CI: 0.96–1.00) and specificity (0.83, 95% CI: 0.79–0.87).

We assessed and found differences in sensitivity and specificity estimates for the different GDS-5 versions, at each cut-off point used.

A summary of the sensitivity analysis and all the forest plots could be found in S1S6 Figs.

Certainty of evidence

We used GRADE summary of findings (SoF) tables to report the certainty of evidence (Table 3). Overall, the certainty of the evidence was very low, mostly due to concerns about the indirectness of the evidence, inconsistency, and imprecision of the results. However, the De Dios or Ortega GDS-5 version obtained a high certainty of evidence.

Discussion

The first versions of the GDS-4 and GDS-5 were D’Ath and Hoyl versions, respectively [14,15]. However, many other versions have been created in recent years, mostly by testing which combination of GDS-30 items could have a better performance in terms of sensitivity and specificity [1720,22,23]. In this systematic review, we found five different versions for the GDS-4 instrument and seven different GDS-5 versions.

Previous systematic reviews have assessed the accuracy of these GDS short versions [13,2527]. These reviews included from two to ten studies for the GDS-4 assessment, and only one study for the GDS-5 assessment. While in our systematic review we included 23 studies: 15 that evaluated GDS-4 and 15 that evaluated GDS-5.

All previous meta-analysis had pooled the results from studies using different GDS versions. However, results suggest that different versions have different sensitivity and specificity estimates for the same cut-off point.

Among the assessed GDS-4 versions, the balance between sensitivity and specificity was greater for the Galaria version at cut-off 2 (pooled analysis of two studies, very low certainty of the accuracy evidence), and for the Cheng version at cut-off 4 (one study, low certainty of the accuracy evidence). Among the assessed GDS-5 versions, the balance between sensitivity and specificity was greater for the “De Dios or Ortega” version at cut-off 2 (pooled analysis of two studies with high certainty of the evidence). Although this suggests that the “De Dios or Ortega” version at cut-off 2 may be a balanced option, with a high certainty that allows a more confident estimation of underdiagnosis and overdiagnosis rates, decision-makers must also consider other factors such as applicability in their contexts or cultural variations in the manifestation of depression, before deciding which GDS version to use.

Subgroup analyses found that estimates were different across different GDS-4 versions, and across different GDS-5 versions. While this suggests that some versions may have a better performance than others, the low certainty of these estimates prevents from making any solid conclusion. However, it seems sensible that future systematic reviews evaluate each version separately.

Moreover, most of the meta-analyses for each version also had significant heterogeneity, which may be due to differences in risk of bias, populations characteristics (such as dementia prevalence), study setting, or reference standard usage (DSM-III, DSM-IV, DSM-V, or ICD-10 criteria). Moreover, some cultural differences in the construct of depression may cause heterogeneous results across different contexts [62]. Regretfully, the low number of studies per GDS version and their heterogeneous characteristics prevent to glimpse any predominant factor that could explain the heterogeneous results.

Limitations and strengths

Certain limitations must be considered when interpreting the results: 1) certainty of the evidence was low or very low for most of the results, mainly due to heterogeneity and risk of bias. 2) Most of studies had a high risk of bias, mainly due to the selective reporting of the cut-off points (some studies seemed to report only the cut-off with the highest sensitivity and specificity), and the assessment of GDS-4 or GDS-5 accuracy by extracting items assessed in a full GDS-30 interview (since the GDS-30 is a much longer survey, it is expected that answering to the GDS-30 would be more exhausted than answering the GDS-4 or GDS-5 versions). 3) Studies had heterogeneous settings, population characteristics, and depression definition.

However, to the best of our knowledge, this is the most comprehensive systematic review performed to date regarding the accuracy of GDS-4 and GDS-5, which included 23 studies; and is the first systematic review that provides the pooled estimates of each GDS-4 and GDS-5 versions. Thus, our results would help guide clinical practice and clinical guidelines recommendations.

Conclusion

This study summarizes the sensitivity and specificity of GDS-4 and GDS-5 for depression screening in older adults. We found several GDS-4 and GDS-5 versions, the results of which had great heterogeneity, which suggest that some versions may be more accurate than others. Certainty for the evidence was low or very low for almost all estimates. Altogether, our results indicate the need for more well-designed studies that compare different GDS versions.

Supporting information

S1 Checklist. PRISMA-DTA checklist item.

(DOC)

S1 Fig. D’ Ath version.

(DOCX)

S2 Fig. Van Marwijk version.

(DOCX)

S3 Fig. Cheng version.

(DOCX)

S4 Fig. Galaria version.

(DOCX)

S5 Fig. Hoyl version.

(DOCX)

S6 Fig. Cheng or Heisel version and De Dios or Ortega version.

(DOCX)

S1 Table. Search strategy.

(DOCX)

S2 Table. Excluded studies.

(DOCX)

S3 Table. Characteristics of the included studies.

(DOCX)

S1 File

(XLSX)

S1 Database

(XLSX)

Acknowledgments

We would like to thank David Villarreal-Zegarra and Jessica Hanae Zafra-Tanaka for their valuable comments in the revision of the manuscript.

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.James SL, Abate D, Abate KH, Abay SM, Abbafati C, Abbasi N, et al. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392(10159):1789–1858. doi: 10.1016/S0140-6736(18)32279-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Sözeri-Varma G. Depression in the elderly: clinical features and risk factors. Aging Dis. 2012;3(6):465–471. . [PMC free article] [PubMed] [Google Scholar]
  • 3.Barua A, Ghosh MK, Kar N, Basilio MA. Prevalence of depressive disorders in the elderly. Ann Saudi Med. 2011;31(6):620–624. doi: 10.4103/0256-4947.87100 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Aziz R, Steffens DC. What are the causes of late-life depression? Psychiatr Clin North Am. 2013;36(4):497–516. doi: 10.1016/j.psc.2013.08.001 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Drayer RA, Mulsant BH, Lenze EJ, Rollman BL, Dew MA, Kelleher K, et al. Somatic symptoms of depression in elderly patients with medical comorbidities. Int J Geriatr Psychiatry. 2005;20(10):973–82. doi: 10.1002/gps.1389 . [DOI] [PubMed] [Google Scholar]
  • 6.Bunce D, Batterham PJ, Mackinnon AJ, Christensen H. Depression, anxiety and cognition in community-dwelling adults aged 70 years and over. J Psychiatr Res. 2012;46(12):1662–6. doi: 10.1016/j.jpsychires.2012.08.023 . [DOI] [PubMed] [Google Scholar]
  • 7.Gureje O, Kola L, Afolabi E. Epidemiology of major depressive disorder in elderly Nigerians in the Ibadan Study of Ageing: a community-based survey. Lancet. 2007;370(9591):957–64. doi: 10.1016/S0140-6736(07)61446-9 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Licht-Strunk E, Beekman ATF, de Haan M, van Marwijk HWJ. The prognosis of undetected depression in older general practice patients. A one year follow-up study. J Affect Disord. 2009;114(1):310–5. doi: 10.1016/j.jad.2008.06.006 [DOI] [PubMed] [Google Scholar]
  • 9.Mitchell AJ, Vaze A, Rao S. Clinical diagnosis of depression in primary care: a meta-analysis. Lancet. 2009;374(9690):609–19. doi: 10.1016/S0140-6736(09)60879-5 [DOI] [PubMed] [Google Scholar]
  • 10.Vöhringer PA, Jimenez MI, Igor MA, Fores GA, Correa MO, Sullivan MC, et al. Detecting Mood Disorder in Resource-Limited Primary Care Settings: Comparison of a self-administered screening tool to general practitioner assessment. J Med Screen. 2013;20(3):118–24. doi: 10.1177/0969141313503954 [DOI] [PubMed] [Google Scholar]
  • 11.O’Connor E, Rossom RC, Henninger M, Groom HC, Burda BU, Henderson JT, et al. Screening for Depression in Adults: An Updated Systematic Evidence Review for the US Preventive Services Task Force [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2016. Available from: https://www.ncbi.nlm.nih.gov/books/NBK349027/. [PubMed] [Google Scholar]
  • 12.Mitchell AJ, Bird V, Rizzo M, Meader N. Diagnostic validity and added value of the Geriatric Depression Scale for depression in primary care: a meta-analysis of GDS30 and GDS15. J Affect Disord. 2010;125(1–3):10–7. doi: 10.1016/j.jad.2009.08.019 . [DOI] [PubMed] [Google Scholar]
  • 13.Mitchell AJ, Bird V, Rizzo M, Meader N. Which version of the geriatric depression scale is most useful in medical settings and nursing homes? Diagnostic validity meta-analysis. Am J Geriatr Psychiatry. 2010;18(12):1066–77. doi: 10.1097/jgp.0b013e3181f60f81 . [DOI] [PubMed] [Google Scholar]
  • 14.D’Ath P, Katona P, Mullan E, Evans S, Katona C. Screening, detection and management of depression in elderly primary care attenders. I: The acceptability and performance of the 15 item Geriatric Depression Scale (GDS15) and the development of short versions. Fam Pract. 1994;11(3):260–6. doi: 10.1093/fampra/11.3.260 . [DOI] [PubMed] [Google Scholar]
  • 15.Hoyl MT, Alessi CA, Harker JO, Josephson KR, Pietruszka FM, Koelfgen M, et al. Development and testing of a five-item version of the Geriatric Depression Scale. J Am Geriatr Soc. 1999;47(7):873–8. doi: 10.1111/j.1532-5415.1999.tb03848.x . [DOI] [PubMed] [Google Scholar]
  • 16.Apóstolo J, Campos E, Reis I, Justo-Henriques S, Correia C. Screening capacity of Geriatric Depression Scale with 10 and 5 items [Capacidade de rastreio da Escala de Depressão Geriátrica com 10 e 5 itens]. Revista de Enfermagem Referência. 2018;Serie IV:29–40. doi: 10.12707/RIV17062 [DOI] [Google Scholar]
  • 17.Cheng ST, Chan AC. A brief version of the geriatric depression scale for the chinese. Psychol Assess. 2004;16(2):182–6. doi: 10.1037/1040-3590.16.2.182 Erratum in: Psychol Assess. 2006;18(1):48. . [DOI] [PubMed] [Google Scholar]
  • 18.De Dios del Valle R, Hernández Sánchez AM, Rexach Cano LI, Cruz Jentoft AJ. Validación de una versión de cinco ítems de la Escala de Depresión Geriátrica de Yesavage en población española. Rev Esp Geriatr Gerontol. 2001;36(5):276–80. doi: 10.1016/S0211-139X(01)74736-1 [DOI] [Google Scholar]
  • 19.Galaria II, Casten RJ, Rovner BW. Development of a shorter version of the geriatric depression scale for visually impaired older patients. Int Psychogeriatr. 2000;12(4):435–43. doi: 10.1017/s1041610200006554 . [DOI] [PubMed] [Google Scholar]
  • 20.Heisel MJ, Duberstein PR, Lyness JM, Feldman MD. Screening for suicide ideation among older primary care patients. J Am Board Fam Med. 2010;23(2):260–9. doi: 10.3122/jabfm.2010.02.080163 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Martínez de la Iglesia J, Onís Vilches MC, Dueñas Herrero R, Aguado Taberné C, Albert Colomer C, Arias Blanco MC. Abreviar lo breve.Aproximación a versiones ultracortas del cuestionario de Yesavage para el cribado de la depresión. Aten Primaria. 2005;35(1):14–21. doi: 10.1157/13071040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ortega Orcos R, Salinero Fort MA, Kazemzadeh Khajoui A, Vidal Aparicio S, de Dios del Valle R. Validación de la versión española de 5 y 15 ítems de la Escala de Depresión Geriátrica en personas mayores en Atención Primaria. Rev Clin Esp. 2007;207(11):559–62. doi: 10.1157/13111585 [DOI] [PubMed] [Google Scholar]
  • 23.van Marwijk HW, Wallace P, de Bock GH, Hermans J, Kaptein AA, Mulder JD. Evaluation of the feasibility, reliability and diagnostic value of shortened versions of the geriatric depression scale. Br J Gen Pract. 1995;45(393):195–9. . [PMC free article] [PubMed] [Google Scholar]
  • 24.Hammami S, Hajem S, Barhoumi A, Koubaa N, Gaha L, Laouani Kechrid C. [Screening for depression in an elderly population living at home. Interest of the Mini-Geriatric Depression Scale]. Rev Epidemiol Sante Publique. 2012;60(4):287–93. doi: 10.1016/j.respe.2012.02.004 . [DOI] [PubMed] [Google Scholar]
  • 25.Pocklington C, Gilbody S, Manea L, McMillan D. The diagnostic accuracy of brief versions of the Geriatric Depression Scale: a systematic review and meta-analysis. Int J Geriatr Psychiatry. 2016;31(8):837–57. doi: 10.1002/gps.4407 . [DOI] [PubMed] [Google Scholar]
  • 26.Krishnamoorthy Y, Rajaa S, Rehman T. Diagnostic accuracy of various forms of geriatric depression scale for screening of depression among older adults: Systematic review and meta-analysis. Arch Gerontol Geriatr. 2020;87:104002. doi: 10.1016/j.archger.2019.104002 [DOI] [PubMed] [Google Scholar]
  • 27.Tsoi KK, Chan JY, Hirai HW, Wong SY. Comparison of diagnostic performance of Two-Question Screen and 15 depression screening instruments for older adults: systematic review and meta-analysis. Br J Psychiatry. 2017;210(4):255–60. doi: 10.1192/bjp.bp.116.186932 Erratum in: Br J Psychiatry. 2017;211(2):120. . [DOI] [PubMed] [Google Scholar]
  • 28.McInnes MDF, Moher D, Thombs BD, Mc Grath TA, Bossuyt PM, Clifford T, et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement. JAMA. 2018;319(4):388–396. doi: 10.1001/jama.2017.19163 . [DOI] [PubMed] [Google Scholar]
  • 29.Tyrer P. A comparison of DSM and ICD classifications of mental disorder. Adv Psychiatr Treat. 2018;20(4):280–5. doi: 10.1192/apt.bp.113.011296 [DOI] [Google Scholar]
  • 30.Abdullahi A, Candan SA, Abba MA, Bello AH, Alshehri MA, Afamefuna Victor E, et al. Neurological and Musculoskeletal Features of COVID-19: A Systematic Review and Meta-Analysis. Front Neurol. 2020;11:687. doi: 10.3389/fneur.2020.00687 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Joyeux L, De Bie F, Danzer E, Russo FM, Javaux A, Peralta CFA, et al. Learning curves of open and endoscopic fetal spina bifida closure: systematic review and meta-analysis. Ultrasound Obstet Gynecol. 2020;55(6):730–9. doi: 10.1002/uog.20389 . [DOI] [PubMed] [Google Scholar]
  • 32.Arora M, Chugh A, Jain N, Mishu M, Boeckmann M, Dahanayake S, et al. Global impact of tobacco control policies on smokeless tobacco use: a systematic review protocol. BMJ Open. 2020;10(12):e042860. doi: 10.1136/bmjopen-2020-042860 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Edwards M. Assessing for depression and mood disturbance in later life. Br J Community Nurs. 2004;9(11):492–4. doi: 10.12968/bjcn.2004.9.11.16874 [DOI] [PubMed] [Google Scholar]
  • 34.Grossberg GT, Beck D, Zaidi SNY. Rapid Depression Assessment in Geriatric Patients. Clin Geriatr Med. 2017;33(3):383–91. doi: 10.1016/j.cger.2017.03.007 . [DOI] [PubMed] [Google Scholar]
  • 35.Moyano Díaz E, Flores Moraga E, Soramaa H. Fiabilidad y validez de constructo del test MUNSH para medir felicidad, en población de adultos mayores chilenos. Univ Psychol. 2011;10:567–80. [Google Scholar]
  • 36.Nabbe P, Le Reste JY, Guillou-Landreat M, Munoz Perez MA, Argyriadou S, Claveria A, et al. Which DSM validated tools for diagnosing depression are usable in primary care research? A systematic literature review. Eur Psychiatry. 2017;39:99–105. doi: 10.1016/j.eurpsy.2016.08.004 . [DOI] [PubMed] [Google Scholar]
  • 37.Scogin F, Shah A. Screening older adults for depression in primary care settings. Health Psychol. 2006;25:675–7. doi: 10.1037/0278-6133.25.6.675 [DOI] [PubMed] [Google Scholar]
  • 38.Watson LC, Pignone MP. Screening accuracy for late-life depression in primary care: a systematic review. J Fam Pract. 2003;52(12):956–64. . [PubMed] [Google Scholar]
  • 39.Wu CM, Kelley LS. Choosing an appropriate depression assessment tool for chinese older adults: a review of 11 instruments. The best tools take into account cultural differences. J Gerontol Nurs. 2007;33(8):12–22. doi: 10.3928/00989134-20070801-04 . [DOI] [PubMed] [Google Scholar]
  • 40.Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Internal Med. 2011;155(8):529–36. doi: 10.7326/0003-4819-155-8-201110180-00009 . [DOI] [PubMed] [Google Scholar]
  • 41.Schunemann HJ, Mustafa RA, Brozek J, Santesso N, Bossuyt PM, Steingart KR, et al. GRADE Guidelines: 22. The GRADE approach for tests and strategies—from test accuracy to patient important outcomes and recommendations. J Clin Epidemiol. 2019. doi: 10.1016/j.jclinepi.2019.02.003 . [DOI] [PubMed] [Google Scholar]
  • 42.Schunemann HJ, Oxman AD, Brozek J, Glasziou P, Jaeschke R, Vist GE, et al. Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. BMJ. 2008;336(7653):1106–10. doi: 10.1136/bmj.39500.677199.AE . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. 2005;58(10):982–90. doi: 10.1016/j.jclinepi.2005.02.022 . [DOI] [PubMed] [Google Scholar]
  • 44.Allgaier AK, Kramer D, Mergl R, Fejtkova S, Hegerl U. [Validity of the geriatric depression scale in nursing home residents: comparison of GDS-15, GDS-8, and GDS-4]. Psychiatr Prax. 2011;38(6):280–6. doi: 10.1055/s-0030-1266105 . [DOI] [PubMed] [Google Scholar]
  • 45.Allgaier AK, Kramer D, Saravo B, Mergl R, Fejtkova S, Hegerl U. Beside the Geriatric Depression Scale: the WHO-Five Well-being Index as a valid screening tool for depression in nursing homes. Int J Geriatr Psychiatry. 2013;28(11):1197–204. doi: 10.1002/gps.3944 . [DOI] [PubMed] [Google Scholar]
  • 46.Castelo MS, Coelho-Filho JM, Carvalho AF, Lima JW, Noleto JC, Ribeiro KG, et al. Validity of the Brazilian version of the Geriatric Depression Scale (GDS) among primary care patients. Int Psychogeriatr. 2010;22(1):109–13. doi: 10.1017/S1041610209991219 . [DOI] [PubMed] [Google Scholar]
  • 47.Castelo MS, Neto IS, Noleto JCS, Lima WdO. Escala de Depressão Geriátrica com quatro itens: um instrumento válido para rastrear depressão em idosos em nível primário de saúde. Geriatrics, Gerontology and Aging. 2007;1(1):26–31. [Google Scholar]
  • 48.Dokuzlar O, Soysal P, Usarel C, Isik AT. The evaluation and design of a short depression screening tool in Turkish older adults. Int Psychogeriatr. 2018;30(10):1541–8. doi: 10.1017/S1041610218000236 . [DOI] [PubMed] [Google Scholar]
  • 49.Almeida OP, Almeida SA. Short versions of the geriatric depression scale: a study of their validity for the diagnosis of a major depressive episode according to ICD-10 and DSM-IV. Int J Geriatr Psychiatry. 1999;14(10):858–65. doi: . [DOI] [PubMed] [Google Scholar]
  • 50.Chattat R, Ellena L, Cucinotta D, Savorani G, Mucciarelli G. A study on the validity of different short versions of the Geriatric Depression Scale. Arch Gerontol Geriatr Suppl. 2001;7:81–6. doi: 10.1016/s0167-4943(01)00124-8 [DOI] [PubMed] [Google Scholar]
  • 51.Chin W-C, Liu C-Y, Lee C-P, Chu C-L. Validation of five short versions of the Geriatric Depression Scale in the elder population in Taiwan. Taiwan J Psychiatr. 2014;28(3):156–63. [Google Scholar]
  • 52.De la Torre Maslucan J, Shimabukuro Maeki R, Varela Pinedo L, Krüger Malpartida H, Huayanay Falconí L, Cieza Zevallos J, et al. Validación de la versión reducida de la escala de depresión geriátrica en el consultorio externo de geriatría del Hospital Nacional Cayetano Heredia. Acta Médica Peruana. 2006;23:144–7. [Google Scholar]
  • 53.Eriksen S, Bjørkløf GH, Helvik AS, Larsen M, Engedal K. The validity of the hospital anxiety and depression scale and the geriatric depression scale-5 in home-dwelling old adults in Norway(✰). J Affect Disord. 2019;256:380–5. doi: 10.1016/j.jad.2019.05.049 . [DOI] [PubMed] [Google Scholar]
  • 54.Izal M, Montorio I, Nuevo R, Pérez-Rojo G. Comparación de la sensibilidad y la especificidad entre diferentes versiones de la Escala de Depresión Geriátrica. Rev Esp Geriatr Gerontol. 2007;42(4):227–32. doi: 10.1016/S0211-139X(07)73555-2. [DOI] [Google Scholar]
  • 55.Izal M, Montorio I, Nuevo R, Pérez-Rojo G, Cabrera I. Optimising the diagnostic performance of the Geriatric Depression Scale. Psychiatry Res. 2010;178(1):142–6. Epub 2010/05/11. doi: 10.1016/j.psychres.2009.02.018 . [DOI] [PubMed] [Google Scholar]
  • 56.Jongenelis K, Pot AM, Eisses AM, Gerritsen DL, Derksen M, Beekman AT, et al. Diagnostic accuracy of the original 30-item and shortened versions of the Geriatric Depression Scale in nursing home patients. Int J Geriatr Psychiatry. 2005;20(11):1067–74. doi: 10.1002/gps.1398 . [DOI] [PubMed] [Google Scholar]
  • 57.Pomeroy IM, Clark CR, Philp I. The effectiveness of very short scales for depression screening in elderly medical patients. Int J Geriatr Psychiatry. 2001;16(3):321–6. doi: 10.1002/gps.344 . [DOI] [PubMed] [Google Scholar]
  • 58.Rinaldi P, Mecocci P, Benedetti C, Ercolani S, Bregnocchi M, Menculini G, et al. Validation of the five-item geriatric depression scale in elderly subjects in three different settings. J Am Geriatr Soc. 2003;51(5):694–8. doi: 10.1034/j.1600-0579.2003.00216.x . [DOI] [PubMed] [Google Scholar]
  • 59.Sacuiu S, Seidu NM, Sigström R, Rydberg Sterner T, Johansson L, Wiktorsson S, et al. Accuracy of 12 short versions of the Geriatric Depression Scale to detect depression in a prospective study of a high-risk population with different levels of cognition. Int Psychogeriatr. 2019:1–10. doi: 10.1017/s1041610219001650 . [DOI] [PubMed] [Google Scholar]
  • 60.Cheng ST, Chan AC. Comparative performance of long and short forms of the Geriatric Depression Scale in mildly demented Chinese. Int J Geriatr Psychiatry. 2005;20(12):1131–7. doi: 10.1002/gps.1405 . [DOI] [PubMed] [Google Scholar]
  • 61.Cheng ST, Yu EC, Lee SY, Wong JY, Lau KH, Chan LK, et al. The geriatric depression scale as a screening tool for depression and suicide ideation: a replication and extention. Am J Geriatr Psychiatry. 2010;18(3):256–65. doi: 10.1097/JGP.0b013e3181bf9edd . [DOI] [PubMed] [Google Scholar]
  • 62.Juhasz G, Eszlari N, Pap D, Gonda X. Cultural differences in the development and characteristics of depression. Neuropsychopharmacol Hung. 2012;14(4):259–65. . [PubMed] [Google Scholar]

Decision Letter 0

Eleanor Ochodo

22 Mar 2021

PONE-D-21-03547

Accuracy of the Geriatric Depression Scale (GDS)-4 and GDS-5 for the screening of depression among older adults: a systematic review and meta-analysis

PLOS ONE

Dear Alvaro Taype-Rondan, 

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

This is an interesting review but the rationale and methodology(especially) warrant considerable revisions. Please refer to references provided in the peer-review feedback on resources that can guide the correct methods for conducting and reporting systematic reviews and meta-analyses of Diagnostic Accuracy studies. One example is the PRISMA statement for diagnostic accuracy reviews which was not used and referenced by the author team.

Please submit your revised manuscript by 30 April, 2021. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Eleanor Ochodo, M.D., PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please include a copy of Table 4 which you refer to in your text on page 13.

3. Please include captions for *all* Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Overall, the authors have done rigorous work to examine the utility of the brief versions of GDS (GDS-5 and 4) in screening for geriatric depression. Despite the challenge of inconsistency in the item used and the cut-off value, they managed to segregate the various versions of GDS4 and 5 with the same cut-off point and performed meta-analysis whenever possible. The outcome is summarized as pooled sensitivity and specificity with the Youden index. Since2*2 contingency table is constructed it would be more informative if the authors could report other measures of diagnostic accuracy like a pooled estimate of the positive and negative likelihood ratios, summary diagnostic odds ratio and area under curve (AOC).

Major:

Methods: For the outcome measures, since 2 by 2 contingency table is constructed, it would be better if authors could include other measures of diagnostic accuracy like a pooled estimate of the positive and negative likelihood ratios, summary diagnostic odds ratio, and area under curve (AOC).

Furthermore, I wonder if Cronbach’s alfa of the items enlisted in table 2 could be summarized to recommend for future studies.

Minor:

Line 60: it is combining? Or just selecting some items from the original 30-item version?

Line 62,63: word different and altogether may not be needed.

Line 147: Regarding the population, …. The sentence needs reconstruction. Only one study excluded the sample with dementia.

Line 74-81: Authors have elaborated the various eligibility criteria for inclusion. Authors can further clarify:

1.      Language of the tool. If translated, whether the translation was done appropriately

2.      I understood that irrespective of Study design, setting, the way the interview was included. It may be better to mention.

Reviewer #2: This review aims to assess the diagnostic accuracy of Geriatric Depression Scale (GDS)-4 and GDS-5 for screening of depression among older adults

General comments

This is an interesting paper but the rationale needs to be strengthened and methods revised considerably. The methodology is presents some inaccuracies and confusion with methods used for meta-analyses of intervention reviews. I would recommend the following references to the authors;

• PRISMA for Diagnostic Test Accuracy Reviews (DTA): http://www.prisma-statement.org/Extensions/DTA

• Leeflang MM. Systematic reviews and meta-analyses of diagnostic test accuracy. Clin Microbiol Infect. 2014 Feb;20(2):105-13. doi: 10.1111/1469-0691.12474. PMID: 24274632 https://pubmed.ncbi.nlm.nih.gov/24274632/

• Leeflang MM, Deeks JJ, Gatsonis C, Bossuyt PM; Cochrane Diagnostic Test Accuracy Working Group. Systematic reviews of diagnostic test accuracy. Ann Intern Med. 2008 Dec 16;149(12):889-97. doi: 10.7326/0003-4819-149-12-200812160-00008. https://pubmed.ncbi.nlm.nih.gov/19075208/

• DTA meta-analyses methods: Chapter 10 (Analysis of results) of the Cochrane handbook for DTA reviews. https://methods.cochrane.org/sdt/handbook-dta-reviews

• GRADE: https://gradepro.org/ GRADE Pro /GDT is a free software that enables authors conduct GRADE assessments accurately and also generates nice summary of findings tables. (Table 3 needs to follow this format)

Specific comments

ABSTRACT

Methods section

• This statement in the methods section is unclear. “We conducted sensitivity and specificity meta-analyses of those studies using the Diagnostic and Statistical Manual of Mental Disorders (DSM) or the International Classification of Diseases-10 (ICD-10). Do the authors mean using “Diagnostic and Statistical Manual of Mental Disorders (DSM) or the International Classification of Diseases-10 (ICD-10)” as the reference standard? In its current form, the statement implies that these were the statistical methods used to pool the estimates of sensitivity and specificity.

• “Mostly, significant subgroup differences across versions were found”. It would be helpful to state which subgroups were measured. Do the authors means subgroup differences at different test thresholds?

Conclusion

• This statement in the conclusion is not qualified in the results section. “Conclusions: The accuracy of the different GDS-4 and GDS-5 versions showed a high heterogeneity”. This statement is best placed in the results section.

MAIN TEXT

Introduction

• The rationale, “accuracy remains unclear” is a vague rationale. Is the lack of clarity due to variation in accuracy estimates of existing primary studies? Please qualify that rationale better. I also disagree that the existing published systematic reviews are all outdated. Some were published in 2017 (ref 27) and 2019 (ref 26). These are recent reviews. An outdated review is usually > 5years.

• It would be good to include a paragraph in the rationale about the best available test/reference test- what it is, its strengths and limitations as well as the anticipated role of the index tests. Are the GDS scales being evaluated as replacement tests for the reference tests/existing tests?

Methods

• The reporting of this review would be greatly improved if the authors were guided by the PRISMA extension for Diagnostic Test Accuracy reviews and not the original PRISMA. Please re-write this review based on PRISMA DTA.[ McInnes MDF, Moher D, Thombs BD, McGrath TA, Bossuyt PM; and the PRISMA-DTA Group; Clifford T, Cohen JF, Deeks JJ, Gatsonis C, Hooft L, Hunt HA, Hyde CJ, Korevaar DA, Leeflang MMG, Macaskill P, Reitsma JB, Rodin R, Rutjes AWS, Salameh JP, Stevens A, Takwoingi Y, Tonelli M, Weeks L, Whiting P, Willis BH. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement. JAMA. 2018;319(4):388-396.]

Search strategy

• Please clarify why only the first 100 results yielded in google scholar were searched. Google scholar generates lots of hits. How did the authors ensure that the first 100 were the most relevant to screen?

Data selection and extraction.

• The authors state that duplicates were removed using endnote reference tool and data extraction was done using an excel sheet. Please clarify which platform/software specifically was used to screen titles, abstracts and full texts of the search yield?

Risk of bias and certainly of evidence

• Please add more detail about how GRADE was used to assess certainty of evidence? How was downgrading done?

Statistical analyses

• This section needs to be revised for clarity. Please refer to Chapter 10 (Analysis section) in the Cochrane handbook for DTA reviews. https://methods.cochrane.org/sdt/handbook-dta-reviews

• Please provide a reference to qualify the type of meta-analyses used (bivariate model).Also clarify if it was the bivariate random effects method (which is commonly used) or bivariate mixed-effects models. By mixed effects do you mean random and fixed effects combined?

• Please state clearly at the beginning of this section that meta-analyses of GDS-4 and GDS-5 were done separately

• Please provide a rationale why the Y index was calculated provided. This is a global measure of accuracy and to my knowledge rarely used nowadays because of its limitations.

• Please revisit how heterogeneity is measured in DTA reviews. I2 is used to assess heterogeneity of intervention reviews and not recommended for DTA reviews.

Results

• The results section about risk of bias is thin. QUADAS has four domains against which risk of bias results are reported. Please specify which domains were deemed to have risk of bias.

• Table 3. The reporting of GRADE results is incorrect. GRADE assessment is given for an overall summary of evidence and not individual studies as presented. QUADAS is for individual studies but GRADE summarises the overall certainly of evidence across the domains quality/risk of bias; inconsistency, imprecision, indirectness and publication bias. For example, one would except an overall certainly of evidence for pooled results at each cutoff but not for individual studies. Please refer to the GRADE pro software to help with the GRADE assessment as well as generation of an accurate summary of findings table (https://gradepro.org/).

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Roshana Shrestha

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jul 1;16(7):e0253899. doi: 10.1371/journal.pone.0253899.r002

Author response to Decision Letter 0


23 Apr 2021

Dear editor and reviewers,

Thank you for your kind consideration. In this letter, we proceed to respond to each of the comments made by the reviewers and the journal requirements.

Sincerely,

Alvaro Taype-Rondan, corresponding author

Journal Requirements:

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Answer. Thank you for your observation. We have checked the requirements and our manuscript meets them.

2. Please include a copy of Table 4 which you refer to in your text on page 13.

Answer. Thank you for your observation. In our manuscript, there are 3 tables in total. "Table 4" should not appear. On page 13, we have replaced “Table 4” with “Table 3”.

3. Please include captions for *all* Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

Answer. Thank you for your observation. We added the captions for Supporting Information files at the end of the manuscript. Also, we have updated the in-text citations.

Reviewers' comments:

R1.1: Methods: For the outcome measures, since 2 by 2 contingency table is constructed, it would be better if authors could include other measures of diagnostic accuracy like a pooled estimate of the positive and negative likelihood ratios, summary diagnostic odds ratio, and area under curve (AOC).

Answer. Thank you for your recommendation. For decision making in clinical practice, sensitivity and specificity are used to identify false positives, false negatives, true positives, and true negatives, and thus to consider the desirable and undesirable consequences of diagnostic tests for patients (1). Although we could include the other diagnostic measures, we believe that it would enlarge the table, and make it less understandable. Thus, we chose to keep sensitivity and specificity as the most useful and sufficient indicators for decision-making.

1. Buehler AM, Ascef BO, Oliveira Júnior HA, Ferri CP, Fernandes JG. Rational use of diagnostic tests for clinical decision making. Rev Assoc Med Bras. 2019;65(3):452-459. doi: 10.1590/1806-9282.65.3.452.

R1.2: Furthermore, I wonder if Cronbach’s alfa of the items enlisted in table 2 could be summarized to recommend for future studies.

Answer. Thank you for your observation. The included studies provided data to construct the 2x2 table for each short GDS version but did not provide the variance data of each short GDS version to calculate Cronbach's alpha (1). The sensitivity and specificity data did not allow us to calculate the correlation between items, hence we were unable to compute Cronbach's alpha.

1. Cortina J.What is coefficient alpha: an examination of theory and applications. J Appl Psychol. 1993;78:98-104. doi: 10.1037/0021-9010.78.1.98

R1.3: Line 60: it is combining? Or just selecting some items from the original 30-item version?

Answer. Thank you for your observation. We have corrected the redaction of that sentence. We have changed “Therefore, shorter GDS versions, combining some of the GDS-30 items” to “Therefore, shorter GDS versions, selecting some of the GDS-30 items”.

R1.4: Line 62,63: word different and altogether may not be needed.

Answer. Thank you for your observation. We have changed “such as different GDS versions with four items (called altogether GDS-4), and different GDS versions with five items (called altogether GDS-5)” to “such as GDS versions with four items (called GDS-4), and GDS versions with five items (called GDS-5)”.

R1.5: Line 147: Regarding the population, …. The sentence needs reconstruction. Only one study excluded the sample with dementia.

Answer. Thank you for your observation. We have rewritten that sentence now in Line 166: “Regarding the population, one study was performed only in people without dementia [44], and the rest of the studies were performed in both groups of patients”.

R1.6: Line 74-81: Authors have elaborated the various eligibility criteria for inclusion. Authors can further clarify:

1. Language of the tool. If translated, whether the translation was done appropriately

2. I understood that irrespective of Study design, setting, the way the interview was included. It may be better to mention.

Answer. Thank you for your observation. The GDS-4 and GDS-5 versions of the included studies were translated into different languages and applied. Although it is assumed that these instruments were validated prior to their use for depression screening, there were studies that explicitly stated that they used the validated instrument, while others did not state their validation information. We included studies that evaluated the GDS-4 or GDS-5 versions independent of whether they detailed validation in the language of the study participants.

Regarding the study design, we have included observational studies as those are adequately designed for the assessment of diagnostic tests. Also, we had no restrictions of setting and interview. We added the following lines in “Eligibility criteria” section Line 87-89: No restrictions on language, publication date, validation of language translation of the short GDS versions, or the mode of test assessment were applied”.

In S3 Table, we added the following data: “Language of the test” and “Mode of test assessment”.

R2.1: This is an interesting paper but the rationale needs to be strengthened and methods revised considerably. The methodology is presents some inaccuracies and confusion with methods used for meta-analyses of intervention reviews. I would recommend the following references to the authors;

• PRISMA for Diagnostic Test Accuracy Reviews (DTA): http://www.prisma-statement.org/Extensions/DTA

• Leeflang MM. Systematic reviews and meta-analyses of diagnostic test accuracy. Clin Microbiol Infect. 2014 Feb;20(2):105-13. doi: 10.1111/1469-0691.12474. PMID: 24274632 https://pubmed.ncbi.nlm.nih.gov/24274632/

• Leeflang MM, Deeks JJ, Gatsonis C, Bossuyt PM; Cochrane Diagnostic Test Accuracy Working Group. Systematic reviews of diagnostic test accuracy. Ann Intern Med. 2008 Dec 16;149(12):889-97. doi: 10.7326/0003-4819-149-12-200812160-00008. https://pubmed.ncbi.nlm.nih.gov/19075208/

• DTA meta-analyses methods: Chapter 10 (Analysis of results) of the Cochrane handbook for DTA reviews. https://methods.cochrane.org/sdt/handbook-dta-reviews

• GRADE: https://gradepro.org/ GRADE Pro /GDT is a free software that enables authors conduct GRADE assessments accurately and also generates nice summary of findings tables. (Table 3 needs to follow this format)

Answer. Thank you for your observation. We have read the suggested literature on systematic reviews and meta-analyses of diagnostic test accuracy to improve the methodology in our manuscript. We have added the following lines in the Abstract as required by PRISMA-DTA: "Study quality was assessed with the QUADAS-2 tool”, “We performed a bivariate random-effects meta-analysis to calculate the pooled sensitivity and specificity with their 95% confidence intervals (95% CI)", "The number of participants included was 5048", and "There was high risk of bias in the index test domain". In addition, with the corrections made in the other reviewers' comments, we comply with the rest of the content indicated in the PRISMA-DTA. In the "Material and Methods" section, we have changed "We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines" to "We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines [28]".

28. McInnes MDF, Moher D, Thombs BD, McGrath TA, Bossuyt PM, Clifford T, et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement. JAMA. 2018;319:388-396. doi: 10.1001/jama.2017.19163.

Regarding Table 3, the Summary of Findings table format for GRADE Pro diagnostic questions does not seem to us to be the most convenient because the data on the prevalence of depression in the elderly can be very variable among different subsets of elderly people (1,2). Therefore, we opted to continue with the format of Table 3 where we place the certainty of evidence evaluated according to the GRADE methodology.

1. Djernes JK. Prevalence and predictors of depression in populations of elderly: a review. Acta Psychiatr Scand. 2006;113(5):372-87. doi: 10.1111/j.1600-0447.2006.00770.x. PMID: 16603029.

2. Blazer DG. Depression in late life: review and commentary. J Gerontol A Biol Sci Med Sci. 2003;58(3):249-65. doi: 10.1093/gerona/58.3.m249. PMID: 12634292.

R2.2: ABSTRACT

Methods section

• This statement in the methods section is unclear. “We conducted sensitivity and specificity meta-analyses of those studies using the Diagnostic and Statistical Manual of Mental Disorders (DSM) or the International Classification of Diseases-10 (ICD-10). Do the authors mean using “Diagnostic and Statistical Manual of Mental Disorders (DSM) or the International Classification of Diseases-10 (ICD-10)” as the reference standard? In its current form, the statement implies that these were the statistical methods used to pool the estimates of sensitivity and specificity.

Answer. Thank you for your observation. We meant that DSM and ICD-10 were used as reference standard. In “Methods” section of the abstract, we have changed “We conducted sensitivity and specificity meta-analyses of those studies using the Diagnostic and Statistical Manual of Mental Disorders (DSM) or the International Classification of Diseases-10 (ICD-10)” to “We conducted sensitivity and specificity meta-analyses of those studies that used the Diagnostic and Statistical Manual of Mental Disorders (DSM) or the International Classification of Diseases-10 (ICD-10) as reference standard”.

R2.3: “Mostly, significant subgroup differences across versions were found”. It would be helpful to state which subgroups were measured. Do the authors means subgroup differences at different test thresholds?

Answer. Thank you for your observation. We had specified which subgroups were analyzed, each subgroup is defined by a different GDS-4/-5 version and a threshold. We have added that explanation in the “Results” section of the abstract: “Mostly, significant subgroup differences at different test thresholds across versions were found”.

R2.4: Conclusion

• This statement in the conclusion is not qualified in the results section. “Conclusions: The accuracy of the different GDS-4 and GDS-5 versions showed a high heterogeneity”. This statement is best placed in the results section.

Answer. Thank you for your observation. We added the following line in “Results” section of the abstract: “The accuracy of the different GDS-4 and GDS-5 versions showed a high heterogeneity”. In “Conclusions” section of the abstract, we added the following lines: “We found several GDS-4 and GDS-5 versions that showed great heterogeneity, mostly with a low or very low certainty of the evidence”.

R2.5: MAIN TEXT

Introduction

• The rationale, “accuracy remains unclear” is a vague rationale. Is the lack of clarity due to variation in accuracy estimates of existing primary studies? Please qualify that rationale better. I also disagree that the existing published systematic reviews are all outdated. Some were published in 2017 (ref 27) and 2019 (ref 26). These are recent reviews. An outdated review is usually > 5years.

Answer. Thank you for your observation. We have developed the idea of our rationale as follows: Previous systematic reviews had wrongly focused their analysis, by pooling the results of different versions of each scale (GDS-4/-5) that represented different instruments as the used different questions from the GDS-30, and evaluated together the diagnostic values obtained by different scales. This analysis was incorrect as there are different types of versions of GDS-4 and GDS-5 and they should not be evaluated as one. Thus, although there are previous recent systematic reviews with metanalyses [26,27], the actual accuracy of the GDS-4/GDS-5 scales remains unclear.

In “Introduction” section, we have changed “Although some previous systematic reviews have assessed this subject, these are outdated and tend to pool…” to “Although some previous systematic reviews have assessed this subject, these tend to pool…”

26. Krishnamoorthy Y, Rajaa S, Rehman T. Diagnostic accuracy of various forms of geriatric depression scale for screening of depression among older adults: Systematic review and meta-analysis. Archives of Gerontology and Geriatrics. 2020;87:104002. doi: https://doi.org/10.1016/j.archger.2019.104002.

27. Tsoi KK, Chan JY, Hirai HW, Wong SY. Comparison of diagnostic performance of Two-Question Screen and 15 depression screening instruments for older adults: systematic review and meta-analysis. The British journal of psychiatry : the journal of mental science. 2017;210(4):255-60. Epub 2017/02/18. doi: 10.1192/bjp.bp.116.186932. PubMed PMID: 28209592.

R2.6: It would be good to include a paragraph in the rationale about the best available test/reference test- what it is, its strengths and limitations as well as the anticipated role of the index tests. Are the GDS scales being evaluated as replacement tests for the reference tests/existing tests?

Answer. Thank you for your observation. We added the following lines in “Introduction” section: “There are several scales for screening for depression among older adults, such as the Geriatric Depression Scale (GDS) [11], the Center for Epidemiologic Studies Depression Scale (CES-D), and others. However, the GDS is one of the most used to identify depression among older adults [11]. Among the strengths of the GDS, its use may be easier in people with cognitive impairment because of the simple yes-no format, and it can be used in hospital and community settings”.

[11] O'Connor E, Rossom RC, Henninger M, Groom HC, Burda BU, Henderson JT, et al. U.S. Preventive Services Task Force Evidence Syntheses, formerly Systematic Evidence Reviews. Screening for Depression in Adults: An Updated Systematic Evidence Review for the US Preventive Services Task Force. Rockville (MD): Agency for Healthcare Research and Quality (US); 2016.

R2.7: Methods

• The reporting of this review would be greatly improved if the authors were guided by the PRISMA extension for Diagnostic Test Accuracy reviews and not the original PRISMA. Please re-write this review based on PRISMA DTA.[ McInnes MDF, Moher D, Thombs BD, McGrath TA, Bossuyt PM; and the PRISMA-DTA Group; Clifford T, Cohen JF, Deeks JJ, Gatsonis C, Hooft L, Hunt HA, Hyde CJ, Korevaar DA, Leeflang MMG, Macaskill P, Reitsma JB, Rodin R, Rutjes AWS, Salameh JP, Stevens A, Takwoingi Y, Tonelli M, Weeks L, Whiting P, Willis BH. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement. JAMA. 2018;319(4):388-396.]

Answer. Thank you for your observation. We have added the following lines in the Abstract as required by PRISMA-DTA: "Study quality was assessed with the QUADAS-2 tool”, “We performed a bivariate random-effects meta-analysis to calculate the pooled sensitivity and specificity with their 95% confidence intervals (95% CI)", "The number of participants included was 5048", and "There was high risk of bias in the index test domain". In addition, with the corrections made in the other reviewers' comments, we comply with the rest of the content indicated in the PRISMA-DTA. In the "Material and Methods" section, we have changed "We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines" to "We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines [28]".

28. McInnes MDF, Moher D, Thombs BD, McGrath TA, Bossuyt PM, Clifford T, et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement. JAMA. 2018;319:388-396. doi: 10.1001/jama.2017.19163.

R2.8: Search strategy

• Please clarify why only the first 100 results yielded in google scholar were searched. Google scholar generates lots of hits. How did the authors ensure that the first 100 were the most relevant to screen?

Answer. Thank you for your observation. As Google Scholar is a big and unspecific source of grey literature, which orders the results by relevance and coincidence, Systematic reviews usually examine the first results (usually the first 100 records) in Google Scholar (1-3). We searched Google Scholar to identify grey literature through the first 100 records. We entered the Internet as incognito and removed the cache (data previously stored on the computer) so that previous data would have no influence on the order of appearance of the search results. These results were ordered according to relevance and were not restricted by publication date. The first 100 records were evaluated. In “Search strategy” section, we added the following lines (Line 93-100): “Google Scholar was searched to identify grey literature through the first 100 records, as systematic reviews usually examine the first 100 records in Google Scholar because it is a large and unspecific source of grey literature, which sorts results by relevance and coincidence”.

1. Abdullahi A, Candan SA, Abba MA, Bello AH, Alshehri MA, Afamefuna Victor E, et al. Neurological and Musculoskeletal Features of COVID-19: A Systematic Review and Meta-Analysis. Front Neurol. 2020;11:687. doi: 10.3389/fneur.2020.00687

2. Joyeux L, De Bie F, Danzer E, Russo FM, Javaux A, Peralta CFA, et al. Learning curves of open and endoscopic fetal spina bifida closure: systematic review and meta-analysis. Ultrasound Obstet Gynecol. 2020;55(6):730–9. doi: 10.1002/uog.20389

3. Arora M, Chugh A, Jain N, Mishu M, Boeckmann M, Dahanayake S, et al. Global impact of tobacco control policies on smokeless tobacco use: a systematic review protocol. BMJ Open. 2020 Dec 24;10(12):e042860.

doi: 10.1136/bmjopen-2020-042860

R2.9: Data selection and extraction.

• The authors state that duplicates were removed using endnote reference tool and data extraction was done using an excel sheet. Please clarify which platform/software specifically was used to screen titles, abstracts and full texts of the search yield?

Answer. Thank you for your observation. Screening of titles, abstracts, and full texts was performed manually in Endnote. After eliminating duplicates, two Endnote libraries were created with the records that were found so that two people could independently perform the screening, and compare their selection of studies. We added the following lines in “Data selection and extraction” section (lines 102-104): “Two independent authors (ANF and DRSM) independently screened all results for inclusion, first reviewing the titles and abstracts, and later performing a full-text assessment, through EndNote software”.

R2.10: Risk of bias and certainly of evidence

• Please add more detail about how GRADE was used to assess certainty of evidence? How was downgrading done?

Answer. Thank you for your observation. We added the following lines in “Risk of bias and certainty of evidence” section (Lines 125-131): “Risk of bias, indirect evidence, inconsistency, imprecision, and publication bias were assessed. We downgraded the certainty of evidence when fewer than 70% of studies had at least 7 of 10 items at low risk according to QUADAS-2, when fewer than 70% of studies had the components (population, index test, or reference standard) similar to the initial diagnostic question, when heterogeneity was moderate or high, when the confidence interval range was greater than or equal to 10%, and when fewer than 4 studies evaluated the outcome of interest”.

R2.11: Statistical analyses

• This section needs to be revised for clarity. Please refer to Chapter 10 (Analysis section) in the Cochrane handbook for DTA reviews. https://methods.cochrane.org/sdt/handbook-dta-reviews

Answer. Thank you for your observation. The process of data analysis was performed according to the Cochrane handbook (allowing analysis replication), as follows:

First, we installed the package “ssc install midas” and “ssc install metaprop” in STATA, which were the packages of diagnostic and proportion meta-analysis, respectively. Then, we input the commands “midas tp fp fn tn id(author year) ms(0.75) ford fors bfor(dss)” and got the forest plots for sensitivity and specificity. When less than four studies were included for a meta-analysis, we performed meta-analyses of proportions. We input the commands “metaprop num denom, random” and got the meta-analysis of proportions.

In “Statistical analyses” section, we added the following lines (Lines 137-147): “When there were at least four studies to include in the meta-analysis, we used bivariate mixed-effects models via random effects that consider the correlation between sensitivity and specificity by each study to provide estimates of effects [40]. When less than four studies were included for a meta-analysis, the mixed-effects model assessment was not appropriate, so we performed meta-analyses of proportions using the exact binomial distribution. We calculated the pooled sensitivity and specificity with their 95% confidence intervals” and “Heterogeneity was assessed through visual assessment of forest plots”

R2.12: Please provide a reference to qualify the type of meta-analyses used (bivariate model). Also clarify if it was the bivariate random effects method (which is commonly used) or bivariate mixed-effects models. By mixed effects do you mean random and fixed effects combined?

Answer. Thank you for your observation. We used a “bivariate mixed-effects regression framework focused on making inferences about average sensitivity and specificity” through midas command in Stata for meta-analysis for diagnostic test performance (1). This mixed model is part of the bivariate random effects method (2). We added the following line in “Statistical analyses” section (Lines 137-139): “When there were at least four studies to include in the meta-analysis, we used bivariate mixed-effects models via random effects that consider the correlation between sensitivity and specificity by each study to provide estimates of effects [43]”.

1. Dwamena BA, Sylvester R, Carlos RC. MIDAS: Stata module for meta-analytical integration of diagnostic test accuracy studies. Statistical Software Components [Internet]. 2009 [cited 2021 Apr 16] Available from: http://fmwww.bc.edu/repec/bocode/m/midas.pdf

2. Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. 2005;58(10):982-90. doi: 10.1016/j.jclinepi.2005.02.022. PMID: 16168343.

R2.13: Please state clearly at the beginning of this section that meta-analyses of GDS-4 and GDS-5 were done separately

Answer. Thank you for your observation. We added the following line in “Statistical analyses” section: “We performed the meta-analyses of GDS-4 and GDS-5 separately”.

R2.14: Please provide a rationale why the Y index was calculated provided. This is a global measure of accuracy and to my knowledge rarely used nowadays because of its limitations.

Answer. Thank you for your observation. As you rightly pointed out, the Youden index is a measure currently no longer used since it has several limitations. We found that adding the Youden index to the manuscript did not add anything new and the results were still understood despite of it. We therefore chose to delete the Youden index from the manuscript. We have removed all content about Youden Index in the "Statistical analysis", "Results", "Discussion", and "Table 3" sections.

R2.15: Please revisit how heterogeneity is measured in DTA reviews. I2 is used to assess heterogeneity of intervention reviews and not recommended for DTA reviews.

Answer. Thank you for your observation. We agree with you. The Cochrane Handbook for DTA reviews (1) indicates that it is not appropriate to use the i2 for sensitivity and specificity as it would overestimate the degree of heterogeneity observed. It also indicates that heterogeneity is usually assessed through visual assessment of forest plots and in ROC space. They indicate that heterogeneity can be observed through estimates with confidence intervals. When the results of the studies are different, there is little overlap in the confidence intervals. With respect to the ROC space, they indicate that it is rarely possible to obtain confidence intervals. It is impossible to assess whether the differences between studies are within the expected limit due to chance or due to real differences between studies. Therefore, we opted for the evaluation of heterogeneity through visual assessment of forest plots. We have removed the heterogeneity assessment with i2 in the manuscript and assessed heterogeneity visually with forest plots. We changed all content about I2 in the "Statistical analysis", "Results", and "Table 3" sections.

1. DTA meta-analyses methods: Chapter 11 (Interpreting results and drawing conclusions) of the Cochrane handbook for DTA reviews. https://methods.cochrane.org/sdt/handbook-dta-reviews

R2.16: Results

• The results section about risk of bias is thin. QUADAS has four domains against which risk of bias results are reported. Please specify which domains were deemed to have risk of bias.

Answer. Thank you for your observation. We added the following lines in “Results” section (Lines 199-201): “There was high risk of bias in the index test domain. Specifically, the question about the lack of pre-specification of the cut-off points was the most common flaw”.

R2.17: Table 3. The reporting of GRADE results is incorrect. GRADE assessment is given for an overall summary of evidence and not individual studies as presented. QUADAS is for individual studies but GRADE summarises the overall certainly of evidence across the domains quality/risk of bias; inconsistency, imprecision, indirectness and publication bias. For example, one would except an overall certainly of evidence for pooled results at each cutoff but not for individual studies. Please refer to the GRADE pro software to help with the GRADE assessment as well as generation of an accurate summary of findings table (https://gradepro.org/).

Answer. Thank you for your observation. In Table 3, we evaluated the certainty of evidence through the GRADE methodology for each version of GDS-4 and GDS-5 for each cut-off point. As seen in the table, there are versions of GDS-4 or GDS-5 for each cut-point that have been evaluated by only one study. It is our understanding that the GRADE methodology can be performed on individual studies when only one study answers the question to be evaluated. This is indicated by GRADE in its article "GRADE guidelines: 12. Preparing Summary of Findings tables - binary outcomes" by giving examples of outcomes evaluated by only one randomized clinical trial in Table 2 and Table 3 (1).

1. Guyatt GH, Oxman AD, Santesso N, Helfand M, Vist G, Kunz R, Brozek J, Norris S, Meerpohl J, Djulbegovic B, Alonso-Coello P, Post PN, Busse JW, Glasziou P, Christensen R, Schünemann HJ. GRADE guidelines: 12. Preparing summary of findings tables-binary outcomes. J Clin Epidemiol. 2013;66(2):158-72. doi: 10.1016/j.jclinepi.2012.01.012.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Eleanor Ochodo

26 May 2021

PONE-D-21-03547R1

Accuracy of the Geriatric Depression Scale (GDS)-4 and GDS-5 for the screening of depression among older adults: a systematic review and meta-analysis

PLOS ONE

Dear Alvaro Taype-Rondan,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please respond to review comments about clarifying what constitutes the overall meta-analyses versus investigations of heterogeneity in the abstract and main text (any versions vs similar versions at different common cutoff points?). In addition, do ensure that the conclusions of the review are in line with the stated objectives in both the abstract and main text.

Please submit your revised manuscript by 26 June, 2021. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Eleanor Ochodo, M.D., PhD

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: The authors have responded satisfactorily to the previous comments except those about thresholds.

It is still unclear what the overall meta-analyses entails versus investigations of heterogeneity. Overall meta-analysis (any GDS-4 version or GDS-5 version separately at different common cutoffs? ) vs heterogeneity (same GDS version at different common cutoffs). Since different cut-offs and versions have been used it is important to be clear from the outset what constitutes the overall meta-analysis vs heterogeneity.

ABSTRACT:

Methods section:

" We conducted sensitivity and specificity meta-analyses of those studies that used.....". Please revise to we conducted meta-analyses of the sensitivity and specificity of those studies that used......

"We performed a bivariate random-effects meta-analysis to calculate the pooled sensitivity and specificity with their 95% confidence intervals (95% CI). " Please be very explicit what this overall meta-analyses included. For example one could rephrase as follows "we performed a bivariate random-effects meta-analysis to estimate the pooled sensitivity and specificity with their 95% confidence intervals (95% CI) at each reported common cutoff. For the overall meta-analyses, any GDS-4 version or GDS-5 version separately, by each cut-off and for investigations of heterogeneity, across similar GDS versions by each cutoff".

Results

Being very explicit about the cutoff and versions will help one understand the results better. For example, the first set of reported results is about all versions of GDS-4 and GDS-5 separately at cutoff>2. This implies the overall meta-analyses at cutoff 2?? and subsequent cutoffs??

Conclusion:

This conclusion does not accurately reflect the aim of the review which is to estimate accuracy. For example this could be reworded to " We found several GDS-4 and GDS-5 versions that showed great heterogeneity in estimates of sensitivity and specificity, mostly with a low or very low certainty of the evidence"...........

MAIN TEXT

Statistical analyses section page 7

The overall meta-analyses vs heterogeneity is unclear. For example please see comparisons below:

Lines 133-135

We conducted meta-analyses of the sensitivity and specificity of GDS-4 and GDS-5 versions whenever studies fulfilled the following condition: 1) There was more than one study that compared the same version of GDS-4 or GDS-5 and used the same cut-off point.

Lines 143-144

In addition, we meta-analyzed all the included studies that assessed any GDS-4 version, and all the studies that assessed any GDS-5 version, by each cut-off.

The first statements about meta-analyses (lines 133-135) seem similar to investigations of heterogeneity (lines 145-146).

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Roshana Shrestha

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jul 1;16(7):e0253899. doi: 10.1371/journal.pone.0253899.r004

Author response to Decision Letter 1


11 Jun 2021

Journal Requirements:

1. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Answer. Thank you for your observation. We have reviewed the list of references and made changes to the reference style to make it conform to the journal's requirements. We found that Cheng 2004 (1) and Tsoi 2017 (2) have been corrected. We added their “Erratum” information in the references (Lines 383-384 and Line 420).

The list of references is complete, and no retracted studies have been cited.

According to the corrected information from Cheng 2004 (1), we have changed in the column “Reference standard” of Table 1: “Major depressive disorders, dysthymia, adjustment disorder with depressed mood, dementia with depression, bipolar disorder, depressive episode assessed by DSM-III” to “Major depressive disorder, dysthymia, depressive disorder not otherwise specified, adjustment disorder with depressed mood, dementia with depression assessed by DSM-III”. This change was also made in the Supplementary File S3_Table. No other changes to the manuscript were required.

1. Cheng ST, Chan AC. A brief version of the geriatric depression scale for the chinese. Psychol Assess. 2004 Jun;16(2):182-6. doi: 10.1037/1040-3590.16.2.182. Erratum in: Psychol Assess. 2006 Mar;18(1):48. PMID: 15222814.

2. Tsoi KK, Chan JY, Hirai HW, Wong SY. Comparison of diagnostic performance of Two-Question Screen and 15 depression screening instruments for older adults: systematic review and meta-analysis. Br J Psychiatry. 2017;210(4):255-60. doi: 10.1192/bjp.bp.116.186932. Erratum in: Br J Psychiatry. 2017;211(2):120. pmid: 28209592.

Reviewers' comments:

R2.1: ABSTRACT

Methods section:

" We conducted sensitivity and specificity meta-analyses of those studies that used.....". Please revise to we conducted meta-analyses of the sensitivity and specificity of those studies that used......

Answer. Thank you for your recommendation. We have changed in Lines 31-32: “We conducted sensitivity and specificity meta-analyses of those studies that used…” to “We conducted meta-analyses of the sensitivity and specificity of those studies that used…”.

R2.2: "We performed a bivariate random-effects meta-analysis to calculate the pooled sensitivity and specificity with their 95% confidence intervals (95% CI). " Please be very explicit what this overall meta-analyses included. For example one could rephrase as follows "we performed a bivariate random-effects meta-analysis to estimate the pooled sensitivity and specificity with their 95% confidence intervals (95% CI) at each reported common cutoff. For the overall meta-analyses, any GDS-4 version or GDS-5 version separately, by each cut-off and for investigations of heterogeneity, across similar GDS versions by each cutoff".

Answer. Thank you for your observation. We have changed in Lines 34-39: “We performed a bivariate random-effects meta-analysis to calculate the pooled sensitivity and specificity with their 95% confidence intervals (95% CI)” to “We performed bivariate random-effects meta-analyses to calculate the pooled sensitivity and specificity with their 95% confidence intervals (95% CI) at each reported common cut-off. For the overall meta-analyses, we evaluated each GDS-4 version or GDS-5 version separately by each cut-off, and for investigations of heterogeneity, we assessed altogether across similar GDS versions by each cut-off”.

R2.3: Results

Being very explicit about the cutoff and versions will help one understand the results better. For example, the first set of reported results is about all versions of GDS-4 and GDS-5 separately at cutoff>2. This implies the overall meta-analyses at cutoff 2?? and subsequent cutoffs??

Answer. Thank you for your observation. The sentence implies the overall meta-analyses at cut-off 2. It does not imply subsequent cut-offs. We have changed “at a cut-off ≥ 2” to “at a cut-off 2” for better understanding.

In addition, throughout the manuscript, the “≥” sign was eliminated when referring to cut-off points. For example, we have changed “cut-off ≥ 1” to “cut-off 1”.

R2.4: Conclusion:

This conclusion does not accurately reflect the aim of the review which is to estimate accuracy. For example this could be reworded to " We found several GDS-4 and GDS-5 versions that showed great heterogeneity in estimates of sensitivity and specificity, mostly with a low or very low certainty of the evidence"...........

Answer. Thank you for your observation. We have changed in Line 51: “We found several GDS-4 and GDS-5 versions that showed great heterogeneity, mostly with a low or very low certainty of the evidence” to “We found several GDS-4 and GDS-5 versions that showed great heterogeneity in estimates of sensitivity and specificity, mostly with a low or very low certainty of the evidence”

R2.5: MAIN TEXT

Statistical analyses section page 7

The overall meta-analyses vs heterogeneity is unclear. For example please see comparisons below:

Lines 133-135

We conducted meta-analyses of the sensitivity and specificity of GDS-4 and GDS-5 versions whenever studies fulfilled the following condition: 1) There was more than one study that compared the same version of GDS-4 or GDS-5 and used the same cut-off point.

Lines 143-144

In addition, we meta-analyzed all the included studies that assessed any GDS-4 version, and all the studies that assessed any GDS-5 version, by each cut-off.

Answer. Thank you for your observation. For the overall meta-analyses, we evaluated each GDS-4 version or GDS-5 version separately by each cut-off. Regarding investigations of heterogeneity, we assessed altogether across any GDS-4 version or GDS-5 version by each cut-off.

We have changed Lines 133-135 to “We conducted meta-analyses of the sensitivity and specificity of each of the GDS-4 and GDS-5 versions whenever studies fulfilled the following condition: 1) There was more than one study that compared the same version of GDS-4 or GDS-5 at the same cut-off point”.

We have changed Lines 143-144 to “In addition, we meta-analyzed altogether the results of the included studies that assessed the same cut-off point of any GDS-4 version. Likewise, we meta-analyzed altogether the results of the included studies that assessed the same cut-off point of any GDS-5 version”.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 2

Eleanor Ochodo

16 Jun 2021

Accuracy of the Geriatric Depression Scale (GDS)-4 and GDS-5 for the screening of depression among older adults: a systematic review and meta-analysis

PONE-D-21-03547R2

Dear Alvaro Taype-Rondan,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Eleanor Ochodo

Academic Editor

PLOS ONE

Acceptance letter

Eleanor Ochodo

21 Jun 2021

PONE-D-21-03547R2

Accuracy of the Geriatric Depression Scale (GDS)-4 and GDS-5 for the screening of depression among older adults: a systematic review and meta-analysis

Dear Dr. Taype-Rondan:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Eleanor Ochodo

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Checklist. PRISMA-DTA checklist item.

    (DOC)

    S1 Fig. D’ Ath version.

    (DOCX)

    S2 Fig. Van Marwijk version.

    (DOCX)

    S3 Fig. Cheng version.

    (DOCX)

    S4 Fig. Galaria version.

    (DOCX)

    S5 Fig. Hoyl version.

    (DOCX)

    S6 Fig. Cheng or Heisel version and De Dios or Ortega version.

    (DOCX)

    S1 Table. Search strategy.

    (DOCX)

    S2 Table. Excluded studies.

    (DOCX)

    S3 Table. Characteristics of the included studies.

    (DOCX)

    S1 File

    (XLSX)

    S1 Database

    (XLSX)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES