Summary
Background
The high prevalence of depression in a growing aging population represents a critical public health issue. It is unclear how social, health, cognitive, and functional variables rank as risk/protective factors for depression among older adults and whether there are conspicuous differences among men and women.
Methods
We used random forest analysis (RFA), a machine learning method, to compare 56 risk/protective factors for depression in a large representative sample of European older adults (N = 67,603; ages 45-105y; 56.1% women; 18 countries) from the Survey of Health, Ageing and Retirement in Europe (SHARE Wave 6). Depressive symptoms were assessed using the EURO-D questionnaire: Scores ≥ 4 indicated depression. Predictors included a broad array of sociodemographic, relational, health, lifestyle, and cognitive variables.
Findings
Self-rated social isolation and self-rated poor health were the strongest risk factors, accounting for 22.0% (in men) and 22.3% (in women) of variability in depression. Odds ratios (OR) per +1SD in social isolation were 1.99x, 95% CI [1.90,2.08] in men; 1.93x, 95% CI [1.85,2.02] in women. OR for self-rated poor health were 1.93x, 95% CI [1.81,2.05] in men; 1.98x, 95% CI [1.87,2.10] in women. Difficulties in mobility (in both sexes), difficulties in instrumental activities of daily living (in men), and higher self-rated family burden (in women) accounted for an additional but small percentage of variance in depression risk (2.2% in men, 1.5% in women).
Interpretation
Among 56 predictors, self-perceived social isolation and self-rated poor health were the most salient risk factors for depression in middle-aged and older men and women. Difficulties in instrumental activities of daily living (in men) and increased family burden (in women) appear to differentially influence depression risk across sexes.
Funding
This study was internally funded by Colorado State University through research start-up monies provided to Stephen Aichele, Ph.D.
Keywords: Social isolation, Aging, Depression, Europe, Sex differences
Research in context.
Evidence before this study
We searched PubMed, the American Psychological Association PsycInfo, and PsycArticles, databases (from inception to October 30, 2021) for studies published in English and which investigated risk factors for depression in older adults. Search terms were the approved Medical Subject Headings “risk factors”, “depression”, “machine learning”, and “older adults”. We identified six studies that met these criteria. All but one had sample sizes less than 25,000 participants, none included in-depth questions about social networks, and none examined differences by sex.
Added value of this study
To our knowledge, this is the first large, multi-country study to use machine learning to compare a broad range of social, health, functional, and cognitive variables as concurrent risk/protective factors for depression in middle-aged and older adults—and with analyses conducted independently for women and men. Results concerning the importance of key risk/protective factors may be more reliable than those from studies using more conventional statistical techniques (e.g., because the current machine learning analyses make no distributional assumptions, are able to capture nonlinear effects and complex interactions among predictors without having to specify them a priori, which often leads to higher prediction accuracy). Additionally, the current results may better generalize than those from studies based on more constrained samples (narrower age range, single country of origin, data aggregated across sexes).
Implications of all the available evidence
Middle-aged and older adults who report being socially isolated and/or in poor general health are at nearly twice the elevated risk for depression. From a public health perspective, these results provide evidence of the importance of screening for depression risk within this age demographic during routine medical visits (i.e., when assessing general health status). Furthermore, this screening process may be improved by inclusion of measures of perceived social isolation.
Alt-text: Unlabelled box
Introduction
The World Health Organization estimates that over 300 million people currently live with a depressive disorder (commonly referred to as depression), and middle-aged and older adults appear to be disproportionately affected.1 Depression among older adults is associated with elevated risks for mortality,2 incident dementia (often comorbid),3 and reduced quality of life.4 The economic burden of depression is estimated at US$326 billion/year and is the leading cause of disability worldwide.5 These adverse personal and socioeconomic impacts will undoubtedly expand with increasing health care costs and as the percentage of the world population comprised of individuals over age 60 years nearly doubles, from 12% to 22%, by 2050.6 The development of affordable and effective intervention strategies is therefore critical, and this in turn requires a rigorous understanding of the underlying risk factors for depression in later adulthood.
Meta-analyses have shown that chronic diseases, functional impairment, reduced social support, and stressful life events are most frequently associated with depression onset in this age demographic.7,8 However, meta-analytic approaches are vulnerable to bias related to methodological differences of the summarized studies (e.g., differences in data quality, heterogeneous selection criteria, dissimilarities across measurement instruments). Moreover, interactions among risk/protective factors are notoriously difficult to test in a comprehensive way using conventional parametric statistical models because the number of possible interactions increases exponentially with each additional predictor. This means that results from such studies (and meta-analyses based thereon) may underestimate the importance of risk factors that influence depression risk indirectly (e.g., by amplifying or attenuating the effects of other risk/protective factors). An advantage of machine learning methods is that they can capture nonlinear and interaction effects in an exploratory way, i.e., without having to include all possible curvilinear and interaction effects in the model specification, which becomes infeasible with increasing numbers of predictors in parametric statistical models.
Recently, some research teams have adopted a split-sample methodology, wherein half of the observations (typically from a large, population-representative sample of older adults) are analyzed using machine learning methods to determine the importance of a broad range of different risk/protective factors (e.g., for cognitive decline or for all-risk mortality).9,10 Importance rankings based on these methods often implicitly account for nonlinear effects and complex interactions among predictors that would be difficult or impractical to model using conventional techniques. Parametric methods (e.g., logistic regression, survival analysis) are then applied to a more tractable subset of predictors (identified as important in the machine learning stage) within the second sub-sample to estimate predictive associations based on known statistical distributions. Thus, this strategy leverages both cutting-edge and established statistical techniques to efficiently pinpoint the strongest of many potential influences on health outcomes.
Such an approach has rarely been applied for evaluating depression risk. Aichele et al.11 previously used random forest analysis (RFA), a machine learning approach, to compare 36 different risk/protective factors (including sociodemographic, lifestyle, health, and cognitive measures) for predicting depressive symptoms trajectories in 6,203 middle-aged and older adults from the United Kingdom. Results showed that symptoms of physical illness and lower fluid intelligence (problem solving ability) were most influential. Having fewer friends and increased difficulty in daily life functional activities were also implicated. Choi et al.12 used a similar approach to evaluate 106 predictors in a sample of 112,589 adults from the United Kingdom (UK). Results showed that confiding in others, visits from family and friends, and being physically active reduced the odds of depression. Cognitive functioning and self-rated health were not examined. Generally, these studies indicate that lack of social support, poor health, and mobility are the most important risk factors for depression in later adulthood, at least in the UK. Further research is needed to determine whether these and additional risk/protective factors (e.g., cognitive and/or functional decline) are similarly important for predicting depression in other populations.
To our knowledge, no study has yet systematically and comprehensively compared socio-relational, health, cognitive, and functional variables as predictive of depression risk in older European adults and examined predictors across sexes. In the current study, we assessed the importance of 56 risk/protective factors for depression in a large, population-representative sample of middle-aged and older adults from Wave 6 of the Survey of Health, Ageing, and Retirement in Europe (SHARE; N = 67,603; ages 45-105y). By comparing such a broad range of predictors using a machine learning approach, we aim to further knowledge of the risk/protective factors for depression in later adulthood.
Methods
Study design and population
Data came from Wave 6 of SHARE, a multi-national longitudinal study of middle-aged and older adults from 20 European countries and Israel. Study design, sampling, and data resources for SHARE are described in detail in Börsch-Supan, 2013, 2019.13,14 Survey materials were administered as a Computer Assisted Personal Interview (CAPI), supplemented by paper and pencil questionnaire. The survey questions spanned demographic, socio-relational, and health-related (including functional ability and mental health) measures. Interviews were conducted in respondents’ homes and took about 90 min.
For the current study, we used SHARE Wave 6 data (N = 67,603; ages 45-105y, 56.1% women). SHARE used probabilistic sampling based on household (and other) demographic information to ensure that participant selection was nationally representative. Across countries, household response rates for SHARE Wave 6 ranged from 30.3% (Luxembourg) to 63.5% (Greece).15 Identified households included at least one person age 50 years or older selected as a primary respondent. Partners of primary respondents were also selected to participate, regardless of age. For this study, we included respondents as young as 45 years to capture a broader age range of middle-aged persons – a demographic currently understudied in research on adult aging and development (0.8% of participants in the current sample were of ages between 45 and 50 years).
We focused on Wave 6 because numerous social network variables were assessed at that wave that were not available at earlier waves and because we knew from prior studies that social isolation is a key risk factor for depression in older adults (see above). Data collection was approved by the internal review board of the University of Mannheim, Germany (until 2011) and by the Ethics Council of the Max-Planck-Society for the Advancement of Science (2011 onward).
Measures
Depression
Depression was based on responses to the EURO-D scale,16 which was initially developed to compare depressive symptoms in older persons from 11 European countries. The scale consists of 12 dichotomous items corresponding to the following depressive symptoms: sadness, pessimism, suicidality, guilt, sleep problems, interest in things, irritability, poor appetite, fatigue, difficulty concentrating, enjoyment, and tearfulness. Items are scored 0 or 1 such that 1 always indicates negative valence (i.e., 1 = more depressed) and summed for a final score between 0 and 12, where a summary score ≥ 4 indicates clinically diagnostic depression. The psychometric properties of the EURO-D scale have been extensively investigated, with moderately high reliability (average Chronbach's alpha = .694 across 14 European centers) and criterion validity established cross-culturally in European, Indian, and Latin-American populations.16,17
Risk/protective factors
Most of the data/variables for these analyses were obtained from easySHARE, a curated subset of SHARE variables that have been thoroughly screened. These measures were augmented with other Wave 6 variables related to behavioral risk factors, social network information, interpersonal transactions, and health. All of these variables have been carefully documented by SHARE online and with corresponding PDF codebooks (https://www.share-datadocutool.org/study_units/view/1).
Demographics
Analyses included seven sociodemographic variables: chronological age, country of residence (18 countries were represented), education level, employment status, marital status, number of people in the household, and household annual income. Note that analyses were carried out independently by sex, meaning that sex was not entered as a variable in the prediction models. Education level was based on the International Standard Classification of Education,18 a seven-point scale in ascending order of highest education level completed. Employment status was designated as one of five categories: retired, employed/self-employed (the comparison group), unemployed, permanently sick or disabled, and “homemaker.”
Family
Ten variables related to family configuration and responsibilities were included. Some variables related to family have been assigned to other categories (e.g., marital status, family members in social network, giving regular personal care to someone in the home). Family variables were live-in status and age difference of the spouse/partner; mortality statuses of spouse, parents, and siblings; numbers of children and grandchildren; proximity to children; and a question regarding whether family responsibilities prevented the respondent from pursuing personal interests (family burden).
Social network
There were 17 social network variables which assessed network structure and quality of relations. Structure was indicated by numbers of individuals in the network by relational type, e.g., family members, men, women; proximity of individuals’ domiciles to the respondent's, < 1km, < 5km; and frequency of contact. Subjective measures of social network quality included variables related to emotional closeness and social connectedness. Additionally, we calculated a measure of social isolation as an average of four items from SHARE's mental health battery, three of which correspond to the UCLA-3 Loneliness Scale,19 “lack of companionship”, “feel left out”, “feel isolated from others”— plus the item “feel lonely”. This composite measure was harmonized with identical items from other well-known population-representative studies of older adults.20
Care-related personal transfers
Care-related transfers were assessed by questions as to whether respondents had given and/or received financial support (> 250 Euro in the past 12 months), provided and/or received care from others living outside of the home, provided regular personal care to someone inside the home, and/or looked after grandchildren in the absence of a child's parents.
Health and functional limitations
Nine general measures of health were evaluated as depression risk factors: number of chronic diseases, self-rated poor health, diagnoses of two specific conditions (hypertension, diabetes), body-mass index (BMI), physical inactivity, tobacco smoking (past or present), and weekly alcohol consumption. In calculating number of chronic diseases for the corresponding easySHARE variable, the following medical conditions were included: heart attack, hypertension, high blood cholesterol, stroke and/or cerebrovascular disease, diabetes, chronic lung disease, cancer, stomach or duodenal ulcer, Parkinson's disease, cataracts, hip and/or femoral fracture.
SHARE spans numerous measures of functional health. Many of these have been conveniently summarized in easySHARE as composite variables. We included four composite measures of functional health status. These were (a) difficulty in activities of daily living (ADL): difficulties dressing oneself, bathing/showering, eating/cutting up food, walking across a room, and getting out of bed; (b) instrumental activities of daily living (IADL): difficulties making telephone calls, taking medications, and managing money; (c) difficulties with mobility: difficulties walking 100m, walking across a room, climbing several flights of stairs, and climbing one flight of stairs; (d) fine motor problems: difficulties in picking up a small coin, eating/cutting up food, and dressing. Additionally, we included a measure of grip strength.
Cognition
Cognitive measures were numerical ability (the ability to solve subtraction problems), general orientation (to date, month, year, and day of week), and delayed verbal recall memory (ability to remember a 10-item word list after answering additional survey questions).
Data analysis
All analyses were conducted within the R statistical computing environment.21 The data were first divided by sex (women, men; summary statistics are provided in Table 1) and then further split into analysis groups (A1 = random forest analysis; A2 = logistic regression) using R's built-in random sampling function, resulting in four sub-samples: Men A1 (n = 14,823), Men A2 (n = 14,823), Women A1 (n = 18,979), and Women A2 (n = 18,978). To handle missingness, we used multiple imputation by chained equations22 to impute 20 complete data sets for each of these four subsamples, using all variables included in the analyses.
Table 1.
Demographics | Men (N = 29,646) |
Women (N = 37,957) |
||
---|---|---|---|---|
n (%) | M (SD) | n (%) | M (SD) | |
1. Age in years | 29,646 (100.0) | 68.0 (9.7) | 37,957 (100) | 67.7 (10.4) |
2. Country of residence | 18 countries represented, not shown due to space limitations | |||
3. Education level | ||||
None | 1,208 (4.1) | 1,950 (5.1) | ||
Primary | 4,916 (16.6) | 7,668 (20.2) | ||
Lower secondary | 4,410 (14.9) | 6,560 (17.3) | ||
Upper secondary | 10,317 (34.8) | 11,994 (31.6) | ||
Post-secondary | 1,294 (4.4) | 1,688 (4.4) | ||
First stage tertiary | 6,711 (22.6) | 7,345 (19.4) | ||
Second stage tertiary | 336 (1.1) | 190 (0.5) | ||
4. Employment status | ||||
Retired | 19,242 (64.9) | 20,104 (53.0) | ||
Employed or self-employed | 7,786 (26.3) | 8,536 (22.5) | ||
Unemployed | 923 (3.1) | 946 (2.5) | ||
Permanently sick/disabled | 909 (3.1) | 1,123 (3.0) | ||
Homemaker | 74 (0.2) | 5,752 (15.2) | ||
5. Marital status | ||||
Married & living w/spouse | 22,822 (77.0) | 23,012 (60.6) | ||
Registered partnership | 476 (1.6) | 467 (1.2) | ||
Separated | 378 (1.3) | 401 (1.1) | ||
Never married | 1,789 (6.0) | 1,892 (5.0) | ||
Divorced | 2,055 (6.9) | 3,525 (9.3) | ||
Widowed | 2,004 (6.8) | 8,533 (22.5) | ||
6. # People in household | 29,646 (100.0) | 2.3 (1.0) | 37,957 | 2.1 (1.0) |
7. Household income in Euro (x 1k) | 29,646 (100.0) | 29.5 (34.0) | 37,957 (100.0) | 24.8 (29.1) |
Family | n (%) | M (SD) | n (%) | M (SD) |
8. Living with partner/spouse | 24,327 (82.1) | 24,276 (64.0) | ||
9. Age difference from partner | 24,314 (82.0) | 3.7 (4.7) | 24,263 (63.9) | -2.4 (4.5) |
10. # Children | 29,460 (99.4) | 2.1 (1.3) | 37,736 (99.4) | 2.1 (1.3) |
11. # Grandchildren | 26,306 (88.7) | 2.8 (3.3) | 34,227 (90.2) | 3.1 (3.5) |
12. ≥ 1 child in same household | 8,538 (28.8) | 11,062 (29.1) | ||
13. ≥ 1 child lives < 1km away | 11,672 (39.4) | 15,509 (40.9) | ||
14. Mother still alive | 5,789 (19.5) | 8,091 (21.3) | ||
15. Father still alive | 2,344 (7.9) | 3,303 (8.7) | ||
16. # siblings still alive | 23,312 (78.6) | 2.4 (1.8) | 29,714 (78.3) | 2.4 (1.8) |
17. Burden of family responsibilities1 | 28,116 (94.8) | 3.1 (0.9) | 36,421 (96.0) | 3.0 (1.0) |
Social Network | n (%) | M (SD) | n (%) | M (SD) |
18. Size social network | 26,107 (88.1) | 2.4 (1.5) | 33,969 (89.5) | 2.8 (1.6) |
19. # SNM in daily contact | 25,241 (85.1) | 1.3 (0.9) | 33,073 (87.1) | 1.3 (1.0) |
20. # SNM in weekly contact | 25,241 (85.1) | 2.1 (1.3) | 33,073 (87.1) | 2.5 (1.4) |
21. # Family members in SNM | 25,301 (85.3) | 2.0 (1.3) | 33,139 (87.3) | 2.2 (1.3) |
22. # Men in SNM | 24,619 (83.0) | 1.0 (1.1) | 31,554 (83.1) | 1.2 (0.9) |
23. # Women in SNM | 24,619 (83.0) | 1.5 (0.9) | 31,554 (83.1) | 1.6 (1.3) |
24. Avg. proximity of SNM | 24,247 (81.8) | 2.8 (1.6) | 31,797 (83.8) | 3.4 (1.6) |
25. Proximity, closest SNM | 24,247 (81.8) | 1.6 (1.4) | 31,797 (83.8) | 2.1 (1.6) |
26. # SNM within 1km | 24,250 (81.8) | 1.3 (0.9) | 31,799 (83.8) | 1.3 (1.0) |
27. # SNM within 5km | 24,250 (81.8) | 1.6 (1.1) | 31,799 (83.8) | 1.7 (1.2) |
28. Avg. freq. of contact from SNM | 25,238 (85.1) | 1.7 (0.9) | 33,071 (87.1) | 1.8 (0.9) |
29. Freq. contact, closest SNM | 25,238 (85.1) | 1.2 (0.6) | 33,071 (87.1) | 1.3 (0.6) |
30. Avg. emotional closeness in SNM | 25,233 (85.1) | 3.3 (0.6) | 33,082 (87.2) | 3.3 (0.6) |
31. Emotional closeness, closest SNM | 25,233 (85.1) | 3.6 (0.6) | 33,082 (87.2) | 3.6 (0.6) |
32. # Very emotionally close SNM | 25,236 (85.1) | 2.1 (1.4) | 33,084 (87.2) | 2.5 (1.5) |
33. Social connectedness | 25,018 (84.4) | 1.9 (0.9) | 32,579 (85.8) | 2.1 (0.9) |
34. Self-rated social isolation1 | 28,142 (94.9) | 1.3 (0.4) | 36,435 (96.0) | 1.4 (0.5) |
Interpersonal Transactions (past 12 months) | n (%) | M (SD) | n (%) | M (SD) |
35. Received support >250 Euro | 1,195 (4.0) | 2,451 (6.5) | ||
36. Received outside help | 5,551 (18.7) | 9,570 (25.2) | ||
37. Gave support >250 Euro | 5,691 (19.2) | 6,776 (17.9) | ||
38. Gave regular care in-home | 1,829 (6.2) | 2,708 (7.1) | ||
39. Gave help outside home | 8,014 (27.0) | 10,095 (26.6) | ||
40. Gave care for grandchildren | 4,970 (16.8) | 8,982 (23.7) | ||
Health and Functional Limitations | n (%) | M (SD) | n (%) | M (SD) |
41. # Chronic diseases | 29,580 (99.8) | 1.2 (1.3) | 37,870 (99.8) | 1.2 (1.2) |
42. Self-rated poor health | 29,594 (99.8) | 3.2 (1.1) | 37,874 (99.8) | 3.2 (1.1) |
43. Hypertension diagnosis | 12,209 (41.2) | 15,686 (41.3) | ||
44. Diabetes diagnosis | 4,517 (15.2) | 4,775 (12.6) | ||
45. Body mass index | 29,159 (98.4) | 27.3 (4.1) | 36,665 (96.6) | 26.8 (5.0) |
46. Lack of physical activity | 29,580 (99.8) | 2.5 (1.3) | 37,874 (99.8) | 2.7 (1.3) |
47. Ever smoked daily | 17,197 (58.0) | 12,576 (33.1) | ||
48. Alcohol consumption, units/week | 19,380 (65.4) | 8.8 (9.6) | 15,777 (41.6) | 4.6 (5.8) |
49. Maximum grip strength | 27,137 (91.5) | 42.2 (10.3) | 34,246 (90.2) | 26.0 (7.0) |
50–53. Difficulties in: | ||||
Activities of daily living (ADL) | 29,584 (99.8) | 0.2 (0.8) | 37,870 (99.8) | 0.3 (0.9) |
Instrumental activities of daily living (IADL) | 29,584 (99.8) | 0.2 (0.8) | 37,870 (99.8) | 0.3 (0.9) |
Fine motor skills | 29,582 (99.8) | 0.1 (0.5) | 37,868 (99.8) | 0.2 (0.5) |
Mobility | 29,582 (99.8) | 0.5 (0.9) | 37,868 (99.8) | 0.7 (1.0) |
Mental Health and Cognition | n (%) | M (SD) | n (%) | M (SD) |
OC. Depression | 5,451 (18.4) | 12,095 (31.9) | ||
54. Numerical ability | 28,129 (94.9) | 4.3 (1.3) | 36,425 (96.0) | 4.0 (1.6) |
55. Orientation to date, day of week | 28,110 (94.8) | 3.8 (0.5) | 36,417 (95.9) | 3.8 (0.5) |
56. Delayed recall memory | 27,943 (94.3) | 3.7 (2.1) | 36,224 (95.4) | 4.1 (2.3) |
Note: For continuous variables, n(%) refers to total responses (and rate). For binary and categorical variables, n(%) refers to count (and percentage) of participants who gave a confirmatory “yes” response. M = mean. SD = standard deviation. # = number of. SNM = social network member(s). SHARE variable names are provided in parentheses next to each variable. Predictors are numbered. OC = “outcome”
1Items were reverse-coded from their original SHARE scaling for clarity of interpretation
Random forest analysis (RFA)
RFA is a machine learning approach related to classification and regression trees.23 Regression trees recursively partition observations into sub-groups by predictor selection criteria that maximally discriminate differences in an outcome variable (e.g., depression risk). Thus, the “root node” of a regression tree represents the best predictor using all observations, whereas subsequent nodes represent the best predictors within nested, increasingly smaller subsamples of observations. Through this, trees are particularly inclined to represent interaction effects between several predictor variables but can also approximate linear as well as nonlinear effects by means of several splits in the same predictors. RFA extends this single tree approach, providing built-in cross-validation by generating multiple trees where each tree is derived from randomly sampled subsets of observations and predictors. The predictions in RFA are averaged over the individual trees, making them more smooth than those of any single tree. This gives RFA distinct advantages over standard regression-based approaches for comparing risk factors: Variable importance measures implicitly capture linear, nonlinear, and higher-order interaction effects. Moreover, problems related to multicollinearity and spurious variable selection (model over-fit) are mitigated.
RFA was applied to the A1 sub-samples (independently for women and men) using the function cforest() from the R package party with the “cforest_unbiased” option24,25 for predictors of mixed types and with variable importance based on the permutation importance. We tuned cforest's parameters (ntree = number of trees, mtry = number of variables pre-sampled per node) independently for each of the imputed data sets for the RFA subsamples (Men A1, Women A1) and running RFAs with different parameter values (ntree = 100–1200; mtry = 7–15). We then used the function cforesStats() from the R package caret26 to calculate the predictive accuracy for each of these analyses using the out-of-bag observations (OOB; unsampled observations during tree/forest construction) to determine the parameter values at which accuracy was maximized. Tuning-related analyses are described at length in the Supplemental Materials. We then re-ran the RFAs using the tuned parameter values for each of the imputed data sets, and we again calculated accuracy for each of these analyses based on the OOB observations. Point estimates for variables’ importance (VIMP) and variables’ ranks were averaged across the RFA of the 20 imputed data sets. RFA does not provide standard errors for VIMP estimates, but we checked the consistency of variables’ VIMP ranks across analyses of the imputed data sets (reported in the Supplemental Materials).
Logistic regression
We applied stepwise logistic regression to the A2 sub-samples (independently for women and men) to estimate explained variance in depression risk and odds ratios for the strongest risk/protective factors. Predictors were added sequentially in descending order of their ranked importance, as determined by the RFA results. Predicted variation in depression risk was evaluated using Tjur's coefficient of discrimination (R2Tjur), which has an upper bound of 1.00 and is closely related to R2 definitions for linear models.27 We conducted an a priori sensitivity analysis (α = .05, β = .80, N = 15,000) which showed that a stepwise change in R2 (ΔR2) = .0005 could be reliably determined. This ΔR2 was exceedingly small in terms of explained variation; however, we felt it important to take an inclusive approach. We therefore compromised by multiplying this value by an order of magnitude (x10) to arrive at a threshold for ΔR2 = .005; that is, a given predictor must contribute at least 0.5% to explained variation in depression risk to merit inclusion in the final model. Regression coefficients (point estimates and standard errors) were pooled across analyses of the imputed data sets using Rubin's rule as implemented in R package mice.22 These were then exponentiated to obtain odds ratios and corresponding confidence intervals. Point estimates for ΔR2 and total R2 were obtained by averaging ΔR2 and total R2 across analyses of the imputed data sets.
Role of the funding source: The funder had no role in the design or conduct of the study; the collection, management, analysis, or interpretation of the data; preparation, review, or approval of the manuscript; or in the decision to submit the manuscript for publication.
Results
Summary statistics
Summary statistics, including response rates, for variables in the current analyses are provided in Table 1. Because analyses were disaggregated by sex, we reported summary statistics by sex.
Random forest analysis
Optimized RFA tuning parameters are provided in the Supplemental Materials. Across the 20 imputed data sets, RFAs run with these parameter values gave average out-of-bag accuracy of M = 0.760 (range = .756–.764) for women and M = 0.824 (range = .820–.827) for men. RFA results for the ten predictors with the highest variable importance are presented in Table 2, with the full list presented as Supplemental Materials. The top predictors were highly similar for men and women. Self-rated social isolation was the strongest risk factor for depression, followed by self-rated poor health and difficulties with mobility. Other top predictors included additional health and functional measures, numerical ability (i.e., fluid intelligence), self-rated family burden, and country of residence.
Table 2.
Rank | Men |
Women |
||||
---|---|---|---|---|---|---|
Predictor | VIMP | rpb | Predictor | VIMP | rpb | |
1 | Social isolation | 0.026 | 0.379 | Social isolation | 0.044 | 0.394 |
2 | Poor health | 0.015 | 0.330 | Poor health | 0.030 | 0.372 |
3 | Diff mobility | 0.003 | 0.301 | Diff mobility | 0.007 | 0.308 |
4 | Diff IADL | 0.002 | 0.218 | Country | 0.003 | NA |
5 | Diff ADL | 0.002 | 0.231 | Family burden | 0.003 | 0.101 |
6 | Numerical ability | 0.002 | - 0.163 | Diff ADL | 0.003 | 0.228 |
7 | Diff fine motor | 0.002 | 0.219 | # of chronic diseases | 0.002 | 0.219 |
8 | Family burden | 0.001 | 0.093 | Numerical ability | 0.002 | - 0.193 |
9 | Country | 0.001 | NA | Diff fine motor | 0.002 | 0.206 |
10 | Lack of physical activity | 0.001 | 0.185 | Diff IADL | 0.002 | 0.210 |
Note: The ten top RFA predictors are shown in decreasing order of importance. VIMP = random forest raw variable importance. rpb = zero-order point-biserial correlations between each continuous predictor and the binary outcome, depression, calculated using all pairwise complete observations from the non-imputed data. Diff mobility= difficulties with mobility, Diff ADL= difficulties in activities of daily living, Diff IADL= difficulties in instrumental activities of daily living, Diff fine motor= difficulties with fine motor skills. NA = not applicable.
Logistic regression
Predictors for the logistic regression were entered stepwise in decreasing order of RFA importance, with predictor retention determined by ΔR2Tjur ≥ .005. Only five predictors met this threshold, the top three of which were consistent across men and women (Figure 1). Self-rated social isolation and self-rated poor health accounted for most of the explained variation in depression risk (22.0% for men, 22.3% for women). Difficulties in mobility accounted for an additional 1.6% (men) and 0.6% (women) of explained variation. In men, difficulties in instrumental activities of daily living (IADL) accounted for an additional 0.6% of explained variation. In women, self-rated family burden accounted for an additional 0.9% of explained variation.
Note that several of the top predictors as determined by RFA (which is robust to multicollinearity) were non-significant in the stepwise approach due to overlapping explained variation in depression risk. Measures of functional health were salient in this regard in that they offered little additional explanatory information beyond that of mobility difficulties. RFA results showed country of residence as a top predictor; however, increases in explained variation related to this variable in the logistic regression analyses were small (∼1%) given the large number of potential comparisons (with 18 countries represented). We therefore excluded country as a predictor in the final regression models.
Odds ratios for the top five risk/protective factors (standardized, for comparative purposes) are shown in Figure 2. Estimates in raw format are provided as Supplemental Materials. Results were as follows: +1SD social isolation increased odds in men by 1.99x, 95% CI [1.90,2.08] and in women by 1.93x, 95% CI [1.85,2.02]; +1SD poor health increased odds in men by 1.93x, 95% CI [1.81,2.05] and in women by 1.98x, 95% CI [1.87,2.10]; +1SD difficulties in mobility increased odds in men by 1.24x, 95% CI [1.17,1.31] and in women by 1.25x, 95% CI [1.15,1.36]. Additionally, in men, +1SD difficulties in IADL increased odds by 1.19x, 95% CI [1.11,1.28]. In women, +1SD higher self-rated family burden increased odds by 1.27x, 95% CI [1.22,1.32].
Discussion
We used random forest machine learning to compare 56 variables for predicting depression in a large sample of middle-aged and older European adults. Results showed that self-reported social isolation and poor health were the top predictors, accounting for over 20% of variability in depression risk. Mobility problems, difficulties in instrumental activities of daily living (for men), and family burden (family responsibilities interfering with personal aims, for women) accounted for approximately 2% of additional explained variability in risk. This demonstrates that social isolation, poor health, and difficulties in daily functional activities are a potent combination of risk factors closely linked to depression in later life.
Our findings amplify prior work identifying social isolation as a key predictor for depression among older adults.28,29 In 2020, the National Academies of Sciences, Engineering, and Medicine published a report stating that social isolation and loneliness represent a “significant yet underappreciated public health risk”.30 This report points to a growing concern that over 25% of community-dwelling Americans aged 65 and older are socially isolated, with 43% reporting feeling lonely. The current results indicate that older Europeans are likely similarly affected by problems of social isolation and depression.
Depression manifests both objectively and subjectively. Accordingly, in our study, we included a variety of social, health, cognitive, and functional variables, measured both objectively and subjectively. Of the top predictors for depression, subjective measures were most strongly represented. Social isolation (a composite measure based on subjectively reported lack of companionship, feeling left out, feeling isolated from others, and feeling lonely) was a much stronger predictor of depression risk than were more than 30 objectively measured sociodemographic, family, transactional, and social network-related variables. Self-reported social isolation was most strongly correlated with objective measures of living with a partner/spouse (r = -.29), mobility difficulties (r = .28), and distance to one's “closest other” (r = .22). Self-rated poor health was most strongly correlated with objective measures of mobility difficulties (r = 0.49), number of chronic diseases (r = 0.42), and physical inactivity (r = .33). These self-report measures of social isolation and poor health likely reflect health, mobility, and relational proximity problems that influence depression risk both directly and through their (potentially complex) interactions—and thus, compared with more narrowly defined risk factors, may be especially important for assessing risk outcomes in older adults.31
Beyond social isolation and general poor health, we found that problems with mobility (e.g., walking up stairs), difficulties in instrumental activities of daily living (in men), and self-reported burden of family responsibilities (in women), collectively accounted for an additional 2.2% (in men) and 1.5% (in women) of variation in depression risk. Difficulties with day-to-day function/personal care can contribute to loss of independence and lower self-efficacy, which in turn has been associated with depression onset.32 The interruption of personal goals due to increased burden of care and/or other stressful life events has been linked to higher depression and more emotional distress.33 Taken together, these associations indicate that challenges in navigating functional aspects of daily life likely contributed to depression risk.
Globally, the prevalence of depression in women has been estimated to be nearly twice that of men,1 however factors predictive of depression in older adults have rarely been compared across sexes. We conducted analyses independently by sex, and results showed highly similar predictor patterns across women and men, despite large differences in later-life depression prevalence rates (18% in men vs. 32% in women) as measured by the EURO-D in this population-representative sample. The question then remains as to what accounts for the overall difference in prevalence rates across sexes if the key risk factors and predictive effects are so similar? In addition to sex-related differences in IADL and burden of family care, a much larger percentage of women vs. men were widowed (22.5% vs. 6.8%), and widowhood was positively correlated with depression (φ = .10 in women; φ = .07 in men), which suggests that differences in bereavement may have also played a role. Notwithstanding, future studies of sex-related differences in risk factors for depression among older adults may benefit by closer examination of sex/gender-related social disparities and gender role socialization, which may influence the expression of depressive symptoms.34
Limitations
Data for this study were cross-sectional (SHARE Wave 6), so we could not examine temporal associations between risk/protective factors and depression onset. The decision to analyze cross-sectional data was made on the basis that social network variables of key concern were only collected in SHARE Waves 4 and 6, with additional risk/protective factors of focal interest only collected at Wave 6. Additionally, although we found little evidence of an effect of chronological age on differences in depression risk, it has been suggested that stressful life experiences (SLE) more strongly influence depression risk in mid- vs. late-life.35 Further, depressive symptoms in women have been shown to be elevated during the menopause transition.36 Unfortunately, we could not include SLE and related antecedents for affective disorders in our analyses, nor were we able to account for menopausal status or potentially important neurophysiological measures (e.g., cerebral white matter lesion prevalence) as these variables were not available from SHARE.
Conclusion
Depression remains a leading cause of disability in middle age and later life that is both preventable and possibly modifiable. The current results point to social connectedness, physical health, and mobility as key for maintaining emotional wellbeing and minimizing depression risk in later adulthood. Difficulties in instrumental activities of daily living (in men) and increased family burden (in women) appear to differentially influence depression risk across sexes. From a public health perspective, these results provide evidence of the importance of screening for depression risk and perceived social isolation within this age demographic during routine medical visits where these health indicators (perceived health, mobility) should be assessed regularly.
Contributors
EPH and SA contributed to the drafting, writing, and interpretation of the study results. SA performed all data analyses and formatted results (figures, tables). CS provided guidance and feedback on statistical methodology. YJ assisted with data preparation, summarization, and preliminary literature review. LF provided editing and interpretation of study results during manuscript revision. SA provided research oversight and supervision.
Declaration of interests
The authors have no conflicts of interest.
Footnotes
Supplementary material associated with this article can be found in the online version at doi:10.1016/j.lanepe.2022.100391.
Contributor Information
Elizabeth P. Handing, Email: ehanding@colostate.edu.
Stephen Aichele, Email: stephen.aichele@colostate.edu.
Appendix. Supplementary materials
References
- 1.The World Health Organization. Depression and Other Common Mental Disorders. Geneva: Global Health Estimates; 2017.
- 2.Walker E.R., McGee R.E., Druss B.G. Mortality in mental disorders and global disease burden implications: a systematic review and meta-analysis. JAMA Psychiatry. 2015;72(4):334–341. doi: 10.1001/jamapsychiatry.2014.2502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Dafsari F.S., Jessen F. Depression-an underrecognized target for prevention of dementia in Alzheimer's disease. Transl Psychiatry. 2020;10(1):160. doi: 10.1038/s41398-020-0839-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Sivertsen H., Bjorklof G.H., Engedal K., Selbaek G., Helvik A.S. Depression and quality of life in older persons: a review. Dement Geriatr Cogn Disord. 2015;40(5-6):311–339. doi: 10.1159/000437299. [DOI] [PubMed] [Google Scholar]
- 5.Greenberg P.E., Fournier A.A., Sisitsky T., et al. The economic burden of adults with major depressive disorder in the United States (2010 and 2018) Pharmacoeconomics. 2021;39(6):653–665. doi: 10.1007/s40273-021-01019-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.The World Health Organization. Fact Sheet: Ageing and Health. The World Health Organization; 2021.
- 7.Cole M.G., Dendukuri N. Risk factors for depression among elderly community subjects: a systematic review and meta-analysis. Am J Psychiatry. 2003;160(6):1147–1156. doi: 10.1176/appi.ajp.160.6.1147. [DOI] [PubMed] [Google Scholar]
- 8.Vink D., Aartsen M.J., Schoevers R.A. Risk factors for anxiety and depression in the elderly: a review. J Affect Disord. 2008;106(1-2):29–44. doi: 10.1016/j.jad.2007.06.005. [DOI] [PubMed] [Google Scholar]
- 9.Aichele S., Rabbitt P., Ghisletta P. Illness and intelligence are comparatively strong predictors of individual differences in depressive symptoms following middle age. Aging Ment Health. 2017;23(1):122–131. doi: 10.1080/13607863.2017.1394440. [DOI] [PubMed] [Google Scholar]
- 10.Puterman E., Weiss J., Hives B.A., et al. Predicting mortality from 57 economic, behavioral, social, and psychological factors. Proc Natl Acad Sci USA. 2020;117(28):16273–16282. doi: 10.1073/pnas.1918455117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Aichele S., Rabbitt P., Ghisletta P. Illness and intelligence are comparatively strong predictors of individual differences in depressive symptoms following middle age. Aging Ment Health. 2019;23(1):122–131. doi: 10.1080/13607863.2017.1394440. [DOI] [PubMed] [Google Scholar]
- 12.Choi K.W., Stein M.B., Nishimi K.M., et al. An exposure-wide and mendelian randomization approach to identifying modifiable factors for the prevention of depression. Am J Psychiatry. 2020;177(10):944–954. doi: 10.1176/appi.ajp.2020.19111158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Börsch-Supan A., Brandt M., Hunkler C., et al. Data resource profile: the survey of health, ageing and retirement in Europe (SHARE) Int J Epidemiol. 2013 doi: 10.1093/ije/dyt088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Börsch-Supan A, Brandt M, Hunkler C, et al., Survey of health, ageing and retirement in Europe (SHARE) wave 6. Release version: 7.1.0., 2019, 10.6103/SHARE.w6.710 [DOI] [PMC free article] [PubMed]
- 15.Bergmann M, Kneip T, De Luca G G, and Scherpenzeel A. Survey participation in the survey of health, ageing and retirement in Europe (SHARE), Wave 1-6. Based on Release 6.0.0. SHARE Working Paper Series 31-2017. Munich: Munich Center for the Economics of Aging (MEA); 2017.
- 16.Prince M.J., Reischies F., Beekman A.T., et al. Development of the EURO-D scale-a European, Union initiative to compare symptoms of depression in 14 European centres. Br J Psychiatry. 1999;174:330–338. doi: 10.1192/bjp.174.4.330. [DOI] [PubMed] [Google Scholar]
- 17.Castro-Costa E., Dewey M., Stewart R., et al. Prevalence of depressive symptoms and syndromes in later life in ten European countries: the SHARE study. Br J Psychiatry. 2007;191:393–401. doi: 10.1192/bjp.bp.107.036772. [DOI] [PubMed] [Google Scholar]
- 18.UNESCO. International Standard Classification of Education: ISCED. UNESCO; 1997. 2006.
- 19.Hughes M.E., Waite L.J., Hawkley L.C., Cacioppo J.T. A short scale for measuring loneliness in large surveys: results from two population based studies. Res Aging. 2004;26(6):655–672. doi: 10.1177/0164027504268574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gruenewald T.L., Crosswell A.D.E., Mayer E.S., et al. Measures of stress in the health and retirement study and the HRS family of studies. User Guide. 2020 [Google Scholar]
- 21.R Computing, R: A language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna: Austria; R Core Team
- 22.Van Buuren S., Groothuis-Oudshoorn K. MICE: Multivariate imputation by chained equations in R. J Stat Softw. 2011;45(3):1–67. doi: 10.18637/jss.v045.i03. [DOI] [Google Scholar]
- 23.Breiman L. Random forests. Mach Learn. 2001;(45):5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
- 24.Strobl C., Boulesteix A.L., Zeileis A., Hothorn T. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform. 2007;8:25. doi: 10.1186/1471-2105-8-25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Strobl C., Malley J., Tutz G. An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol Methods. 2009;14(4):323–348. doi: 10.1037/a0016973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kuhn M. Caret: classification and regression training. R package version 6.0-90. 2021; https://cran.r-project.org/web/packages/caret/caret.pdf
- 27.Tjur T. Coefficients of determination in logistic regression models-a new proposal: the coefficient of discrimination. Am Stat. 2009;63:366–372. doi: 10.1198/tast.2009.08210. [DOI] [Google Scholar]
- 28.Santini Z.I., Jose P.E., York Cornwell E., et al. Social disconnectedness, perceived isolation, and symptoms of depression and anxiety among older Americans (NSHAP): a longitudinal mediation analysis. Lancet Public Health. 2020;5(1):e62–e70. doi: 10.1016/S2468-2667(19)30230-0. [DOI] [PubMed] [Google Scholar]
- 29.Lee S.L., Pearce E., Ajnakina O., et al. The association between loneliness and depressive symptoms among adults aged 50 years and older: a 12-year population-based cohort study. Lancet Psychiatry. 2021;8:48–57. doi: 10.1016/S2215-0366(20)30383-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.National Academies of Sciences, Engineering, and Medicine. Social Isolation and Loneliness in Older Adults: Opportunities for the Health Care System. Washington, DC: The National Academies Press; 2020. [PubMed]
- 31.Ocampo J.M. Self-rated health: importance of use in elderly adults. Colomb Méd. 2010;41:275–289. doi: 10.25100/cm.v41i3.715. [DOI] [Google Scholar]
- 32.Yang Y. How does functional disability affect depressive symptoms in late life? The role of perceived social support and psychological resources. J Health Soc Behav. 2006;47(4):355–372. doi: 10.1177/002214650604700404. [DOI] [PubMed] [Google Scholar]
- 33.Cohen S., Murphy M.L.M., Prather A.A. Ten surprising facts about stressful life events and disease risk. Annu Rev Psychol. 2019;70:577–597. doi: 10.1146/annurev-psych-010418-102857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Cheung E., Mui A. Gender variation in late-life depression: findings from a national survey in the USA. Ageing Int. 2021:1–18. doi: 10.1007/s12126-021-09471-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kessler RC, Mickelson KD, Walters EE, et al. Age and depression in the MIDUS Survey. In: Brim OG, Ryff CD, Kessler RC, eds. How Healthy Are We? A National Study of Well-Being at Midlife. Eds. The University of Chicago Press; 2004:227–251.
- 36.Khoudary E., Samar R., Greendale G., et al. The menopause transition and women's health at midlife: a progress report from the Study of Women's Health Across the Nation (SWAN) Menopause J N Am Menopause Soc. 2019;26:1213–1227. doi: 10.1097/GME.0000000000001424. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.