Skip to main content
SSM - Population Health logoLink to SSM - Population Health
. 2022 Nov 17;20:101290. doi: 10.1016/j.ssmph.2022.101290

Cross-sectional estimates of population health from the survey of health and retirement in Europe (SHARE) are biased due to health-related sample attrition

Magdalena Muszyńska-Spielauer a,c,d,, Martin Spielauer b
PMCID: PMC9700319  PMID: 36444337

Abstract

Cross-sectional data from the Survey of Health, Ageing and Retirement in Europe (SHARE) are a common source of information in comparative studies of population health in Europe. In the largest part, these data are based on longitudinal samples, which are subject to health-specific attrition. This implies that estimates of population health based on cross-sectional SHARE datasets are biased as the data are selected on the outcome variable of interest.

We examine whether cross-sectional datasets are selected based on health status. We compare estimates of the prevalence of full health, healthy life years at age 50 (HLY), and rankings of 18 European countries by HLY based on the observed, cross-sectional SHARE wave 7 datasets and full samples. The full samples consist of SHARE observed and attrited respondents, whose health trajectories are imputed by microsimulation. Health status is operationalized across the global index of limitations in activities of daily living (GALI). HLY stands for life expectancy free of activity limitations.

Cross-sectional datasets are selected based on health status, as health limitations increase the odds of attrition from the panel in older age groups and reduce them in younger ones. In older age groups, the prevalence of full health is higher in the observed cross-sectional data than in the full sample in most countries. In most countries, HLY is overestimated based on the cross-sectional data, and in some countries, the opposite effect is observed. While, due to the small sample sizes of national surveys, the confidence intervals are large, the direction of the effect is persistent across countries. We also observe shifts in the ranking of countries according to HLYs of the observed data versus the HLYs of the full sample.

We conclude that estimates on population health based on cross-sectional datasets from longitudinal, attrited SHARE samples are over-optimistic.

Highlights

  • Cross-sectional SHARE datasets consist of longitudinal samples selected based on health status.

  • In younger age groups, respondents in full health are more likely to drop out from the longitudinal samples.

  • In older age groups, respondents in decreased health are more likely to leave the longitudinal samples.

  • In cross-sectional SHARE data, the prevalence of full health is too high in older age groups and too low - in younger ones.

  • Health expectancy is overestimated based on cross-sectional SHARE datasets.

1. Introduction

Cross-sectional datasets based on longitudinal samples are subject to attrition, which is the dropout of participants from the samples between subsequent interviews. Attrition reduces the sample size and leads to erroneous results and conclusions as it biases the sample by changing its composition and making it no longer representative of the study population. It is particularly important to examine the effect of attrition when the sample is selected on the variable of interest or other characteristics correlated with the dependent variable (Deng, Hillygus, Reiter, Si, & Zheng, 2013; Goodman & Blum, 1996). The effect of attrition on study results is particularly apparent in health studies, where the outcome variable, health status, is a known determinant of attrition from the sample (Ahern & Le Brocque, 2005; Desmond, Bagiella, Moroney, & Stern, 1998; Graaf, Bijl, Smit, Ravelli, & Vollebergh, 2000; Hoeymans et al., 1998; Levin, Katzen, Klein, & Llabre, 2000; Michaud, Kapteyn, Smith, & Van Soest, 2011). Health status is a determinant of all the sources of attrition: failure to locate, refusal to participate, morbidity, and mortality (Graaf et al., 2000). In addition, demographic characteristics related to health, such as sex, old age, marital status, and educational attainment, are known determinants of attrition from the sample (Ahern & Le Brocque, 2005; Desmond et al., 1998; Graaf et al., 2000; Hoeymans et al., 1998; Levin et al., 2000). This implies that in cross-sectional health studies based on attrited samples, it is likely that inferences are made based on not only non-random samples, but also selected on the outcome variable of interest (Ahern & Le Brocque, 2005; Graaf et al., 2000). While the problem of sample attrition and its potential bias on the measurement of phenomena and their relationships is commonly acknowledged in longitudinal studies, it is not mentioned in cross-sectional studies with data derived from longitudinal samples of panel studies. The only exception, to our knowledge, is Michaud et al. (2011), who studied the effect of bringing attrited individuals to the survey on cross-sectional estimates of health, socio-economic status, and family composition in the Health and Retirement Study.

On the example of the Survey of Health, Ageing and Retirement in Europe (SHARE), this article aims to demonstrate that cross-sectional estimates of health based on longitudinal samples are biased by attrition. We assess to what extent attrition from the SHARE panel up to wave 7 is related to health status and whether a health-related attrition bias is present in cross-sectional estimates of population health in European countries based on cross-sectional wave 7 SHARE datasets. The importance of this study lies in the fact that cross-sectional datasets of SHARE are the most common source of information on the health status of the Europeans in comparative studies, and, in consequence, a large number of studies is potentially affected by health-related sample attrition.

As we hypothesize that attrition from the SHARE longitudinal sample is related to decreased health status, we expect that the share of respondents in full health is overestimated in SHARE cross-sectional datasets. As a result, in studies based on SHARE cross-sectional datasets, we overestimate health expectancy in European countries. As cross-sectional estimates of health expectancy based on SHARE are often applied to study differences between countries in health expectancy, we additionally hypothesize that the ranking of countries by healthy years lived is distorted when based on the non-representative samples. In this study, full samples representative of the European populations are created by adding back attrited individuals whose health trajectories are imputed by microsimulation based on transitions between health states (including death) observed in the longitudinal samples.

2. Data

This study is based on waves 1–2, 4–7 of the Survey of Health, Ageing and Retirement in Europe (SHARE) (Börsch-Supan, 2020; Börsch-Supan et al., 2013). The cross-sectional data that is subject of this study refers to wave 7. Wave 7 was conducted in 2017, except for a small part of the sample in Portugal, with an interview conducted in 2018. Because a different questionnaire was used in wave 3, data from this wave is excluded from the analyses. We include the 18 countries that participated in wave 7 and at least one wave before and had non-missing information on NUTS-1 place of residence, as NUTS-1 is required for the calibration of cross-sectional weights. 2017 life tables required as additional data input for calculating health expectancies applying the Sullivan method (Sullivan, 1971) come from Eurostat (2022). Regional NUTS-1 data on population by sex and age for the calibration of weights come from Eurostat (2022).

Descriptive statistics of the cross-sectional wave 7 datasets for each study country and their corresponding panel developments are provided in Table 1. The composition of the datasets by the origin wave is different across the countries, as they joined the survey at various waves and also the existence of refreshment samples is not universal. The attrition rate in the table refers to the share of the panel samples missing from the survey at wave 7 for causes other than death. Although one would expect that the attrition rate across the study countries would strongly depend on the time since the samples were originally drawn, this is not always the case in the SHARE study. The attrition rate of the study panel samples varies between 13% in Croatia and 52% in Germany. In the case of Croatia, the low attrition rate is related to the fact that the panel sample was drawn only in wave 6. However, other countries do not present this relationship between the panel length and attrition rate. The potential bias in cross-sectional estimates of any statistic caused by panel attrition is reflected in the sample retention for the wave 7 cross-sectional samples (column "% All” in Table 1). Cross-sectional wave 7 SHARE datasets retain between 41% (France) and 85% (Croatia) of respondents ever interviewed over the study years. In half of the countries, sample retention is 54% or less.

Table 1.

Description of the SHARE wave 7 cross-sectional sample development and share of respondents in wave 7 cross-sectional sample out of the entire panel sample by country.

Country
Joined in
Refreshment
No. of Respondents
Attrition
Respondents in Wave 7
Code Name Wave Year Wave Panel Died Attrited Rate Number % All
AT Austria 1 2004 4 6221 587 2463 40 3219 51
BE Belgium 1 2004–06 2–6 9551 856 3860 40 4932 51
CH Switzerland 1 2004 2,4 4501 271 1848 41 2417 53
CZ Czech Rep. 2 2006–07 4,5 8429 748 3496 41 4264 50
DE Germany 1 2004 2,5 8615 350 4480 52 3865 44
DK Denmark 1 2004 5,6 5688 866 1620 28 3257 57
EE Estonia 4 2010–11 7694 902 1734 23 5136 66
ES Spain 1 2004 2,4 8686 1184 2832 33 4732 54
FR France 1 2004–05 2,4,6 8118 708 4124 51 3341 41
GR Greece 1 2004–05 2,6 6395 762 2601 41 3119 48
HR Croatia 6 2015 2844 100 373 13 2788 85
HU Hungary 4 2011 3050 296 1226 40 1595 51
IT Italy 1 2004 2–6 8380 731 3140 37 4616 54
LU Luxemb. 5 2013 6 2120 26 856 40 1315 60
PL Poland 2 2006–07 7 6162 441 1097 18 7766 83
PT Portugal 4 2011 2147 129 742 35 1290 60
SI Slovenia 4 2011 5,6 5393 247 1472 27 3834 69
SE Sweden 1 2004–05 2,5 6591 742 2669 40 3217 49

Notes: Number of respondents in panel = All respondents interviewed at least once in any wave prior to wave 7; Attrition rate = no. of respondents who attrited in relation to the panel sample size; % All = interviewed in wave 7 as percent of all respondents ever interviewed.

Source: Authors‘ estimations based on Börsch-Supan (2020); information on start wave and refreshment samples from Bergmann, Kneip, De Luca, and Scherpenzeel (2019, p.10-11)

In this study health states are operationalized across the limitations in activities of daily living with the Global Activity Limitation Indicator (GALI), based on the question: “For at least the past six months, to what extent have you been limited because of a health problem in activities people usually do? Would you say you have been … ?" with the answer options “Severely limited”, “Limited but not severely” and “Not limited at all”. The study allows proxy responses to the GALI question (Bergmann, Scherpenzeel, & Börsch-Supan, 2019). GALI has been systematically accessed as a comparable health measure instrument across European countries (Berger et al., 2015; Jagger et al., 2010; Van Oyen, Van der Heyden, Perenboom, & Jagger, 2006).

3. Methods

3.1. Attrition in longitudinal samples

In the first part of the study, we assess to what extent the samples that form the cross-sectional wave 7 datasets are selected based on health status. We study the pattern of attrition according to health or health-related characteristics from wave to wave in the panel that, in the end, constitutes the wave 7 cross-sectional datasets. Attrition for reasons other than death, labeled further as attrition, is studied separately from mortality. Attrition due to death is an expected phenomenon in panel samples. If not higher than the officially registered mortality or selected on the characteristics of interest differently than the patterns observed in the general population, it is unlikely to bias the sample (Mihelic & Crimmins, 1997; Smith, Lynn, & Elliot, 2009). As shown by Friedel and Birkenbach (2020), when compared to data registry SHARE longitudinal panel undercounts deaths. We estimate multinomial logistic regression models where the outcome variable of interest is the interview status at the end of each wave: attrited or dead, compared to re-interviewed. The models are estimated separately for study countries and sex and include the interaction effect between health status and 10-year age groups. Health status is measured in two levels: full health versus limited in activities of daily living across the GALI. Apart from the interaction effect, we include the main age group effect in the models to control for the large differences between the age groups in the attrition levels per se. It has been demonstrated in previous studies that higher age (Chatfield, Brayne, & Matthews, 2005; Graaf et al., 2000; Mihelic & Crimmins, 1997), lower educational attainment (Banks, Muriel, & Smith, 2011; Mihelic & Crimmins, 1997; Sharma, Tobin, & Brant, 1986) and being single (Lillard & Panis, 1998; Mihelic & Crimmins, 1997) is associated with a higher risk of attrition. Hence, we include educational attainment and marital status as control variables to exclude the spurious relationship between these variables, health status and risk of attrition. Since the focus of this article is to discuss the development of the panel samples, we estimate the models without the longitudinal weights, of which one of the aims is to adjust for attrition. The models are estimated using the nnet package in R (Ripley & Venables, 2016).

As in this part of the study, we are only interested in the attrition process and not between which two specific waves it occurred, the datasets in this part of the study are created by pooling observations across all pairs of waves. As pooled datasets include repeated measurements on individuals, it is likely that the assumption of the logistic models that the error terms for each observation are independent is violated, and confidence intervals for the model estimates are underestimated (Hanley, Negassa, Edwardes, & Forrester, 2003). In our particular case, once re-interviewed respondents are more likely to stay in the longitudinal sample than those interviewed only once, as the highest attrition occurs after the first interview (results not shown in Tables). This indicates a time series autoregressive correlation structure of order one for the panel observations (Hardin & Hilbe, 2002) and hence a potential violation of the assumption in the logistic model of independence of error terms across measurements of individuals. To study sensitivity of the study results to this assumption, we estimated alternative logistic regression models, only for attrition, with a generalized estimating equations method (Liang & Zeger, 1986). The new models account for the potential autoregression of the error term across the repeated observations, but not for a competing risk of death. The models were estimated using the geepack package in R (Højsgaard, Halekoh, & Yan, 2005). Similar to the results of Mihelic and Crimmins (1997), no differences in the sign of the health-related coefficients or their significance between the ordinal regression models and those controlling for the autocorrelation between panel observations were observed (results not shown in Tables).

3.2. Cross-sectional estimates of population health

We assess the bias caused by attrition in cross-sectional estimates of health by comparing the prevalence of full health, and the differences in health expectancy at age 50, in the observed crosssectional wave 7 datasets with the prevalence of full health in a full samples. As full sample, we refer to the entire panel sample of SHARE, where the missing health status at wave 7 of respondents who attrited is imputed by microsimulation.

The scheme of wave 7 observed dataset and full sample generation across the SHARE panel data in a country that participated in all waves of the panel (1–2,4-7) is presented in Fig. 1. To simplify this illustration, it does not include respondents who missed an interview (or interviews) and were re-interviewed at any wave before wave 7. In the study, these respondents are not considered attrited, as their health status and other observed characteristics are included in the panel sample upon their return wave prior to wave 7. In the scheme, a new sample drawn at each wave X is denoted by SX. Out of the dataset WX, the respondents are either re-interviewed at the next wave (WX.[X + 1]), die before the next wave (DX), or attrit from the panel sample (AX). The observed dataset at the next wave X +1, marked in the scheme by a rectangle, consists of those re-interviewed respondents from the sample of the previous wave (WX.[X + 1]) and the new sample drawn at this interview, S[X + 1]. The full sample, marked by a circle, consists of the observed sample and respondents who attrited from the panel sample prior to wave X + 1, whose health status at wave X + 1 is simulated according to the procedure described below. The part of the full sample that consists of the respondents who attrited from the panel prior to wave X+1 is marked on the scheme by an orange circle. It is a sum of respondents who attrited at each of the waves prior to X + 1 and are returned to the full sample. These respondents are denoted as AY.[X + 1],Y = {1,2..,X}, where Y stands for the last wave the respondents participated in the survey.

Fig. 1.

Fig. 1

Scheme of the generation of wave 7 observed data and full sample across the SHARE panel samples.

Notes: Full sample refers to the entire panel sample of SHARE, where the missing health status at wave 7 of respondents who attrited is imputed by microsimulation, as described in the Methods Section.

This scheme does not include respondents who missed an interview (or interviews) and were re-interviewed at any wave before wave 7.

Source: Authors‘ own conceptualization

The health status of respondents missing from the panel is simulated under the assumption that the transition rates of these attrited respondents are identical to the observed transition rates. For example, we simulate the health state at wave 2 (A1.2 on the scheme in Fig. 1) of those who attrited between wave 1 and 2 (A1) by applying observed probabilities to be in a specific health state wave 2 dependent on the health state at wave 1 and other observed individual characteristics included in this study. These probabilities are estimated from the observed transitions between the health states of the respondents in dataset W1 and W1.2. Unlike the previous part of the study, in this part, the transition probabilities are estimated for every two waves. For each two panel waves, country and sex, we estimate transition rates between (non-missing) health states in the panel samples based on a multinomial logistic regression model with the outcome states defined by all possible states across the study health dimensions, including death as the absorbing state. Apart from an interaction effect between health state at the starting wave and age group, we include marital status, educational attainment as explanatory variables. We also control for the main effect of age group. Age is grouped into 5-year-age groups between 50 and 90 years and an open age interval of 90+ years. Like in the attrition model, we include interactions between age group and health status. Marital status is classified into two categories: married or living with a partner in a consensual union, single. Educational attainment is classified into three categories according to the International Standard Classification of Education (ISCED): low (0–2), medium (3–4), and high (5–8). The datasets used to estimate the transition rates between marital states between the waves come from panel data created by pooling all the observations across waves.

As discussed in the previous section, probabilities of transition between health states are applied to simulate the state of health at wave 7 of the attrited individuals. In the simulation model, age, health, and marital status are time-varying characteristics, and educational attainment is time constant. Age, marital status and health status are updated at each wave. As in Wolf (2001, pp. 313–339), microsimulation is used in this study as a multiple imputation method for missing responses due to attrition.

Weights applied in the cross-sectional estimates are derived in a raking procedure, which calibrates the distribution of the full sample, weighted by the design weights, by sex, age group (50–59,60–69,70–79, 80+ year), and NUTS-1, to the marginal distribution of the registered population of the same characteristics. This procedure is identical to the original method to derive cross-sectional, individual weights in the SHARE survey (De Luca & Claudio, 2019). To avoid potential errors resulting from differences between our and the original raking procedure in SHARE, we estimate these individual weights for both the cross-sectional wave 7 datasets and full samples. Observations with missing information on NUTS-1 place of residence were assigned missing values; hence, their design weights were raked only to the remaining margins (sex and age group). In both cases, i.e., for the respondents who were observed in the SHARE wave 7 and those whose health state at wave 7 was determined by microsimulation, we applied the design weights at the last observation (the SHARE variable dwwx,x = 1,2,4–6). We applied the function anesrake from the anesrake package in R (Pasek & Pasek, 2018).

First, we compare the ratio of the prevalence of full health in the observed datasets with that of the full samples in 10 year-age groups. Confidence intervals (CIs) for the ratio are derived following Method C proposed by Katz, Baptista, Azen, and Pike (1978, pp. 469–474). Next, the health expectancy at age 50 (HE) of the respondents of the observed datasets and full samples are estimated with the Sullivan method, which redistributes years lived at a certain age from a life-table into the healthy and unhealthy parts, according to the prevalence of health limitations. HE based on the GALI indicator, is called Healthy Life Years (HLY), and no limitations across the GALI are referred to as full health. CIs for the difference between the two HLYs are estimated according to the guidelines of Jagger et al. (2010). We apply age groups as in the data provided in the Eurostat's life-tables (5-year age groups between age 50 and 84 and an open interval of 85+ years) to derive the prevalence of full health in the HLY estimations.

4. Results

The longitudinal samples that form cross-sectional wave 7 SHARE datasets are selected based on health status. These result, however, is not uniform across the age groups, as the effect of health limitations on the odds of attriting from the longitudinal SHARE samples, as compared to the odds of being re-interviewed, depends on the respondent's age (Fig. 2). For both sexes and in most countries, decreased health reduces the odds of attrition for respondents in younger age groups (below age 70) and increases attrition in older age groups (above age 70). Although this pattern is not universal across countries, and for many country-sex-age combinations the effect is not significant, its persistent nature across the countries indicates that it is likely that the health limitations have an opposite effect on leaving the panel sample between the younger and the older respondents.

Fig. 2.

Fig. 2

Log odds ratio of attrition from the SHARE longitudinal sample due to decreased health, as compared to being reinterviewed, by sex and 10-year-age groups.

Notes: 90% confidence intervals;

We present log(odds ratio), as some of the confidence intervals for odds ratios are very wide and hence the figure is less readable. Derived based on a multinomial regression models with death as a competing risk, controlled for marital status, educational attainment and a proportional effect of a 10-year-age group.

Source: Authors‘ own estimations based on Börsch-Supan (2020).

In Fig. 3, we present the ratio of the prevalence of full health in the observed, cross-sectional datasets of SHARE wave 7 and the corresponding full samples by country, sex and age. In many countries, particularly for men, the ratio of the prevalence of full health in the two samples increases with age. This implies that the higher the age, the more the prevalence of full health is overestimated in the observed dataset compared to the full sample. The outliers in the study are men in Czech Republic and Hungary, and women in Austria and Poland. In these countries, prevalence of full health is significantly underestimated at age 80+ in the observed datasets. As samples in the country-sex-age sub-groups of the SHARE survey are small, the confidence intervals for both odds and the ratio of the two prevalence values are wide. Hence, the ratios are significant only for some sub-groups.

Fig. 3.

Fig. 3

Ratio of the prevalence of full health in cross-sectional wave 7 dataset to the prevalence in the full sample, in 10-year age groups, by country and sex.

Notes: 90% confidence intervals;

Full sample refers to the entire panel sample of SHARE, where the missing health status at wave 7 of respondents who attrited is imputed by microsimulation, as described in the Methods Section.

Source: Authors‘ estimations based on Börsch-Supan (2020); Eurostat (2022).

In Table 2, healthy life years at age 50 (HLY) estimated based on the observed datasets are compared to HLY estimated based on the full samples. Due to selective sample attrition, HLY is overestimated based on the observed wave 7 datasets, as the HLY values are higher in observed than in full samples in many countries. Based on attrited, cross-sectional datasets, HLY is overestimated up to 9.1% for women in the Czech Republic and 6.4% for men in Portugal. HLY is overestimated by at least 1% in 8 out of 18 countries for women and 11 - for men. As in the case of the prevalence ratios, due to the small SHARE samples, the standard errors, and hence confidence intervals, of HLYs and the difference between the two HLYs are large. Countries with the largest and significant difference between HLYs (e.g., France) are those that are characterized by a significantly higher ratio of the prevalence of full health in the observed datasets in older age groups, and in particular in the age group 80+ (compare Fig. 3). In a small number of cases, however, the opposite effect to the expected can be observed: HLY is underestimated when based on the observed data by at least 1% in 5 out of 18 countries for women and 3 - for men. In these cases, HLY is underestimated as the prevalence of full health in younger age groups is underestimated and this effect is not counterbalanced by a higher prevalence of full health at older ages in the observed datasets of wave 7. For example, for men in Croatia, HLY based on the observed dataset is underestimated by 0.4 years, compared to the HLY value based on the full sample. The share of men in full health in Croatia is underestimated in the observed dataset for all but those 80+ years old (compare Fig. 3) by up to 5.5% at age 60–69. Although the share of those in full health at age 80+ years is overestimated based on the observed data by 6%, this difference is not large enough to counterbalance, in HLY's estimates, the opposite pattern in the younger age groups.

Table 2.

Healthy life years (HLY) based on the cross-sectional wave 7 datasets (Observed), HLY based on the full panel samples in wave 7 with replaced missing health status for those who attrited before wave 7 (Full). Difference between the HLY based on the observed sample and the full sample. Standard errors of the HLY difference (SE). HLYs’ gap relative to HLY based on the observed sample. At age 50, by country and sex.

Country
Women
Men
HLY
Difference in HLY
HLY
Difference in HLY
Code Name Observed Full Absolute SE Relative Observed Full Absolute SE Relative
AT Austria 15.7 15.7 0.0 0.7 0.1 15.4 15.3 0.1 0.5 0.6
BE Belgium 15.8 16.1 −0.3 0.3 2.0 16.1 15.9 0.3 0.3 1.6
CH Switzerland 21.7 21.8 −0.1 0.9 0.5 21.7 21.6 0.1 0.6 0.4
CZ Czech Rep. 13.4 12.2 1.2** 0.5 9.1 14.0 13.3 0.7* 0.4 5.1
GE Germany 13.1 12.9 0.3 0.3 2.0 13.4 12.9 0.5 0.3 3.6
DK Denmark 18.2 18.2 0.0 0.4 0.1 18.6 18.9 −0.3 0.4 1.7
EE Estonia 10.9 11.1 −0.1 0.3 1.2 13.1 13.0 0.1 0.3 0.5
ES Spain 19.3 19.7 −0.4 0.5 2.0 19.9 19.9 0.0 0.4 0.1
FR France 16.9 16.3 0.6 0.4 3.5 19.0 18.0 1.1*** 0.4 5.6
GR Greece 22.6 22.5 0.1 0.3 0.7 24.5 23.7 0.8** 0.3 3.2
HR Croatia 12.7 13.0 −0.2 0.4 1.9 13.2 13.7 −0.4 0.4 3.3
HU Hungary 14.3 14.0 0.4 0.4 2.5 16.1 15.6 0.5 0.4 3.2
IT Italy 20.5 20.1 0.4 0.3 1.7 20.0 19.6 0.4 0.3 1.9
LU Luxembourg 16.3 16.3 −0.0 0.6 0.2 16.5 16.6 −0.1 0.7 0.6
PL Poland 11.1 10.8 0.2 0.3 2.2 12.6 12.8 −0.2 0.3 1.6
PT Portugal 15.9 15.4 0.5 0.5 2.9 13.3 12.4 0.9 0.7 6.4
SE Sweden 18.9 19.4 −0.5 0.5 2.7 18.1 17.1 1.0** 0.4 5.3
SI Slovenia 14.5 14.3 0.3 0.4 1.9 16.2 15.9 0.3 0.4 1.9

***p < 0.01, **p < 0.05, *p < 0.1.

Source: Authors‘ estimations based on Börsch-Supan (2020); Eurostat (2022).

In Table 3, we compare the ranking of the SHARE countries according to the HLY at age 50 estimated based on the two datasets. As expected, the ranking of countries according to HLY based on the observed datasets differs from ranking according to HLY based on the full samples. For men, we observe more changes in the rankings, reflecting the larger number of significant differences in the HLY between the two samples than for women. Most of the changes occur at the lower positions in the ranking, i.e., lower health expectancy values. The observed changes in the rankings are due to the countries characterized by an overestimated HLY based on the observed data lowering their position in the ranking for HLY based on the full samples (compare Table 2). These countries are France, Portugal, the Czech Republic, and Poland for women, Italy, France, Czech Republic, Germany, and Portugal for men.

Table 3.

Ranking of the SHARE countries according to Healthy Life Years at age 50 (HLY) based on observed cross-sectional wave 7 dataset (Observed), and ranking of countries according to HLY based on the full panel sample in wave 7 with replaced missing health status of those who attrited before wave 7 (Full). By country and sex.Source: Authors‘ own estimations based on Börsch-Supan (2020); Eurostat (2022).

4.

5. Summary and discussion

Cross-sectional datasets from the Survey of Health and Retirement in Europe (SHARE) are the most common source of information in comparative studies of population health in European countries. Apart from the first wave of each country, and since the replacement samples are drawn irregularly, SHARE cross-sectional datasets in the largest part consist of attrited, longitudinal samples. As health status and socio-demographic characteristics such as age, sex, educational attainment, and marital status are well-recognized determinants of attrition, SHARE longitudinal samples - and hence cross-sectional datasets based on these samples - are likely non-representative and selected on the variable of interest in population health studies. We studied the development of panel samples according to the health status of the respondents. We examined to what extent the health-related attrition in SHARE influences the cross-sectional estimates of population health and changes the ranking of countries in comparative studies. To do so, we compared the prevalence of good health and estimates of health expectancy in the wave 7 cross-sectional datasets with values estimated based on simulated full samples, where attrited respondents are replaced by microsimulation.

In most countries, cross-sectional datasets are selected on health status as we demonstrated that health status has a significant effect on attrition from the SHARE longitudinal samples. The effect is, however, the opposite in young and older age groups: In most countries, at younger ages (50–69), we observe higher attrition of those in full health. At older ages, those in full health are more likely to remain in the sample compared with respondents limited in their activities of daily living across GALI. As the panel samples are getting older, it is likely that in some countries, the observed negative effects of health on attrition odds in young age groups cancel out in older-age groups and hence are not present anymore in the cross-sectional data: the younger respondents of the early waves, with an over-represented share of health limitations, reach older ages, at which they are more likely to attrit or die. This explains why the gap in the prevalence of full health and HLYs between the two samples are small for some countries, or even negative. Results of this study concerning the development of the panel samples are consistent with the results of previous studies. For example, Di Gessa and Grundy (2014) demonstrated an increased risk of attrition among respondents with less than good self-rated health status between SHARE wave 1 and 2 in Denmark, France, Italy, and England. Stolz, Mayerl, Rásky, and Freidl (2018) showed that, in 9 European countries, those who attrited from the SHARE sample between waves 1 and 6 were considerably frailer than respondents who remained in the sample, and that the effect of frailty on attrition risk did not increase linearly with age.

In cross-sectional datasets, a persistent pattern was documented in many countries, particularly for men: the older the age group, the higher the ratio of the prevalence of full health in the observed, cross-sectional datasets and the full samples. In many of the study countries, in younger age groups, the prevalence of full health was demonstrated to be even underestimated when based on observed data. In older age groups, particularly at age 80+, the prevalence of full health is likely overestimated. The gap in HLYs is consistently linked to the prevalence of full health being overestimated in older age groups. We demonstrated that based on the attrited, cross-sectional SHARE datasets of wave 7, we overestimate the prevalence of full health in most countries. In very few countries, as a result of a lower prevalence of full health in the observed samples at younger age groups not being counterbalanced by the opposite effect at older age groups, HLY based on the observed data is underestimated. For both sexes, the ranking of countries according to HLYs of the observed data and full datasets change accordingly to the observed differences in HLYs between the two samples.

We also showed that the pattern of health-related attrition in the SHARE panel sample and its effect on cross-sectional estimates is not universal among countries and differs between sexes. A non-universal pattern of attrition between the SHARE waves across countries and sexes was also demonstrated by Bergmann, Scherpenzeel, and Börsch-Supan (2019). Differential attrition patterns across European countries were also reported in other surveys: the European Community Household Panel by Behr, Bellgardt, and Rendtel (2005), and the European Union Statistics on Income and Living Conditions by Greulich and Dasré (2017).

In this study, the health status of respondents missing from the panel is simulated, assuming that the transition rates of the attrited respondents between two waves are identical to the transition rates of the respondents observed in both waves. This assumption implies that the difference between the cross-sectional observed data and full sample at wave 7 results only from compositional differences in health state and other characteristics at the last observation between attrited and observed respondents. An identical assumption is used in the inverse-probability-of-attrition weights, which is the most common method for correcting the attrition bias in longitudinal samples (Schmidt & Woll, 2017; Seaman & White, 2013; Weuve et al., 2012). Alternative assumptions would differentiate transition rates between the health states of those who attrited from the observed rates. As demonstrated in the first part, such differentiation in the transition rates between observed and attrited respondents is likely to depend on age and hence requires a series of additional assumptions.

As the main objective of the SHARE study is to compare population health across European countries, great effort is being made to harmonize the data across countries. As we demonstrated in this study, different levels and patterns of health-related attrition across countries result in biases of varying severity in cross-sectional estimates of population health across countries and sexes, potentially affecting many comparative studies. We made the first step towards solving the problem by documenting its existence. As applied in this study, the simple simulation model is a straightforward method to fix this problem. However, it needs to be tailored to the specifics of a study, i.e., which cross-sectional wave and a health dimension are used. For example, as demonstrated by Beller, Geyer, and Epping (2022) for the German Aging Survey, the effect of decreased health on the risk of attrition differs between health dimensions. This requires the development of an R-package or software that follows the steps of this study and derives appropriate weights to correct for the attrition bias in the panel sample for a specific health variable, wave, sex, and country of interest. In this package, similar to this study, microsimulation would be applied to derive characteristics of respondents at a selected wave who attrited prior to that wave. The correction multiplier for the observed data for the attrition bias can be derived by comparing the distribution of the observed sample and full dataset (weighted by the design weight) according to the characteristics of interest. In the next step, the weighted sample distribution should be calibrated to the marginal distribution of the registered population of the same characteristics. The development of this package is beyond the scope of this paper. In this article's Supplementary Material, we provide the correction multiplier for design weights derived for the specific attrition problem described in this paper. These are adjustment multipliers for design weights to correct these weights for panel sample attrition in the SHARE cross-sectional wave 7 datasets by country, sex, 5-year-age groups, and health status across GALI.

Funding

MMS was supported by the European Research Council within the EU Framework Programme for Research and Innovation Horizon 2020, ERC Grant Agreement No. 725187 (LETHE).

Declaration of competing interest

None.

Acknowledgments

The authors would like to thank two anonymous reviewers, as this paper benefited enormously from their suggestions.

Footnotes

We confirm that this work is original, has not been published and is not under consideration for publication in another journal. Publication of the manuscript is approved by all authors and tacitly by the responsible authorities where the work was carried out, and that, if accepted, it will not be published elsewhere in the same form, in English or in any other language, including electronically without the written consent of the copyright-holder.

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.ssmph.2022.101290.

Appendix A. Supplementary data

The following are the Supplementary data to this article:

Multimedia component 1
mmc1.csv (24.3KB, csv)

Data availability

The authors do not have permission to share data.

References

  1. Ahern K., Le Brocque R. Methodological issues in the effects of attrition: Simple solutions for social scientists. Field Methods. 2005;17(1):53–69. [Google Scholar]
  2. Banks J., Muriel A., Smith J.P. Attrition and health in ageing studies: Evidence from ELSA and HRS. Longitudinal and life course studies. 2011;2(2) doi: 10.14301/llcs.v2i2.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Behr A., Bellgardt E., Rendtel U. Extent and determinants of panel attrition in the European community Household panel. European Sociological Review. 2005;21(5):489–512. [Google Scholar]
  4. Beller J., Geyer S., Epping J. Health and study dropout: Health aspects differentially predict attrition. BMC Medical Research Methodology. 2022;22(1):1–10. doi: 10.1186/s12874-022-01508-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Berger N., Van Oyen H., Cambois E., Fouweather T., Jagger C., Nusselder W., et al. Assessing the validity of the global activity limitation indicator in fourteen European countries. BMC Medical Research Methodology. 2015;15(1):1. doi: 10.1186/1471-2288-15-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bergmann M., Kneip T., De Luca G., Scherpenzeel A. Munich Center for the Economics of Aging; Munich: 2019. Survey participation in the survey of health, ageing and retirement in Europe (SHARE), wave 1-7. [Google Scholar]
  7. Börsch-Supan A. SHARE-ERIC; 2020. Survey of health, ageing and retirement in Europe (SHARE) waves 1-7. Release version: 7.1.0. data set. [DOI] [Google Scholar]
  8. Börsch-Supan A., Brandt M., Hunkler C., Kneip T., Korbmacher J., Malter F., et al. Data resource profile: The survey of health, ageing and retirement in Europe (SHARE) International Journal of Epidemiology. 2013;42(4):992–1001. doi: 10.1093/ije/dyt088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chatfield M.D., Brayne C.E., Matthews F.E. A systematic literature review of attrition between waves in longitudinal studies in the elderly shows a consistent pattern of dropout between differing studies. Journal of Clinical Epidemiology. 2005;58(1):13–19. doi: 10.1016/j.jclinepi.2004.05.006. [DOI] [PubMed] [Google Scholar]
  10. De Luca G., Claudio R. SHARE release guide; 2019. Weights and imputations. [Google Scholar]
  11. Deng Y., Hillygus D.S., Reiter J.P., Si Y., Zheng S. Handling attrition in longitudinal studies: The case for refreshment samples. Statistical Science. 2013;28(2):238–256. [Google Scholar]
  12. Desmond D.W., Bagiella E., Moroney J.T., Stern Y. The effect of patient attrition on estimates of the frequency of dementia following stroke. Archives of Neurology. 1998;55(3):390–394. doi: 10.1001/archneur.55.3.390. [DOI] [PubMed] [Google Scholar]
  13. Di Gessa G., Grundy E. The relationship between active ageing and health using longitudinal data from Denmark, France, Italy and England. Journal of Epidemiology & Community Health. 2014;68(3):261–267. doi: 10.1136/jech-2013-202820. [DOI] [PubMed] [Google Scholar]
  14. Eurostat . 2022. Eurostat database.https://ec.europa.eu/eurostat/data/database available online at. data downloaded on. [Google Scholar]
  15. Friedel S., Birkenbach T. Evolution of the initially recruited share panel sample over the first six waves. Journal of Official Statistics. 2020;36(3):507–527. [Google Scholar]
  16. Goodman J.S., Blum T.C. Assessing the non-random sampling effects of subject attrition in longitudinal research. Journal of Management. 1996;22(4):627–652. [Google Scholar]
  17. Graaf R.d., Bijl R.V., Smit F., Ravelli A., Vollebergh W.A. Psychiatric and sociodemographic predictors of attrition in a longitudinal study the Netherlands Mental Health Survey and Incidence Study (NEMESIS) American Journal of Epidemiology. 2000;152(11):1039–1047. doi: 10.1093/aje/152.11.1039. [DOI] [PubMed] [Google Scholar]
  18. Greulich A., Dasré A. The quality of periodic fertility measures in EU-SILC. Demographic Research. 2017;36:525–556. [Google Scholar]
  19. Hanley J.A., Negassa A., Edwardes M.D.d., Forrester J.E. Statistical analysis of correlated data using generalized estimating equations: An orientation. American Journal of Epidemiology. 2003;157(4):364–375. doi: 10.1093/aje/kwf215. [DOI] [PubMed] [Google Scholar]
  20. Hardin J.W., Hilbe J.M. Chapman and Hall/CRC; 2002. Generalized estimating equations. [Google Scholar]
  21. Hoeymans N., Feskens E.J., Van Den Bos G.A., Kromhout D. Non-response bias in a study of cardiovascular diseases, functional status and self-rated health among elderly men. Age and Ageing. 1998;27(1):35–40. doi: 10.1093/ageing/27.1.35. [DOI] [PubMed] [Google Scholar]
  22. Højsgaard S., Halekoh U., Yan J. The R package geepack for generalized estimating equations. Journal of Statistical Software. 2005;15(1):1–11. [Google Scholar]
  23. Jagger C., Gillies C., Cambois E., Van Oyen H., Nusselder W., Robine J.-M. The Global Activity Limitation Index measured function and disability similarly across european countries. Journal of Clinical Epidemiology. 2010;63(8):892–899. doi: 10.1016/j.jclinepi.2009.11.002. others. [DOI] [PubMed] [Google Scholar]
  24. Katz D., Baptista J., Azen S., Pike M. Biometrics; 1978. Obtaining confidence intervals for the risk ratio in cohort studies; pp. 469–474. [Google Scholar]
  25. Levin B.E., Katzen H.L., Klein B., Llabre M.L. Cognitive decline affects subject attrition in longitudinal research. Journal of Clinical and Experimental Neuropsychology. 2000;22(5):580–586. doi: 10.1076/1380-3395(200010)22:5;1-9;FT580. [DOI] [PubMed] [Google Scholar]
  26. Liang K.-Y., Zeger S.L. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22. [Google Scholar]
  27. Lillard L.A., Panis C.W. Panel attrition from the panel study of Income dynamics: Household income, marital status, and mortality. Journal of Human Resources. 1998:437–457. [Google Scholar]
  28. Michaud P.-C., Kapteyn A., Smith J.P., Van Soest A. Temporary and permanent unit non-response in follow-up interviews of the health and retirement study. Longitudinal and Life Course Studies. 2011;2(2):145–169. [Google Scholar]
  29. Mihelic A.H., Crimmins E.M. Loss to follow-up in a sample of Americans 70 years of age and older: The LSOA 1984–1990. Journals of Gerontology Series B: Psychological Sciences and Social Sciences. 1997;52(1):S37–S48. doi: 10.1093/geronb/52b.1.s37. [DOI] [PubMed] [Google Scholar]
  30. Pasek J., Pasek M.J. 2018. Package ‘anesrake’. The comprehensive R archive network. [Google Scholar]
  31. Ripley B., Venables W. Vol. 7. 2016. pp. 3–12. (Package ‘nnet’. R package version). [Google Scholar]
  32. Schmidt S.C., Woll A. Longitudinal drop-out and weighting against its bias. BMC Medical Research Methodology. 2017;17(1):1–11. doi: 10.1186/s12874-017-0446-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Seaman S.R., White I.R. Review of inverse probability weighting for dealing with missing data. Statistical Methods in Medical Research. 2013;22(3):278–295. doi: 10.1177/0962280210395740. [DOI] [PubMed] [Google Scholar]
  34. Sharma S.K., Tobin J.D., Brant L.J. Factors affecting attrition in the baltimore longitudinal study of aging. Experimental Gerontology. 1986;21(4–5):329–340. doi: 10.1016/0531-5565(86)90040-9. [DOI] [PubMed] [Google Scholar]
  35. Smith P., Lynn P., Elliot D. Sample design for longitudinal surveys. Methodology of longitudinal surveys. 2009:21–33. [Google Scholar]
  36. Stolz E., Mayerl H., Rásky É., Freidl W. Does sample attrition affect the assessment of frailty trajectories among older adults? A joint model approach. Gerontology. 2018;64:430–439. doi: 10.1159/000489335. [DOI] [PubMed] [Google Scholar]
  37. Sullivan D.F. A single index of mortality and morbidity. HSMHA Health Reports. 1971;86(4):347. [PMC free article] [PubMed] [Google Scholar]
  38. Van Oyen H., Van der Heyden J., Perenboom R., Jagger C. Monitoring population disability: Evaluation of a new global activity limitation indicator (GALI) Sozial-und Präventivmedizin. 2006;51(3):153–161. doi: 10.1007/s00038-006-0035-y. [DOI] [PubMed] [Google Scholar]
  39. Weuve J., Tchetgen E.J.T., Glymour M.M., Beck T.L., Aggarwal N.T., Wilson R.S.…de Leon C.F.M. Accounting for bias due to selective attrition: The example of smoking and cognitive decline. Epidemiology. 2012;23(1):119. doi: 10.1097/EDE.0b013e318230e861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Wolf D.A. Canadian Studies in Population [ARCHIVES]; 2001. The role of microsimulation in longitudinal data analysis; pp. 313–339. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1
mmc1.csv (24.3KB, csv)

Data Availability Statement

The authors do not have permission to share data.


Articles from SSM - Population Health are provided here courtesy of Elsevier

RESOURCES