Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Mar 1.
Published in final edited form as: Health Place. 2013 Dec 10;26:31–38. doi: 10.1016/j.healthplace.2013.12.002

Self-rated health: Small area large area comparisons amongst older adults at the state, district and sub-district level in India

Siddhivinayak Hirve b,c,*, Penelope Vounatsou a, Sanjay Juvekar b, Yulia Blomstedt c, Stig Wall c, Somnath Chatterji d, Nawi Ng c
PMCID: PMC3944101  NIHMSID: NIHMS547872  PMID: 24361576

Abstract

We compared prevalence estimates of self-rated health (SRH) derived indirectly using four different small area estimation methods for the Vadu (small) area from the national Study on Global AGEing (SAGE) survey with estimates derived directly from the Vadu SAGE survey.

The indirect synthetic estimate for Vadu was 24% whereas the model based estimates were 45.6% and 45.7% with smaller prediction errors and comparable to the direct survey estimate of 50%.

The model based techniques were better suited to estimate the prevalence of SRH than the indirect synthetic method. We conclude that a simplified mixed effects regression model can produce valid small area estimates of SRH.

Keywords: Small area estimation, Self-rated health, Empirical best linear unbiased prediction, Hierarchical Bayes estimation

1. Introduction

Nationally representative large area health surveys are valuable sources of information used by countries for health planning and evaluation. However, logistical and financial constraints limit the application of this information from the aggregated national or sub-national/state (large area) level to smaller sub-national areas as the sample size of such large area surveys is often inadequate to estimate health indicators with any level of precision at the district or sub-district (small area) level. A domain (area) is considered large if the domain-specific survey sample is large enough to yield ‘direct estimates’ of adequate precision and ‘small’ if otherwise. Such large area surveys, though rich in detailed health information, are of limited value to local health agencies at the district or sub-district level for purposes of formulation and evaluation of policy and program and resource allocation (Douglas et al., 2001; Paul-Shaheen et al., 1987). On the other hand, the demand for small area statistics has greatly increased in recent years due to decentralized health micro-planning and decision-making in India and elsewhere. Small area estimation (SAE) analysis are statistical procedures ranging from simple design-based direct estimates to complex model-based estimates that borrow strength by using information about the variable of interest from other similar small areas or from information in the same area collected in the past, and thus increase the effective sample size of the small area. These values are combined into the estimation process through a model which links the related small areas through the use of auxiliary information, most often census information available at the small area level (Ghosh and Rao, 1994).

Many of the SAE methods have been pioneered in the USA (Ericksen, 1974; Fay and Herriot, 1979; Kalton et al., 1993; Levy, 1979; Platek and Singh, 1986) and more recently in the UK (Bajekal et al., 2004; Martuzzi and Elliott, 1996; Twigg et al., 2000). SAE methods are broadly classified according to the data source they borrow information from–whether cross-sectional from other similar areas or from past data in the same area or both. They are further classified by type of inference as ‘design-based’ or ‘model-based’ with further classification based on whether they involve using the ‘frequentist’ or Bayesian approach in estimation (Pfeffermann, 2002). Alternately, these estimators are typically categorized into three groups–(i) direct estimates derived without any modeling from area-level data, (ii) synthetic or indirect estimates derived from the synthesis of survey and auxiliary data by some type of regression modeling and (iii) composite estimate, derived by combining direct and indirect estimates (Raffle, 2008). A number of studies have recently applied SAE methods to estimate disease counts including prevalence of diabetes (Congdon, 2006a), heart disease and stroke (Schwartz et al., 2009), psychiatric illness (Congdon, 2006b; Hudson, 2009), asthma (Mendez-Luck et al., 2007), chlamydial infection (Thomas and Nandram, 2010), dental caries (Leroux et al., 1996), rare outcomes like birth defects (Earnest et al., 2010) and disability (Jia et al., 2004) at the county or small area level. SAE methods have been used to estimate risky health behaviors like smoking (Cui et al., 2012; Li et al., 2009b), and alcohol use (Twigg and Moon, 2002), and obesity (Li et al., 2009a; Xie et al., 2007; Zhang et al., 2011) and prioritize communities with high under-five mortality rates (Asiimwe et al., 2011) and breast cancer (Knutson et al., 2008) for targeted public health interventions. Other studies have applied SAE techniques to estimate the unmet need for contraceptive use (Amoako Johnson et al., 2012), institutional births (Amoako Johnson et al., 2010), and monitoring vaccination coverage (Eberth et al., 2013; Jia et al., 2006). Studies have used SAE methods to understand geographical disparities in disease (Hudson and Soskolne, 2012; Schneider et al., 2009), inequities in income, poverty (Elbers et al., 2003), ecological relationship between inequity and illness (Curtis et al., 2006), health insurance coverage (Pickle and Su, 2002; Yu et al., 2007) and homelessness (Hudson and Vissing, 2010).

Self-rated health (SRH) is a widely used measure based on a person’s self-assessment of his status in response to a global health question “In general, how would you rate your health today?” (Fayers and Sprangers, 2002; Lundberg and Manderbacka, 1996; World Health Organization, 1996). SRH, though non-specific in its measurement, is a surprisingly reliable measure that is sensitive to a person’s perception of his health and it complements other more specific measures of health. It has been used in surveys to assess health status of populations, predict health outcome, survival, impending morbidity and death (Blazer, 2008; Hirve et al., 2012; Idler and Benyamini, 1997; Jylha, 2009). A strong association between poor SRH and risk of all-cause and disease-specific mortality independent of age, sex, income, education, social environment, health behaviors and chronic illness has been consistently reported by all studies (Benjamins et al., 2004; Benyamini and Idler, 1999; Frankenberg and Jones, 2004; Heidrich et al., 2002; Ishizaki et al., 2006; Kaplan and Camacho, 1983; Mossey and Shapiro, 1982; Ng et al., 2012; Nielsen et al., 2008; Yu et al., 1998). An inherent problem with SRH often overlooked is the concern of interpersonal incomparability. When an individual chooses a discrete response on an ordinal scale, the response is analyzed with the assumption that it represents a measure of his true health on an underlying latent interval scale. Different individuals use different thresholds to categorize their perception of their true health. This difference in reporting behavior referred to as reporting heterogeneity, unless recognized and corrected for, can lead to misleading and incorrect comparisons (Banks et al., 2007).

The Study on global AGEing and adult health (SAGE) is a multi-country study that includes a nationally representative survey implemented in India 2007–2008 under the aegis of the World Health Organization (WHO) that aims to improve understanding of health and well-being of adults aged 50 years and older, in low to middle income countries (Kowal et al., 2012). The SAGE survey was designed to provide estimates aggregated to the national and sub-national i.e. state level with adequate precision. The SAGE survey design did not allow estimation with adequate precision at the district or lower level. In addition to the national SAGE survey, an identical version was also implemented in a small rural population of about 100,000 under health and demographic surveillance (HDSS) Vadu, spread over 22 villages in Pune district in India in 2007 as part of the collaboration between SAGE and the International Network for the Demographic Evaluations of Populations and their Health (INDEPTH Network). In this paper, we derive SRH estimates for Vadu (small area) from the national WHO-SAGE survey, using different SAE methods and validate these small area level estimates against direct survey estimates derived from the INDEPTH-SAGE survey implemented in Vadu.

2. Methods

2.1. Ethics statement

The WHO SAGE survey was approved by the Ethics Review Committee of the WHO, Geneva and respective Ethics Committees of KEM Hospital Research Center, Pune implementing the Vadu SAGE survey and the International Institute of Population Sciences, Mumbai implementing the national SAGE survey. All individuals participated in the study after having completed an informed written consent.

2.2. Data sources

We used four different sources of data for our analysis—(1) the WHO-SAGE survey dataset for India, (2) the INDEPTH-SAGE survey dataset for Vadu, (3) the Census of India 2011, and (4) HDSS, Vadu dataset for 2007. The WHO-SAGE survey in India used a multi-stage, stratified cluster sample design (He et al., 2012). The sample was drawn from 19 of the 28 states and 7 union territories. The 19 states were categorized into six groups based on four indicators of infant mortality, female literacy, per capita income and safe deliveries. One state was randomly selected from each group. The sample was further stratified by urban or rural locality. Individual weights were post-stratified according to 2006 projected population estimates. For our analysis, we included only the sample from the rural strata of one of the six selected states viz. Maharashtra. Maharashtra has a population of about 9 million aged 50 years and above living in about 44,000 villages in 35 districts of the state. The WHO-SAGE survey was administered to a sample of 630 of this population living in rural areas of 21 districts of the state. Five individuals were excluded due to missing information on district of residence. The Vadu demographic surveillance area comprising 22 villages in Pune district of the state had a population of about 9800 aged 50 years and over. The INDEPTH-SAGE survey used a simple random sample design to select 500 individuals from a list of 9801 individuals aged 50 years and older that was generated from the most recent 2006 HDSS dataset. Sampling weights (inverse of the probability of the individual getting selected) were assigned for these individuals. The provisional population counts disaggregated to the district level for Maharashtra state were downloaded from the Census of India 2011 website. We tabulated population census counts for each district for age groups 50–59y, 60–69y, 70–79y and 80+y separately for men and women who lived in rural areas of Maharashtra. We similarly, tabulated population counts for Vadu from the HDSS dataset stratified by age and sex.

The national and INDEPTH- SAGE survey datasets included variables on demographic and socio-economic characteristics, individual reports of their health status and subjective well-being, and social networking. Wealth quintiles were derived from household ownership of assets, dwelling characteristics and access to safe drinking water and sanitation. The SRH response of very poor/poor/ fair SRH on the original five-category Likert scale was recoded as ‘poor SRH’ while the response good/very good SRH was recoded as ‘good SRH’. We constructed the shortened version of the WHO disability assessment schedule (WHODAS-II) score based on twelve questions that assessed six domains of functioning in daily life (understanding and communicating, getting around, self-care, getting along with others, life activities and participation in society) (World Health Organization, 2001). The weighted mean of responses to the twelve questions was transformed into a final score that ranged from 0 to 100 (where 0 indicated no disability and 100 indicated extreme disability). The WHO quality of life (WHOQOL) score was a mean of eight responses to questions on satisfaction with health, living conditions and other aspects of life. The mean score ranged from 1 to 5 where 1 indicated ‘best’ and 5 indicated ‘worst’ quality of life. The female literacy rate, proportion with house ownership, proportion with access to safe drinking water and sanitation, socio-economic development index for the district were taken from the Census data.

2.3. Data analysis

Our outcome measure for analysis is the prevalence estimate for ‘good SRH’. We used four different SAE approaches to estimate SRH at the district/sub-district level. We used the national SAGE survey and the Census India dataset to generate an indirect synthetic estimate for each district. Similarly we used the Vadu HDSS dataset to generate the estimate for the Vadu demographic surveillance area.

2.3.1. Basic indirect synthetic estimate

We grouped the SAGE survey participants into eight groups (k=1,…,8) according to sex and age. We computed the estimated district specific prevalence (j) as:

p^j=k=18njknjp^.k

where njk is the census count in demographic group k in district j, nj is the total census count in district j, and .k is the estimated prevalence rate for demographic group k at the state level. A 95% credible interval was estimated via Monte Carlo Markov Chain (MCMC) simulation.

2.3.2. Model based regression estimate

We used two routines (xtmelogit) and generalized linear and latent mixed model (GLLAMM)) in STATA v11 to develop a random effects model as:

logit(pij)=XijB+uj

where X was a vector of age, sex, disability, quality of life and social networking as significant individual level covariates, B was a vector of corresponding fixed effects, and uj was a vector of district specific residuals. From the model, we predicted pij for each individual, computed the expected prevalence estimate (jk), for each demographic group in each district and then the expected district specific prevalence estimate (j). We included the random effect term in the model but excluded individual values of the random effects while computing the expected prevalence estimates. Both routines provided empirical best linear unbiased predictions (EBLUP) post-estimation of the random parameters (estimates and their prediction and standard error) given the observed data.

2.3.3. Hierarchical Bayesian (HB) estimate (random effects logistic regression model)

We developed a random effects logistic regression model in WinBUGS software as follows:

Yij~Be(pij)logit(pij)=XijB+ujuj~N(0,σu2)

We assumed a Bernoulli distribution with probability pij for the outcome. We defined a logistic regression model with random effect (uj) for each district to be distributed normally with mean 0 and variance (σu2). We used the likelihood function in conjunction with non-informative priors to estimate the posterior distributions for the β coefficients and random effect and their prediction error parameters. We used the mean of the posterior distributions of the parameters to compute the district-specific prevalence estimates in the same way as for the other approaches.

2.4. Validation

We cross-validated the prevalence estimates generated by the different SAE approaches with the prevalence directly estimated from the WHO-SAGE survey data for each district and from the INDEPTH-SAGE survey data for the Vadu demographic surveillance area.

3. Results

The INDEPTH-SAGE survey was administered to 321 out of a randomly generated list of 500 individuals from the HDSS dataset (response rate 64%) after excluding 54 (11%) individuals who refused and 125 (25%) who had migrated or could not be traced. The non-responders did not differ significantly in terms of age, sex, education and socioeconomic status.

The age (mean 62 years) and sex (51% men) composition of the SAGE survey participants from Vadu was not significantly different from that for Maharashtra (Table 1). Individuals from Vadu were significantly better educated (20% had secondary education or more), had a higher proportion currently working (68%) as compared to those from the state. A significantly higher proportion (31%) from the state reported disability as a reason for not currently working as compared to 6% from Vadu. The individuals from the state reported significantly greater disability (DAS score 28.4) and lower quality of life (QOL score 57.0) compared to the Vadu group (DAS score 20.8, QOL score 72.2). A significantly higher proportion of individuals from Vadu reported good SRH (50%) compared to 23% from the state.

Table 1.

Selected demographic and socio-economic characteristics of SAGE participants.

Census 2011 (Maharashtra) WHO-SAGE (Maharashtra) INDEPTH-SAGE (Vadu)
Population above 50y age (rural) 9170,271 9055,965 9801
No. of districts 35 21 1
No. of villages 43,943 46 22
Female literacy (rural) in age 6þyrs 75% 74% 79%
SAGE respondent profile (n = 625) (n=319) p-Valuea
Mean age (SD) 62.0 (8.4) 62.2 (8.5) 0.672
Age
50–59y 44% 38% 0.305
60–69y 37% 41%
70–79y 16% 17%
80+y 3% 4%
Males 51% 51% 0.877
No spousal support 22% 24% 0.541
Education
None 54% 60% 0.001
Primary 32% 20%
Secondary 9% 12%
Higher secondary/college 5% 8%
Wealth quintiles
Poorest quintile 25% 16% <0.001
2nd 21% 24%
3rd 16% 27%
4th 25% 19%
Richest quintile 13% 15%
Work status
Currently working 57% 68% 0.002
Not working, disabled 31% 6%
Not working, retired 44% 37%
Not working, other reasons 25% 57%
Disability adjusted scoreb (SD) 28.4 (0.8) 20.8 (0.8) <0.001
Quality of life score (SD)c 57.0 (0.6) 72.2 (0.9) <0.001
Networking score (SD)d 2.3 (0.6) 2.5 (0.6) <0.001
Self-rated health <0.001
Very good 1% 6%
Good 22% 44%
Fair 58% 43%
Poor 18 % 6%
Very poor 1% 1%
a

p-Value for difference between SAGE Maharashtra and SAGE Vadu.

b

Range:0–100; higher score indicates greater disability.

c

Range:0–100; higher score indicates better quality of life.

d

Range: 1–5; higher score indicates higher level of social networking.

Only three of the 21 districts sampled in the WHO-SAGE survey had more than fifty individuals generally considered as a minimum sample required estimating prevalence (median sample size was 30 individuals per district, range: 14 to 81 individuals). The regression coefficient estimates for the fixed and random effects parameters as estimated by the GLLAMM and the xtmelogit routine were almost identical (Table 2). The corresponding parameters estimated by the Bayesian approach were similar to those estimated by the GLLAMM and xtmelogit routines. The posterior variance of the district specific prevalence estimates was smaller (.089) with tighter intervals compared to those estimated by the two routines. Table 3 compares the district (small area) level prevalence estimates for good SRH derived using different SAE methods with the prevalence estimated directly from the survey. None of the district level variables (female literacy rate, proportion with house ownership, access to safe drinking water and sanitation, socio-economic development index for the district) were statistically significant and were excluded from the SAE models. The direct survey prevalence at the state level was 23.3% (95% credible intervals 20.0–26.7%). At the district level, the estimate varied from 4.8% to 47.1% with very wide 95% credible intervals reflecting the small sample size for each district. The indirect synthetic estimate for the state was 23.5% (similar to the direct estimate), however with minimal variation amongst the districts. The 95% credible intervals were less wide compared to those for the direct prevalence estimates at the district level. The regression model based prevalence estimates from both routines (xtmelogit and GLLAMM) were similar both at the state (18.7% and 19%) and district level. The 95% confidence intervals derived from both routines were also very similar, yet tighter than the 95% credible intervals derived by either the direct survey estimation or the indirect synthetic method. The district level estimates derived from the hierarchical Bayesian model were closer to the district level direct survey estimates albeit with wider 95% credible intervals reflecting the uncertainty around the estimates. Fig. 1 ranks each district by the amount of its district-specific random effect i.e. by the amount by which the log odds of the mean district-specific prevalence estimate differs from the log odds of the overall mean state level prevalence. Districts with a positive random intercept had a higher likelihood of reporting good SRH as compared to districts with a negative random intercept despite controlling for individual level covariates. This difference in district-specific prevalence however was not significant as was seen by the overlap of their 95% confidence intervals. Fig. 2a shows the Bayesian model based HB estimate and indirect synthetic prevalence estimates plotted against the direct survey estimate for each district. There is minimal variation in the indirect synthetic estimate between districts compared to the between-district variation of the model-based estimate. For most districts, the model based estimate was far lower than the estimate derived directly from the SAGE survey. Fig. 2b shows the Bayesian model based HB estimate and GLLAMM prevalence estimates plotted against the direct survey estimate for each district. For most districts, the HB estimates show closer approximation to the direct survey estimates than the GLLAMM estimates. The correlation between the different model-based estimates (xtmelogit, GLLAMM and HB) was high (greater than 0.95). The correlation between the model-based estimates and the direct survey and indirect synthetic estimate and the direct survey estimate was 0.75 and −0.56, respectively (detailed results not shown). Fig. 3 show the geographical disparity in the distribution of prevalence of good SRH derived using the xtmelogit model-based SAE method. A lower prevalence of good SRH was seen in the economically better developed western region of Maharashtra (exception the coastal district of Sindhudurg) as compared to a higher prevalence of good SRH seen in the economically backward Vidarbha region in Eastern Maharashtra. This difference however was not significant.

Table 2.

Multilevel random effects logistic regression model coefficient estimates. The HB coefficients are the posterior means of the coefficients (standard deviation in brackets) estimated by the HB model.

Xtmelogit Coefficient (SE) GLLAMM coefficient (SE) HB coefficient (SD)
Intercept −5.47*** (.868) −5.47*** (.868) −5.01 (.947)
Age (centered on mean) −.03* (.016) −.03* (.016) −.03 (.017)
Sex −.33 (.246) −.33 (.246) −.34 (.249)
Disability score −.03** (.009) −.03** (.009) −.03 (.009)
QOL score .06*** (.011) .06*** (.011) .06 (.011)
Social networking score .56** (.213) .56** (.213) .58 (.216)
Random effect parameters
District level intercept variance .214 (.166) .214 (.166) .089 (.039)
*

Indicates significance level <.05.

**

Indicates significance level <.01.

***

Indicates significance level <.001.

Table 3.

Comparing small area (district) level estimates for prevalence of good SRH based on direct survey, indirect synthetic approach, multilevel regression analysis, generalized linear latent and mixed model (GLLAMM), and hierarchical Bayesian model, Maharashtra State.

District N Direct survey
Synthetic
Xtmelogit
GLLAMM
HB
Estimate (%) 95% credible
interval (%)
Estimate (%) 95% credible
interval (%)
Estimate (%) 95% credible
interval (%)
Estimate (%) 95% credible
interval (%)
Estimate (%) 95% credible
interval (%)
AH 19 33.2 15.4 54.3 23.5 20.3 26.8 31.8 27.7 35.9 31.6 28.2 35.1 32.4 18.7 48.2
AK 39 29.4 16.6 44.0 23.5 20.3 26.8 21.5 10.7 32.2 21.5 10.9 32.1 27.0 16.7 39.6
AU 30 21.9 9.7 37.4 23.5 20.4 26.8 17.8 10.6 25.0 17.8 10.6 25.0 19.7 10.4 31.5
BH 13 39.9 17.6 65.1 23.4 20.3 26.7 16.8 3.5 30.1 17.3 3.4 31.2 23.8 12.0 41.4
BU 66 14.7 7.5 24.0 23.5 20.3 26.8 11.8 5.6 18.0 12.1 6.0 18.2 17.4 9.6 26.6
DH 23 24.0 9.8 42.0 23.5 20.3 26.8 19.0 6.2 31.8 19.0 6.4 31.6 20.1 10.9 32.1
JG 31 27.3 13.7 43.6 23.5 20.3 26.8 19.3 7.5 31.0 19.1 7.4 30.9 23.9 14.7 35.3
KO 33 34.3 19.7 50.6 23.4 20.3 26.8 24.5 6.0 43.0 24.8 6.4 43.1 38.8 26.4 50.9
NG 13 20.1 4.8 43.0 23.5 20.3 26.8 8.5 1.8 15.2 9.0 2.1 15.9 11.8 4.7 23.4
ND 35 24.4 12.1 39.3 23.5 20.3 26.8 16.2 7.0 25.3 16.2 7.1 25.3 20.3 10.9 32.9
OS 21 21.7 7.7 40.4 23.5 20.4 26.8 13.5 4.7 22.2 13.7 5.4 22.1 19.1 9.7 31.4
PA 25 25.8 11.6 43.5 23.5 20.3 26.8 18.2 7.8 28.5 18.1 7.9 28.4 24.2 13.4 37.5
PU 70 22.3 13.5 32.5 23.5 20.3 26.8 16.4 6.2 26.6 16.9 7.0 26.7 22.5 13.5 33.0
RG 43 20.1 9.8 32.8 23.4 20.3 26.7 16.9 10.3 23.5 17.8 11.2 24.4 23.4 12.3 35.8
RT 21 39.2 21.0 59.9 23.2 20.0 26.5 19.7 10.4 29.0 20.1 12.2 28.1 23.7 12.0 40.4
SN 15 41.0 19.8 64.7 23.4 20.3 26.8 29.3 14.4 44.2 31.3 16.2 46.5 34.1 20.8 50.4
ST 33 31.5 17.4 47.6 23.4 20.2 26.7 19.4 9.4 29.5 19.5 10.3 28.8 24.2 13.6 38.3
SI 15 47.1 25.0 70.0 23.3 20.2 26.6 36.5 14.9 58.2 36.2 14.6 57.8 49.1 35.5 62.7
SO 14 31.2 11.7 55.2 23.5 20.4 26.8 15.2 3.3 27.1 14.9 3.5 26.3 17.7 9.6 30.7
TH 47 14.3 6.1 25.3 23.4 20.3 26.8 13.8 8.7 18.9 13.9 8.6 19.3 18.0 8.7 28.8
YA 19 4.8 0.1 17.1 23.5 20.3 26.8 7.8 2.1 13.4 8.0 2.4 13.6 10.2 3.2 19.3
Overall 625 23.3 20.0 26.7 23.5 20.3 26.8 18.7 16.1 21.3 19.0 16.4 21.5 23.2 13.3 35.3

Fig. 1.

Fig. 1

Small area (district) level effects (with 95% confidence intervals) for prevalence of good SRH, Maharashtra (n=625).

Fig. 2.

Fig. 2

Comparison of HB and indirect synthetic estimate (panel A), and HB and GLLAMM estimate (panel B) of prevalence of good SRH with direct survey estimate for districts in Maharashtra, India. Districts are labelled by their codes. Solid line indicates perfect correlation with direct survey estimate.

Fig. 3.

Fig. 3

District (small area) level model based estimate of prevalence of good SRH, Maharashtra (n=625).

The prevalence of good SRH in the Vadu demographic surveillance area was 50.1% as estimated directly from the INDEPTH-SAGE survey. The model-based estimates (xtmelogit-45.6%, GLLAMM-45.7%, HB-45.2%) for Vadu were similar with tighter 95% credible intervals and not substantially different from the direct survey estimate. The indirect synthetic estimate for Vadu (23.2%) was a poor approximation to the direct survey or model-based estimate (Table 4).

Table 4.

Comparison of prevalence estimates for Vadu (small area) derived indirectly from WHO-SAGE (large area) survey in India using different small area estimation methods with direct survey estimates for Vadu from INDEPTH-SAGE survey (n=319).

Good SRH
prevalence estimate
95% Confidence/
credible interval
Direct survey weighted 50.1 44.5%–55.6%
Indirect synthetic 24.2 19.7%–29.2%
Xtmelogit 45.6 42.8%–48.3%
GLLAMM 45.7 43.0%–48.4%
Hierarchical Bayesian 45.2 42.5%–48.0%

4. Discussion

Our paper describes district and sub-district level disparities in prevalence of good SRH using different SAE methods. The direct survey estimate of prevalence for good SRH for the Vadu area was about 50%. No clear significant pattern was seen between economic development and SRH prevalence at the district level. A lower prevalence of good SRH seen in some of the economically better districts of Western Maharashtra may reflect the higher expectations of people living in better-developed areas. It may also reflect the inadequacies of district level economic indices to capture or reflect the economic reality of rural areas of the district.

Our paper provides evidence that the random effects regression models were better suited to estimate the prevalence of SRH than the fixed effects model (indirect synthetic method). We chose the fixed effects model (indirect synthetic) estimate as it is historically the most commonly used SAE method in public health due to its intuitive approach and ease of implementation (Levy, 1979). However the method is imprecise as it assumes that the difference in health outcome measures is solely due to differences in their demographic composition (Schaible, 1996). This assumption is incorrect as individual health outcomes are known to vary by contextual factors operating at the area level (Macintyre et al., 2002). On the other hand, the multilevel random effect logistic regression model allows contextual factors to influence the model estimates and provides increased accuracy of standard errors. We used the GLLAMM and xtmelogit routines to perform comparative mixed model analysis for logistic response families. The models fitted by the two routines overlap with some differences in syntax, data organization, output, computational speed, accuracy, predictions and post-estimation statistics. GLLAMM uses numerical integration to evaluate the marginal likelihood and numerical derivatives to maximize it. This can make the model computationally intensive and the program slow as the number of latent variables, parameters, quadrature or free-mass points and observations increase in the model. Both routines use best linear unbiased prediction (BLUP) to estimate the random effects (Robinson, 1991). However, by simply plugging in the estimated parameters into the predictor does not account for the additional variability, leading to overly optimistic variances for the EBLUP. The EBLUP estimate is relatively robust to variations in the sample size of each small area as the model estimates based on fewer observations are ‘shrunk’ towards the global mean for the data (Twigg and Moon, 2002). BLUPs are a useful smoothing tool. The shrinkage property keeps them from over fitting the data. On the other hand, the HB approach treats both the fixed effects and random effects parameters as random and assumes a joint distribution for these parameters. Modeling is carried out in several stages that are easier to understand even if the model fitting process is complicated. HB estimates have smaller mean square errors and account for the uncertainty in the prediction error than corresponding EBLUP estimates. However, the HB estimates are computationally complex and are sensitive to the specification of their priors and use MCMC simulations to approximate the posterior distributions of the parameter estimates.

A major limitation of any model-based SAE method is that if substantively important individual or area level covariates are excluded due to their unavailability, then the estimates will be imprecise. Another limitation of this and any other model-based SAE approach is that it is restricted by the availability of covariates common to both the survey and the auxiliary data. Furthermore, to avoid computational complexity in our model, we did not allow the effect of covariates on the outcome to vary across districts (random slope).

There is no consensus on which SAE method provides the best estimates. The two main challenges of SAE are calculating the estimate with any level of precision given the small sample size at the small area level and estimating its standard or prediction error. The general framework used for SAE includes pooling data over time or geographical space, modeling complex structural relationships between the outcome variable and domain-specific covariates, and/or exploiting spatial correlation by modeling a spatial component (Srebotnjak et al., 2010). SAE methods borrow strength from related small areas sampled in a survey to find more precise estimates for a given small area or even for areas not sampled in the survey. This has led to the development of methods such as EBLUP, empirical Bayes (EB) and hierarchical Bayes (HB) estimation. These methods have a distinct edge over indirect synthetic methods in estimating more precise estimates (Ghosh and Rao, 1994). The EB approach is similar to EBLUP except that the prior distributions are estimated from the data, and then used in conjunction with the likelihood to obtain the posterior distributions of the random parameters. The main shortcoming of the EB estimates is that the variance does not take into account the uncertainty of the parameters, which are treated as known and derived from the data. The HB approach overcomes this shortcoming by using non-informative prior distributions of parameters in conjunction with the likelihood to obtain the posterior distributions of the random parameters. In recent years there has been an increased interest in the application of conditional autoregressive spatial models (that borrow strength from neighboring areas) and shared component spatial models (that models two outcomes simultaneously to overcome the problem of mean square error of sparse counts) to analyze areal-level data for mapping rare diseases or where many small areas have zero or very low expected disease counts (Earnest et al., 2010).

Our paper demonstrates that a simplified mixed effects regression model can produce valid small area estimates of SRH and prediction errors. More generally, these methods can be applied for other health outcome measures such as prevalence of chronic conditions, disability etc. that will help improve planning and evaluating intervention strategies for healthy aging by local health agencies with limited resources.

Acknowledgement

This paper uses data from the World Health Organization Study on Global AGEing (SAGE). SAGE is supported by the US National Institute on Aging through Interagency Agreements (OGHA 04034785; YA1323-08-CN-0020; Y1-AG-1005-01) and through a research grant (R01-AG034479). Health and Demographic Surveillance System, Vadu, is a member of the International Network for the Demographic Evaluation of Populations and Their Health (INDEPTH Network). The analyses and writing of this paper has been financed by the Umeå Centre for Global Health Research, Umeå University, with support from FAS, the Swedish Council for Working Life and Social Research (grant no. 2006-1512) through its PhD fellowship to the first author. We acknowledge the support of Kathy Kahn from INDEPTH Network and Paul Kowal, and Nirmala Naidoo from WHO, Geneva, for coordinating this multi-country study. Thanks are due to Pallavi Lele and the Vadu HDSS team for their quality work and the older adult population of the Vadu Demographic Surveillance Area for their willingness to contribute their knowledge to this study.

Contributor Information

Siddhivinayak Hirve, Email: siddhihirve@gmail.com.

Penelope Vounatsou, Email: penelope.vounatsou@unibas.ch.

Sanjay Juvekar, Email: sanjay.juvekar@gmail.com.

Yulia Blomstedt, Email: yulia.blomstedt@epiph.umu.se.

Stig Wall, Email: stig.wall@epiph.umu.se.

Somnath Chatterji, Email: chatterjis@who.int.

Nawi Ng, Email: Nawi.Ng@epiph.umu.se.

References

  1. Amoako Johnson F, Chandra H, Brown JJ, Padmadas SS. District-level estimates of Institutional Births in Ghana: application of small area estimation technique using census and DHS data. J. Off. Stat. 2010;26:341–359. [Google Scholar]
  2. Amoako Johnson F, Padmadas SS, Chandra H, Matthews Z, Madise NJ. Estimating unmet need for contraception by district within Ghana: an application of small-area estimation techniques. Popul. Stud. (Camb) 2012;66:105–122. doi: 10.1080/00324728.2012.678585. [DOI] [PubMed] [Google Scholar]
  3. Asiimwe JB, Jehopio P, Atuhaire LK, Mbonye AK. Examining small area estimation techniques for public health intervention: lessons from application to under-5 mortality data in Uganda. J. Public Health Policy. 2011 doi: 10.1057/jphp.2010.46. 2010/12/15 ed. [DOI] [PubMed] [Google Scholar]
  4. Bajekal M, Scholes S, Pickering K, Purdon S. National Center for Social Research. Department of Health., UK; 2004. Synthetic Estimation of Healthy Lifestyles Indicators: Stage 1 Report; pp. 3–35. [Google Scholar]
  5. Banks J, Marmot M, Oldfield Z, Smith JP. The SES health gradient on both sides of the Atlantic. In: Wise DA, editor. Developments in the Economics of Aging. Chicago: University of Chicago Press; 2007. pp. 359–406. [Google Scholar]
  6. Benjamins MR, Hummer RA, Eberstein IW, Nam CB. Self-reported health and adult mortality risk: an analysis of cause-specific mortality. Soc. Sci. Med. 2004;59:1297–1306. doi: 10.1016/j.socscimed.2003.01.001. [DOI] [PubMed] [Google Scholar]
  7. Benyamini Y, Idler E. Community studies reporting association between self-rated health and mortality: additional studies, 1995 to 1998. Res. Aging. 1999;21:392–401. [Google Scholar]
  8. Blazer DG. How do you feel about…? Health outcomes in late life and self-perceptions of health and well-being. Gerontologist. 2008;48:415–422. doi: 10.1093/geront/48.4.415. [DOI] [PubMed] [Google Scholar]
  9. Congdon P. Estimating diabetes prevalence by small area in England. J. Public Health (Oxford) 2006a;28:71–81. doi: 10.1093/pubmed/fdi068. [DOI] [PubMed] [Google Scholar]
  10. Congdon P. Estimating population prevalence of psychiatric conditions by small area with applications to analysing outcome and referral variations. Health Place. 2006b;12:465–478. doi: 10.1016/j.healthplace.2005.05.001. [DOI] [PubMed] [Google Scholar]
  11. Cui Y, Baldwin SB, Lightstone AS, Shih M, Yu H, Teutsch S. Small area estimates reveal high cigarette smoking prevalence in low-income cities of Los Angeles county. J. Urban Health. 2012;89:397–406. doi: 10.1007/s11524-011-9615-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Curtis S, Copeland A, Fagg J, Congdon P, Almog M, Fitzpatrick J. The ecological relationship between deprivation, social isolation and rates of hospital admission for acute psychiatric care: a comparison of London and New York City. Health Place. 2006;12:19–37. doi: 10.1016/j.healthplace.2004.07.002. [DOI] [PubMed] [Google Scholar]
  13. Douglas MJ, Conway L, Gorman D, Gavin S, Hanlon P. Achieving better health through health impact assessment. Health Bull. (Edinburgh) 2001;59:300–305. [PubMed] [Google Scholar]
  14. Earnest A, Beard JR, Morgan G, Lincoln D, Summerhayes R, Donoghue D, Dunn T, Muscatello D, Mengersen K. Small area estimation of sparse disease counts using shared component models-application to birth defect registry data in New South Wales, Australia. Health Place. 2010;16:684–693. doi: 10.1016/j.healthplace.2010.02.006. [DOI] [PubMed] [Google Scholar]
  15. Eberth JM, Hossain MM, Tiro JA, Zhang X, Holt JB, Vernon SW. Human papillomavirus vaccine coverage among females aged 11 to 17 in Texas Counties: an application of multilevel, small area estimation. Womens Health Issues. 2013;23:e131–141. doi: 10.1016/j.whi.2012.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Elbers C, Lanjouw JO, Lanjouw P. Micro-level estimation of poverty and inequality. Econometrica. 2003;71:355–364. [Google Scholar]
  17. Ericksen EP. A regression method for estimating population changes of local areas. J. Am. Stat. Assoc. 1974;69:867–875. [Google Scholar]
  18. Fay RE, Herriot RA. Estimation of income from small places: an application of James-Stein procedures to Census data. J. Am. Stat. Assoc. 1979;74:269–297. [Google Scholar]
  19. Fayers PM, Sprangers MA. Understanding self-rated health. Lancet. 2002;359:187–188. doi: 10.1016/S0140-6736(02)07466-4. [DOI] [PubMed] [Google Scholar]
  20. Frankenberg E, Jones NR. Self-rated health and mortality: does the relationship extend to a low income setting? J. Health Soc. Behav. 2004;45:441–452. doi: 10.1177/002214650404500406. [DOI] [PubMed] [Google Scholar]
  21. Ghosh M, Rao JNK. Small area estimation: an appraisal. Stat. Sci. 1994;9:55–93. [Google Scholar]
  22. He W, Muenchrath MN, Kowal PR. Shades of Gray: A Cross-country Study of Health and Well-being of the Older Populations in SAGE Countries, 2007–2010. Washington DC: US Government Printing Office; 2012. [Google Scholar]
  23. Heidrich J, Liese AD, Lowel H, Keil U. Self-rated health and its relation to all-cause and cardiovascular mortality in southern Germany. Results from the MONICA Augsburg cohort study 1984–1995. Ann. Epidemiol. 2002;12:338–345. doi: 10.1016/s1047-2797(01)00300-3. [DOI] [PubMed] [Google Scholar]
  24. Hirve S, Juvekar S, Sambhudas S, Lele P, Blomstedt Y, Wall S, Berkman L, Tollman S, Ng N. Does self-rated health predict death in adults aged 50 years and above in India? Evidence from a rural population under health and demographic surveillance. Int. J. Epidemiol. 2012;41:1719–1727. doi: 10.1093/ije/dys163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Hudson CG. Validation of a model for estimating state and local prevalence of serious mental illness. Int. J. Methods Psychiatr. Res. 2009;18:251–264. doi: 10.1002/mpr.294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Hudson CG, Soskolne V. Disparities in the geography of serious mental illness in Israel. Health Place. 2012;18:898–910. doi: 10.1016/j.healthplace.2012.02.008. [DOI] [PubMed] [Google Scholar]
  27. Hudson CG, Vissing YM. The geography of adult homelessness in the US: validation of state and county estimates. Health Place. 2010;16:828–837. doi: 10.1016/j.healthplace.2010.04.008. [DOI] [PubMed] [Google Scholar]
  28. Idler EL, Benyamini Y. Self-rated health and mortality: a review of twenty-seven community studies. J. Health Soc. Behav. 1997;38:21–37. [PubMed] [Google Scholar]
  29. Ishizaki T, Kai I, Imanaka Y. Self-rated health and social role as predictors for 6-year total mortality among a non-disabled older Japanese population. Arch. Gerontol. Geriatr. 2006;42:91–99. doi: 10.1016/j.archger.2005.05.002. [DOI] [PubMed] [Google Scholar]
  30. Jia H, Link M, Holt J, Mokdad AH, Li L, Levy PS. Monitoring county-level vaccination coverage during the 2004–2005 influenza season. Am. J. Prev. Med. 2006;31:275–280. doi: 10.1016/j.amepre.2006.06.005. [DOI] [PubMed] [Google Scholar]
  31. Jia H, Muennig P, Borawski E. Comparison of small-area analysis techniques for estimating county-level outcomes. Am. J. Prev. Med. 2004;26:453–460. doi: 10.1016/j.amepre.2004.02.004. [DOI] [PubMed] [Google Scholar]
  32. Jylha M. What is self-rated health and why does it predict mortality? Towards a unified conceptual model. Soc. Sci. Med. 2009;69:307–316. doi: 10.1016/j.socscimed.2009.05.013. [DOI] [PubMed] [Google Scholar]
  33. Kalton G, Kordos J, Platek R. Small Area Statistics and Survey Designs Vol. I: Invited Papers; Vol. II: Contributed Papers and Panel Discussion. Warsaw: Central Statistical Office; 1993. [Google Scholar]
  34. Kaplan GA, Camacho T. Perceived health and mortality: a nine-year follow-up of the human population laboratory cohort. Am. J. Epidemiol. 1983;117:292–304. doi: 10.1093/oxfordjournals.aje.a113541. [DOI] [PubMed] [Google Scholar]
  35. Knutson K, Zhang W, Tabnak F. Applying the small-area estimation method to estimate a population eligible for breast cancer detection services. Prev. Chronic Dis. 2008:5. [PMC free article] [PubMed] [Google Scholar]
  36. Kowal P, Chatterji S, Naidoo N, Biritwum R, Fan W, Lopez Ridaura R, Maximova T, Arokiasamy P, Phaswana-Mafuya N, Williams S, Snodgrass JJ, Minicuci N, D’Este C, Peltzer K, Boerma JT, Collaborators S. Data resource profile: the World Health Organization Study on global AGEing and adult health (SAGE)Int. J. Epidemiol. 2012:411639–1649. doi: 10.1093/ije/dys210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Leroux BG, Maynard RJ, Domoto P, Zhu C, Milgrom P. The estimation of caries prevalence in small areas. J. Dent. Res. 1996;75:1947–1956. doi: 10.1177/00220345960750120601. [DOI] [PubMed] [Google Scholar]
  38. Levy PS. Small area estimation—synthetic and other procedures, 1968–1978. NIDA Res. Monogr. 1979:4–19. [PubMed] [Google Scholar]
  39. Li W, Kelsey JL, Zhang Z, Lemon SC, Mezgebu S, Boddie-Willis C, Reed GW. Small-area estimation and prioritizing communities for obesity control in Massachusetts. Am. J. Public Health. 2009a;99:511–519. doi: 10.2105/AJPH.2008.137364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Li W, Land T, Zhang Z, Keithly L, Kelsey JL. Small-area estimation and prioritizing communities for tobacco control efforts in Massachusetts. Am. J. Public Health. 2009b;99:470–479. doi: 10.2105/AJPH.2007.130112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Lundberg O, Manderbacka K. Assessing reliability of a measure of self-rated health. Scand. J. Soc. Med. 1996;24:218–224. doi: 10.1177/140349489602400314. [DOI] [PubMed] [Google Scholar]
  42. Macintyre S, Ellaway A, Cummins S. Place effects on health: how can we conceptualise, operationalise and measure them? Soc. Sci. Med. 2002;55:125–139. doi: 10.1016/s0277-9536(01)00214-3. [DOI] [PubMed] [Google Scholar]
  43. Martuzzi M, Elliott P. Empirical Bayes estimation of small area prevalence of non-rare conditions. Stat. Med. 1996;15:1867–1873. doi: 10.1002/(SICI)1097-0258(19960915)15:17<1867::AID-SIM398>3.0.CO;2-2. [DOI] [PubMed] [Google Scholar]
  44. Mendez-Luck CA, Yu H, Meng YY, Jhawar M, Wallace SP. Estimating health conditions for small areas: asthma symptom prevalence for state legislative districts. Health Serv. Res. 2007;42:2389–2409. doi: 10.1111/j.1475-6773.2007.00793.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Mossey JM, Shapiro E. Self-rated health: a predictor of mortality among the elderly. Am. J. Public Health. 1982;72:800–808. doi: 10.2105/ajph.72.8.800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Ng N, Hakimi M, Santos AS, Byass P, Wilopo S, Wall S. Is self-rated health an independent index for mortality among older people in Indonesia? PLoS One. 2012:e35308. doi: 10.1371/journal.pone.0035308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Nielsen AB, Siersma V, Hiort LC, Drivsholm T, Kreiner S, Hollnagel H. Self-rated general health among 40-year-old Danes and its association with all-cause mortality at 10-, 20-, and 29 years’ follow-up. Scand. J. Public Health. 2008;36:3–11. doi: 10.1177/1403494807085242. [DOI] [PubMed] [Google Scholar]
  48. Paul-Shaheen P, Clark JD, Williams D. Small area analysis: a review and analysis of the North American literature. J. Health Polit. Policy Law. 1987;12:741–809. doi: 10.1215/03616878-12-4-741. [DOI] [PubMed] [Google Scholar]
  49. Pfeffermann D. Small area estimation—new developments and directions. Int. Stat. Rev. 2002;70:125–143. [Google Scholar]
  50. Pickle LW, Su Y. Within-state geographic patterns of health insurance coverage and health risk factors in the United States. Am. J. Prev. Med. 2002;22:75–83. doi: 10.1016/s0749-3797(01)00402-0. [DOI] [PubMed] [Google Scholar]
  51. Platek R, Singh M. Laboratory for Research in Statistics and Probability. Ottawa: Carleton University-University of Ottawa; 1986. Small Area Statistics: An International Symposium’85 (Contributed Papers) [Google Scholar]
  52. Raffle H. Using Small-Area Estimation Techniques for County-level Estimates of Select Indicators from the Ohio Family Health Survey. White paper, submitted to The Office of Ohio Health Plans and the Ohio Department of Job and Family Services; 2008. https://hsldigital.osu.edu/sitetool/sites/omaspublic/documents/OFHS_Report_Raffle.pdf. [Google Scholar]
  53. Robinson GK. That BLUP is a good thing: the estimation of random effects. Stat. Sci. 1991;6:15–32. [Google Scholar]
  54. Schaible WL. Indirect Estimators in US Federal Programs. New York: Springer; 1996. [Google Scholar]
  55. Schneider KL, Lapane KL, Clark MA, Rakowski W. Using small-area estimation to describe county-level disparities in mammography. Prev. Chronic. Dis. 2009:6. [PMC free article] [PubMed] [Google Scholar]
  56. Schwartz F, Ruhil AV, Denham S, Shubrook J, Simpson C, Boyd SL. High self-reported prevalence of diabetes mellitus, heart disease, and stroke in 11 counties of rural Appalachian Ohio. J. Rural Health. 2009;25:226–230. doi: 10.1111/j.1748-0361.2009.00222.x. [DOI] [PubMed] [Google Scholar]
  57. Srebotnjak T, Mokdad AH, Murray CJ. A novel framework for validating and applying standardized small area measurement strategies. Popul. Health Metrics. 2010;8:26. doi: 10.1186/1478-7954-8-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Thomas TL, Nandram B. Predicting incidence and asymptomatic rates for chlamydia in small domains. J. Adv. Nurs. 2010;66:2650–2658. doi: 10.1111/j.1365-2648.2010.05430.x. [DOI] [PubMed] [Google Scholar]
  59. Twigg L, Moon G. Predicting small area health-related behaviour: a comparison of multilevel synthetic estimation and local survey data. Soc. Sci. Med. 2002;54:931–937. doi: 10.1016/s0277-9536(01)00065-x. [DOI] [PubMed] [Google Scholar]
  60. Twigg L, Moon G, Jones K. Predicting small-area health-related behaviour: a comparison of smoking and drinking indicators. Soc. Sci. Med. 2000;50:1109–1120. doi: 10.1016/s0277-9536(99)00359-7. [DOI] [PubMed] [Google Scholar]
  61. World Health Organization. Health Interview Surveys, Towards International Harmonization of Methods and Instruments. Copenhagen: World Health Organization, Office for Europe; 1996. [PubMed] [Google Scholar]
  62. World Health Organization. International Classification of Functioning, Disability and Health (ICF) World Health Organization; 2001. [Google Scholar]
  63. Xie D, Raghunathan TE, Lepkowski JM. Estimation of the proportion of overweight individuals in small areas—a robust extension of the Fay-Herriot model. Stat. Med. 2007;26:2699–2715. doi: 10.1002/sim.2709. [DOI] [PubMed] [Google Scholar]
  64. Yu ES, Kean YM, Slymen DJ, Liu WT, Zhang M, Katzman R. Self-perceived health and 5-year mortality risks among the elderly in Shanghai, China. Am. J. Epidemiol. 1998;147:880–890. doi: 10.1093/oxfordjournals.aje.a009542. [DOI] [PubMed] [Google Scholar]
  65. Yu H, Meng YY, Mendez-Luck CA, Jhawar M, Wallace SP. Small-area estimation of health insurance coverage for California legislative districts. Am. J. Public Health. 2007;97:731–737. doi: 10.2105/AJPH.2005.077743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Zhang Z, Zhang L, Penman A, May W. Using small-area estimation method to calculate county-level prevalence of obesity in Mississippi, 2007–2009. Prev. Chronic Dis. 2011;8:A85. [PMC free article] [PubMed] [Google Scholar]

RESOURCES