Abstract
Survey research designs that integrate contextual data have become more prevalent in recent decades, presumably to enable a more refined focus on the person as the unit of analysis and a greater emphasis on interindividual differences due to social forces and contextual conditions. This article reviews varied approaches to contextualizing survey data and examines the value of linking two data sources to respondent information: interviewer ratings and neighborhood information (measured via census tracts). The utility of an integrative approach is illustrated with data from the Health and Retirement Study. The results reveal modest gains by using a contextualized approach but also demonstrate that neglecting contextual factors may lead to misdirected substantive conclusions, especially for older racial and ethnic minorities. To enhance the ecological validity of survey data, investigators should select theoretically-meaningful contextual data for specific research questions and consider cross-level interactions.
A major trend among social and behavioral scientists during recent decades involves some type of record linkage. Investigators contend that the value of data derived from various studies can be enhanced by linking the data to other sources. The purpose of record linkage is to gain a better understanding of phenomena and processes by examining respondents in their temporal, environmental, and social contexts. For social scientists using survey data, contextualizing data aids ecological validity—often referred to as the generalizability of observed phenomena in an environment to natural phenomena in the real world (Schmuckler, 2001). The types of record linkages are many, constrained only by the creativity of the investigator and the accessibility of the data.
The aims of this article are twofold. First, we review some of the record linkages common in human development research that are used ostensibly to enhance the value of survey data. The record linkages discussed illustrate ways of integrating temporal, environmental, and social contexts. Second, we ask what is gained by these record linkages by providing an illustrative example. From another vantage point, might our conclusions be misdirected by neglecting to integrate record linkages? For the second aim, we focus on one of the strongest approaches—integrating geocoded data—intended to contextualize survey data to real-world conditions. We also suggest that interviewer ratings of a survey respondent’s environmental context are a relatively underutilized resource for research on aging and human development.
To address these aims, contextualization of survey data is defined as the process of linking responses from one or more samples to the circumstances that surround the respondents. Contextualization addresses fundamental questions of meaning from survey data: When? Where? Who? Although information to address these questions may appear self-evident, probing it more fully often adds meaningful information to interpret findings.
Record Linkage
Survey data are highly valued in behavioral, social, and epidemiological research because some type of random sampling enables one to make a stronger case for the external validity of the study. Of course, sampling theory should be used to ensure an accurate representation of the population, but the generalizability of survey results is appealing to investigators. The value of survey data can be further enhanced by record linkages that capitalize on information about the respondent and his or her context.
Scholars of aging and human development have long used information from a sample on the date of a measurement occasion and the age of respondents to identify cohorts and describe the historical context of the data, even if no data are formally linked to a cross-sectional survey. By contrast, many studies actually link the data to bring context into the analysis—and there are many ways to do so.
We highlight four of the many types of such linkages that enable researchers to account for different types of context—temporal, environmental, and social. Among ecological theories of aging, scholars often draw attention to temporal (e.g., Wahl, Iwarsson, & Oswald, 2012) and environmental contexts (e.g., Lawton & Nahemow, 1973). We further differentiate the environmental context to include place/space—where—and person—who. Indeed, an individual is best understood in the context of his or her social environment, situated in a physical environment, and in time, historical and in the individual’s life course.
When? First, one could argue that longitudinal data are one of the most basic types of record linkage because they represent an opportunity to bring history and biographical context into the analysis. Contextualism is central to life-span development (Baltes, 1987), and longitudinal data enable systematic consideration of selection processes (Ferraro & Shippee, 2009).
The data from any cross-sectional study provide information on when the respondent was observed, but a longitudinal panel study provides information on intraindividual change in measures repeated over time. Observing such changes is invaluable for studies of aging—much more meaningful than inferring change from age differences at one point. Some have argued that two waves of data do not constitute a longitudinal design because one cannot assess the pace (rate) of change (Rogosa, 1988). Three or more waves of longitudinal data that link variables from the same persons provide the opportunity to distinguish intra- and interindividual change—and the pace of change—a practice that has become widespread via the use of growth curves or the multilevel model for change.
Where? Second, focusing on where the data were collected, the essence of this approach is to link survey responses to data from geocoded areas or to at least differentiate rural and urban areas (Phillipson & Scharf, 2005). Owing to the insights on context gleaned from such linking, this approach was referred to decades ago as contextual analysis (Boyd & Iversen, 1979). Examples of this type of record linkage involve matching survey respondents to areas such as countries, states, or provinces. Contextual data, however, also can be linked without geocoding through regional identifiers. To many scholars, this is the exemplar of contextualizing survey data because the linkage is to established sources, often government data such as vital statistics, crime rates, or census estimates of the percentage of households with income below the poverty line. The use of “official statistics” adds to the authenticity of the contextualization. The researcher’s ingenuity is to theorize and link the contextual data, not necessarily to collect it per se. One searches for data to link and faces several decisions about how to implement the linkage. The researcher may rely on various types of linkage information, such as Zip codes and county and state identifiers to match respondents. Linking contextual data to individual data, however, may pose certain challenges, including border changes in regional units over time and missing geocodes, which can lead to biased samples, and privacy concerns, especially when small ecological units are used.
A major consideration for contextual linking data is the unit of analysis. Very large ecological units, often defined by governments, help one undertake cross-national or regional comparisons, including consideration of how policy is related to outcomes of interest. For example, the Survey of Health, Ageing and Retirement in Europe (SHARE) is a widely used survey that has enabled comparisons of 11 European nations and Israel since 2004, and other nations have been added since then (e.g., Deindl, Brandt, & Hank, 2016). The SHARE also permits consideration of smaller units within nations via the Nomenclature of Territorial Units for Statistics (NUTS). There is variability across nations in how regions are defined, but NUTS enables investigators to examine the aging experience in these smaller ecological units.
For studies of adults within a nation, the investigator also often has choices regarding the ecological unit to be linked and may be able to test the influence at multiple levels. Researchers in the United States often seek data at the level of a census tract because it is a recognized government entity that is a reasonable approximation of neighborhood or area, with populations generally between 1,200 and 8,000 people (e.g., Clarke et al., 2014). Theoretically, many investigators also view the neighborhood as a meaningful context for vulnerability as older adults attempt to adapt to their environment (Glass & Balfour, 2003). More U.S. data are available at the county level, but the size and diversity of counties often renders them less useful than census tracts (Krieger et al., 2003). Another option is to create neighborhood clusters by aggregating two or more contiguous census tracts into units that are smaller than counties (Lee & Ferraro, 2007). With phone and Internet surveys, one may not have a street address, so questions may be needed to provide some type of geocode. Zip codes are usually smaller units that can be aggregated up to counties, and Zip+4 codes provide greater ecological precision. Surveys outside the United States have also linked to such small-area information.
Some researchers may prefer units smaller than census tracts. For instance, King et al. (2011) analyzed block groups, which are clusters of blocks within a census tract, to examine how walkability of neighborhoods influenced physical activity in Baltimore, Maryland, and Seattle, Washington, neighborhoods. The point is that the choice of unit for record linkage is important and may vary by the research question and/or the outcome. Thus, one may be able to perform a linkage, but it may or may not be meaningful. Smaller areas are generally preferred because one can aggregate upward, but smaller areas typically come with more exacting user agreements to protect human participants.
Integration across when and where brings yet another approach to contextualization. Although relatively rare, some studies use a longitudinal design to examine change within persons as well as within an ecological unit. An exemplar is an analysis of changes in attributes of census tracts such as economic disadvantage and ethnic concentration along with changes in functional health and survival over 15 years (Clarke et al., 2014). Although there are many excellent cross-sectional comparisons of aging across nations (e.g., Crimmins, Kim, & Solé-Auró, 2010), there are relatively few record linkages that involve longitudinal analyses from more than one country (Mayer, 2015). Given that some of the cross-national comparisons are actually drawn from longitudinal studies such as SHARE, Health and Retirement Study (HRS), and the English Longitudinal Study of Ageing, we can expect a growing number of cross-national, longitudinal analyses during the next decade.
Who? Third, recognizing the importance of physical and social environments, investigators may seek to contextualize information at different levels from the respondent’s household, family members, or social network. Person–environment fit does not occur in a social vacuum. People’s lives are embedded in a network of meaningful others (e.g., convoys); aging unfolds via linked lives (Antonucci, 2001; Elder, 1998). These linked lives, further defined by the physical environment, help older persons interpret their experiences and shape their sense of belonging (Wahl et al., 2012). Most of us associate people with specific environments, and those people may act as resources for or risks to effective functioning (Deindl et al., 2016). However, not everyone may be equally affected—individual characteristics may interact with environmental factors and/or the physical environment may impede (or foster) social interaction.
Multistage survey sampling often means that considerable contextual information is used to collect respondent data. In many national samples, one selects households prior to selecting a respondent within the household. Some surveys have strategies to include the spouses of adult respondents, whereas others exclude more than one member from a given household. Twin studies are another example whereby investigators may capitalize not only on the differentiation of monozygotic and dizygotic pairs, but also on retrospective information from shared households during childhood. Other studies interview family members from multiple generations to study social connections (Fingerman, Pillemer, Silverstein, & Suitor, 2012).
Although we expect the answers of residents from given census tracts to be correlated, we expect stronger correlations among responses from household members, couples, mother/daughter pairs, or twins. To deal with the correlated data, one may simply adjust for the correlation in statistical analyses or capitalize on it by bringing multiple levels of context into the analysis. Either approach may be justified for a specific research question, provided that the analysis satisfactorily accounts for the number of units within each level.
Fourth, another way to contextualize survey data is to draw on unique sources of information, such as interviewer ratings of the respondent and his or her environment. Although distinct from family and friends, interviewers or observers may be able to report valuable information for contextualizing survey data. For instance, interviewer-rated health correlates well with self-reported health—and the former was a stronger predictor of mortality among Chinese older adults (Feng, Zhu, Zhen, & Gu, 2016). Many face-to-face national surveys ask the interviewer to rate attributes of the respondent (e.g., level of engagement with the interview) as well as elements of the environment such as type of housing, presence of graffiti, vacant housing units, and state of repair of the block or immediate area. Owing to perceptual differences in interviewer experiences, it is likely that more specific questions about visible signs of the environment (e.g., broken windows) are more useful than global ratings of the areas (e.g., safety). It is well known that unfamiliar areas are more likely to be viewed as unsafe; thus an interviewer’s first foray into a neighborhood may be quite different on global assessments of safety than would be the case if the interviewer knows the area well. Advantages of using interviewer ratings include, but are not limited to the data are typically bundled with the data from the respondent survey (manifest link), the rating was made at the time of the interview (instead of linking to a decennial census data), and the unit the interviewer is asked to evaluate is typically small (“on this block”).
Research Design to Assess the Value of Contextualized Survey Data
Although many scholars argue for contextualizing survey data, doing so is ultimately an empirical question. What value is added to survey data by integrating contextual data? Are the conclusions from contextualized survey data different than those without such record linkages? If yes, how different are the conclusions? Does failure to integrate contextual data lead to incorrect conclusions about aging and human development? The answer for some types of record linkage is fairly well known.
For instance, scores of studies have revealed the limitations of cross-sectional data analysis compared to longitudinal analysis for understanding aging and human development. Indeed, theories such as the life course perspective (Elder, 1998) and recent ecological theories of aging (Wahl et al., 2012) draw attention to the importance of considering historical time and place across the life course. Likewise, dozens of studies linking data from households or family members to survey responses reveals the influence of household context. There is ample evidence to support the assertion that multiple waves of data give a more complete picture of aging and human development than can be gleaned from data gathered at a single point in time —and that integrating household or family information into the analysis of data from a survey respondent is better than relying solely on the respondent. Theses such as press-competence (Lawton & Nahemow, 1973) and person–environment fit highlight the importance of the person interacting with the environment, which may be more clearly explicated in multilevel models that test interactions between individual- and contextual-level variables. There may be anomalies along the way, but the literature as a whole provides abundant evidence of the utility of integrating these types of contextual data.
By contrast, the evidence is less plentiful and more equivocal when considering the influence of interviewer ratings. If interviewer ratings are highly correlated with ratings from survey respondents, the value added may be modest or nonexistent. For other measures of context, however, the use of interviewer ratings may be more consequential. For instance, interviewers and residents may emphasize different attributes of the neighborhood, and some studies even define the neighborhood differently for the two ratings (e.g., interviewer rating: “describe the street [one block, both sides] where the respondent lives” versus respondent rating: “local area” as “everywhere within a 20-minute walk or within about a mile of your home”; Cornwell & Cagney, 2014, pp. S53, S55).
Somewhere along this continuum of utility value associated with integrating contextual data lies the findings regarding the influence of geocoded areas. Some studies reveal strong ecological effects whereas others reveal quite modest ecological influence, above and beyond the respondent’s information. Some of this variability is likely due to the topic investigated; some outcome variables are more contingent on environmental influence than is the case for other outcomes. Moreover, the type of data linkage may tap distinct aspects or levels of the environment, which may be uniquely associated with the outcome. Also contributing to the ambiguity is the critical question of the unit used for geocoding. It may be that one can observe the relationship when measured at a certain level only. Some geographical units may simply be too crude; there is so much heterogeneity in counties, for instance. The unit of measurement may obscure relationships that might be observed using a smaller unit of analysis.
To address the limitations of, and inconsistencies within, the literature, we designed a study using survey data augmented with two sources of contextual data. Using the HRS, we integrate subjective and objective neighborhood data from respondent observations, interviewer ratings, and census data to compare and contrast findings. Wahl et al. (2012) note that “there has been a failure to clearly specify the objective and subjective characteristics of the environment” and that comparatively less attention has been paid to the environmental context (p. 308). We also examine the person–environment interaction, which is often ignored in aging research. In addition, we vary the size of the ecological unit, aggregating up from census tracts to counties and states to demonstrate the importance of a meaningful unit for geocoded record linkages.
The HRS is ideal for this study because it contains both respondent observations and interviewer ratings and can be linked to the RAND Neighborhood Socioeconomic Status (NSES) Index. Although numerous studies link respondent data with either interviewer ratings or census data, very few integrate across the three sources. An exemplar is a study of New York City families with children involved in a housing relocation experiment that linked respondent data, interviewer ratings, and neighborhood census data (Leventhal & Brooks-Gunn, 2003). We are unaware of any comparable studies using a nationally-representative sample to examine the utility of contextualization for older adults. Given that local environments are critical to the effective functioning of people who are older (Clarke et al., 2014), this study provides an unparalleled examination of the potential of multiple sources of contextualized data for older adults.
METHOD
Sample
We use data from the HRS, a long-term, longitudinal study (1992 and ongoing) spanning more than two decades. The HRS is a nationally representative study of adults age 51 and older living in the United States, with oversamples of Black and Hispanic adults. We draw on a subsample of data collected between 2004 and 2010 to illustrate the value of using a contextualized approach in studying later-life outcomes.
The subsample of data comprises adults age 51 and older who completed the Psychosocial Leave-Behind Questionnaire in 2006 or 2008, in addition to core interviews in 2004 and 2010. The self-administered questionnaire, which includes neighborhood information, was integrated into the study in 2006 and given to a random one-half sample of respondents in alternating survey years. Combining responses to the Psychosocial Leave-Behind Questionnaires in 2006 and 2008 provides complete data on the HRS sample. We exclude proxies from the sample, and limit it to only those with valid responses on neighborhood measures, resulting in an analytic sample of 7,272 respondents.
Measures
Functional limitations
We measure functional limitations using 11 items designed to capture problems with mobility. Specifically, respondents were asked whether they had “difficulty” with each of the following activities: walking several blocks; walking one block; sitting for hours; getting up from a chair; climbing several flights of stairs; climbing one flight of stairs; stooping, kneeling, or crouching; reaching arms above shoulder level; pushing or pulling large objects; lifting or carrying weight over 10 pounds; and picking up a dime from a table. If respondents answered affirmatively, they were coded 1 (0, otherwise). We summed the items to create an index of functional limitations in 2010, ranging from 0 to 11.
Neighborhood measures
We gather neighborhood data from three distinct sources to compare and contrast the contextual utility of each.
First, we use respondent observations to create a scale of neighborhood physical disorder. Drawing on data from the self-administered questionnaires in 2006 and 2008, we use four items to gauge how respondents feel about the area in which they live. On a scale from 1 to 7, respondents indicated how strongly they agreed with each of the following statements: (1) “vandalism and graffiti are a big problem in this area,” (2) “people would be afraid to walk alone in this area after dark,” (3) “this area is always full of rubbish and litter,” and (4) “there are many vacant or deserted houses or storefronts in this area.” We averaged the four items using the row mean to create a scale of physical disorder (α = .64 in 2006, α = .83 in 2008), with higher scores corresponding to greater disorder.
Second, interviewer ratings from face-to-face interviews in 2004 were compiled to determine perceived neighborhood conditions among those not living in the area. We draw on four measures comparable to the neighborhood information provided by respondents: (1) We create a binary variable to capture whether vacant buildings are in the neighborhood, coded 1 for vacant buildings and 0, otherwise. (2) Other neighborhood conditions, such as vandalism, litter, boarded houses, and factories or warehouses, were summed together to create an index of poor conditions, ranging from 0 to 12. (3) We use an ordinal measure to examine how well kept structures are in the neighborhood, with response categories ranging from 1 (very well) to 4 (very poorly—dilapidated). (4) We include an ordinal measure comparing maintenance of the respondent’s home to other structures in the neighborhood, with response categories ranging from 1 (better than others) to 3 (worse than others).
Third, we use geocodes at the census tract level to obtain objective information on neighborhoods using the RAND NSES Index. We link respondent’s residence in 2004 or prior to census tracts defined in the Census 2000. The RAND NSES is a normalized index drawing on six indicators of socioeconomic status (SES) from the Census 2000: (1) percent of adults older than 25 with less than a high school education, (2) percent of males unemployed, (3) percent of households living in poverty, (4) percent of households receiving public assistance, (5) percent of female-headed households with children, and (6) median household income. Higher scores on the RAND NSES relate to higher SES. Using county and state geographic identifiers, we aggregate the RAND NSES (hereafter, referred to as neighborhood SES) to create two additional indices of SES at the county and state levels.
Covariates
In addition to the key independent and dependent variables, we include important demographic and socioeconomic characteristics in the analysis: age, sex, race and ethnicity (non-Hispanic Black, non-Hispanic other, and Hispanic vs. non-Hispanic White), education (years of schooling), and adult SES (household wealth).
Analysis
We address our second aim—what is gained by contextual data—in three analytic stages. We began by presenting a correlation matrix of neighborhood measures. Although most survey data provide neighborhood information from respondents only, we find it useful to first examine the correlations among neighborhood measures derived from three sources: respondents, interviewers, and census data. If the correlations are high, the potential for value among less utilized neighborhood information, such as interviewer ratings, may be limited.
In the second stage of the analysis, we investigated the utility of neighborhood data from geocoded linkages to census information, interviewer reports, and respondent reports to contextualize potential influences on a selected measure of health, including an examination of whether certain individuals may be more vulnerable to contextual influences than others. Because neighborhood information from respondents is an important predictor of health—and available in many surveys—we also examined whether and how much geocoded neighborhood SES and interviewer reports contextualize respondent reports. We used multilevel negative binomial and linear regression models for functional limitations and physical disorder, respectively. Our analytic sample resulted in 7,272 respondents living in 2,905 census tracts (mean per tract = 2.5). We performed the analyses using menbreg for the negative binomial response and mixed for the continuous response in Stata 14.2.
The composite negative binomial model for the functional limitations of person i (Level 1) in census tract j (Level 2) is specified as follows, with independent variables presented in blocks:
where, β0 is the overall intercept; β1–5 and β6–7 are the regression coefficients for the Level-1 demographic and socioeconomic covariates, respectively; β8 is the regression coefficient for the Level-1 respondent observation, physical disorder; β9–12 are the regression coefficients for the Level-1 interviewer ratings; β13 is the regression coefficient for the Level-2 neighborhood SES; and ζ0ij models overdispersion.
Similarly, the composite linear model for physical disorder is specified as follows:
where, β0 is the overall intercept; β1–5 and β6–7 are the regression coefficients for the Level-1 demographic and socioeconomic covariates, respectively; β8–11 are the regression coefficients for the Level-1 interviewer ratings; β12 is the regression coefficient for the Level-2 neighborhood SES; and ζ0j and εij are the Level-2 and Level-1 residuals, respectively.
In the third stage of the analysis, we address the question of whether the unit of measurement for geocoded data is critical to the conclusions and compare across three levels: state, county, and census tract. We return to the negative binomial model for functional limitations but use a single-level model for parsimony.
RESULTS
Table 1 presents a correlation matrix, where weak to moderate correlations are shown among most variables. For instance, physical disorder, which taps respondent’s own observations, is moderately correlated with neighborhood SES (r = −0.35), derived from census tract geocodes, and some, but not all, interviewer ratings (r = .27 for upkeep of nearby structures). In addition, a comparison of interviewer and geocoded data reveals moderate correlations: neighborhood SES is correlated with poor neighborhood conditions and upkeep of nearby structures at −0.36 and −0.46, respectively. Interestingly, correlations among interviewer’s ratings are not especially strong (|r|≤ 0.54 for all), indicating that each measure provides unique information not captured by other interviewer observations.
TABLE 1.
Correlations among neighborhood measures
1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|
1. Physical disordera | 1 | |||||
2. Vacant buildingsb | 0.0897 | 1 | ||||
3. Poor conditionsb | 0.2007 | 0.4053 | 1 | |||
4. Upkeep of nearby structuresb | 0.2670 | 0.2391 | 0.5436 | 1 | ||
5. Relative maintenance of homeb | 0.0027 | −0.0349 | 0.0068 | −0.0406 | 1 | |
6. Neighborhood SESc | −0.3532 | −0.1788 | −0.3630 | −0.4604 | 0.0534 | 1 |
Respondent report;
Interviewer report;
RAND Neighborhood Socioeconomic Status Index from census tract data.
Next, we assessed the contextualization of functional limitations among older adults, including whether it is sensitive to the source of neighborhood data. We began with a reduced model that includes demographic and socioeconomic covariates only. We then compared the results from the reduced model to models that control separately for physical disorder, interviewer ratings, and neighborhood SES before combining all covariates together in the full model.
Figure 1 presents the incidence-rate ratios from the estimated multilevel negative binomial models for functional limitations. The reduced model, with no contextualization, serves as a reference estimate of functional limitations after adjusting for the seven covariates only. The succeeding bars for each independent variable depict change in the effects of the covariates on functional limitations, compared to the reduced model, and after adjusting for the three sources of neighborhood information (i.e., respondent, interviewer, and census data).
FIGURE 1.
Functional limitations in the Health and Retirement Study (2004–2010): Change in incidence-rate ratios across sources of neighborhood data. Gray line represents an incidence-rate ratio equal to 1. Race/ethnicity of the respondent is measured using a series of binary variables; reference group = White. *p < .05. **p < .01. ***p < .001 (two-tailed tests).
In the reduced model, the incidence-rate ratios for age, female, and race were greater than 1, indicating that being older, female, and Black were all associated with more functional limitations. By comparison, higher educational attainment and wealth had incidence-rate ratios less than 1 and were thus related to fewer functional limitations. The incorporation of neighborhood data had the greatest influence on race and ethnicity, whereas the contextual influence was negligible for other covariates. Specifically, the effect of Black was rendered nonsignificant after adjusting for neighborhood conditions, regardless of the data source. In addition, the effect of Hispanic was nonsignificant in the reduced model, but became significant after adjusting for the census data.
We also tested an interaction between race/ethnicity and neighborhood SES to examine whether some individuals may be more susceptible to functional limitations than others in a given context. Figure 2 displays the adjusted predictions of functional limitations by race/ethnicity and neighborhood SES. The results show that the differences in functional limitations are modest in the higher SES neighborhoods—good for all groups. By contrast, Figure 2 reveals that, net of wealth, White older adults experience more functional limitations than Black and Hispanic older adults in low-SES neighborhoods.
FIGURE 2.
Adjusted predictions of functional limitations by race/ethnicity and neighborhood socioeconomic status (SES) in the Health and Retirement Study (2004–2010). Middle 80% of cases on neigborhood SES are presented to exclude predictions based on nonobserved data.
Given the centrality of the respondent’s assessment of neighborhood physical disorder in the aforementioned model, we also used multilevel models to examine physical disorder as an outcome and adding interviewer and census data in parallel fashion. We compared the results from the reduced model, featuring demographic and socioeconomic covariates only, to models that adjust separately for interviewer ratings and neighborhood SES before combining them in the full model.
Figure 3 presents the percent decrease in unstandardized regression coefficients after accounting for the different sources of neighborhood data. As Figure 3 shows, the geocoded data contributed a great deal to respondent’s observations, whereas the impact of interviewer ratings was comparatively smaller. For instance, the inclusion of census tract data reduced the effect of Black by more than half (64% decrease), compared to a 26% decrease after adjusting for interviewer observations. Similarly, the effect of Hispanic was reduced by 103% after adjusting for the geocoded data, compared to an 18% decrease after incorporating interviewer ratings. One exception to the consistently strong effects of the geocoded data relates to age, where only interviewer ratings produced a reduction in the coefficient for age (59%).
FIGURE 3.
Physical disorder in the Health and Retirement Study (2004–2008): Percent decrease in unstandardized regression coefficients across sources of neighborhood data. Race/ethnicity of the respondent is measured using a series of binary variables; reference group = White.
The results presented thus far provide evidence that contextual variables are useful to enhance the utility of survey data. When studying functional limitations in later life, moreover, these results reveal that the addition of neighborhood data is consequential because it leads to more nuanced conclusions about race and Hispanic ethnicity and the role of environment in shaping health disparities. The results for the fully adjusted models displayed in Figures 1–3 are presented in Table 2.
TABLE 2.
Fully adjusted multilevel models of functional limitations and physical disorder
Functional Limitationsa
|
Physical Disorderb
|
||
---|---|---|---|
n = 7,264 | n = 7,266 | ||
Individual-level variables | |||
Age | 1.023*** [1.020, 1.025] | 1.023*** [1.020, 1.026] | −0.002 [−0.006, 0.001] |
Female | 1.300*** [1.238, 1.366] | 1.300*** [1.238, 1.365] | 0.003 [−0.054, 0.061] |
Blackc | 0.938 [0.866, 1.017] | 0.243*** [0.121, 0.491] | 0.172** [0.065, 0.278] |
Otherc | 0.970 [0.819, 1.148] | 0.414 [0.064, 2.682] | 0.161 [−0.043, 0.365] |
Hispanicc | 0.898* [0.812, 0.992] | 0.303** [0.131, 0.699] | −0.004 [−0.133, 0.125] |
Education | 0.963*** [0.954, 0.971] | 0.963*** [0.955, 0.972] | −0.035*** [−0.046, −0.024] |
Wealth | 0.972*** [0.966, 0.977] | 0.972*** [0.966, 0.977] | −0.011** [−0.018, −0.004] |
Physical disorder | 1.053*** [1.034, 1.073] | 1.051*** [1.032, 1.071] | |
Vacant buildings | 1.026 [0.866, 1.215] | 1.032 [0.872, 1.222] | −0.013 [−0.223, 0.197] |
Poor conditions | 0.983 [0.953, 1.014] | 0.988 [0.957, 1.020] | 0.040 [−0.000, 0.080] |
Upkeep of nearby structures | 1.166*** [1.112, 1.222] | 1.161*** [1.108, 1.217] | 0.194*** [0.134, 0.254] |
Relative maintenance of home | 1.119*** [1.065, 1.175] | 1.120*** [1.066, 1.176] | 0.020 [−0.040, 0.081] |
Census-tract level variable | |||
Neighborhood SES | 0.992** [0.988, 0.997] | 0.985*** [0.979, 0.991] | −0.047*** [−0.053, −0.042] |
Cross-level interactions | |||
Neighborhood SES x Blackc | 1.018*** [1.009, 1.028] | ||
Neighborhood SES x Otherc | 1.011 [0.987, 1.035] | ||
Neighborhood SES x Hispanicc | 1.014* [1.003, 1.026] | ||
Constant | 1.374 [0.877, 2.151] | 2.504** [1.462, 4.291] | 6.552*** [5.988, 7.116] |
−2 log likelihood | 32302.720 | 32286.402 | 23700.524 |
Note.
Column presents results for negative binomial multilevel model; incidence-rate ratios with 95% confidence intervals in brackets;
Column presents results for linear multilevel model; unstandardized regression coefficients with 95% confidence intervals in brackets;
Reference = White.
p < .05.
p < .01.
p < .001 (two-tailed tests).
With evidence that geocoded data add value to investigations of widely used outcome variables, the third stage of our analysis zeroes in on the question of the size of the geocoded unit. Does the unit of measurement for geocoded data threaten the integrity of the conclusions raised earlier? To examine this question, we used a single-level regression model to compare the results from the reduced model to models that adjust for neighborhood SES summarized at three levels: state, county, and census tract. Given the salience of Black and Hispanic ethnicity in the preceding analyses, we also focus on these variables, examining the impact of contextual data on these covariates only. We began with a reduced model that included demographic and socioeconomic covariates only.
Figure 4 presents incidence-rate ratios from the estimated negative binomial regression models predicting functional limitations, enabling us to examine the relative influence of neighborhood SES summarized at three different levels—state, county, and census tract—on the race/ethnicity–health relationship. In the reduced model, being Black is associated with increased functional limitations. Once neighborhood information is incorporated, however, the effect of Black becomes nonsignificant across all models. Interestingly, the effect of Hispanic is significant only after adjusting for neighborhood SES at the census tract level; otherwise, the effect is concealed. Thus, we observe some benefit from using geocoded data summarized at smaller geographic units, but the evidence also suggests that larger, more heterogeneous units are still helpful in obtaining reasonable approximations of relationships. The racial difference becomes nonsignificant after adjusting for neighborhood SES at either the state, county, or census tract levels.
FIGURE 4.
Racial and ethnic differences in functional limitations in the Health and Retirement Study (2004–2010): Change in incidence-rate ratios across levels of neighborhood SES. Gray line represents an incidence-rate ratio equal to 1. Models adjust for demographic and socioeconomic covariates. *p < .05. **p < .01. ***p < .001 (two-tailed tests).
DISCUSSION
As innovative analytic methods, rich data, and cross-disciplinary perspectives of human development have intersected, researchers increasingly underscore the importance of context, emphasizing time and place. Elder, Shanahan, and Jennings (2015) argue that applying a life course perspective led to a “greater appreciation for the necessity of longitudinal and contextually rich data” (p. 11). However, Wahl et al. (2012) note that longitudinal analysis is not sufficient for understanding aging individuals; interactions between the person and environment should be considered along with potential variation between and within cohorts. Important features of understanding how individuals experience their environment include not only their perceptions of the environment—captured by survey questionnaires—but also the objective properties of the environment (Bronfenbrenner, 1977)—captured via data linkages. In this study, we reviewed multiple methods and sources of information for contextualizing survey data to assess what is gained by doing so.
Survey data are meant to capture real-world experiences within natural environments; however, individual responses to some inquiries may not accurately represent the environment. Although in some instances this may be preferred from a theoretical perspective, other times it is important to obtain an accurate representation of the ecological context. Moreover, it is likely that individuals are not aware of some environmental influences, especially if they are not derived from the immediate setting (Bronfenbrenner, 1977). For instance, our examination of the correlations among respondent, interviewer, and geocoded neighborhood measures reveal moderate correlations at best. Thus, incorporating outside evaluations, more objective measures, and/or accounting for between- and within-group differences may add information beyond the perceptions of the respondent. Perhaps most important, however, is to examine potential record linkages and include contextual elements at the unit of analysis most appropriate for the outcome and research question. Whereas some of these sources of contextual data are fairly common across fields of study, others such as the use of interviewer data, or a combination of sources, are less common.
Our demonstration offers three important insights into contextualizing survey data: (1) contextualizing survey data not only enables researchers to obtain more precise estimates, but contributes to a more evidence-based understanding of health and aging; (2) researchers may draw on different sources for contextual information, but some may be more valuable than others; and (3) unit of analysis is important to consider when adding contextual variables, but coarse units are still helpful.
First, failing to account for context may lead to misguided conclusions related to intra- and interindividual variability. Although we found that one can obtain reasonable estimates without contextual data, one can obtain more precise estimates and learn important information by including them. Without accounting for ecological variables, researchers may inadvertently attribute differences to individual factors or speculate about “environmental influence.” Integrating ecological factors into analyses may actually show the importance of individual attributes and behavior, but the claim of influence due to individual factors is more compelling when accounting for ecological variables.
In more specific terms, we found that the addition of neighborhood measures clearly influenced some estimates (i.e., Black and Hispanic), whereas other estimates remained essentially unaffected (i.e., age, female). Cagney, Browning, and Wen (2005) report similar findings for self-rated health—after adding neighborhood-level SES measures to their models, the effect of Black on self-rated health was no longer significant. By including information about the environment in which people live, we were able to clarify race differences in our physical health outcome.
Although adding neighborhood SES may have rendered the Black difference nonsignificant (previously significant in reduced models), research on racial residential segregation suggests that neighborhood SES may not be an alternative explanation for health disparities, but rather another indicator for race or social inequality. The effect of neighborhood context on health is well established, with studies showing that exposure to poor neighborhood conditions, such as crime, air pollution, and vacant housing negatively influence health. Such conditions may be the true engines of inequality because they create barriers to healthy behaviors and health services and lead to other risk factors such as physical inactivity (Williams & Collins, 2001). Segregation persists in the United States and Black adults are more likely than White adults to live in disadvantaged neighborhoods. Williams and Collins (2001) thus argue that racial residential segregation causes race differences in SES, which, in turn, shape health disparities. In addition, Link and Phelan’s (1995) fundamental cause theory labels SES a “fundamental cause of disease,” but because resources are strongly patterned by race, the authors later argued that racism, too, may be a fundamental cause of health inequalities (Phelan & Link, 2015).
We also tested a person–environment interaction to examine whether, in a given context, some individuals are more vulnerable to functional limitations than others. We found that, net of wealth, White older adults in low-SES neighborhoods experienced more functional limitations than Black and Hispanic older adults in low-SES neighborhoods. Although minority adults have fewer resources, on average, prior research indicates that Black older adults perceive levels of financial strain that are comparable to White older adults (Kahn & Fazio, 2005). Drawing on the concept of relative deprivation, the authors suggest that, because minority adults have been disproportionately exposed to poverty, Black older adults may “consider themselves to be relatively better off, in spite of their own probable low financial status” (p. 79); the same could be suggested for Hispanic older adults. This is noteworthy given repeated evidence that perceptions matter more for adult health than objective circumstances (e.g., Singh-Manoux, Marmot, & Adler, 2005). Similar to our findings, Geronimous et al. (2015) showed that the negative effect of low SES on biological health was stronger for White adults than for Black adults—a finding the authors suggest may reflect a lack of “collective strategies for pooling risk that buffer the negative health effects of material deprivation and stigma for other low-income groups” (p. 16).
Second, some sources of data may be more appropriate than others, and thus may reveal more information. Perhaps most noteworthy in our results, the neighborhood measures did not always affect the estimates similarly. Compared to the subjective respondent and interviewer evaluations of neighborhood characteristics, objective census tract data are important for ecological validity because they led to disparate conclusions about Hispanic ethnicity. Differences in functional limitations by Hispanic ethnicity were found only in models with the geocoded neighborhood SES variable. Whereas any neighborhood information attenuated the effect of Black, neighborhood SES revealed differences in functional limitations by ethnicity. Moreover, these differences point to what researchers describe as the Hispanic paradox (Markides & Coriel, 1986)—Hispanic adults were found to have fewer functional limitations than non-Hispanic adults after adjusting for neighborhood SES. Failing to include the objective measure of neighborhood SES would lead researchers to misguided conclusions about racial and ethnic disparities in functional limitations. This conclusion is likely specific to our analyses of health outcomes.
Deciding upon the best source of information to contextualize data depends upon the research question and guiding theory, and the value added by each likely depends upon the outcome of interest. For instance, neighborhood interviewer observations such as broken windows and graffiti as well as geocoded SES data likely tap different aspects of the social and environmental surroundings, which may be distinctly associated with certain outcomes. If researchers do not have access to data linkages such as interviewer observations and geocoded data, variables or groups of variables may be utilized to account for some contextual effects. It is imperative of the researcher to explore and take advantage of available data.
Third, and similar to the previous comment, unit of analysis is important to consider when adding contextual variables. Although we show that coarse units are still helpful, some units may be more appropriate than others depending upon the research question and variables of interest. We aggregated the neighborhood SES index created with census data to compare across three geographic units of analysis: state, county, and census tract. As shown in Figure 4, neighborhood SES at the smallest geographic unit—census tract—had the greatest impact on results. Although the unit of analysis does not change the conclusions for the effect of Black, the effect of Hispanic is significant when using the census-tract level variable only. Thus, we learn more by using smaller units of analysis, which are often preferred over larger units for studies of health and place. If needed, one may consider “neighborhood clusters” that aggregate two or more contiguous census tracts, without aggregating to counties (Lee & Ferraro, 2007). Choosing the appropriate unit of analysis should also be guided by theory and/or the availability of data and methods.
We aimed to investigate the value associated with integrating contextual data—illustrating this empirically—but the theory guiding a study is pivotal to determining the value utility of contextual data. For instance, integrating information on social context (e.g., household data) would likely be of great value to a study that draws on the principle of linked lives; incorporating environmental context (e.g., geocoded data) may add value to a study guided by and with a focus on person–environment fit; and temporal context (e.g., longitudinal data) is imperative for studies utilizing theories of accumulation. Moreover, a combination of linkages may offer the most value, however, the value added and appropriate linkages depend on the underlying theory and research question. Stated differently, though some theses have begun to outline multiple dimensions of context (e.g., ecological theory of aging, Wahl et al., 2012; life course perspective, Elder, 1998) suggesting all would add value, a researcher may be interested in a particular mechanism that another, more specific, theory has placed greater emphasis on (e.g., socio-emotional selectivity theory, cumulative inequality theory).
Although the analyses were meant to explore what, if any, additional information can be gathered by contextualizing data, the results must be interpreted with some limitations in mind. First, there was roughly 27% missing on interviewer observations. We are unable to test whether these missing cases were somehow systematic and thus biased; interviewers may be less likely to rate certain neighborhoods altogether, such as those with atypical characteristics like a single building. Second, it may be argued that the neighborhood SES measure used in this study represents compositional effects—relating to the distribution of people with similar characteristics—and is not comparable to interviewer observations which represent contextual effects—relating to the social and physical environment in which people live (Curtis & Jones, 1998). Distinguishing between ecological influence and ecological composition is challenging and likely requires temporal ordering of both survey and contextual data. Third, aggregating smaller units up to a larger unit of analysis may not offer the same scientific benefit as what is derived from analyses using the smaller units. For instance, our state-level neighborhood SES variable was based on samples from census tracts; a similar, but different variable could be made by drawing the sample from the state population, and we project may be even less useful.
Despite these limitations, this study is distinctive by examining the value of contextualizing survey data by linking respondent information to both interviewer and geocoded census data. Incorporating neighborhood data influenced the conclusions related to race and ethnicity, suggesting the importance of context in the daily lives of older racial and ethnic minorities. Of note, the other variables in both models remained fairly consistent, indicating that the contextual effects of neighborhood, no matter the source of the measure, on health are most strongly related to race and ethnicity.
There are many ways in which researchers can supplement data to account for historical and ecological context. For instance, Le-Scherban et al. (2014) examined the relation between changes in neighborhood composition and body mass index over time using geocoded data and longitudinal analysis. Others suggest intriguing combinations of adding contextual data such as integrating ethnographic data and geographic information system (GIS) technologies to explore families in neighborhoods in time and space (Matthews, Detweiler, & Burton, 2005). With access to rich data and various record linkages, the horizon of possibilities is vast. Although not all linkages may be helpful and contextual effects may not be manifest in all results, failing to account for social, environmental, and historical context may lead to misguided conclusions. Theories of human development, aging, and health have long directed scholars to focus on contextual effects, and the data for capturing these influences continue to proliferate.
Acknowledgments
FUNDING
Support for this research was provided by a grant from the National Institute on Aging to K. Ferraro (AG043544).
Contributor Information
Lindsay R. Wilkinson, Baylor University
Kenneth F. Ferraro, Purdue University
Blakelee R. Kemp, Purdue University
References
- Antonucci TC. Social relations: An examination of social networks, social support, and sense of control. In: Birren JE, Schaie KW, editors. Handbook of the psychology of aging. 5. San Diego, CA: Academic Press; 2001. p. 427. [Google Scholar]
- Baltes PB. Theoretical propositions of life-span developmental psychology: On the dynamics between growth and decline. Developmental Psychology. 1987;23(5):611–626. doi: 10.1037/0012-1649.23.5.611. [DOI] [Google Scholar]
- Boyd LH, Iversen GR. Contextual analysis: Concepts and statistical techniques. Belmont, CA: Wadsworth Publishing; 1979. [Google Scholar]
- Bronfenbrenner U. Toward an experimental ecology of human development. American Psychologist. 1977;32(7):513–531. doi: 10.1037/0003-066X.32.7.513. [DOI] [Google Scholar]
- Cagney KA, Browning CR, Wen M. Racial disparities in self-rated health at older ages: What difference does the neighborhood make? Journals of Gerontology Series B: Psychological Sciences and Social Sciences. 2005;60(4):S181–S190. doi: 10.1093/geronb/60.4.S181. [DOI] [PubMed] [Google Scholar]
- Clarke P, Morenoff J, Debbink M, Golberstein E, Elliott MR, Lantz PM. Cumulative exposure to neighborhood context: Consequences for health transitions over the adult life course. Research on Aging. 2014;36(1):115–142. doi: 10.1177/0164027512470702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cornwell EY, Cagney KA. Assessment of neighborhood context in a nationally representative study. Journal of Gerontology: Social Sciences. 2014;69(Suppl 2):S51–S63. doi: 10.1093/geronb/gbu052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crimmins EM, Kim JK, Solé-Auró A. Gender differences in health: Results from SHARE, ELSA and HRS. European Journal of Public Health. 2010;21(1):81–91. doi: 10.1093/eurpub/ckq022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Curtis S, Rees Jones I. Is there a place for geography in the analysis of health inequality? Sociology of Health & Illness. 1998;20(5):645–672. doi: 10.1111/1467-9566.00123. [DOI] [Google Scholar]
- Deindl C, Brandt M, Hank K. Social networks, social cohesion, and later-life health. Social Indicators Research. 2016;126(3):1175–1187. doi: 10.1007/s11205-015-0926-5. [DOI] [Google Scholar]
- Elder GH. The life course as developmental theory. Child Development. 1998;69:1–12. doi: 10.1111/j.1467-8624.1998.tb06128.x. [DOI] [PubMed] [Google Scholar]
- Elder GH, Shanahan MJ, Jennings JA. Human development in time and place. In: Lerner RM, editor. Handbook of child psychology and developmental science. Vol. 4. Hoboken, NJ: Wiley; 2015. pp. 6–54. [Google Scholar]
- Feng Q, Zhu H, Zhen Z, Gu D. Self-rated health, interviewer-rated health, and their predictive powers on mortality in old age. Journal of Gerontology: Social Sciences. 2016;71(3):538–550. doi: 10.1093/geronb/gbu186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferraro KF, Shippee TP. Aging and cumulative inequality: How does inequality get under the skin? The Gerontologist. 2009;49:333–343. doi: 10.1093/geront/gnp034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fingerman KL, Pillemer KA, Silverstein M, Suitor JJ. The baby boomers’ intergenerational relationships. The Gerontologist. 2012;52(2):199–209. doi: 10.1093/geront/gnr139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Geronimous AT, Pearson JA, Linnenbringer E, Schulz AJ, Reyes AG, Epel ES, … Blackburn EH. Race/ethnicity, poverty, urban stressors, and telomere length in a Detroit community-based sample. Journal of Health and Social Behavior. 2015;56(2):199–224. doi: 10.1177/0022146515582100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glass TA, Balfour JL. Neighborhoods, aging, and functional limitations. In: Kawachi I, Berkman LF, editors. Neighborhoods and health. New York, NY: Oxford University Press; 2003. pp. 303–334. [Google Scholar]
- Kahn JR, Fazio EM. Economic status over the life course and racial disparities in health. Journal of Gerontology: Social Sciences. 2005;60(2):S76–S84. doi: 10.1093/geronb/60.Special_Issue_2.S76. [DOI] [PubMed] [Google Scholar]
- King AC, Sallis JF, Frank LD, Saelens BE, Cain K, Conway TL, … Kerr J. Aging in neighborhoods differing in walkability and income: Associations with physical activity and obesity in older adults. Social Science & Medicine. 2011;73(10):1525–1533. doi: 10.1016/j.socscimed.2011.08.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krieger N, Zierler S, Hogan JW, Waterman P, Chen J, Lemieux K, Gjelsvik A. Geocoding and measurement of neighborhood socioeconomic position: A U.S. perspective. In: Kawachi I, Berkman LF, editors. Neighborhoods and health. New York, NY: Oxford University Press; 2003. pp. 147–178. [Google Scholar]
- Lawton MP, Nahemow L. Ecology and the aging process. In: Eisdorfer C, Lawton MP, editors. The psychology of adult development and aging. Washington, DC: American Psychological Association; 1973. pp. 619–674. [Google Scholar]
- Lee MA, Ferraro KF. Neighborhood residential segregation and physical health among Hispanic Americans: Good, bad, or benign? Journal of Health and Social Behavior. 2007;48(2):131–148. doi: 10.1177/002214650704800203. [DOI] [PubMed] [Google Scholar]
- Lê-Scherban F, Albrecht SS, Osypuk TL, Sánchez BN, Diez Roux AV. Neighborhood ethnic composition, spatial assimilation, and change in body mass index over time among Hispanic and Chinese immigrants: Multi-ethnic study of atherosclerosis. American Journal of Public Health. 2014;104(11):2138–2146. doi: 10.2105/AJPH.2014.302154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leventhal T, Brooks-Gunn J. Moving to opportunity: An experimental study of neighborhood effects on mental health. American Journal of Public Health. 2003;93(9):1576–1582. doi: 10.2105/AJPH.93.9.1576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Link BG, Phelan JC. Social conditions as fundamental causes of disease. Journal of Health and Social Behavior. 1995;35:80–94. doi: 10.2307/2626958. [DOI] [PubMed] [Google Scholar]
- Markides KS, Coreil J. The health of Hispanics in the southwestern United States: An epidemiologic paradox. Public Health Reports. 1986;101(3):253–265. [PMC free article] [PubMed] [Google Scholar]
- Matthews SA, Detwiler JE, Burton LM. Geo-ethnography: Coupling geographic information analysis techniques with ethnographic methods in urban research. Cartographica: the International Journal for Geographic Information and Geovisualization. 2005;40(4):75–90. doi: 10.3138/2288-1450-W061-R664. [DOI] [Google Scholar]
- Mayer KU. An observatory for life courses: Populations, countries, institutions, and history. Research in Human Development. 2015;12(3/4):196–201. doi: 10.1080/15427609.2015.1068051. [DOI] [Google Scholar]
- Phelan JC, Link BG. Is racism a fundamental cause of inequalities in health? Annual Review of Sociology. 2015;41:311–330. doi: 10.1146/annurev-soc-073014-112305. [DOI] [Google Scholar]
- Phillipson C, Scharf T. Rural and urban perspectives on growing old: Developing a new research agenda. European Journal of Ageing. 2005;2(2):67–75. doi: 10.1007/s10433-005-0024-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rogosa D. Myths about longitudinal research. In: Schaie KW, Campbell RT, Meredith W, Rawlings SC, editors. Methodological issues in aging research. New York, NY: Springer Publishing; 1988. pp. 43–69. [Google Scholar]
- Schmuckler MA. What is ecological validity? A dimensional analysis. Infancy. 2001;2(4):419–436. doi: 10.1207/S15327078IN0204_02. [DOI] [PubMed] [Google Scholar]
- Singh-Manoux A, Marmot MG, Adler NE. Does subjective social status predict health and change in health status better than objective status? Psychosomatic Medicine. 2005;67:855–861. doi: 10.1097/01.psy.0000188434.52941.a0. [DOI] [PubMed] [Google Scholar]
- Wahl HW, Iwarsson S, Oswald F. Aging well and the environment: Toward an integrative model and research agenda for the future. The Gerontologist. 2012;52(3):306–316. doi: 10.1093/geront/gnr154. [DOI] [PubMed] [Google Scholar]
- Williams DR, Collins C. Racial residential segregation: A fundamental cause of racial disparities in health. Public Health Reports. 2001;116(5):404–416. doi: 10.1016/S0033-3549(04)50068-7. [DOI] [PMC free article] [PubMed] [Google Scholar]