Abstract
Objectives
The National Social Life, Health, and Aging Project assessed functioning of all 5 senses using both self-report and objective measures. We evaluate the performance of the objective measures and model differences in sensory function by gender and age. In the process, we demonstrate how to use and interpret these measures.
Methods
Distance vision was assessed using a standard Sloan eye chart, and touch was measured using a stationary 2-point discrimination test applied to the index fingertip of the dominant hand. Olfactory function (both intensity detection and odor identification) was assessed using odorants administered via felt-tip pens. Gustatory function was measured via identification of four taste strips.
Results
The performance of the objective measures was similar to that reported for previous studies, as was the relationship between sensory function and both gender and age.
Discussion
Sensory function is important in studies of aging and health both because it is an important health outcome and also because a decline in functioning can be symptomatic of or predict other health conditions. Although the objective measures provide considerably more precision than the self-report items, the latter can be valuable for imputation of missing data and for understanding differences in how older adults perceive their own sensory ability.
AS part of its attempt to obtain a comprehensive assessment of respondent health, the National Social Life, Health, and Aging Project (NSHAP) included measurements of each of the five senses. By taking advantage of several recent advances in the in-home collection of biomeasures (Lindau & McDade, 2007), NSHAP was able to obtain biomeasures of visual, tactile (touch), gustatory (taste), and olfactory (smell) function (auditory function was assessed via self-report only). The purposes of this article were to describe the methods by which these measurements were obtained, to report on the quality of the resulting data, and to illustrate how these data may be used analytically.
Sensory function is an important aspect of health, especially as people age. Declines in sensory function may be symptomatic of underlying disease and can affect personal safety (Anstey, Wood, Lord, & Walker, 2005), quality of life, and perceived health (Ostbye et al., 2006). In addition, a decline in sensory function may limit participation in intimate relationships and other types of social activities, which may in turn have additional negative consequences for health. Exploring this type of dynamic process in which health and social interaction are intertwined was one of NSHAP’s primary objectives (Lindau, Laumann, Levinson, & Waite, 2003).
Several population and clinical studies have documented age-related decline in sensory function. For example, studies in the United States (Murphy et al., 2002), Germany (Hummel, Kobal, Gudziol, & Mackay-Sim, 2007; Landis, Konnerth, & Hummel, 2004), and Sweden (Bramerson, Johansson, Ek, Nordin, & Bende, 2004) have found the likelihood of olfactory dysfunction to increase substantially after age 55, affecting a larger proportion of the population than previously thought (Landis & Hummel, 2006). Data for gustation are limited to much smaller clinical studies but also suggest a decline in gustatory function with age independent of the decline in olfaction (Fukunaga, Uematsu, & Sugimoto, 2005; Seiberling & Conley, 2004). National data on vision and hearing are available from the National Health and Nutrition Examination Survey and indicate that the presence of both visual and hearing impairments increases substantially with age (Li, Healy, Wanzer Drane, & Zhang, 2006; Vitale, Cotch, & Sperduto, 2006). Finally, a few small studies have found a decrease in hand sensibility with age (Desrosiers, Hebert, Bravo, & Dutil, 1996; Ranganathan, Siemionow, Sahgal, & Yue, 2001; Wickremaratchi & Llewelyn, 2006). Our objective was to confirm these results with the NSHAP data while investigating the psychometric properties of each measurement module.
ANALYTIC APPROACH
Multiple Measurements
The protocols for measuring olfactory, gustatory, and tactile functioning each involved administering several different stimuli in blinded fashion and asking respondents to identify each one from a list of possible alternatives. It is assumed that the ability to identify the stimulus correctly reflects the respondent’s underlying level of function in that particular sensory domain. A standard way to analyze such data is with an item-response model. Assuming yij is the response for respondent i to the jth item, the simplest such model is
![]() |
where E(yij) is the mean of yij conditional on ηij, g() is known as the link function, and αi and θj are sets of respondent-specific and item-specific parameters, respectively. When yij takes the values 0 or 1, a common choice for g() is the logit function (Cox & Snell, 1989):
![]() |
(1) |
Model 1 is the well-known Rasch model (Rasch, 1960) developed for evaluating and scoring educational tests composed of binary items and may be fit to NSHAP’s sensory data by coding each identification item as either correct (1) or incorrect (0). The parameter θj may then be thought of as the underlying difficulty of the jth identification task and the parameter αi as the underlying ability of the ith respondent to perform such tasks.
Model 1 has two important limitations (Skrondal & Rabe-Hesketh, 2004, pp. 292–298). First, it assumes that differences in ability affect performance on all items equally. This may be true of a well-constructed test battery or psychological scale but may not be true of NSHAP’s sensory function modules because they were not designed with this criterion in mind. To accommodate this, we may extend Equation 1 in the following way:
![]() |
(2) |
where the λj are analogous to factor loadings in a factor analytic model (note that only j − 1 of the λj are identifiable). This is referred to as a two-parameter item-response model (Birnbaum, 1968) because each item is now represented by two parameters (θj and λj).
A second limitation evident from Equation 1 is that for any given item, the probability of a “correct” response approaches 0 as the respondent’s sensory ability decreases. This is clearly not realistic in cases where the response involves choosing from a fixed set of possibilities, as in the case of NSHAP’s sensory identification items (despite this, Equation 1 and Equation 2 are still often used to analyze data from multiple choice items). A more realistic model is
![]() |
(3) |
where in the case of the logit link g−1(x) = ex/(1 + ex). In this model, as ability approaches −∞, the probability of a correct response approaches c, which can therefore be interpreted as the probability of a pure guess (i.e., by someone with no ability) being correct. Although c is represented here as being constant across items, this can be relaxed (Birnbaum, 1968).
In cases where sensory function is hypothesized to depend on certain observed covariates, this can be accomplished within the context of these models by specifying a structural model for αi:
![]() |
where xi is a vector of covariates for respondent i (note that if xi includes a constant one of the θj must be set to 0 for identification). The resulting model is referred to as the Multiple Indicator Multiple Causes model (Joreskog & Goldberger, 1975).
In addition to models in which sensory function is treated as an outcome, one may also wish to treat it as an explanatory variable in analyses of other outcomes—as, for example, in exploring the hypothesis noted above that poor sensory function may limit one’s ability to engage in satisfying intimate relationships (in reality, this relationship may be bidirectional). Such analyses may be conducted in two ways. First, one may estimate a full structural equation model (Bollen, 1989) combining one of the measurement models above with another model in which αi is also used as a predictor. A second, simpler approach is to use the fitted measurement model to compute empirical Bayes predictions of the αi and then simply use these estimates as covariates in another model. It is important to note that although the
differ from the true αi, this approach can still yield consistent (though less efficient) estimates for the model of interest, though the standard errors will be biased downward because the
are being treated as nonrandom (Whittemore, 1989). A possible way to address this would be to bootstrap the entire process (i.e., both the estimation of
and the model of interest; Efron & Tibshirani, 1993).
All the models described above are examples of Generalized Linear Latent and Mixed Models (GLLAMMs; Skrondal & Rabe-Hesketh, 2004) and may be fit in Stata (StataCorp, 2007) using the gllamm package (Zheng & Rabe-Hesketh, 2007). For convenience, we assume that the αi are distributed Gaussian with mean given by the corresponding structural model (if specified).
Objective Versus Subjective Measurement
In addition to biomeasures of sensory function, NSHAP also obtained self-reported measures of function for each of the five senses. Because self-reported measures of function can have poor reliability (e.g., Landis, Hummel, Hugentobler, Giger, & Lacroix, 2003) and/or be affected by reporting bias, such measures are typically of interest only in cases where biomeasures are unavailable. For example, because NSHAP’s visual acuity testing was performed on only half of the respondents, while the self-reported measure was obtained from all, one could use the half sample with both measures to develop and estimate a model for the self-report process and then use this model to impute visual acuity for those who were not tested. Similarly, one might use self-reports of olfactory, gustatory, and tactile function to impute missing values for the corresponding biomeasures due to nonresponse (note, however, that in these three cases, the corresponding self-reported measures were obtained from only half of the respondents).
In addition to obtaining objective measurements of sensory function as part of a holistic assessment of health, the NSHAP research team was also interested in the effects that a decline in sensory function may have on an older adult’s participation in intimate and other types of social activity. Because such effects may be due in part to self-limitation, the way in which respondents perceive their sensory function—as distinct from their actual level of function—becomes important. This leads to consideration of the distribution of self-reported function conditional on αi, and the way in which this distribution depends on various covariates. Such an analysis can be performed using the methods described earlier. Because vision was assessed using only a single measure of visual acuity, one can model self-reported vision using visual acuity directly as a covariate (e.g., see Globe, Wu, Azen, & Varma, 2004).
At the conclusion of each interview, NSHAP interviewers were asked to rate the respondent’s vision and hearing using a 5-point scale. The resulting data (not presented here) may be used in models for imputing missing values of visual acuity and/or self-reported vision and hearing.
OLFACTION
Olfactory function was assessed in all respondents using tests of both odor sensitivity (i.e., the lowest concentration at which an odor can be detected) and odor identification. Odorants were administered using commercially available felt-tip pens, each filled with an individual odorant at a specific concentration. This device is inexpensive, convenient, and ideally suited to delivering odorants at a constant concentration (Hummel, Sekinger, Wolf, Pauli, & Kobal, 1997). After we developed the NSHAP protocol, another short screening method using the pens was independently developed (Mueller & Renner, 2006).
The test of odor sensitivity involved presenting a series of five pens, the first containing only the diluent propylene glycol (1,2-propanediol) followed by steadily increasing concentrations of the odorant n-butanol (0.13%, 0.50%, 2.00%, and 8.00%). Each pen was held by the interviewer (who wore a cotton glove to eliminate residual odors) approximately half an inch from the respondent’s nostrils; respondents were then asked to inhale through the nose, during which time the pen was waved slowly back and forth for no more than 3–4 s. Following each pen, respondents used a visual analog scale ranging from 0 (labeled no smell at all) to 10 (labeled smells very strong) to record their perception of odor strength.
Although there is not sufficient space to present the sensitivity data here, we note that there was some variation among respondents in the way in which the visual analog scale was completed. Respondents were given a choice between recording their responses directly on the interviewer’s laptop computer (using the mouse to position a slider) or on a paper version of the scale. Nearly two thirds of those who completed the olfactory module chose the paper version (61%), and the likelihood of choosing paper was higher for women and increased with age. Despite instructions to mark an X on the line representing the scale, 17%–18% of those who recorded their answers on paper wrote an integer between 0 and 10 instead.
Following the sensitivity test, respondents were presented with a five-item identification test. A single odor was presented, and respondents were asked to identify it from a set of alternatives (responses were recorded by the interviewer on the computer). This was repeated using five individual odors. The response sets were as follows (in order of administration and with the true odorant indicated in italics): (a) chamomile, raspberry, rose, or cherry; (b) smoke, glue, leather, or grass; (c) orange, blueberry, strawberry, or onion; (d) bread, fish, cheese, or ham; and (e) chive, peppermint, pine, or onion. Following a forced-choice paradigm, respondents were not permitted to answer “don’t know”; however, for each test, 1%–3% of respondents refused to answer. Although these refusals are excluded from the analyses presented here, they may in many cases reflect uncertainty about the correct response, and therefore, other analysts may wish to handle them differently.
Results for the identification tests are presented in Table 1. For this analysis, we have focused solely on whether the respondent was able to identify the odorant correctly; a more in-depth analysis might examine the distribution of responses among the various incorrect alternatives. Each of the five odorants was identified correctly by a majority of respondents, with peppermint identified correctly most often (92%) and leather identified correctly least often (71%). Item nonresponse (including item-specific refusal to give a response plus 61 respondents [2%] who declined the entire olfactory module, 1 respondent who broke off the interview at an earlier point, and one instance of an equipment problem) was highest for the first item (5%) and declined steadily thereafter. The increasing likelihood of a correct response coupled with decreasing item nonresponse is consistent with the possibility that some respondents became more adept at the task after the first couple of tries. However, because the order of the items was identical for all respondents, it is not possible to distinguish between such order effects and true item-specific differences in difficulty.
Table 1.
Item-Response Models Fit to Odor Identification Data (SEs)
n = 2,928 |
||||||
Odor | Percent correct | Item nonresponsea | Parameter | Model 1 | Model 2A | Model 2B |
Item difficulty | ||||||
Rose | 75.9 | 4.9 | θrose | 1.41 (0.06) | 1.33 (0.06) | −0.58 (0.21) |
Leather | 70.5 | 4.6 | θleather | 1.08 (0.05) | 1.01 (0.05) | −0.77 (0.20) |
Orange | 84.8 | 4.4 | θorange | 2.10 (0.06) | 2.24 (0.11) | −0.53 (0.27) |
Fish | 87.0 | 3.7 | θfish | 2.31 (0.07) | 2.32 (0.10) | 0.00 |
Peppermint | 91.6 | 3.3 | θpeppermint | 2.87 (0.08) | 3.45 (0.22) | −0.01 (0.30) |
Item discrimination | ||||||
λrose | 0.80 (0.12) | 0.83 (0.11) | ||||
λleather | 0.75 (0.11) | 0.77 (0.11) | ||||
λorange | 1.18 (0.18) | 1.20 (0.16) | ||||
λfish | 1.00 | 1.00 | ||||
λpeppermint | 1.54 (0.25) | 1.44 (0.20) | ||||
Structural model | ||||||
Constant | 2.65 (0.13) | |||||
Gender (vs. men) | ||||||
Women | 0.32 (0.07) | |||||
Age (vs. 57–64 years) | ||||||
65–74 years | −0.47 (0.09) | |||||
75–85 years | −1.10 (0.13) | |||||
Var(αi) | 1.27 (0.10) | 1.33 (0.28) | 1.09 (0.22) | |||
Log-likelihood | −6,282.5 | −6,268.0 | −6,168.6 |
Note: aIncludes 65 respondents for whom entire smell module is missing (61 refusals, 1 equipment problem, 1 interview break-off, and 2 due to interviewer error) plus those who refused each specific item.
Estimates for Model 1 and two versions of Model 2 (see Multiple Measurements) are also presented in Table 1, obtained using all respondents for whom data from at least one item were available. Model 1 reflects the same ordering in item difficulty observed in the percent correct and estimates the variance of the αi to be 1.27 on the logit scale, indicating that a 1 SD increase in individual ability roughly triples the odds of correctly identifying a given odor. Model 2A permits the effect of a change in individual ability to vary across items; a likelihood ratio test of this model against Model 1 yields a p value of <.001, indicating that the items do differ in their ability to discriminate among individuals. Estimates of the discrimination parameters indicate that peppermint provided the best discrimination, whereas rose and leather provided the worst.
Model 2B extends 2A by incorporating a structural model for the αi containing the covariates gender and age group. The estimated odds ratio for women (vs. men) is e0.32 = 1.38, with an approximate 95% confidence interval of 1.20–1.58. The effects of age appear roughly linear, with a 67% decrease in the odds of identifying an odor correctly from the youngest (57–64 years) to the oldest (75–85 years) age group (odds ratio 0.33 with a 95% confidence interval of 0.26–0.43). These results are consistent with previous studies of gender and age differences in olfactory function assessed by odor identification (Hummel et al., 2007).
GUSTATION
Assessment of gustatory function was performed on all respondents using a series of taste-impregnated strips of filter paper (Mueller et al., 2003). Four strips were presented in the same order to each respondent: The first tasted sour, the second bitter, the third sweet, and the fourth salty. Before tasting each strip, respondents were asked to take a sip of water; they were then instructed to put the strip on their tongue and to describe the taste using one of the following descriptors: “salty,” “sweet,” “bitter,” or “sour.” In addition, they were asked to rate how certain they were that they had identified the taste correctly using a visual analog scale ranging from 0 (labeled very uncertain) to 10 (labeled very certain). To facilitate use of the scale, respondents were asked to record their answers directly on the laptop; those who were uncomfortable doing so were provided with a paper version of the items. As with the olfactory module, the majority of respondents who participated in the assessment (58%) chose the paper version.
Nonresponse was higher for this module than for the olfactory module. One hundred and thirty-seven respondents (5%) declined to participate, and equipment problems prevented administering the module in an additional 58 cases (2%). Although respondents who recorded their answers on the laptop were not given the option “don’t know,” they were permitted to indicate that they had tried and were unable to perform the task, at which point no further strips were administered. Fifty-nine respondents (2%) reported being unable to rate one of the four strips. In addition, between 5 and 21 respondents who recorded their answers on the computer refused to answer each identification item, and between 9 and 20 respondents who recorded their answers on paper wrote in “don’t know.” Finally, between 66 and 161 respondents (2%–5%) who recorded their answers on paper left each identification item blank. Although respondents who were truly unable to identify a particular taste are likely represented in each of these categories, only those recorded as “tried, unable to do” or who wrote in “don’t know” are counted as legitimate (incorrect) responses in the analysis presented here; all others are excluded.
Results for the identification items are presented in Table 2. The least-recognized taste was sour (39% correct), whereas the most recognized taste was sweet (86% correct). Because the four tastes were presented in the same order to all respondents, it is not possible to distinguish between item-specific differences and a possible learning effect, though the fact that only 67% identified the final taste (salty) correctly suggests that a learning effect cannot account for all the differences observed. Item nonresponse was highest for the bitter strip, reflecting a larger number of blank, “don’t know,” and refused responses.
Table 2.
Item-Response Models Fit to Taste Identification Data (SEs)
n = 2,765 |
||||||
Taste | Percent correcta | Item nonresponseb | Parameter | Model 1 | Model 2A | Model 2B |
Item difficulty | ||||||
Sour | 39.3 | 10.0 | θsour | −0.56 (0.05) | −0.64 (0.07) | −1.85 (0.20) |
Bitter | 69.5 | 14.3 | θbitter | 1.05 (0.06) | 0.97 (0.06) | 0.00 |
Sweet | 86.3 | 11.9 | θsweet | 2.28 (0.07) | 2.12 (0.09) | 1.23 (0.10) |
Salty | 67.2 | 11.6 | θsalty | 0.90 (0.05) | 0.94 (0.07) | −0.24 (0.14) |
Item discrimination | ||||||
λsour | 1.67 (0.27) | 1.24 (0.19) | ||||
λbitter | 1.00 | 1.00 | ||||
λsweet | 0.98 (0.14) | 0.93 (0.13) | ||||
λsalty | 1.38 (0.21) | 1.14 (0.18) | ||||
Structural model | ||||||
Constant | 0.80 (0.07) | |||||
Gender (vs. men) | ||||||
Women | 0.60 (0.09) | |||||
Age (vs. 57–64 years) | ||||||
65–74 years | −0.08 (0.07) | |||||
75–85 years | −0.22 (0.08) | |||||
Var(αi) | 1.48 (0.12) | 0.95 (0.19) | 1.18 (0.23) | |||
Log-likelihood | −5,919.3 | −5,912.6 | −5,865.7 |
Notes: aAmong those providing a response; responses “don’t know” and “tried, unable to do” are counted as “incorrect.”
Includes 227 respondents for whom entire taste module is missing (137 refusals, 58 due to equipment problems, 1 interview break-off, and 31 paper-and-pencil response sheets lost in the field) plus those who refused (computer-assisted personal interview) or failed to mark (paper-and-pencil) each specific item.
Estimates for Model 1 mirror the item ordering observed in the percent correct, whereas those for Model 2A suggest that the sour and salty tastes load more heavily on the single dimension of ability being picked up here (a likelihood ratio test comparing Model 2A with the more restrictive Model 1 yields a p value of .004). Under 2A, the variance of the αi is estimated to be 0.95, indicating that a 1 SD increase in ability increases the odds of correctly identifying a taste with a loading of one by a factor of 2.65. Model 2B adds a structural model for the αi including gender and age group as covariates. The ability of women to identify tastes correctly is estimated to be 0.60 logits higher than that of men, whereas age is associated with a more modest decline of only 0.22 logits by the oldest age group relative to the youngest.
For comparison, Table 3 shows the results from logistic regression models fit separately to each of the taste identification items including both gender and age group as covariates. The estimated gender effects are similar in magnitude to the estimate from the structural model; however, age-related declines are observed for only the tastes bitter and salty. This indicates that the unidimensional model is inadequate for the purpose of describing the effects of age on the ability to identify these four tastes.
Table 3.
Logistic Models Fit to Data on Distance Vision, Hearing, and Touch (SEs)a
Covariate |
||||
Item | Percenta | Female (vs. male) | Age 65–74 years (vs. 57–64 years) | Age 75–85 years (vs. 57–64 years) |
Distance vision (3 m)b,c | ||||
Unable to do | 1.0 | |||
20/200 (COHZV) | 0.2 | |||
20/160 (SZNDC) | 0.5 | |||
20/125 (VKCNR) | 0.5 | |||
20/100 (KCRHN) | 1.7 | −0.46 (0.33) | −0.76 (0.56) | −1.87 (0.45) |
20/80 (ZKDVC) | 2.4 | −0.55 (0.22) | −0.50 (0.40) | −1.80 (0.37) |
20/63 (HVORK) | 4.9 | −0.38 (0.20) | −0.62 (0.33) | −1.56 (0.29) |
20/50 (RHSON) | 8.9 | −0.14 (0.17) | −0.50 (0.24) | −1.46 (0.21) |
20/40 (KSVRH) | 18.0 | −0.29 (0.14) | −0.51 (0.18) | −1.36 (0.17) |
20/32 (HNKCD) | 19.6 | −0.24 (0.14) | −0.72 (0.15) | −1.61 (0.15) |
20/25 (NDVKO) | 25.1 | −0.23 (0.13) | −0.68 (0.13) | −1.54 (0.19) |
20/20 (DHOSZ) | 10.0 | −0.52 (0.18) | −0.99 (0.18) | −1.73 (0.29) |
20/16 (VRNDO) | 6.4 | −0.60 (0.24) | −0.88 (0.22) | −2.08 (0.47) |
20/12.5 (CZHKS) | 0.9 | |||
20/10 (ORZSK) | 0.1 | |||
Proportional odds model | −0.31 (0.11) | −0.74 (0.11) | −1.59 (0.12) | |
Self-rated hearingb | ||||
Poor | 3.6 | |||
Fair | 16.8 | 0.81 (0.21) | −0.16 (0.27) | −0.95 (0.27) |
Good | 32.6 | 0.65 (0.11) | −0.29 (0.14) | −1.01 (0.15) |
Very good | 29.3 | 0.57 (0.07) | −0.22 (0.10) | −0.74 (0.09) |
Excellent | 17.7 | 0.45 (0.13) | −0.29 (0.13) | −0.75 (0.16) |
Proportional odds model | 0.57 (0.08) | −0.25 (0.09) | −0.84 (0.09) | |
Taste identificationd | ||||
Sour | 39.9 | 0.46 (0.10) | 0.04 (0.10) | 0.14 (0.12) |
Bitter | 69.8 | 0.82 (0.08) | 0.02 (0.10) | −0.24 (0.11) |
Sweet | 85.6 | 0.67 (0.13) | 0.09 (0.15) | 0.05 (0.13) |
Salty | 67.0 | 0.56 (0.10) | −0.14 (0.13) | −0.34 (0.17) |
2-point discriminatione | ||||
12 mm | 83.5 | −0.30 (0.15) | −0.33 (0.15) | −0.77 (0.16) |
1 point only | 86.7 | 0.27 (0.15) | 0.26 (0.22) | 0.04 (0.27) |
8 mm | 81.8 | < −0.01 (0.13) | −0.03 (0.17) | −0.39 (0.23) |
4 mm | 40.9 | 0.12 (0.11) | −0.27 (0.18) | −0.57 (0.17) |
Notes: aEstimates weighted to account for differential probabilities of selection and differential nonresponse. Design-based standard errors obtained using the linearization method.
Unconstrained model (i.e., nonparallel regressions) in which the change in odds associated with a change in the value of the covariate(s) is permitted to vary across the different cutpoints; estimates represent the change in the log odds of being in or above the corresponding category.
Categories at each end have been combined due to the small number of observations.
Separate logistic regression models fit to the probability of a correct response; responses “don’t know” and “tried, unable to do” are counted as “incorrect.”
Separate logistic regression models fit to the probability of a correct response; responses “didn’t feel any points” and “tried, unable to do” are counted as “incorrect.”
VISION
Distant visual acuity was assessed in both eyes together at 3 m using a chart with Sloan optotypes manufactured by Precision Vision (catalog number 2104). Respondents who normally wear glasses or contact lenses for driving or distance vision were instructed to wear them during the test. Interviewers followed a detailed protocol to ensure consistent distance from the chart (using a premeasured string laid out on the floor), line of sight (respondent seated with interviewer holding chart at respondent’s eye level), and lighting (sufficient light for reading with low glare or strong backlighting). Respondents were asked to begin by reading the smallest discernible line and, depending on the outcome, were then successively directed up or down one line at a time until the smallest line that could be read accurately was determined.
Half of the 3,005 respondents were randomized to receive the vision assessment (1,506). Of these, 64 (4%) refused to participate, 1 broke off the interview at an earlier point, and in four cases, a problem with the equipment prevented conducting the test. Results for the remaining 1,437 respondents are shown in Table 3. Twenty-three respondents (1%) were unable to read the largest line at 3 m, indicating vision worse than 20/200. Using standard guidelines, 62% of the study population are estimated to have good vision (better than 20/40), 27% are estimated to have moderately decreased vision (between 20/40 and 20/60), and 11% are estimated to have poor vision (worse than 20/60). Seventy-three percent of the respondents wore glasses or contact lenses during the test, and 20 respondents who normally wear glasses or contacts did not wear them (these individuals are included in the analyses presented here).
In modeling these data, at least two approaches are possible. First, one might attempt to model the mean directly, either on the standard scale (i.e., 20/20, 20/25, etc.) or on some appropriate transformation thereof (e.g., the logarithm of the minimum angle of resolution, which linearizes the geometric sequence of the chart). A second approach is to model the cumulative probabilities of being at or above a given ability using ordinal regression (McCullagh & Nelder, 1989). The advantage of the latter is that it does not require advanced knowledge of the biometrics of visual acuity, and more importantly, it yields conclusions that are immediately interpretable in clinically relevant terms (e.g., the effect of a covariate on the log odds of having vision equal to or better than a given threshold).
Table 3 shows estimates from a logistic model with the covariates gender (female vs. male) and age group (65–74 and 75–85 years, both relative to 57–64 years) in which the effects of the covariates on the odds of being at or above a given level of ability are allowed to differ at each level (this model may be fit using the gologit2 package for Stata [Williams, 2006]). The estimates for each covariate are roughly similar across the different cutpoints; likelihood ratio tests of equality across the cutpoints yield p values of .576, .398, and .210 for gender, age 65–74 years, and age 75–85 years, respectively. Estimates for the proportional odds model (in which the effects of the covariates are assumed equal across the cutpoints) are also provided and indicate that the odds of being at or above a given level of ability are estimated to be (e−0.31 − 1) × 100 = 26.7% lower for women than for men. The effect of age appears roughly linear, with those aged 75–85 years having 79.6% lower odds of being above a given cutpoint than those aged 57–64 years. A likelihood ratio test of the interaction between gender and age (2 df) yields a p value of .847. Of course, although the proportional odds model does a good job of summarizing the effects of gender and age on visual acuity, there is no guarantee that this model will be appropriate for other covariates.
Note that because respondents were randomly selected for visual acuity assessment, analyses like those just described using only those individuals who were assessed will yield unbiased results. However, in situations where working with half of the sample is not adequate, one could use self-rated vision (asked of all respondents) together with other covariates to impute values of visual acuity for those who were not administered this module (Rubin, 1987).
HEARING
Hearing was the one sense that NSHAP did not measure objectively. This was not due to a lack of interest; rather, portable audiometers—the standard method for assessment in the field—were too costly, and technical problems precluded the possible integration of an audiometric application directly into the computer-assisted personal interview instrument prior to going into the field. Thus, the primary measure of hearing was the question “Is your hearing excellent, very good, good, fair, or poor?” This question was asked of all respondents; respondents who use a hearing aid were asked to describe their hearing while using it. Respondents were also asked the question “Do you feel you have a hearing loss?” to which 44% answered yes; this single question has been shown to have reasonable sensitivity and specificity for hearing impairment (Sindhusake et al., 2001). Finally, as noted above, the interviewer’s rating of the respondent’s hearing is also available. Because the interviewer had engaged in a 90-min or longer face-to-face conversation with the respondent, one might expect his or her rating to be fairly sensitive to deficits severe enough to affect conversation.
Results from the self-report question are shown in Table 3. Only 18% of respondents rated their hearing as excellent, whereas roughly one third each rated their hearing as either very good or good. Twenty percent rated their hearing as either fair or poor. Estimates for the unconstrained ordered logit model are similar to those for the proportional odds model, with likelihood ratio tests comparing the two yielding p values of .867, .993, and .331 for the gender, age 65–74 years, and age 75–85 years coefficients, respectively. Thus, we estimate that the likelihood of being above a given cutpoint is (e0.57 − 1) × 100 = 76.8% higher for women than for men and decreases with age at an increasing rate, declining by (e−0.84 − 1) × 100 = 56.8% in the 75- to 85-year-old age group.
TOUCH
Tactile function was assessed via 2-point discrimination—a standard and reliable method for measuring the finger’s sensation to touch (Dellon & Keller, 1997; Dellon, Mackinnon, & Crosby, 1987; Finnell, Knopp, Johnson, Holland, & Schubert, 2004). These tests were performed using a multisided handheld discriminator with graded intraprong distances developed specifically for NSHAP by a metallurgical engineer. Although more sensitive and accurate devices are available (e.g., Mayfield & Sugarman, 2000), their cost and the time required to administer them preclude using them in a large, multipurpose study conducted by survey interviewers in the home (the desire to obtain data comparable to existing studies was also a factor). Respondents were first asked to close their eyes, after which the interviewer touched the tip of the index finger of their dominant hand lightly with two small metal points located a fixed distance apart. Respondents were then asked whether they had felt one or two points; responses such as “three points” or “I feel something but I’m not sure how many points” were recorded by the interviewer as one point. Four tests were performed in succession: the first at 12 mm apart, the second consisting of only a single point, the third at 8 mm, and the fourth at 4 mm.
In order to reduce the average length of the interview, the assessment of tactile function was administered to the same random half sample who received the vision assessment. Of those 1,506 respondents, 28 (2%) declined to participate, an equipment problem was reported by the interviewer in three cases, and 1 respondent broke off the interview at an earlier point, leaving 1,474 respondents who completed at least part of the module. As with the taste identification module, after each stimulus, respondents were permitted to indicate that despite trying, they were unable to perform the task, at which point no further stimuli were administered. However, only 39 respondents (3% of those participating) indicated that they were unable to complete one of the tests; these responses—together with the response “I didn’t feel any points” (1%–4%)—are treated as incorrect in the analyses presented here.
Table 4 shows the percent correct and item nonresponse for each of the four stimuli. As expected, respondents found it substantially more difficult to distinguish between points that were 4 mm apart (only 41% correctly identified these as two distinct points). Interestingly, however, the same percentage of respondents (79%) were able to distinguish between points 8 mm apart as were able to distinguish between points 12 mm apart, suggesting that there is a plateau in the response function over this interval. Only slightly more (86%) correctly identified the single point.
Table 4.
Item-Response Models Fit to 2-Point Discrimination Data (SEs)
n = 1,474 |
||||||
Distance between points | Percent correcta | Item nonresponseb | Parameter | Model 1 | Model 2A | Model 2B |
Item difficulty | ||||||
12 mm | 79.4 | 2.1 | θ12 mm | 1.76 (0.09) | 2.26 (0.25) | 0.00 |
1 point only | 86.0 | 2.5 | θ1 point | 2.34 (0.10) | 2.25 (0.13) | 1.05 (0.13) |
8 mm | 79.4 | 2.5 | θ8 mm | 1.75 (0.09) | 2.44 (0.34) | 0.05 (0.27) |
4 mm | 40.6 | 2.9 | θ4 mm | −0.52 (0.07) | −0.41 (0.06) | −1.01 (0.14) |
Item discrimination | ||||||
λ12 mm | 1.00 | 1.00 | ||||
λ1 point | 0.56 (0.10) | 0.49 (0.10) | ||||
λ8 mm | 1.14 (0.38) | 0.92 (0.29) | ||||
λ4 mm | 0.27 (0.07) | 0.25 (0.07) | ||||
Structural model | ||||||
Constant | 2.94 (0.45) | |||||
Gender (vs. men) | ||||||
Women | −0.16 (0.18) | |||||
Age (vs. 57–64 years) | ||||||
65–74 years | −0.33 (0.23) | |||||
75–85 years | −1.04 (0.30) | |||||
Var(αi) | 1.72 (0.18) | 4.53 (1.53) | 5.50 (2.00) | |||
Log-likelihood | −2,945.9 | −2,904.6 | −2,892.2 |
Notes: aResponses “didn’t feel any points” and “tried, unable to do” are counted as incorrect.
Includes 32 respondents for whom entire taste module is missing (28 refusals, 3 due to equipment problems, and 1 interview break-off).
The same item-response models used above were also fit to the 2-point discrimination data. Model 2A shows that the 4 mm and single-point tasks load substantially less on the individual factor being captured by the model. Regressing that factor on gender and age group shows no difference in sensory ability between men and women but a decrease with age, especially among the oldest group. For comparison, Table 3 shows results from individual logistic regressions fit to each of the four items. The decline with age is most evident for the 4 and 12 mm items and is not evident at all for the single-point item. Although women were slightly less likely to discriminate between the 12 mm points (this could, e.g., reflect a tendency on the part of the interviewers to touch the discriminator initially less heavily against some women’s fingers), none of the other items exhibited a gender difference.
For expository purposes, we have modeled the single-point item in the same manner as the other three. However, the primary function of the single-point item was to prevent respondents from simply guessing that all the stimuli involved two points. Given this, together with the fact that identifying a single point clearly represents a different task from discriminating between two points, one might argue that this item should instead be modeled differently. This is underscored by the fact that the likelihood of answering the single-point item correctly was not related to age.
CONCLUSIONS
The National Social Life, Health, and Aging Project is the first U.S. national study of older adults that has attempted to obtain a comprehensive assessment of sensory function. Consistent with previous clinical and population-based studies, the data show age-related declines in functioning across each of the five senses. Researchers may now use this data set to examine whether certain subgroups exhibit greater declines than others. In addition, researchers may now—for the first time—begin to explore among older adults the relationships between sensory function and both the level of social participation and the quality of intimate relationships. Future waves of NSHAP will offer the ability to study changes in sensory function over time, providing an opportunity to explore causal hypotheses involving sensory function and social interaction. Use of the self-report measures in conjunction with the objective assessments should also prove informative here because one’s perception of one’s abilities may serve to mediate the effects of actual changes in function on social interaction.
To the best of our knowledge, NSHAP is the first survey study to attempt objective measurements of olfactory, gustatory, or tactile function. Although item nonresponse for the gustation component ranged from 10% to 14% (the slightly higher rate was perhaps to be expected given the more invasive nature of putting an object in one’s mouth), it was less than 5% for olfaction and touch. Results for olfaction and taste identification are similar to those obtained from more in-depth studies using the same methodologies in more controlled settings, indicating that these protocols can be administered successfully by field interviewers with older adults in the home. The analyses presented here also show that the resulting data may be analyzed with standard item-response models. The 2-point discrimination data may prove to be an exception here because there is some indication that the three graded distances, when taken together, do not measure a single underlying dimension. More work is needed to determine whether and how these items should be used together.
The analysis of the multi-item measures (i.e., olfaction, gustation, touch) presented here is intended merely to illustrate how the items may be combined for the purpose of investigating the relationship between sensory function and other variables. However, scoring each item as either correct or incorrect as we have done here ignores the possibility that additional information may be recovered by distinguishing between the various incorrect alternatives (e.g., the difference between salty and bitter may not be as large as the difference between salty and sweet). Polytomous choice models similar to those described here may be used to address this issue. Similarly, researchers interested in hearing may wish to analyze the two self-report measures together with the interviewer rating in order to investigate possible biases in the self-reports and to achieve a more efficient analysis.
Finally, we note that a detailed analysis of a particular sensory function will likely require explicit consideration of several specific physiological factors known to affect that function. Because NSHAP included a broad assessment of health, many of these factors have been measured and are therefore available for analysis. For example, a history of either nasal surgery or head injury is relevant to olfactory function and was included in the assessment of physical health. Similarly, the presence of several comorbid conditions (e.g., high blood pressure, diabetes) and a complete log of current medications were also obtained (see corresponding article in this volume for more details). These greatly enhance the value of the sensory data.
AUTHOR CONTRIBUTIONS
M.M., S.W., S.L., J.L., T.H., and S.T.L. all participated in designing the sensory function protocols used in the study. L.P.S. performed the data analysis and drafted the manuscript. M.M., S.W., S.L., J.L., T.H., and S.T.L. participated in revising the manuscript for important intellectual content.
Acknowledgments
The authors would like to thank Dr. D. Friedman for help in designing NSHAP’s vision module and P. Rathouz for directing us to the paper by Whittemore (1989) regarding the use of empirical Bayes predictions. The authors would also like to thank R. Williams for designing and making the discriminators used in this study.
References
- Anstey KJ, Wood J, Lord S, Walker JG. Cognitive, sensory and physical factors enabling driving safety in older adults. Clinical Psychology Review. 2005;25:45–65. doi: 10.1016/j.cpr.2004.07.008. [DOI] [PubMed] [Google Scholar]
- Birnbaum A. Some latent trait models and their use in inferring an examinee’s ability. In: Lord FM, Novick MR, editors. Statistical theories of mental test scores. Reading, MA: Addison-Wesley; 1968. pp. 396–479. [Google Scholar]
- Bollen KA. Structural equations with latent variables. New York: Wiley; 1989. [Google Scholar]
- Bramerson A, Johansson L, Ek L, Nordin S, Bende M. Prevalence of olfactory dysfunction: The Skovde Population-Based Study. Laryngoscope. 2004;114:733–737. doi: 10.1097/00005537-200404000-00026. [DOI] [PubMed] [Google Scholar]
- Cox DR, Snell EJ. Analysis of binary data. 2nd ed. London: Chapman & Hall; 1989. [Google Scholar]
- Dellon AL, Keller KM. Computer-assisted quantitative sensorimotor testing in patients with carpal and cubital tunnel syndromes. Annals of Plastic Surgery. 1997;38:493–502. doi: 10.1097/00000637-199705000-00009. [DOI] [PubMed] [Google Scholar]
- Dellon AL, Mackinnon SE, Crosby PM. Reliability of two-point discrimination measurements. Journal of Hand Surgery. 1987;12:693–696. doi: 10.1016/s0363-5023(87)80049-7. [DOI] [PubMed] [Google Scholar]
- Desrosiers J, Hebert R, Bravo G, Dutil E. Hand sensibility of healthy older people. Journal of the American Geriatrics Society. 1996;44:974–978. doi: 10.1111/j.1532-5415.1996.tb01871.x. [DOI] [PubMed] [Google Scholar]
- Efron B, Tibshirani R. An introduction to the bootstrap. Vol. 57. New York: Chapman & Hall; 1993. [Google Scholar]
- Finnell JT, Knopp R, Johnson P, Holland PC, Schubert W. A calibrated paper clip is a reliable measure of two-point discrimination. Academic Emergency Medicine. 2004;11:710–714. [PubMed] [Google Scholar]
- Fukunaga A, Uematsu H, Sugimoto K. Influences of aging on taste perception and oral somatic sensation. Journal of Gerontology: Biological Sciences and Medical Sciences. 2005;60:109–113. doi: 10.1093/gerona/60.1.109. [DOI] [PubMed] [Google Scholar]
- Globe DR, Wu J, Azen SP, Varma R. The impact of visual impairment on self-reported visual functioning in Latinos: The Los Angeles Latino Eye Study. Ophthalmology. 2004;111:1141–1149. doi: 10.1016/j.ophtha.2004.02.003. [DOI] [PubMed] [Google Scholar]
- Hummel T, Kobal G, Gudziol H, Mackay-Sim A. Normative data for the “Sniffin’ Sticks” including tests of odor identification, odor discrimination, and olfactory thresholds: An upgrade based on a group of more than 3,000 subjects. European Archives of Oto-Rhino-Laryngology. 2007;264:237–243. doi: 10.1007/s00405-006-0173-0. [DOI] [PubMed] [Google Scholar]
- Hummel T, Sekinger B, Wolf SR, Pauli E, Kobal G. Sniffin’ Sticks: Olfactory performance assessed by the combined testing of odor identification, odor discrimination and olfactory threshold. Chemical Senses. 1997;22:39–52. doi: 10.1093/chemse/22.1.39. [DOI] [PubMed] [Google Scholar]
- Joreskog KG, Goldberger AS. Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association. 1975;70:631–639. [Google Scholar]
- Landis BN, Hummel T. New evidence for high occurrence of olfactory dysfunctions within the population [Letter to the editor] American Journal of Medicine. 2006;119:91–92. doi: 10.1016/j.amjmed.2005.07.039. [DOI] [PubMed] [Google Scholar]
- Landis BN, Hummel T, Hugentobler M, Giger R, Lacroix JS. Ratings of overall olfactory function. Chemical Senses. 2003;28:691–694. doi: 10.1093/chemse/bjg061. [DOI] [PubMed] [Google Scholar]
- Landis BN, Konnerth CG, Hummel T. A study on the frequency of olfactory dysfunction. Laryngoscope. 2004;114:1764–1769. doi: 10.1097/00005537-200410000-00017. [DOI] [PubMed] [Google Scholar]
- Li Y, Healy EW, Wanzer Drane J, Zhang J. Comorbidity between and risk factors for severe hearing and memory impairment in older Americans. Preventive Medicine. 2006;43:416–421. doi: 10.1016/j.ypmed.2006.06.014. [DOI] [PubMed] [Google Scholar]
- Lindau ST, Laumann EO, Levinson W, Waite LJ. Synthesis of scientific disciplines in pursuit of health: The Interactive Biopsychosocial Model. Perspectives in Biology and Medicine. 2003;46(3 Suppl.):S74–S86. doi: 10.1353/pbm.2003.0055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lindau ST, McDade TW. Minimally invasive and innovative methods for biomeasure collection in population-based research. In: Weinstein M, Vaupel JW, Wachter KW, editors. Biosocial surveys (chap. 13) Washington, DC: The National Academies Press; 2007. [Google Scholar]
- Mayfield JA, Sugarman JR. The use of the Semmes-Weinstein monofilament and other threshold tests for preventing foot ulceration and amputation in persons with diabetes. Journal of Family Practice. 2000;49(11 Suppl.):S17–S29. [PubMed] [Google Scholar]
- McCullagh P, Nelder JA. Generalized linear models. 2nd ed. London: Chapman & Hall; 1989. [Google Scholar]
- Mueller C, Kallert S, Renner B, Stiassny K, Temmel AFP, Hummel T, Kobal G. Quantitative assessment of gustatory function in a clinical context using impregnated “taste strips”. Rhinology. 2003;41:2–6. [PubMed] [Google Scholar]
- Mueller C, Renner B. A new procedure for the short screening of olfactory function using five items from the “Sniffin’ Sticks” identification test kit. American Journal of Rhinology. 2006;20:113–116. [PubMed] [Google Scholar]
- Murphy C, Schubert CR, Cruickshanks KJ, Klein BEK, Klein R, Nondahl DM. Prevalence of olfactory impairment in older adults. Journal of the American Medical Association. 2002;288:2307–2312. doi: 10.1001/jama.288.18.2307. [DOI] [PubMed] [Google Scholar]
- Ostbye T, Krause KM, Norton MC, Tschanz J, Sanders L, Hayden K, Pieper C, Welsh-Bohmer KA. Ten dimensions of health and their relationships with overall self-reported health and survival in a predominately religiously active elderly population: The Cache County memory study. Journal of the American Geriatrics Society. 2006;54:199–209. doi: 10.1111/j.1532-5415.2005.00583.x. [DOI] [PubMed] [Google Scholar]
- Ranganathan VK, Siemionow V, Sahgal V, Yue GH. Effects of aging on hand function. Journal of the American Geriatrics Society. 2001;49:1478–1484. doi: 10.1046/j.1532-5415.2001.4911240.x. [DOI] [PubMed] [Google Scholar]
- Rasch G. Probabilistic models for some intelligence and attainment tests. Copenhagen, Demark: Nielson and Lydiche; 1960. [Google Scholar]
- Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987. [Google Scholar]
- Seiberling KA, Conley DB. Aging and olfactory and taste function. Otolaryngologic Clinics of North America. 2004;37:1209–1228. doi: 10.1016/j.otc.2004.06.006. [DOI] [PubMed] [Google Scholar]
- Sindhusake D, Mitchell P, Smith W, Golding M, Newall P, Hartley D, Rubin G. Validation of self-reported hearing loss. The blue mountains hearing study. International Journal of Epidemiology. 2001;30:1371–1378. doi: 10.1093/ije/30.6.1371. [DOI] [PubMed] [Google Scholar]
- Skrondal A, Rabe-Hesketh S. Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. Boca Raton, FL: Chapman & Hall/CRC; 2004. [Google Scholar]
- StataCorp. Stata statistical software: Release 10. College Station, TX: StataCorp LP; 2007. [Google Scholar]
- Vitale S, Cotch MF, Sperduto RD. Prevalence of visual impairment in the United States. Journal of the American Medical Association. 2006;295:2158–2163. doi: 10.1001/jama.295.18.2158. [DOI] [PubMed] [Google Scholar]
- Whittemore AS. Errors-in-variables regression using stein estimates. American Statistician. 1989;43:226–228. [Google Scholar]
- Wickremaratchi MM, Llewelyn JG. Effects of ageing on touch. Postgraduate Medical Journal. 2006;82:301–304. doi: 10.1136/pgmj.2005.039651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams R. Generalized ordered logit/partial proportional odds models for ordinal dependent variables. Stata Journal. 2006;6:58–82. [Google Scholar]
- Zheng X, Rabe-Hesketh S. Estimating parameters of dichotomous and ordinal item response models with gllamm. Stata Journal. 2007;7:313–333. [Google Scholar]