Abstract
Objectives.
The aim of this research was to test the invariance of the cognitive variables in the Health and Retirement Study/Asset Health Dynamics Among the Oldest Old studies (HRS/AHEAD) across ethnicity, gender, and time.
Method.
Analyses were conducted using a selected subsample of the HRS/AHEAD data set. The cognitive performance tests measuring episodic memory and mental status were used, and invariance of a two-factor structure was tested using confirmatory factor analyses and multilevel modeling for longitudinal data.
Results.
Results provided some support for “strict” factorial invariance of the episodic memory and mental status measures across ethnicity and gender. Further support of weak (“metric”) measurement invariance was found across time.
Discussion.
Results of the research further our understanding of invariance of the HRS/AHEAD cognitive ability measures. Further implications are discussed.
Key Words: Cognitive aging, Group differences, Latent variable modeling, Measurement invariance, Structural equation modeling.
Use of scales of measurement requires that they measure the same thing in different people or in the same person assessed at different times or in different circumstances. For example, it makes no sense to say that Whites have stronger cognitive skills than Blacks (e.g., Jones, 2003; Moody-Ayers, Mehta, Lindquist, Sands, & Covinsky, 2005) if the cognition scale does not measure the same attribute in Black and White people. Measurement invariance is thus a fundamental requirement in both applied and scientific use of measurement instruments. Yet the question “Does this measurement instrument provide invariant measurements?” has been rarely asked in behavioral science research, and the measurement invariance hypothesis has been even more rarely tested in research. Most commonly, it has just been assumed that invariance holds. Effort has been expended to write items that are unambiguous and clear, and item analyses have been carried out to select the best items; from then on, it has been assumed that the items will have the same connotations and meanings for all people and therefore the scale is invariant in comparisons of people of different ages and classifications (as in Horn & McArdle, 1992). But if this assumption of invariance is incorrect, then conclusions based on results obtained with the measurements are likely to be incorrect as well. Only rarely has the assumption been stated as an hypothesis and tested, with increasing attention being paid to this issue more recently (e.g., Bowden, Saklofske, & Weiss, 2011; Hertzog & Carter, 1982; Savage-McGlynn, 2012).
Important as it is, invariance has been more often neglected in behavioral science research than it has been evaluated. Increasingly, it is becoming known that invariance may not hold, that it cannot simply be assumed, and that it is a hypothesis that can be tested. Recent published textbooks and journal articles on measurement and data analysis, in contrast to such older books and articles, contain methods for evaluating invariance (e.g., Bontempo, Grouzet, & Hofer, 2011; compare Mulaik, 1972 with McDonald, 1999; Mungas, Widaman, Reed, & Tomaszewski, 2011), and advances in computer technology make invariance analyses something than can readily be done.
At least four kinds of invariance have been identified: configural, weak, strong, and strict (Meredith, 1993; Widaman & Reise, 1997). The first two forms deal with observed covariances only. Configural invariance requires only that the same indicators load on the same factors across groups. Weak (or metric) invariance requires the factor loadings to be the same across groups. The last two forms deal with the observed means and covariances. Strong (or scalar) invariance requires the factor loadings and manifest variable intercepts to be invariant. Finally, strict factorial invariance requires that all the factor loadings, manifest variable intercepts, and manifest variable residuals be the same across groups (Meredith, 1993; Meredith & Teresi, 2006; Widaman & Reise, 1997). Strict factorial invariance has been argued to be indicative of “true” measurement invariance (Wicherts & Dolan, 2010).However, evidence of strong measurement invariance is all that is required to ensure meaningful comparisons in latent means across groups (Widaman & Reise, 1997). If only weak invariance holds, then meaningful comparisons across groups can be made of the variances and covariances among latent variables but not of the latent means or observed means, covariances, and variances (Bontempo & Hofer, 2007; Gregorich, 2006; McArdle, 2010; McArdle, Smith, & Willis, 2011; Widaman & Reise, 1997). If full invariance does not hold, then partial invariance can be used to determine whether higher forms of invariance might hold for some of the components in the measurement scale, if not all the components (Byrne, Shavelson, & Muthén, 1989).
The question of measurement invariance should be considered in all research in which analyses are directed at showing that measured attributes (states and traits), and the relationships among such attributes are different for different classifications of people or for the same people measured under different circumstances (times and places). The question thus should be considered in almost all behavioral science research. The focus of the present research was on invariance in tests of human cognitive abilities in older adults.
Ideally, invariance should hold for each and every entity (individual) measured with a measurement device, and hypotheses of invariance can be tested for different classifications of people. Three major classifications for which it is possible that invariance in tests of cognitive abilities may not hold are ethnicity, gender, and time. Measures of cognition are combinations (usually linear) of responses to different stimuli (items) and the stimuli can mean different things to different people and to the same person under different circumstances. The items of cognitive ability tests may be interpreted differently by White compared with Black test takers, or men as compared with women, or the item meaning may change over time; tests composed of such items will not measure the same ability in the same way in these different samples.
In most use of cognitive tests at older ages, invariance across ethnicity, gender, and time has often been assumed (usually implicitly)—without test and sometimes without statement that an assumption is being made (e.g., Cagney & Lauderdale, 2002; Moody-Ayers et al., 2005; Zsembik & Peek, 2001). Using data from the 1993 Asset Health Dynamics Among the Oldest Old (AHEAD) study, results by Moody-Ayers et al. (2005) suggested differences in cognitive functioning across Black and White groups, as well as differences in the effect of cognitive functioning on functional decline. However, invariance was not tested; based on these results alone, it is unclear why there are differences in cognitive functioning or why there are differential effects on outcome variables across racial groups. In a study by Sloan and Wang (2005), the researchers examined ethnic/racial differences in performance on cognitive tasks using participants from four waves (1993, 1995, 1998, 2000) of the AHEAD and the Health and Retirement Study (HRS). Results suggested Black and White differences in initial levels, as well as change over time; although Blacks performed more poorly than Whites they had a lower decline over time than Whites. Not addressed in the study was invariance for the cognitive tasks. If invariance does not hold, the interpretation of level and slope differences would be altered. Furthermore, the obtained results regarding growth and change over time may not be valid. If at least strong invariance does not hold across time, then growth models are not interpretable (Widaman, Ferrer, & Conger, 2010). Thus, further research on the cognitive variables themselves would prove useful in our understanding of any observed group differences or change over time.
One study that has addressed the question of measurement invariance in cognition in older samples was conducted by Jones (2003), in which multiple-group structural equation modeling was used to examine differential item functioning across Black and White groups using data from the 1993 AHEAD and the 1996 HRS projects. Results of the research suggested a lack of measurement invariance for intelligence as a single factor. Recent research (McArdle, Fisher, & Kadlec, 2007), however, has suggested that more than a single common factor is needed to account for the intercorrelations among the various cognitive ability variables of the HRS/AHEAD data. Specifically, the immediate and delayed free recall tasks measure an episodic memory factor, whereas a mental status factor composed of the serial 7s, counting backward from 20, naming, and dates tasks. For the immediate and delayed recall tasks, respondents are asked to recall a list of nouns read by an interviewer immediately and after a 5-min delay (Ofstedal et al., 2005). The list of words was randomly assigned to participants within a time point of data collection and within participants across time. The serial 7s task requires respondents to subtract 7 from 100 across 5 trials. On the counting backward task, individuals count backward from 20 for 10 continuous numbers. For the names task, respondents state the U.S. president and vice president by last name and name two objects (scissors and cactus). Finally, for the dates task, respondents provide the current date (month, day, year and day of week). It is by no means clear that these items mean the same thing in different people measured at different times. Although the question of invariance of these factors across age has been asked and answered in the affirmative (McArdle et al. 2007), the question of whether these factors are invariant across ethnicity, gender, and time has not been asked.
The issue of cognitive testing among older individuals is an important topic, particularly because individuals in the United States are continuing to live longer and longer. Furthermore, the U.S. ethnic demography is currently changing, where Blacks and Hispanics will outnumber Whites in the near future. This means that Black and Hispanic elderly participants will ultimately outnumber White elderly participants. Increased understanding of cognition across ethnicity, gender, and time in older populations is therefore especially relevant currently. The HRS/AHEAD studies are used extensively in research on cognitive aging, and a fundamental question remains: are the cognitive ability tests in the HRS/AHEAD studies measuring the same abilities in people of different ethnicities, gender, and across time? The aim of this study was to address this question.
Designs for tests of invariance can be understood in terms of the two contrasts. First, there is a contrast between modeling one common factor (described in McDonald, 1999) and modeling multiple common factors (pedagogically described in Horn et al., 1983). Second is a contrast between multiple groups composed of separate samples of different subjects and multiple groups composed of repeated-measures samples of the same subjects. Thus, analyses can be conducted to test the invariance of a one-factor model across separate samples, a multiple-factor model across separate samples, a one-factor model across the same sample, or a multiple-factor model across the same sample.
A principal difference between the multiple-factor model and the single-factor model is that in the multiple-factor model, in notable contrast to the single-factor model, each common factor is estimated in the context of the other common factors. This means that the estimated factor loadings are partial coefficients representing the extent to which an item relates to a common factor independently of the extent to which that item relates to other common factors. In the single-factor model, the extent to which a common factor relates to other common factors is not considered. Invariance can hold under one of these conditions when it does not hold under the other condition.
As noted earlier, the cognitive tests in the HRS data have been found to measure two common factors, episodic memory (marked by two tests) and mental status (marked by four tests). It may be that invariance holds when we examine the factors jointly, but not when the factors are examined separately, or vice versa. The present research will add to the extant body of literature by adducing evidence on whether or not the two-factor structure and the mental status factor in the HRS data are invariant across ethnicity, gender, and time. Because the episodic memory factor contained only two tests, we were unable to assess the invariance of that factor separately.
The HRS data set is an excellent source for addressing the issue of invariance in cognitive abilities among older individuals. Briefly, the HRS and AHEAD studies began in 1992 and 1993, respectively, and in 1998 were combined into one study that attempts to be nationally representative of Americans more than 50 years of age. The studies use a panel design in which the same respondents are interviewed every 2 years, and new respondents are added to the sample every 6 years to replenish the sample to adjust for aging and attrition (see Heeringa & Connor, 1996; Leacock, 2006; http://hrsonline.isr.umich.edu). The sample size exceeds 17,000 families (30,000 individuals), and the sample is ethnically diverse, with the largest ethnic groups in the sample being Blacks and Whites. The power for examining ethnic, gender, and time differences using this sample is large. Thus, the HRS/AHEAD studies provide an excellent source of data for examining invariance in the measurement of cognitive ability in the elderly U.S. population.
Method
Participants in HRS/AHEAD
The HRS is a nationally representative longitudinal study sponsored by the National Institute on Aging and conducted by the University of Michigan. Others can access this publicly available data set (http://hrsonline.isr.umich.edu/). The baseline sample comprised community-dwelling adults in the contiguous United States who were 51–61 years old in 1992. Blacks, Hispanics, and Florida residents were oversampled (for details, see Heeringa & Connor, 1996). In 1993 and 1995, the AHEAD study was conducted among a national sample of adults aged 70 or older. In 1998, the HRS and AHEAD studies merged, both assuming the name HRS, and two new cohorts were added. More information of the HRS sample and collection is provided in McArdle et al. (2007). The present analyses used data from Waves 1 (1992) through 10 (2010).
Participants in Current Study
Analyses were conducted using the imputed cognitive data set created by Fisher, Hassan, Rodgers, and Weir (2012). From this initial data set, we selected participants in a manner similar to McArdle et al. (2007), with the difference that only those who self-reported as Black (n = 3061), White (n = 13308), or Hispanic (n = 1544) were selected. Specifically, we eliminated any person who (a) was not a primary respondent, (b) had no data on gender or age, (c) had a sampling weight of zero, or (d) was younger than 50 years; cases in which there were two persons per family, we randomly picked only one member of the them; statistical information of the resulting subset of data is described in Table 1 for the total sample. The demographic variables presented in this table include (a) chronological age at baseline testing, (b) years of formal education, (c) gender, (d) birth year (cohort), (e) number of waves of participation, (f) whether the participant was one member of a couple, and (g) whether the individual was still a participant in the most recent 2010 testing.
Table 1.
Age in years | Years of education | Female (%) | Birth cohort | No. waves tested | Living as a couple (%) | Tested in 2010 (%) | |
---|---|---|---|---|---|---|---|
M | 62.96 | 11.85 | 57 | 1931 | 5.51 | 57 | 44 |
SD | 11.25 | 3.46 | 0.49 | 12.62 | 3.04 | 0.50 | 0.50 |
Minimum | 50.00 | 0.00 | 0 = male | 1890 | 1.00 | 0 = living alone | 0 = not tested in 2010 |
Maximum | 103.00 | 17.00 | 1 = female | 1953 | 10.00 | 1 = living as a couple | 1 = tested in 2010 |
Note. SD = standard deviation.
Measures
The cognitive tests measuring episodic memory and mental status were used, and these are outlined by McArdle et al. (2007), with more detailed descriptions in Brandt, Spencer and Folstein (1988), Ofstedal, Fisher, and Herzog (2005), and Fisher et al. (2012). Specifically, immediate and delayed free recall served as indicators of episodic memory, and serial 7s, counting backward from 20, naming the U.S. president and vice president by last name, naming two objects (scissors and cactus), and providing the date (month, day, year, and day of week) served as indicators of mental status. To provide comparability across all scales, and to simplify measurement for further statistical analysis, we scaled each variable into a percent correct score (i.e., based on division by the maximum score and multiplication by 100).
Data Analyses
Analyses were conducted using SPSS version 20 and Mplus 6.0 (Muthén & Muthén, 1998–2010). Descriptive analyses included examination of means, variances, skewness, and kurtosis. Substantive analyses included (a) confirmatory factor analyses conducted to examine the fit of the two-factor model and the mental status factor to each ethnic and gender group and to test the measurement invariance of the two-factor structure and the mental status factor across ethnicity and gender and (b) multilevel models to evaluate longitudinal invariance of the two-factor structure and the mental status factor across time.
Confirmatory Factor Analyses
Single Group Analyses.
We fit the two-factor model and the mental status factor separately to each ethnic and gender subgroup. For the two-factor model, the first factor was marked by two continuous variables (immediate recall and delayed recall), whereas the second factor was marked by four categorical variables (serial 7s, backward counting, dates, and names). The analyses of the mental status factor focused only on the tests in Factor 2. Weighted least squares with mean and variance adjusted estimation was used to account for the skewed categorical variables. Delta parameterization was employed so that scale factors could be modeled for the categorical variables. Factors were identified by fixing the loadings for immediate recall and serial 7s at unity.
Multiple-Group Analyses.
These analyses were conducted separately for ethnicity and gender and for the two-factor model and the mental status factor. In invariance analyses of continuous data, parameters that can be estimated include factor loadings, manifest variable intercepts, and manifest variable residuals. For categorical variables, parameters include thresholds rather than manifest variable intercepts, and scale factors are used in the estimation of residuals (Sass, 2011). To test for configural invariance, the manifest variable intercepts/thresholds, residual variances, and factor loadings were freely estimated for all groups; factor means were fixed at zero; the scaling variable was fixed at unity for identification of the categorical variables (Muthén & Muthén, 1998–2012). To test for weak invariance, loadings were constrained to be equal across groups for all indicators, with all other aspects of the model specified as described for the configural model. For strong invariance, manifest variable intercepts/thresholds were constrained to be equal across groups; factor means were fixed at zero in the first group and freely estimated in the remaining groups; and for the categorical indicators, the scaling variable was fixed at unity in the first group and freely estimated in the remaining groups. Finally, for strict invariance, manifest variable residuals were constrained to be equal across groups and the scaling variable was fixed at unity for all groups for the continuous indicators.
Goodness-of-fit indices were used to make decisions about the accuracy of the models. More specifically, overall χ2 is routinely presented, and we also rely on the root mean square error of approximation (RMSEA; Browne & Cudeck, 1993; Steiger & Lind, 1980) and Comparative Fit Index (CFI; Bentler, 1990) for the assessment of good fit. As a thumb rule, RMSEA values smaller than 0.10 (Browne & Cudeck, 1993) and CFI values greater than 0.95 (Hu & Bentler, 1999) were considered favorable although CFI values greater than 0.90 are tenable (Bentler, 1990) and is still a widely used cutoff (Van Lieshout, Cleverley, Jenkins, & Georgiades, 2011). Delta CFI (Cheung & Rensvold, 2002), which has been found to be more powerful in detecting lack of invariance in large sample sizes (Meade, Johnson, & Braddy, 2008), was used to interpret which step of measurement was most tenable for the data. Delta CFI was computed between the two most proximal models (e.g., configural vs. weak; weak vs. strong). The suggested cutoff value of 0.002 was used for interpretation, with values greater than 0.002 indicating poorer fit.
Multilevel Models.
Invariance across time can be modeled using a multilevel framework (as in McArdle et al., 2007) or using cross-time invariance constraints placed within a single-level model (Horn & Meredith, 2001; Widaman, Ferrer, & Conger, 2010). The single-level cross-time model can be likened to a multiple-group approach in which invariance constraints are placed across groups; the difference is that constraints are placed across time. We attempted to model the data using the single-level framework, fitted to the cross-time data. However, results were inadmissible (i.e., estimated matrices that were not positive definite) regardless of whether all the time points were included in the analyses or a subset of time points were modeled. Specifically, estimated correlations between factors across time were greater than 1.0. Although the correlation of a factor with itself across occasions is an indicator of invariance (Meredith & Horn, 2001), the inadmissible results precluded further analyses using the single-level approach. Therefore, longitudinal factorial invariance across time (year of assessment) was conducted using a multilevel modeling approach by fitting models to the between- and within-covariance matrices among the cognitive measures (McArdle et al., 2007). Under this approach, time is nested within individuals; time is at Level 1 and the individual is Level 2. This approach is similar to modeling students within classrooms (Muthén, 1991). Under a multilevel approach, the parameters that can be estimated include factor loadings and manifest variable residuals at the within and between levels and manifest intercepts at the between-group level (Kim, Kwok, & Yoon, 2012). Different tests of invariance can be conducted by freeing or restricting these parameters. To test for configural invariance, the factor loadings, manifest variable residuals, and the manifest intercepts are all free to vary at the relevant levels. For weak invariance, the factor loadings are restricted. Although strong invariance cannot be modeled because manifest intercepts at the within level cannot be estimated, a form of strict invariance can be tested by restricting the residuals variances to be the same for the within and between matrices. Hence, three models were fit to the data: a configural model, a weak invariance model, and a model with loadings and residual variances the same for the within and between matrices (referred to here loosely as strict invariance).
Because of the size of these analyses, models in which category thresholds were estimated failed to converge. Therefore, we did not estimate category thresholds for the categorical variables, as was done in the confirmatory factor analyses. That is, all variables were treated as continuous in the analyses testing invariance across time, and maximum likelihood estimation with robust standard errors was used (Muthén & Muthén, 1998–2012).
Results
Descriptive Analyses
Summary statistics on all cognitive data are presented in Table 2 for the total sample, by ethnicity and gender. Correlations among the cognitive variables are in Table 3. This information is based on the six cognitive variables at the first time of testing, or the first time they had all six tests. Results demonstrate that the Black and Hispanic participants were younger and had fewer years of education in comparison with the White participants. On average, the Hispanic participants were the youngest and also had the fewest years of education. Female participants were older on average, and had fewer years of education.
Table 2.
Group | IR | DR | S7 | BC | DA | NA | Age | Educ | |
---|---|---|---|---|---|---|---|---|---|
Total sample | M | 53.46 | 42.53 | 69.22 | 96.40 | 93.93 | 88.94 | 65.40 | 11.85 |
SD | 18.48 | 21.29 | 32.65 | 17.82 | 14.09 | 18.10 | 11.05 | 3.46 | |
Skewness | −0.01 | 0.08 | −0.70 | −4.97 | −2.92 | −1.65 | .51 | −0.82 | |
Kurtosis | −0.26 | −0.27 | −0.85 | 23.35 | 10.66 | 2.25 | −0.82 | 1.00 | |
White-not Hispanic | M | 55.27a | 44.72a | 75.17a | 97.82a | 94.73a | 92.32a | 65.98a | 12.49a |
SD | 18.20 | 21.09 | 29.54 | 13.92 | 12.84 | 15.00 | 11.16 | 2.89 | |
Skewness | −0.07 | −0.01 | −0.98 | −6.53 | −3.05 | −2.10 | 0.43 | −0.53 | |
Kurtosis | −0.19 | −0.20 | −0.25 | 41.80 | 12.01 | 4.49 | −0.92 | 0.82 | |
Black-not Hispanic | M | 48.27b | 35.35b | 49.59b | 91.16b | 91.58b | 78.63b | 64.23b | 10.79b |
SD | 18.48 | 21.00 | 34.76 | 27.23 | 16.94 | 23.52 | 10.70 | 3.66 | |
Skewness | 0.11 | 0.36 | 0.11 | −2.90 | −2.47 | −0.82 | 0.69 | −0.61 | |
Kurtosis | −0.34 | −0.19 | −1.34 | 6.65 | 7.10 | −0.27 | −0.49 | 0.25 | |
Hispanic | M | 48.07b | 37.77c | 55.48c | 94.21c | 91.38b | 79.23b | 62.61c | 8.43c |
SD | 17.48 | 19.56 | 35.19 | 22.30 | 17.36 | 19.71 | 10.16 | 4.73 | |
Skewness | 0.32 | 0.31 | −0.11 | −3.78 | −2.51 | −0.57 | 0.86 | −0.08 | |
Kurtosis | 0.07 | 0.07 | −1.42 | 12.72 | 7.21 | −0.40 | −0.14 | −1.01 | |
Men | M | 51.78a | 40.72a | 74.05a | 96.78a | 93.53a | 91.14a | 63.71a | 12.14a |
SD | 17.94 | 20.12 | 31.01 | 17.04 | 14.47 | 16.68 | 10.53 | 3.62 | |
Skewness | 0.06 | 0.12 | −0.95 | −5.30 | −2.82 | −2.02 | 0.67 | −0.86 | |
Kurtosis | −0.19 | −0.16 | −0.35 | 26.62 | 9.93 | 3.92 | −0.61 | 0.80 | |
Women | M | 54.71b | 43.89b | 65.69b | 96.12b | 94.21b | 87.35b | 66.66b | 11.64b |
SD | 18.72 | 22.02 | 33.37 | 18.37 | 13.80 | 18.91 | 11.26 | 3.31 | |
Skewness | −0.08 | 0.02 | −0.54 | −4.76 | −3.00 | −1.44 | 0.38 | −0.83 | |
Kurtosis | −0.27 | −0.35 | −1.08 | 21.36 | 11.25 | 1.47 | −0.92 | 1.24 |
Notes. This subsample was selected so there would be only one person per family, and respondent-level sampling weights were used to adjust to a U.S. national norm. IR = immediate recall; DR = delayed recall; S7 = serial 7s; BC = backward counting; DA = dates; NA = names; Educ = years of education; SD = standard deviation. Different subscripts in each column indicate significant differences at p < .05, within gender and ethnicity.
Table 3.
Group | Variable | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
Total sample | 1. Immediate recall | — | |||||
2. Delayed recall | 0.80** | — | |||||
3. Serial 7s | 0.36** | 0.35** | — | ||||
4. Backward counting | 0.21** | 0.20** | 0.22** | — | |||
5. Dates | 0.26** | 0.27** | 0.25** | 0.23** | — | ||
6. Names | 0.35** | 0.35** | 0.37** | 0.24** | 0.29** | — | |
White-not Hispanic | 1. Immediate recall | — | |||||
2. Delayed recall | 0.79** | — | |||||
3. Serial 7s | 0.32** | 0.31** | — | ||||
4. Backward counting | 0.15** | 0.15** | 0.16** | — | |||
5. Dates | 0.23** | 0.24** | 0.21** | 0.18** | — | ||
6. Names | 0.30** | 0.30** | 0.28** | 0.15** | 0.25** | — | |
Black-not Hispanic | 1. Immediate recall | — | |||||
2. Delayed recall | 0.78** | — | |||||
3. Serial 7s | 0.38** | 0.36** | — | ||||
4. Backward counting | 0.30** | 0.26** | 0.24** | — | |||
5. Dates | 0.29** | 0.31** | 0.26** | 0.27** | — | ||
6. Names | 0.41** | 0.36** | 0.36** | 0.30** | 0.34** | — | |
Hispanic | 1. Immediate recall | — | |||||
2. Delayed recall | 0.79** | — | |||||
3. Serial 7s | 0.32** | 0.31** | — | ||||
4. Backward counting | 0.22** | 0.20** | 0.23** | — | |||
5. Dates | 0.27** | 0.28** | 0.24** | 0.27** | — | ||
6. Names | 0.34** | 0.32** | 0.35** | 0.22** | 0.28** | — | |
Men | 1. Immediate recall | — | |||||
2. Delayed recall | 0.79** | — | |||||
3. Serial 7s | 0.35** | 0.36** | — | ||||
4. Backward counting | 0.18** | 0.18** | 0.22** | — | |||
5. Dates | 0.23** | 0.24** | 0.26** | 0.24** | — | ||
6. Names | 0.35** | 0.35** | 0.35** | 0.23** | 0.29** | — | |
Women | 1. Immediate recall | — | |||||
2. Delayed recall | 0.80** | — | |||||
3. Serial 7s | 0.38** | 0.37** | — | ||||
4. Backward counting | 0.23** | 0.21** | 0.23** | — | |||
5. Dates | 0.28** | 0.29** | 0.24** | 0.22** | — | ||
6. Names | 0.37** | 0.36** | 0.37** | 0.25** | 0.30** | — |
**p < .01.
Results also show that immediate Recall had averages near 50% for each ethnic group and gender, but, and not surprisingly, the Delayed Recall scale was somewhat harder across all groups. The Serial 7s test appeared easier for the White participants in contrast to the Black and Hispanic participants, and the other three subscales, Backward Counting, Dates, and Names, had nearly 80%–90% correct response rate for each ethnic group. In general, the pattern of results for the White sample is similar to the total sample results reported by McArdle et al. (2007). However, the present results also show that means for the Black and Hispanic groups were significantly lower than that of White group for all of the cognitive tests. Finally, female participants scored significantly higher on the tests of memory and the Dates test compared with the male participants. Nevertheless, these mean differences are meaningless without evidence that we are measuring the same thing across the different groups.
Confirmatory Factor Analyses
Single Group Analyses.
First, we tested whether the two-factor structure identified by McArdle et al. (2007) fits each ethnic and gender subsample. The model fit all of the subgroups well, with RMSEAs of 0.000–0.019 and CFIs of 0.998–1.000 (see Table 4). When examined separately, the mental status factor also fit each subgroup well (see Table 4).
Table 4.
Chi-square | df | RMSEA | 95% CI | CFI | |
---|---|---|---|---|---|
Group | |||||
Two-factor model | |||||
White | 32.276 | 8 | 0.016 | 0.011; 0.022 | 0.998 |
Black | 7.050 | 8 | 0.000 | 0.000; 0.020 | 1.000 |
Hispanic | 11.862 | 8 | 0.018 | 0.000; 0.038 | 0.998 |
Men | 29.111 | 8 | 0.019 | 0.012; 0.026 | 0.998 |
Women | 37.888 | 8 | 0.019 | 0.013; 0.025 | 0.998 |
Mental status factor | |||||
White | 12.393 | 2 | 0.020 | 0.010; 0.032 | 0.995 |
Black | 0.568 | 2 | 0.000 | 0.000; 0.025 | 1.000 |
Hispanic | 5.587 | 2 | 0.035 | 0.000; 0.072 | 0.992 |
Men | 6.204 | 2 | 0.017 | 0.002; 0.033 | 0.997 |
Women | 12.641 | 2 | 0.023 | 0.012; 0.036 | 0.997 |
Notes. RMSEA = root mean square error of approximation; CI = confidence interval; CFI = Comparative Fit Index.
Multiple-Group Analyses.
Invariance Across Ethnicity. Four models were tested separately for the two-factor model and for the mental status factor—configural, weak, strong, and strict invariance models. Given the large sample size for this study, the fit indices together with theoretical considerations were used to determine model fit (Widaman & Reise, 1997). Results of the model fits are shown in Table 5. For the two-factor model, delta CFI between the configural and weak models was at the prescribed level, indicating that invariance of factor loadings across ethnic groups is tenable. However, there was a significant CFI difference between the weak and strong models, as well as the strong and strict models. These results suggest greater lack of fit when constraining the intercepts and manifest variable residuals to be invariant across ethnic groups (Meade et al., 2008). Nevertheless, the practical fit indices indicate good fit of the strong and strict invariance models to the data. Standardized parameter estimates for the configural and strict invariance model are presented in Table 6.
Table 5.
Chi-square | df | RMSEA | 95% CI | CFI | ∆CFI | |
---|---|---|---|---|---|---|
Two-factor model | ||||||
Configural | 47.809 | 24 | 0.013 | 0.007; 0.018 | 0.999 | — |
Weak | 85.732 | 32 | 0.017 | 0.013; 0.021 | 0.997 | 0.002 |
Strong | 305.296 | 54 | 0.028 | 0.025; 0.031 | 0.984 | 0.013 |
Strict | 432.554 | 66 | 0.031 | 0.028; 0.033 | 0.977 | 0.007 |
Mental status factor | ||||||
Configural | 18.682 | 6 | 0.019 | 0.010; 0.030 | 0.996 | — |
Weak | 22.397 | 12 | 0.012 | 0.003; 0.020 | 0.997 | −0.001 |
Strong | 230.640 | 32 | 0.033 | 0.029; 0.037 | 0.945 | 0.052 |
Strict | 307.965 | 40 | 0.034 | 0.031; 0.038 | 0.926 | 0.019 |
Notes. RMSEA = root mean square error of approximation; CI = confidence interval; CFI = Comparative Fit Index.
Table 6.
Variable | White | Black | Hispanic | ||||||
---|---|---|---|---|---|---|---|---|---|
λ | SE | Ψ | λ | SE | Ψ | λ | SE | Ψ | |
Configural invariance | |||||||||
Memory factor | |||||||||
Immediate recall | 0.89 | 0.01 | 0.22 | 0.90 | 0.01 | 0.19 | 0.88 | 0.02 | 0.22 |
Delayed recall | 0.88 | 0.01 | 0.23 | 0.85 | 0.01 | 0.27 | 0.89 | 0.02 | 0.22 |
Mental status factora | |||||||||
Serial 7s | 0.57 | 0.01 | 0.67 | 0.60 | 0.01 | 0.64 | 0.56 | 0.03 | 0.69 |
Backward counting | 0.58 | 0.03 | 0.66 | 0.70 | 0.02 | 0.51 | 0.77 | 0.05 | 0.41 |
Dates | 0.51 | 0.02 | 0.74 | 0.58 | 0.02 | 0.66 | 0.57 | 0.04 | 0.67 |
Names | 0.64 | 0.01 | 0.60 | 0.70 | 0.02 | 0.51 | 0.63 | 0.03 | 0.61 |
Strict invariance | |||||||||
Memory factor | |||||||||
Immediate recall | 0.88 | 0.01 | 0.23 | 0.87 | 0.01 | 0.24 | 0.86 | 0.01 | 0.26 |
Delayed recall | 0.90 | 0.01 | 0.20 | 0.89 | 0.01 | 0.20 | 0.88 | 0.01 | 0.22 |
Mental status factora | |||||||||
Serial 7s | 0.59 | 0.01 | 0.65 | 0.66 | 0.01 | 0.57 | 0.63 | 0.02 | 0.60 |
Backward counting | 0.59 | 0.02 | 0.66 | 0.65 | 0.02 | 0.58 | 0.63 | 0.03 | 0.61 |
Dates | 0.47 | 0.01 | 0.78 | 0.52 | 0.02 | 0.73 | 0.50 | 0.02 | 0.75 |
Names | 0.64 | 0.01 | 0.59 | 0.71 | 0.02 | 0.49 | 0.67 | 0.02 | 0.53 |
Note. SE = standard error.
aUniqueness parameters estimated from thresholds.
Because loss in fit was noted in moving from the weak to the strong invariance model based on the delta CFI test, we conducted follow-up tests of partial invariance (Byrne et al., 1989) to determine if some tests met strong invariance. Specifically, we imposed equality constraints on the manifest intercept for each individual test to examine change in CFI when that intercept was forced to be invariant across groups. Tests that did not lead to a significant change in CFI compared with the weak invariance model were subsequently constrained. Tests that led to significant change in fit remained free to vary. We followed this approach until each individual test was tested. Change in CFI was significant for all tests. However, strong invariance was most plausible for the backward count and dates tests, with CFIs of 0.992 and 0.994, respectively, and least plausible for the serial 7s test, with a CFI of 0.88.
Results for the mental status factor were consistent with those for the two-factor model (see Table 5). The weak invariance model was most tenable based on the delta CFI test, but strict invariance was plausible based on the practical fit indices. Follow-up tests of partial invariance indicated strong invariance was more plausible for the backward count and dates tests, with CFIs of 0.988 and 0.989, respectively, but least plausible for serial 7s and names, with CFIs of 0.537 and 0.792, respectively.
Invariance Across Gender
Configural, weak, strong, and strict invariance models were fitted to the two-factor model and the mental status factor. For the two-factor model, although strong invariance may be most reasonable based on the delta CFI test (as can be seen in Table 7), practical fit indices indicate good fit of the highest form of invariance, strict invariance, to the data. Standardized parameter estimates for the configural and strict invariance models are presented in Table 8.
Table 7.
Chi-square | df | RMSEA | 95% CI | CFI | ∆CFI | |
---|---|---|---|---|---|---|
Two-factor model | ||||||
Configural | 52.095 | 16 | 0.016 | 0.011; 0.021 | 0.998 | — |
Weak | 80.781 | 20 | 0.018 | 0.014; 0.023 | 0.997 | 0.001 |
Strong | 132.410 | 31 | 0.019 | 0.016; 0.023 | 0.996 | 0.001 |
Strict | 373.403 | 37 | 0.032 | 0.029; 0.035 | 0.985 | 0.011 |
Mental status factor | ||||||
Configural | 19.202 | 4 | 0.021 | 0.012; 0.031 | 0.997 | — |
Weak | 15.726 | 7 | 0.012 | 0.004; 0.020 | 0.997 | −0.001 |
Strong | 103.836 | 17 | 0.025 | 0.020; 0.029 | 0.981 | 0.016 |
Strict | 284.157 | 21 | 0.038 | 0.034; 0.042 | 0.944 | 0.037 |
Notes. RMSEA = root mean square error of approximation; CI = confidence interval; CFI = Comparative Fit Index.
Table 8.
Variable | Male participants | Female participants | ||||
---|---|---|---|---|---|---|
λ | SE | Ψ | λ | SE | Ψ | |
Configural invariance | ||||||
Memory factor | ||||||
Immediate recall | 0.89 | 0.01 | 0.22 | 0.89 | 0.01 | 0.21 |
Delayed recall | 0.88 | 0.01 | 0.22 | 0.88 | 0.01 | 0.19 |
Mental status factora | ||||||
Serial 7s | 0.62 | 0.01 | 0.61 | 0.62 | 0.01 | 0.62 |
Backward counting | 0.62 | 0.03 | 0.61 | 0.68 | 0.03 | 0.54 |
Dates | 0.49 | 0.02 | 0.76 | 0.56 | 0.02 | 0.68 |
Names | 0.72 | 0.02 | 0.49 | 0.68 | 0.01 | 0.54 |
Strict invariance | ||||||
Memory factor | ||||||
Immediate recall | 0.87 | 0.01 | 0.24 | 0.89 | 0.01 | 0.21 |
Delayed recall | 0.88 | 0.01 | 0.20 | 0.90 | 0.01 | 0.19 |
Mental status factora | ||||||
Serial 7s | 0.63 | 0.01 | 0.61 | 0.64 | 0.01 | 0.60 |
Backward counting | 0.54 | 0.02 | 0.59 | 0.65 | 0.02 | 0.57 |
Dates | 0.51 | 0.01 | 0.74 | 0.52 | 0.01 | 0.73 |
Names | 0.69 | 0.01 | 0.53 | 0.70 | 0.01 | 0.51 |
Note. SE = standard error.
aUniqueness parameters estimated from thresholds.
Because loss in fit was noted in moving from the strong to the strict invariance model based on the delta CFI test, we conducted follow-up tests of partial invariance (Byrne et al., 1989) to determine if some tests met a test of strict invariance. We imposed equality constraints on the manifest residuals for each individual test to examine change in CFI when the item residuals were forced to be invariant across groups. Tests that did not lead to a significant change in CFI compared with the strong invariance model were subsequently constrained. Tests that lead to significant change in fit remained free to vary. We followed this approach until each individual component was tested. Results indicated partial strict invariance for the backward count and the names tests, with CFIs of 0.995 and 0.994, respectively. The lowest obtained CFI was 0.987 (dates test).
When the mental status factor was considered separately, results indicated weak invariance based on the delta CFI test although practical indices (RMSEA) again supported strict invariance (see Table 7). Partial strong invariance held for the backward count and the names tests, with CFIs of 0.998 and 0.997, respectively. The serial 7s test had the lowest CFI (0.899).
Multilevel Models
Invariance Across Time.
Results are shown in Table 9. For the two-factor model, configural invariance fit better than weak or strict invariance. However, the practical fit indices indicate that weak invariance across time is tenable for these data, but the additional requirements of strict invariance are not reasonable. These results are consistent for the mental status factor (see Table 9).
Table 9.
Chi-square | df | RMSEA | CFI | ∆CFI | |
---|---|---|---|---|---|
Two-factor model | |||||
Configural | 33.435 | 16 | 0.014 | 0.996 | — |
Weak | 1767.653 | 20 | 0.030 | 0.979 | 0.008 |
Strict | 14288.298 | 26 | 0.074 | 0.829 | 0.150 |
Mental status factor | |||||
Configural | 68.309 | 4 | 0.014 | 0.995 | — |
Weak | 1125.128 | 7 | 0.043 | 0.914 | 0.081 |
Strict | 6477.053 | 11 | 0.083 | 0.501 | 0.413 |
Notes. RMSEA = root mean square error of approximation; CI = confidence interval; CFI = Comparative Fit Index.
For the two-factor model, tests of partial weak invariance held for the delayed recall and backward count tests, with CFIs of 0.996 and 0.995, respectively. CFIs for all tests were greater than 0.90. For the mental status factor, partial weak invariance did not hold for any of the tests based on delta CFI; however, CFIs were greater than 0.90 for all tests.
Discussion
Measurement invariance provides an indication of the extent to which we can say that we are measuring the same thing across different groups. With recent advances particularly in computer technology, hypotheses of invariance can readily be tested. The issue of invariance is especially relevant in cognitive aging research, given analyses of group differences that are often done across gender and ethnicity and, even more pertinently, research on changes in cognition over time. The purpose of the present research was thus to test the invariance of major cognitive ability factors across ethnicity, gender, and time, using data from the HRS/AHEAD studies, which is ideal given extensive use of these data in aging research.
Results of the research indicated that the weak forms of invariance across ethnicity may be more plausible than the stronger forms of invariance based on the more stringent delta CFI tests. However, based on more practical fit indices, a strict form of invariance across ethnicity is reasonable for the two-factor model and the mental status factor. Hence, when conducting analyses across ethnicity, researchers can examine the two factors jointly, or consider the mental status factor separately, without loss of invariance information. These results also indicate that comparisons across ethnicity can be made of the variances and covariances among the latent variables (Bontempo & Hofer, 2007; Gregorich, 2006; McArdle, 2010; McArdle et al., 2011; Widaman & Reise, 1997), whereas any such comparisons of the latent means or observed means, covariances, and variances should be done cautiously. Mean differences that were observed in the present research indicate that White participants score higher on all of the cognitive tests in comparison with Black and White participants, consistent with findings in previous investigations. The present results indicate that these mean differences are for variables that are measured in the same way across groups.
With regard to gender invariance, results differed for the two-factor model versus the mental status factor model. Specifically, for the two-factor model, strong invariance held based on all measures of fit, whereas strict invariance was tenable based on the practical indices. In the face of this evidence, the conclusions one can draw regarding mean differences in the manifest variables across gender for these data are more robust than those one can draw regarding mean differences across ethnicity. However, when we examined the mental status factor separately, only weak invariance was found to hold across gender, suggesting that when considering gender differences, it may be best to model both factors jointly rather than separately to achieve a higher form of invariance for tests of subsequent hypotheses. Finally, when invariance across time was tested using a multilevel approach, results supported only configural invariance. At best, only weak invariance across time was found, indicating that comparisons are meaningful only for factor variances and covariances, but not latent means or observed variances, covariances, and means (Gregorich, 2006; Widaman & Reise, 1997). However, using a single-level cross-time approach, estimated correlations between factors across time exceeded unity, suggesting some consistency in measurement across time (Meredith & Horn, 2001). These results appear contradictory, and further research is needed to tease apart how to best model the HRS data for longitudinal analyses. Such research could come in the form of simulation studies for which the level of invariance in the population is known.
It has been argued that perhaps strong invariance is all that is necessary for meaningful comparisons in latent means across groups (Widaman & Reise, 1997). Moreover, if full invariance, that is, invariance for all components across groups, does not hold, then tests of partial invariance can be employed to build a set of measures for which invariance does hold (Byrne et al., 1989). Therefore, in addition to testing for full invariance in this study, we also tested partial invariance, that is, invariance of some components but not others. Results consistently revealed the backward count test as meeting the requirements for weak invariance (time invariance) or strong invariance (ethnicity and gender invariance) based on the delta CFI test. Thus, of the six cognitive ability measures examined in this study, the finding of weak invariance for the backward count test across time indicates that individuals tended to interpret this test in the same way across time. The finding of strong invariance across ethnicity and gender indicates that meaningful comparisons can be made across latent mean structures across ethnicity and gender for this test.
There are several reasons why higher levels of full invariance may have failed to hold in this study. Foremost, it may be that there are indeed true differences between the groups or across time. A second reason that invariance may fail to hold is misspecification of the model (Meredith, 1964). As there are only two indicators for the episodic memory factor and four indicators for the mental status factor, these may not be enough to fully define each factor. As illustrated in the analyses of gender invariance, a certain level of invariance may hold when some items are included in an analysis but not when other items are included. The current set of cognitive ability tests in the HRS data set may be an inherent weakness, and it may be advisable in future assessments to include additional tests to assess the full breadth of the memory and mental status factors. Current work is being done to address this issue (e.g., Cognition and Aging in the USA Study, McArdle, PI).
Finally, there is the issue of practical versus statistical significance. This issue is one that often arises in psychology research. More specifically, an analysis of data can result in a finding that by statistical standards is significant but by practical standards is not. This was illustrated in the present analyses with the results of invariance across ethnicity and gender. Based on the delta CFI test, strict invariance is not statistically tenable. However, the remaining fitness indices indicate that strict invariance across these groupings is tenable. According to McDonald (1999), the focus should be on understanding the amount of the difference between groups, rather than the technical significance (p. 333). For the present analyses, the difference in CFI may not be large enough to discount strict invariance across gender and ethnicity.
However, there are several restrictions of this study that should be noted. First, only two aspects of cognition were examined, namely, memory and mental status, using a specific set of cognitive indicators. Evidence of invariance (or lack of invariance) says nothing about whether we are measuring the full range of cognition in the different groups. Future research should consider invariance across ethnicity, gender, and time in older samples for a broader range of cognitive ability variables. In particular, it could be the case that there are aspects of cognition in older ages for which strict invariance across time is indeed plausible. An additional restriction of this research is that we were unable to model category thresholds for the time invariance analyses. As such, the results of the analyses of invariance across time are not directly comparable with those testing invariance across ethnicity and gender. Additionally, because we used a multilevel approach to the analyses of invariance across time, tests of strong invariance were not conducted, which seemingly limits conclusions that can be drawn regarding time invariance of these data. However, that weak invariance did not hold across time may render this potential limitation a moot point; strong invariance may not have held had we been able to test the strong invariance model. Nevertheless, the results of this research indicate that further research is needed particularly to examine the use of the HRS cognitive data set across time.
Despite these limitations, the present research is a step toward not only furthering our understanding of invariance across time but in enhancing our understanding about ethnic and gender similarities and differences in cognition among older adults as well. Little attention has been paid to issues of measurement invariance in aging research. This paper thus has direct relevance to an increasingly aging U.S. population and can serve as an impetus for further research in this area.
Funding
National Institute on Aging MERIT awardto J. J. McArdle (AG- 007137-21).
Acknowledgments
We thank our colleague John L. Horn (University of Southern California) for the free use of his ideas. We also thank the New York University Faculty Resource Network for provision of research resources to the first author.
References
- Bentler P. M. (1990). Comparative fit indices in structural equation models. Psychological Bulletin, 107, 238–246. http://dx.doi.org/10.1037/ 0033-2909.107.2.238 [DOI] [PubMed] [Google Scholar]
- Bontempo D. E., Hofer S. M. (2007). Assessing factorial invariance in cross-sectional and longitudinal studies. In Ong A. D., van Dulmen M. (Eds.), Handbook of methods in positive psychology (pp. 153–175). New York, NY: Oxford University Press. [Google Scholar]
- Bontempo D. E., Grouzet F. M. E., Hofer S. M. (2011). Measurement issues in the analysis of within-person change. In Newsom J. T., Jones R. N., Hofer S. M. (Eds.), Longitudinal data analysis: A practical guide for researchers in aging, health, and social sciences. New York: Routledge. [Google Scholar]
- Bowden S. C., Saklofske D. H., Weiss L. G. (2011). Invariance of the measurement model underlying the Wechsler Adult Intelligence Scale-IV in the United States and Canada. Educational and Psychological Measurement, 71, 186–199. http://dx.doi.org/10.1177/0013164410387382 [Google Scholar]
- Brandt J., Spencer M., Folstein M. (1988). The telephone interview for cognitive status. Neuropsychiatry, Neuropsychology, and Behavioral Neurology, 1, 111–117. [Google Scholar]
- Browne M. A., Cudeck R. (1993). Alternative ways of assessing model fit. In Bollen K. A., Long J. S. (Eds.), Testing structural equation models, (Chapter 6, pp.136–162). Newbury Park, CA: Sage Publications. [Google Scholar]
- Byrne B. M., Shavelson R. J., Muthén B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105, 456–466. http://dx.doi.org/10.1037/0033-2909.105.3.456 [Google Scholar]
- Cagney K. A., Lauderdale D. S. (2002). Education, wealth, and cognitive function in later life. Journal of Gerontology: Psychological Sciences, 57B, P163–P172. http://dx.doi.org/10.1093/geronb/57.2.P163 [DOI] [PubMed] [Google Scholar]
- Cheung G. W., Rensvold R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233–255. http://dx.doi.org/10.1207/S15328007SEM0902_5 [Google Scholar]
- Fisher G. G., Hassan H., Rodgers W. L., Weir D. R. (2012). Health and retirement study imputation of cognitive functioning measures: 1992–2010 early release. Ann Arbor, MI: University of Michigan. [Google Scholar]
- Gregorich S. E. (2006). Do self-reported instruments allow meaningful comparisons across diverse population groups? Testing measurement invariance using the confirmatory factor analysis framework. Medical Care, 44, S78–S94. http://dx.doi.org/10.1097/01.mlr.0000245454.12228.8f [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heeringa S. G., Connor J. H. (1996). Technical description of the Health and Retirement Study sample design (Institute for Social Research Pub. DR-002). Ann Arbor: University of Michigan. [Google Scholar]
- Hertzog C., Carter L. (1982). Sex difference in the structure of intelligence: A confirmatory factor analysis. Intelligence, 6, 287–303. http://dx.doi.org/10.1016/0160-2896(82)90005–8 [Google Scholar]
- Horn J. L., McArdle J. J. (1992). A practical and theoretical guide to measurement invariance in aging research. Experimental Aging Research Special Issue: Quantitative Topics in Research on Aging, 18, 117–144. http://dx.doi.org/10.1080/03610739208253916 [DOI] [PubMed] [Google Scholar]
- Horn J. L., McArdle J. J., Mason R. (1983). When is invariance not invariant: A practical scientist’s look at the ethereal concept of factor invariance. Southern Psychologist, 1, 179–188. [Google Scholar]
- Hu L.T., Bentler P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55. http://dx.doi.org/10.1080/10705519909540118 [Google Scholar]
- Jones R. N. (2003). Racial bias in the assessment of cognitive functioning of older adults. Aging & Mental Health, 7, 83–102. http://dx.doi.org/10.1080/1360786031000045872 [DOI] [PubMed] [Google Scholar]
- Kim E. S., Kwok O., Yoon M. (2012). Testing factorial invariance in multilevel data: A Monte Carlo study. Structural Equation Modeling: A Multidisciplinary Journal, 19, 250–267. http://dx.doi.org/10.1080/10705511.2012.659623 [Google Scholar]
- Leacock C. (Ed.). (2006). Getting started with the Health and Retirement Study, Version 1.0. Ann Arbor: Survey Research Center, Institute of Social Research, University of Michigan. [Google Scholar]
- McArdle J.J. (2010). Contemporary Challenges of Longitudinal Measurement Using HRS Data. In Walford G., Tucker E., Viswanathan M. (Eds.), The SAGE Handbook of Measurement. (pp. 509–536). London: SAGE Press. [Google Scholar]
- McArdle J. J., Fisher G. G., Kadlec K. M. (2007). Latent variable analyses of age trends of cognition in the Health and Retirement Study, 1992–2004. Psychology & Aging, 22, 525–545. http://dx.doi.org/10.1037/0882-7974.22.3.525 [DOI] [PubMed] [Google Scholar]
- McArdle J.J., Smith J.P., Willis R. J. (2011). Cognition and economic outcomes in the Health and Retirement Study. In Wise D.A. (Ed.). Explorations in the Economics of Aging (pp. 209–236). Cambridge, MA: National Bureau of Economic Research. [Google Scholar]
- McDonald R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum Associates. [Google Scholar]
- Meade A. W., Johnson E. C., Braddy P. W. (2008). Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of Applied Psychology, 93, 568–592. 10.1037/0021-9010.93.3.568 [DOI] [PubMed] [Google Scholar]
- Meredith W. (1964). Rotation to achieve factorial invariance. Psychometrika, 29, 187–206. [Google Scholar]
- Meredith W. (1993). Measurement invariance, factor analysis, and factorial invariance. Psychometrika, 58, 525–543. http://dx.doi.org/10.1007/BF02294825 [Google Scholar]
- Meredith W., Horn J. (2001). The role of factorial invariance in modeling growth and change. In Collins L. M., Sayer A. G. (Eds.), New Methods for the Analysis of Change (Chapter 7, pp. 203–240). Washington, D.C.: American Psychological Association. [Google Scholar]
- Meredith W., Teresi J. A. (2006). An essay on measurement and factorial invariance. Medical Care, 44, S69–S77. http://dx.doi.org/10.1007/BF02294825 [DOI] [PubMed] [Google Scholar]
- Moody-Ayers S., Mehta K., Lindquist K., Sands L., Covinsky K. (2005). Black-white disparities in functional decline in older persons: The role of cognitive function. Journal of Gerontology: Medical Sciences, 60A, 933–939. http://dx.doi.org/10.1093/gerona/60.7.933 [DOI] [PubMed] [Google Scholar]
- Mungas D., Widaman K.F., Reed B. R., Tomaszewski F. S. (2011). Measurement invariance of neuropsychological tests in diverse older persons. Neuropsychology, 25, 260–269. http://dx.doi.org/10.1037/a0021090 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muthén B. O. (1991). Multilevel factor analysis of class and student achievement components. Journal of Educational Measurement, 28, 338–354. http://dx.doi.org/10.1111/j.1745–3984.1991.tb00363.x [Google Scholar]
- Muthén L. K., Muthén B. O. (1998. –2010). Mplus user’s guide. Version 5. Los Angeles, CA: Muthén & Muthén. [Google Scholar]
- Muthén L. K., Muthén B. O. (1998. –2012). Mplus user’s guide. Version 7. Los Angeles, CA: Muthén & Muthén. [Google Scholar]
- Mulaik S.A. (1972). The foundations of factor analysis. New York: McGraw-Hill. [Google Scholar]
- Ofstedal M. B., Fisher G. G., Herzog A. R. (2005). Documentation of cognitive functioning measures in the Health and Retirement Study (HRS/AHEAD Documentation Report DR-006). Ann Arbor: University of Michigan. [Google Scholar]
- Savage-McGlynn E. (2012). Sex differences in intelligence in younger and older participants of the Raven’s Standard Progressive Matrices Plus. Personality and Individual Differences, 53, 137–141. http://dx.doi.org/10.1016/j.paid.2011.06.013 [Google Scholar]
- Sass D. A. (2011). Testing measurement invariance and comparing latent factor means within a confirmatory factor analysis framework. Journal of Psychoeducational Assessment, 29, 347–363. http://dx.doi.org/10.1177/0734282911406661 [Google Scholar]
- Sloan F. A., Wang J. (2005). Disparities among older adults in measures of cognitive function by race or ethnicity. Journal of Gerontology: Psychological Sciences, 60B, P242–250. http://dx.doi.org/10.1093/geronb/60.5.P242 [DOI] [PubMed] [Google Scholar]
- Steiger J. H., Lind J. C. (1980). Statistically-based tests for the number of common factors. Paper presented at the Annual Spring Meeting of the Psychometric Society Iowa City, IA. [Google Scholar]
- Van Lieshout R. J., Cleverley K., Jenkins J. M., Georgiades K. (2011). Assessing the measurement invariance of the Center for Epidemiologic Studies Depression Scale across immigrant and non-immigrant women in the postpartum period. Archives of Women’s Mental Health, 14, 413–423. http://dx.doi.org/10.1007/s00737-011-0236-0 [DOI] [PubMed] [Google Scholar]
- Widaman K. F., Ferrer E., Conger R. D. (2010). Factorial invariance within longitudinal structural equation models: Measuring the same construct across time. Child Development Perspectives, 4, 10–18. http://dx.doi.org/10.1111/j.1750-8606.2009.00110.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Widaman K. F., Reise S. P. (1997). Exploring the measurement invariance of psychological instruments: Applications in the substance use domain. In Bryant K. J., Windle M., West S. G. (Eds.), The science of prevention: Methodological advances from alcohol and substance abuse research (pp. 281–324). Washington, DC: APA. [Google Scholar]
- Wicherts J. M., Dolan C. V. (2010). Measurement invariance in confirmatory factor analysis: An illustration using IQ test performance of minorities. Educational Measurement: Issues and Practice, 29, 39–47. http://dx.doi.org/10.1111/j.1745-3992.2010.00182.x [Google Scholar]
- Zsembik B. A., Peek M. K. (2001). Race differences in cognitive functioning among older adults. Journals of Gerontology Series B: Psychological Sciences and Social Sciences, 56, S266–S274. http://dx.doi.org/10.1093/geronb/56.5.S266 [DOI] [PubMed] [Google Scholar]