Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Jun 1.
Published in final edited form as: Psychol Aging. 2008 Jun;23(2):366–376. doi: 10.1037/0882-7974.23.2.366

Vocabulary test format and differential relations to age

Ryan P Bowles 1, Timothy A Salthouse 2
PMCID: PMC2518066  NIHMSID: NIHMS59910  PMID: 18573010

Abstract

Although vocabulary tests are generally considered interchangeable, regardless of format, different tests can have different relations to age and to other cognitive abilities. In this study, four vocabulary test formats are examined: multiple-choice synonyms, multiple-choice antonyms, produce-the-definition, and picture identification. Results indicate that, although they form a single coherent vocabulary knowledge factor, the formats have different relations to age. In earlier adulthood, picture identification had the strongest growth and produce-the-definition the weakest. In later adulthood, picture identification had the strongest decline and multiple-choice synonyms the least. The formats differ in their relation to other cognitive variables, including reasoning, spatial visualization, memory, and speed. After accounting for the differential relations to other cognitive variables, differences in relations to age were eliminated with the exception of the picture identification test. No theory of the aging of vocabulary knowledge fully explains these findings. These results suggest that using a single indicator of vocabulary may yield incomplete and somewhat misleading results about the aging of vocabulary knowledge.

Keywords: vocabulary knowledge, aging, measurement


Regardless of format, vocabulary tests are generally considered interchangeable indicators of vocabulary knowledge, a principle Spearman (1927) coined “indifference of the indicator.” In his comprehensive analyses of the factor structure of human abilities, Carroll (1993) concluded, “The precise format by which vocabulary knowledge is measured generally makes little difference in the factorial composition of the variables, to the extent that the underlying trait being measured is range of native-language vocabulary knowledge” (p. 158). Intelligence test batteries almost always contain at least one test described as a test of vocabulary, but they can differ markedly in the format used, such as definition production on the WAIS-III (Wechsler, 1997a), picture and word identification on the Woodcock-Johnson-R (Woodcock & Johnson, 1990), and multiple choice synonyms on the ETS Kit (Ekstrom, French, Harman, & Derman, 1976). Some test batteries even include more than one type of vocabulary test, but they are invariably treated as measures of the same vocabulary ability construct (e.g. Munoz-Sandoval, Cummins, Alvarado, & Ruef, 1998; Woodcock, 1987; Woodcock & Johnson, 1990).

Furthermore, the general shape of the curve relating age to vocabulary knowledge seems to be independent of the particular task used. Studies using various formats, including multiple choice tests (Alwin & McCammon, 2001; Schaie, 1996), identification tests (McGrew & Woodcock, 2001), and production tests (McArdle, Grimm, Hamagami, Bowles, & Meredith, 2005), indicate that vocabulary knowledge increases throughout early adulthood, flattens out in middle-age (40– 60), then holds steady or declines gradually in late adulthood (Singer, Verhaeghen, Ghisletta, Lindenberger, & Baltes, 2003). This contrasts sharply with other intellectual abilities, for which the peak ability in cross-sectional studies occurs around age 20 (e.g., Schaie, 1996). The size of the increase in vocabulary ability between early and middle or late adulthood is substantial; in a meta-analysis of 324 studies, Verhaeghen (2003) found that older adults (mean age = 70.4) score approximately 0.8 standard deviations above younger adults (mean age = 21.4).

Despite the treatment of the relation between age and vocabulary knowledge as independent of the vocabulary test format, some research indicates that scores from different types of vocabulary knowledge tests have different relations to age (Sorenson, 1938; Verhaeghen, 2003). Although Verhaeghen (2003) in particular speculated about some theoretical processing differences accounting for certain differential age relations, no research has systematically addressed this issue. Furthermore, studies addressing differences across vocabulary format have examined only two types of vocabulary tests (e.g., synonyms vs. antonyms, Sorenson, 1938; produce-the-definition vs. multiple choice, Verhaeghen, 2003).

The goal of this study is to systematically examine differences in the relation between age and vocabulary knowledge across four different test formats. Identifying and understanding the differential age relations is important for aging researchers for both practical and theoretical reasons:

  1. Identifying the sources of differences among vocabulary test formats may inform understanding of the construct of vocabulary knowledge in general, as well as theories about the aging of vocabulary.

  2. Differences among vocabulary test formats may highlight differences in the specific processes required to solve individual vocabulary items and the relation of these processes to age.

  3. Most studies incorporating vocabulary knowledge employ only a single vocabulary test format. Results based on a single format may be misleading because that format reflects not just what is common about vocabulary knowledge, but also what is unique to the specific format. Understanding the differences among formats may provide researchers with better tools for interpreting and critiquing results based on a single format.

  4. It may be possible to identify the vocabulary test format that most closely matches the average or typical vocabulary test. Researchers may then be more confident that results based on that single indicator are accurate representations of the role of vocabulary knowledge.

In this study, we address each of these issues using four types of vocabulary tests: a locally developed multiple-choice synonyms test (Salthouse, 1993), a locally developed multiple-choice antonyms test (Salthouse, 1993), the WAIS-III produce-the-definition Vocabulary test (Wechsler, 1997a), and the Woodcock-Johnson Revised (WJ-R) picture identification Picture Vocabulary test (Woodcock & Johnson, 1990). We first examine the magnitude of differences in the formats’ relations to age. We then attempt to identify sources of the differential age relations by examining relations to other cognitive abilities. The cognitive variables form the basis of a mediational approach, in which we examine whether the differential relations to other cognitive abilities account for the differential age relations. Finally, we offer interpretations of these findings in terms of the four issues described above.

Method

Participants

The data were obtained from 3512 persons who participated in one of eighteen previously published studies by Salthouse and colleagues in which at least two vocabulary tests were administered (Hambrick, Salthouse, & Meinz, 1999, Studies 1, 2, 3 and 4; Meinz, & Salthouse, 1998; Salthouse, 1996; Salthouse, 2001a, Studies 1 and 2; Salthouse, 2001b; Salthouse, Atkinson, & Berish, 2003; Salthouse & Ferrer-Caja, 2003; Salthouse, Fristoe, McGuthry, & Hambrick, 1998, Study 2; Salthouse, Hambrick, Lukas, & Dell, 1996, Study 2; Salthouse, Hancock, Meinz, & Hambrick, 1996, Study 3; Salthouse, McGuthry, & Hambrick, 1999; Salthouse et al., 2000, Study 2; Salthouse, Toth, Hancock, & Woodard, 1997; Siedlecki, Salthouse, & Berish, 2005). Participants ranged in age from 18 to 98 (M = 49.5, SD = 17.2). Health and education levels of the participants are presented in Table 1.

Table 1.

Demographic characteristics of participants

Note. Education refers to the number of years of formal education completed except for those marked with an a, which were classified as 1 for less than 12 years, 2 for high school graduation, 3 for 13–15 years of education, 4 for college graduate, and 5 for 17 or more years of formal education. Health is on a 5-point scale ranging from 1 (excellent) to 5 (poor).

Procedures

In each study, participants were administered at least two vocabulary tests, as well as a number of other cognitive tasks that varied across studies. A selection of these cognitive tasks was made, with a goal of having several tasks in a number of broad cognitive abilities. A task was selected only if it was used in at least two studies. Eighteen tasks met the selection criteria, resulting in four broad cognitive abilities: Reasoning, Spatial Visualization, Memory, and Speed.1 A short description of each task is given in Table 2. We describe the broad cognitive abilities in terms of Carroll’s (1993) taxonomy: Reasoning consists of five inductive reasoning tasks; Spatial Visualization consists of three general visualization tasks; Memory consists of two free recall tasks and one associative memory task; and Speed consists of three perceptual speed tasks.

Table 2.

Description of cognitive tasks

Broad cognitive ability Test n Description Source
Reasoning Matrix Reasoning 1756 The participant selects the best alternative to complete the missing cell in a matrix. odd-numbered items from Raven (1962)

Reasoning Cattell’s Matrices 420 The participant selects the best of six alternatives to complete the missing cell of a 2×2 or 3×3 matrix. Institute for Personality and Ability Testing (1973)

Reasoning Figure Classification 459 Two or three groups of figures are presented at the top of the page, with figures within each group related in some way. Rows of figures are presented, and the participant marks which group the figures belong to. Ekstrom, French, Harman, & Dermen (1976)

Reasoning Shipley Abstraction Test 420 The participant is given a series and responds with the number, letters, or word that completes the series. Zachary (1986)

Reasoning Letter Sets 796 The participant selects which groups of letters do not belong in each of 20 sets of letters. Ekstrom et al. (1976)

Spatial Visualization Spatial Relations 1154 The participant mentally assembles an unfolded piece of paper and then determines which of four three- dimensional structures it most closely resembles. Bennett, Seashore, & Wesman (1997)
Spatial Visualization Paper Folding 944 A piece of folded paper with a hole punched through the folded surface is presented, and the participant identifies the pattern of holes that would result when the paper is unfolded. Ekstrom et al. (1976)
Spatial Visualization Form Boards 467 The participant selects the set of pieces which can be assembled to form a specified shape. Ekstrom et al. (1976)
Memory WMS-III Free Recall 411 A list of 12 words is presented orally four times, with the participant recalling as many words as possible following each presentation. A second list of words is then presented and recalled, followed by an attempt to recall as many of the words from the original list. Wechsler (1997b)
Memory Rey Auditory Learning Test 586 Fifteen words are read to the participant, followed immediately by a recall attempt. Five trials are given with the same list, with each trial consisting of a presentation and recall attempt. Schmidt (1996)
Memory Paired Associates 822 A set of six word pairs is presented orally. The participant receives a page containing the first member of each pair, and responds with the second member. A second trial is then given with a new set of word pairs. Salthouse, Fristoe, et al. (1996)
Speed Letter Comparison 3182 The participant makes same-or- different judgments for pairs of letter strings as quickly as possible. Two pages of letter string pairs are presented, with 30 seconds allowed for each page. Salthouse & Babcock (1991)
Speed Pattern Comparison 3182 Similar to Letter Comparison, except instead of letter strings, the participants is presented pairs of patterns composed of line segments. Salthouse & Babcock(1991)
Speed WAIS-III Digit- Symbol Substitution 411 Two minutes were allowed for the participant to write symbols below digits according to a code table displayed at the top of the page. Wechsler (1997a)

Vocabulary tests

Four vocabulary tests, each with different formats, were used in this study. The Synonyms Vocabulary Test (abbreviated Synonyms; Salthouse, 1993) consists of 10 multiple-choice items, with five response alternatives for each item. Participants are instructed to circle the word that is most nearly the same in meaning to the target word, and the score is the total number of items answered correctly. The Antonyms Vocabulary Test (abbreviated Antonyms; Salthouse, 1993) is identical, except that participants are instructed to circle the word most nearly opposite in meaning to the target word. For both tests, content was selected to be broadly representative of a number of sources, such as practice items on the SAT, with no idiosyncratic selection mechanism.

The WAIS-III Vocabulary (Abbreviated WAIS Voc; Wechsler, 1997a) test consists of 33 produce-the-definition items. Participants are given a target word, and asked to define the word. Complete definitions are given an item score of 2, while incomplete definitions are given a partial credit item score of 1. Total score is the sum of the item scores. Scores on WAIS Voc were divided by 2 to maintain consistency with the other tests.

The WJ-R Picture Vocabulary (abbreviated WJ Pic Voc) test consists of 58 items on which the participant is presented a picture and responds with the name of the object depicted. To minimize testing time, only the final 30 items were administered. Responses are scored either correct (score = 1) or incorrect (score = 0). Total score is the sum of the item scores.

Results

For all results described in this article, alpha was set to .01. All structural equation models were estimated in AMOS (Arbuckle, 2006) using full information maximum likelihood estimation (Wothke, 2000), which allows for the analysis of incomplete data that is missing at random (Little & Rubin, 2002; McArdle, 1994). In order to maintain consistency in the vocabulary test scoring without affecting correlations among the variables, all vocabulary scores were converted to z-scores based on the mean and standard deviation for the 739 persons who had complete data on all four tests. Fit of the four vocabulary tests to a single vocabulary factor was good (RMSEA = .037), and all standardized factor loadings were high (Synonyms: .94; Antonyms: .93; WAIS Voc: .87; WJ Pic Voc: .81).2

Age relations

Age was positively related to scores on all four vocabulary tests, although the magnitude varied (all correlations equated: X2 = 124, df = 3, p < .01). Synonyms (r = .27) and WJ Pic Voc (r = .26) were not significantly different in their correlation with age (X2 = 0.7, df = 1, p = .40). WJ Pic Voc and Synonyms had stronger relations with age than Antonyms (r = .18; constrained equal to WJ Pic Voc and Synonyms: ΔX2 = 70, Δdf = 1, p < .01), which in turn was more strongly related to age than WAIS Voc (r = .09; constrained equal to Antonyms: ΔX2 = 17, Δdf = 1, p < .01). However, as shown in Figure 1, the age relations of all four tests were nonlinear and similarly shaped.

Figure 1.

Figure 1

Relation between age and z-scores on the vocabulary tests.

We fit a linear-linear spline model with fixed knot point to each of the vocabulary tests (Cudeck & Klebe, 2002). The linear-linear spline model consists of two linear trends, one before the knot point (linear growth), and one after (linear decline). We initially estimated the knot point separately for each vocabulary test, but to maximize comparability, we fixed the knot point at age = 58, which was the approximate mean value and within the 95% confidence interval for all four tests. We also considered a quadratic growth curve, but opted for the linear-linear spline because it has easily interpreted parameters (growth and decline rates) that can be compared separately across the formats, and because it fit at least marginally better than a quadratic model for all four tests using the same number of parameters.

WJ Pic Voc has the strongest growth rate (.036 SDs per year before the knot point), followed by Synonyms (.030), Antonyms (.025) and WAIS Voc (.019). WJ Pic Voc also had the strongest decline rate (−.033), followed by WAIS Voc (−.025), Antonyms (−.020), and Synonyms (−.015). All age trends were different; equating the curves yielded a significant loss in fit compared to allowing all to be free (ΔX2 = 147, Δdf = 6, p < .01). Furthermore, equating any two curves yielded a significant loss in fit (smallest ΔX2 = 18, Δdf = 2, p < .01, Antonyms and WAIS Voc; largest ΔX2 = 69, Δdf = 2, p < .01, Synonyms and Antonyms). These are not trivial differences; these results suggest that during the 40 years of adulthood before the knot point, the average age-related increase in WJ Pic Voc scores is 1.44 SD, while WAIS Voc is expected to increase only .76 SD, an effect size difference of .68 SD. At the other end of the life course, WJ Pic Voc is expected to decrease 1.32 SD over the approximately 40 years spanned by our study after the knot point, while Synonyms is predicted to decrease only .60 SD, an effect size difference of .72 SD.

Artifactual causes of differential age relations

Three potentially artifactual sources of the differential age relations were examined before exploring relations to other cognitive variables. First, if scores on one test contain more measurement error than other tests, then the test scores will be less related to age even if the underlying latent trait has the same relation to age and to other cognitive abilities. However, there was no evidence of varying levels of measurement errors; coefficient alpha was .85 for the Synonyms test (n = 2432), .86 for the Antonyms test, .89 for the WAIS Voc (based on these data; .93 as reported by Wechsler, 1997), and .88 for the WJ Pic Voc (based on these data; .88 as reported by McGrew, Werder, & Woodcock, 1991).3

A second possibility is that the tests vary in average difficulty, and they differ in their age relations because of difficulty variations rather than differences between the formats (Bowles, Grimm, & McArdle, 2005). If more difficult items, regardless of the type of test, are more negatively (or positively) related to age, then more difficult tests may be less (or more) strongly related to age. If this is the case, then the point-biserial correlation between age and item responses should be systematically related to item difficulty within each test. However, as displayed in Figure 2, within test format, there was consistency in the age point-biserial across item difficulty within tests. Furthermore, a regression of the point-biserial on item difficulty was not significant for any test.

Figure 2.

Figure 2

Relation between item difficulty and point-biserial correlations between age and items scores. Each point represents a single item.

A third possibility is order effects. Age relations may be, for example, stronger for later presented tests due to such irrelevant factors as age differences in rates of test fatigue. There was no consistent order of tests across studies with one exception: Synonyms was administered immediately before Antonyms. Thus, the age differences are unlikely to result from order effects.4

Relations to other cognitive abilities

The baseline model we employ is displayed in Figure 3. In order to assess relations of the vocabulary tests to other cognitive variables, we developed a factor analytic measurement model with five broad cognitive abilities, Vocabulary, Reasoning, Spatial Visualization, Speed, and Memory, and one higher order factor loading on Reasoning, Spatial Visualization, Speed, and Memory, and correlating with Vocabulary. Global fit statistics for this model were not available because of the complex pattern of missing data, which did not allow for the estimation of a fully saturated model because some pairs of tests were never administered at the same time. However, all factor loadings on the broad cognitive abilities were positive and large, ranging from .74 to .90. The higher order factor was indistinguishable from Reasoning; the standardized factor loading was greater than one, and so we constrained the standardized factor loading to 1. For convenience, we name the higher order factor General Fluid Abilities (GFA), although we make no claim that it is identical to Horn and Cattell’s Gf. Factor loadings for the other cognitive abilities on GFA were: .89 for Spatial Visualization; .80 for Memory; and .79 for Speed; the correlation with Vocabulary was .44.

Figure 3.

Figure 3

Structural equation measurement model. Indicators on the broad cognitive ability factors are suppressed.

We then examined relations between the individual vocabulary tests and the broad cognitive abilities by adding paths from, in turn, GFA, the Spatial Visualization residual, the Memory residual, and the Speed residual to each of the four vocabulary tests. The paths from GFA test whether the vocabulary tests have different relation to GFA. The paths from the residuals test whether the vocabulary tests have different relations to other cognitive abilities, independent of or controlling for GFA. GFA was differentially related to the four vocabulary formats. The standardized regression coefficients were: .36 for Synonyms; .44 for Antonyms; .52 for WAIS Voc; and .39 for WJ Pic Voc.5 All coefficients were significantly different except for Synonyms and WJ Pic Voc (ΔX2 = 1, Δdf = 1, p = .32). The Spatial Visualization residual was more strongly related to WJ Pic Voc (.40) than the other three vocabulary tests (.26), which were not significantly different from each other (ΔX2 = 4, Δdf = 2, p = .14). The Memory residual was more strongly related to WAIS Voc (.26) than the other three vocabulary tests (.16), which were not significantly different from each other (ΔX2 = 7, Δdf = 2, p = .03). The Speed residual was more strongly related to Antonyms (.19) than Synonyms or WAIS Voc (.09), which were not significantly different from each other (ΔX2 < 1, Δdf = 1, p = .38), and least strongly related to WJ Pic Voc (−.06, which was not significantly different from 0, p = .15). These findings are summarized in Table 3 and illustrated in Figure 4.

Figure 4.

Figure 4

Abbreviated path models of relation between cognitive abilities and vocabulary formats. Numbers and the thickness of the lines connecting to vocabulary tests represent standardized regression coefficients.

We then developed a mediation model to examine whether the differential age relations can be accounted for by the differential relations to other cognitive variables. For example, does WAIS Voc have a different relation to age than the other three vocabulary tests because it is more strongly related to both GFA and Memory? If so, after including indirect effects of age through GFA and Memory, the direct effects of age should be identical to the direct effects of age for the other three vocabulary formats. We added to the measurement model in Figure 3 the two age variables, (i.e., linear growth and linear decline with an age 58 knot point), with paths from the age variables to GFA, Spatial Visualization, Memory, Speed, and the four vocabulary tests. For statistical convenience and because it had the least complex pattern of relations to other broad cognitive variables, we set Synonyms as our reference task, and as such included paths from cognitive variables when the previous analyses indicated that the relations were significantly different from those for Synonyms. These included: from GFA to Antonyms and WAIS Voc; from Spatial Visualization to WJ Pic Voc; from Memory to WAIS Voc; and from Speed to Antonyms and WJ Pic Voc.

Results indicated that all age differences between Synonyms, Antonyms, and WAIS Voc were accounted for by differential relations to other cognitive variables. Constraining the path coefficients from the age variables to these three vocabulary tests did not yield significantly more misfit (ΔX2 = 13, Δdf = 4, p = .02). Confirming this result is the similar values for unconstrained direct (unmediated) effects of age: Synonyms, .030 growth, −.014 decline; Antonyms, .030, −.008; WAIS Voc, .025, −.010). WJ Pic Voc, on the other hand, still had a different age trend (.036 growth, −.032 decline; ΔX2 = 27, Δdf = 2, p < .01).

Discussion

Although different formats of vocabulary tests are generally considered interchangeable, they can have different relations to age and other cognitive abilities. Correlations with age differed substantially, ranging from .14 for the WAIS-III produce-the-definition test to .30 for the WJ-R picture identification test. The age differences were still apparent when considering the age trends as non-linear. In earlier adulthood (before the age 58 knot point), WJ-R picture identification had the strongest growth (.036 SDs per year; 1.44 SDs over 40 years) and WAIS-III produce-the-definition the weakest (.019 SDs per year, 0.76 SDs over 40 years). In later adulthood (after age 58), WJ-R picture identification had the strongest decline (−.033 SDs per year; −1.32 SDs over 40 years) and multiple-choice synonyms the least (−.015 SDs per year; −0.60 SDs over 40 years).

As summarized in Table 3, the tests had different relations to other cognitive variables. Compared to the multiple-choice synonyms test, WJ-R Picture Vocabulary was more strongly related to Spatial Visualization and negatively related to speed, the multiple-choice antonyms test more strongly related to speed, and WAIS-III produce the definition more strongly related to reasoning and memory. For the most part, these relations to other broad cognitive abilities accounted for the differential age relations. However, WJ Pic Voc still had a stronger positive age-related growth and stronger negative age-related decline than the other vocabulary tests.

Table 3.

Relation of vocabulary tests to broad cognitive abilities

Broad Cognitive Ability Antonyms WAIS Voc WJ-R Pic Voc
GFA/Reasoning + ++
Spatial Visualization +
Memory +
Speed +

Note: Relations are relative to relations with Synonyms. + indicates a stronger relation, − a weaker relation, and ++ a stronger relation than a single +.

Theories of the Aging of Vocabulary Knowledge

To account for these findings, a theory of the aging of vocabulary knowledge must include one or more cognitive processes that differ across format and are related to age. To our knowledge, only two theories do this: the dual representation theory (McGinnis & Zelinski, 2000) and the spreading activation Transmission Deficit Hypothesis (TDH; James & Burke, 2000; MacKay & Abrams, 1998; MacKay & Burke, 1990). Under dual representation theory, there are two cognitive representations of vocabulary knowledge, a detailed exact definition and a general gist, similar to the gist-verbatim distinction in memory (Brainerd & Reyna, 1992). Alternatively, there may be a continuum of specificity in multiple representations (McGinnis & Zelinski, 2003). Older adults are less able to generate and access the detailed definition (McGinnis & Zelinski, 2000), and compensate by relying more on the general representation (Botwinick & Storandt, 1974; Tun, Wingfield, Rosen, & Blanchard, 1998). Thus, there may be different age relations for different types of vocabulary tasks if the tasks differ in the sufficiency of the general representation for correct responses. While this has been proposed as a theoretical explanation for differences in scores on the WAIS Vocabulary Test (Botwinick & Storandt, 1974), there do not appear to be adequately detailed theoretical expectations of differences in the sufficiency of the general representation across tasks.

The spreading activation Transmission Deficit Hypothesis (TDH; James & Burke, 2000; MacKay & Abrams, 1998; MacKay & Burke, 1990) suggests that the links between representations of a word and its definition or semantic meaning become weaker or less efficient with age (Burke, MacKay, & James, 2000; MacKay & Abrams, 1998). For production tasks, such as a produce-the-definition test, activation of the correct response (definition or target word) comes only from the stem of the vocabulary item (word or picture). In multiple choice tasks, however, activation is passed not just from the target word, which weakens with age, but also from the multiple choice options (Burke, MacKay, & James, 2000). Thus, the age-related degradation of the efficiency of the connections between nodes is more detrimental to production tasks than to multiple choice tasks, a prediction confirmed in some research studies (Verhaeghen, 2003), and echoed in our results on later life declines on picture identification and produce-the-definition tasks than on multiple-choice antonym and synonym tasks.

When coupled with the WordNet theory (Gross, Fischer, and Miller, 1989; Gross & Miller, 1990), TDH also predicts stronger declines for antonym knowledge than synonym knowledge. According to WordNet theory, identifying an antonymous relationship between two words (e.g. hot and cool) involves identifying the direct or exact antonym of the first word (hot to cold) followed by recognizing a synonymous relationship between the second word and the direct antonym of the first (cold and cool). Thus, identifying antonyms requires traversing more links between nodes than identifying synonyms (Charles, Reed, & Derryberry, 1994; Gross et al., 1989), and therefore, under TDH, antonym knowledge should be more susceptible to aging than synonym knowledge. This matches our finding that antonyms are more strongly negatively related to age in later adulthood.

Our findings suggest a third possible direction for theoretical development. The differences in age-related trends among multiple-choice synonyms, multiple-choice antonyms, and produce-the-definition (but not picture identification) were primarily accounted for by differential relations to reasoning or general fluid abilities. Reasoning declines after a peak age of approximately 20, so a vocabulary format more strongly related to reasoning would be expected to grow less during early adulthood and decline more in later adulthood. Our empirical results confirm this expectation: produce-the-definition had weaker early adulthood growth and stronger later adulthood decline, and was most strongly related to reasoning, whereas multiple-choice synonyms grew more rapidly during early adulthood, declined least in later adulthood, and was least strongly related to reasoning. Thus, differences among formats in the necessity or usefulness of reasoning may account for the differential age trends. However, theoretical explanations for why certain formats require or allow for more reasoning remain to be developed. It should be noted that reasoning as a theoretical explanation does not preclude TDH or dual representation, which are at different levels of cognitive representation.

Processing differences

Processing differences among different formats has been a key aspect of cognitive aging research. By developing task manipulations designed to tap different cognitive processes, researchers are able to isolate the hypothesized processing differences. For example, a great deal of research has been dedicated to understanding differences between recognition and recall in episodic memory (Craik & McDowd, 1987), leading to theories involving processes that are differentially related to age (e.g., Craik, 1983). Limited research has addressed such processing differences in vocabulary knowledge (e.g., Botwinick & Storandt, 1974; Burke, MacKay, & James, 2000; Verhaeghen, 2003). This study suggests some further directions that may be fruitful, by highlighting differences among test formats in their relations with broad cognitive abilities. These differences suggest that some of the formats require specific processes that are shared with other cognitive constructs. For example, the finding that antonyms knowledge shares more variance with speed than the other formats suggests that antonyms knowledge may require a process that is speed-intensive. A systematic treatment of the processing differences is beyond the scope of this study, but we hope that it will provide an impetus for research addressing a more thorough understanding of the nature of vocabulary knowledge.

Interpretational challenges

Although we interpret our findings as differences between formats, vocabulary test format is confounded by item content. Tests differed not only in format, but also in the particular target words and, for Synonyms and Antonyms, the response options, and our findings may be a result of idiosyncratic characteristics of the item content (Verhaeghen, 2003). It is not possible to remove this confound, as content necessarily varies with format. For example, the same response options cannot be used for both Synonyms and Antonyms, even if the target words were identical. Item content differences are unlikely to be a major confound, however, as no item on any of the four tests was selected for content. Instead, the items were intended to be an essentially random selection from the large pool of potential vocabulary test items.

A second concern in the interpretation of our findings is the relation between reasoning and the other broad cognitive variables. The higher order factor, which we called General Fluid Ability, was indistinguishable from the Reasoning factor, consistent with Gustafsson’s (1984) assertion that Reasoning or Gf is identical to a general abilities factor like Spearman’s g. As a result, all variance shared among Reasoning, Spatial Visualization, Memory, and Speed was assigned to the GFA/Reasoning factor. This may be caused by our selection of cognitive tasks, which consisted of a majority of reasoning and very closely related spatial visualization tasks, and therefore emphasized variance associated with reasoning. The differential relations of the vocabulary tests to other cognitive variables hold regardless of the exact identity of the higher order factor, although the interpretation of the differential relations to GFA/Reasoning remains somewhat uncertain.

A third concern is that we used two locally developed vocabulary tests: multiple-choice synonyms and multiple-choice antonyms. These tests may have distinctive characteristics that limit the generalizability of our results. The multiple-choice synonyms test was identical in format to such well-used multiple-choice vocabulary tests as the Shipley Vocabulary test (Shipley, 1946) or the Thorndike-Gallup test of verbal knowledge (Thorndike & Gallup, 1944). To our knowledge, no standard cognitive test battery contains a test of antonyms knowledge; the format, however, was identical to the synonyms test except for the request for a word opposite in meaning instead of identical. Content of the items on both tests was adapted from a number of sources with no idiosyncratic selection mechanism. Furthermore, dividing the tests into two subtests (odds and evens) and analyzing each separately yielded the same statistical results. Therefore, we do not expect that our results are specific to these particular multiple-choice tests.

A final interpretational concern is that we used standard scoring procedures for all test formats, and considered effects only at the test score level. There is evidence that, at least on one vocabulary test not included in this study, there may be differential relations with age depending on the particular items considered (Bowles, Grimm, & McArdle, 2005). We did not find systematic differences across items in age point-biserials within any tests, suggesting that the test score is an appropriate level at which to consider differential age relations. Nonetheless, we consider this a topic for future research into the generalizability of both our findings and those of Bowles et al.

Measurement issues

No instrument is either an exclusive or an exhaustive representation of a latent construct. Rather, there are many ways to measure a construct, and no particular way can completely reflect the construct of interest (Hand, 2004). Recognizing this conceptual distinction highlights two critical aspects of measurement. First, it is important to identify how each instrument measures the construct and the manner in which each instrument measures the construct differently from other instruments measuring the same construct. Each instrument may require different cognitive processes, and identification of these processes may inform understanding of the construct. These processes can only be identified and understood in the context of other variables, through evidence of convergence and divergence with other instruments measuring the same construct, and through evidence of convergence and divergence with other constructs.4 The current study provides an example of this process.

Second, as this study highlights, it is important to have multiple indicators of the same construct in order to assess the breadth of the construct. The call for multiple indicators is certainly not new (Little, Lindenberger, & Nesselroade, 1999), but these results illustrate one reason why having only a single indicator could lead to misleading results. A single indicator may involve processes that are not involved in other instruments measuring the same construct. Findings about relations with other variables or constructs could therefore reflect those processes unique to the single indicator instead of the common processes that define the construct. For example, using the WJ-R Picture Vocabulary test as the only indicator of vocabulary knowledge may overestimate the age-related growth and decline of vocabulary knowledge, while using only the WAIS-III Vocabulary test may overestimate the relation between vocabulary knowledge and reasoning.

Practical implications

Despite the importance of employing multiple indicators of vocabulary knowledge, it is often impractical to use more than one. We argue that, although there can never be a pure measure of vocabulary knowledge, multiple-choice synonyms offers the closest approximation. Synonyms has the highest factor loading on a vocabulary factor (.94). Synonyms is also least different from the other tests; it has the fewest significant differences from the other vocabulary tests in terms of relations to other cognitive abilities. Therefore, it is closest to an ‘average’ test. We also suggest that WJ Pic Voc may not be a good choice for a single vocabulary indicator, because of its lowest factor loading (.81), its differential relations to other cognitive abilities, and the unclear source of its differential age relations.

Summary

Despite Carroll’s (1993) claim of “indifference of the indicator” for vocabulary tests, different formats can have different relations to age and to other cognitive abilities. Of the four formats we examined, multiple-choice synonyms, multiple-choice antonyms, and produce-the-definition had the same age trends after accounting for differential relations to other cognitive variables, primarily reasoning. A theoretical explanation for the differences may therefore include differences in the role of reasoning. Picture identification, however, has a different age trend that is not accounted for by other variables, and may not be a good choice of format when a single indicator of vocabulary is used. Until greater understanding of vocabulary knowledge is gained, it is strongly suggested that researchers include multiple indicators of vocabulary knowledge, especially when vocabulary knowledge is an important focus of the research.

Acknowledgments

The authors wish to acknowledge the help provided by John J. McArdle in analyzing the data. The authors gratefully acknowledge the support provided by grants T32 AG20500-01 and RO1 AG019627 from the National Institute on Aging in the preparation of this article.

Footnotes

Ryan P. Bowles, Department of Psychology, Michigan State University and Timothy A. Salthouse, Department of Psychology, University of Virginia.

1

WMS-III Logical Memory (Wechsler, 1997b) also met selection criteria, but as it had different relations to age and other cognitive variables than the other memory variables, it was excluded from the analyses.

2

A model with correlated residuals for Synonyms and Antonyms (i.e., multiple-choice method factor) yielded very similar results. A factor model with the formats residualized on the age functions (age before and after 58) described later was also not appreciably different.

3

Item level data for the calculation of coefficient alpha was available only for approximately two-thirds of the sample that took Synonyms (n = 2432) and Antonyms (n = 2408).

4

There was also no evidence of order effects within tests. Age point-biserials did not differ systematically within test for any format.

5

For analyses addressing differential relations to GFA, the correlation between the Vocabulary factor and GFA was not included in the model.

Contributor Information

Ryan P. Bowles, Michigan State University

Timothy A. Salthouse, University of Virginia

References

* indicates studies included in the data

  1. Alwin DF, McCammon RJ. Aging, cohorts, and verbal ability. Journal of Gerontology: Social Sciences. 2001;56B:S151–S161. doi: 10.1093/geronb/56.3.s151. [DOI] [PubMed] [Google Scholar]
  2. Arbuckle JL. Amos (Version 7) (computer software) Spring House, PA: Amos Development Corporation; 2006. [Google Scholar]
  3. Bennett GK, Seashore HG, Wesman AG. Differential Aptitude Test. San Antonio, TX: The Psychological Corporation; 1997. [Google Scholar]
  4. Botwinick J, Storandt M. Vocabulary ability in later life. Journal of Genetic Psychology. 1974;125:303–308. [Google Scholar]
  5. Bowles RP, Grimm KJ, McArdle JJ. A structural factor analysis of vocabulary knowledge and relations to age. Journal of Gerontology: Psychological Sciences. 2005;60:P234–P241. doi: 10.1093/geronb/60.5.p234. [DOI] [PubMed] [Google Scholar]
  6. Brainerd CJ, Reyna VF. Explaining “memory free” reasoning. Psychological Science. 1992;3:332–339. doi: 10.1111/j.1467-9280.2007.01919.x. [DOI] [PubMed] [Google Scholar]
  7. Burke DM, MacKay DG, James LE. Theoretical approaches to language and aging. In: Perfect TJ, Maylor EA, editors. Models of cognitive aging. Oxford: Oxford University Press; 2000. pp. 204–237. [Google Scholar]
  8. Carroll JB. Human cognitive abilities: A survey of factor-analytic studies. Cambridge: Cambridge University Press; 1993. [Google Scholar]
  9. Charles WG, Reed MA, Derryberry D. Conceptual and associative processing in antonymy and synonymy. Applied Psycholinguistics. 1994;15:329–354. [Google Scholar]
  10. Craik FIM. On the transfer of information from temporary to permanent memory. Philosophical Transactions of the Royal Society, London, Series B. 1983;302:341–359. [Google Scholar]
  11. Craik FIM, McDowd JM. Age differences in recall and recognition. Journal of Experimental Psychology: Learning, Memory and Cognition. 1987;13:474–479. [Google Scholar]
  12. Cudeck R, Klebe KJ. Multiphase mixed-effects models for repeated measures data. Psychological Methods. 2002;7:41–63. doi: 10.1037/1082-989x.7.1.41. [DOI] [PubMed] [Google Scholar]
  13. Ekstrom RB, French JW, Harman HH, Derman D. Kit of factor-referenced cognitive tests. Princeton, NJ: Educational Testing Service; 1976. [Google Scholar]
  14. Gross D, Fischer U, Miller G. Antonymy and the representation of adjectival meanings. Memory and Language. 1989;28:93–106. [Google Scholar]
  15. Gross D, Miller KJ. Adjectives in WordNet. International Journal of Lexicography. 1990;3:265–277. [Google Scholar]
  16. Gustafsson JE. A unifying model for the structure of intellectual abilities. Intelligence. 1984;8:179–203. [Google Scholar]
  17. *.Hambrick DZ, Salthouse TA, Meinz EJ. Predictors of crossword puzzle proficiency and moderators of age-cognition relations. Journal of Experimental Psychology: General. 1999;128:131–164. doi: 10.1037//0096-3445.128.2.131. [DOI] [PubMed] [Google Scholar]
  18. Hand DJ. Measurement theory and practice: The world through quantification. London: Arnold; 2004. [Google Scholar]
  19. Institute for Personality and Ability Testing. Measuring intelligence with the culture fair tests. Champaign, IL: Author; 1973. [Google Scholar]
  20. James LE, Burke DM. Phonological priming effects on word retrieval and tip-of-the-tongue experiences in young and older adults. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2000;26:1378–1391. doi: 10.1037//0278-7393.26.6.1378. [DOI] [PubMed] [Google Scholar]
  21. Little RJA, Rubin DB. Statistical analysis with missing data. 2. New York: Wiley; 2002. [Google Scholar]
  22. Little T, Lindenberger U, Nesselroade JR. On selecting indicators for multivariate measurement and modeling with latent variables. Psychological Methods. 1999;4:192–211. [Google Scholar]
  23. MacKay DG, Abrams L. Age-linked declines in retrieving orthographic knowledge: Empirical, practical, and theoretical implications. Psychology and Aging. 1998;13:647–662. doi: 10.1037//0882-7974.13.4.647. [DOI] [PubMed] [Google Scholar]
  24. MacKay DG, Burke DM. Cognition and aging: A theory of new learning and the use of old connections. In: Hess TM, editor. Aging and cognition: Knowledge organization and utilization. New York: Elsevier; 1990. [Google Scholar]
  25. McArdle JJ. Structural factor analysis experiments with incomplete data. Multivariate Behavioral Research. 1994;29:409–454. doi: 10.1207/s15327906mbr2904_5. [DOI] [PubMed] [Google Scholar]
  26. McArdle JJ, Grimm KJ, Hamagami F, Bowles RP, Meredith W. Modeling latent growth curves using longitudinal data with non-repeated measurements. 2007 doi: 10.1037/a0015857. Manuscript submitted for publication. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. McGinnis D, Zelinski EM. Understanding unfamiliar words: The influence of processing resources, vocabulary knowledge, and age. Psychology and Aging. 2000;15:235–250. doi: 10.1037//0882-7974.15.2.335. [DOI] [PubMed] [Google Scholar]
  28. McGinnis D, Zelinski EM. Understanding unfamiliar words in young, young-old, and old-old adults: Inferential processing and abstraction deficits. Psychology and Aging. 2003;18:497–509. doi: 10.1037/0882-7974.18.3.497. [DOI] [PubMed] [Google Scholar]
  29. McGrew KS, Werder JK, Woodcock RW. WJ-R technical manual. Allen, TX: DLM; 1991. [Google Scholar]
  30. McGrew KS, Woodcock RW. Technical Manual, Woodcock-Johnson III. Itasca, IL: Riverside; 2001. [Google Scholar]
  31. *.Meinz EJ, Salthouse TA. The effects of age and experience on memory for visually presented music. Journal of Gerontology: Psychological Sciences. 1998;53B:P60–P69. doi: 10.1093/geronb/53b.1.p60. [DOI] [PubMed] [Google Scholar]
  32. Munoz-Sandoval AF, Cummins J, Alvarado CG, Ruef ML. Bilingual verbal abilities test: Comprehensive manual. Itasca, IL: Riverside; 1998. [Google Scholar]
  33. Raven J. Advanced Progressive Matrices. London: H. K. Lewis; 1962. [Google Scholar]
  34. Salthouse TA. Speed and knowledge as determinants of adult age differences in verbal tasks. Journal of Gerontology: Psychological Sciences. 1993;48:P29–P36. doi: 10.1093/geronj/48.1.p29. [DOI] [PubMed] [Google Scholar]
  35. *.Salthouse TA. General and specific speed mediation of adult age differences in memory. Journal of Gerontology: Psychological Sciences. 1996;51B:P30–P42. doi: 10.1093/geronb/51b.1.p30. [DOI] [PubMed] [Google Scholar]
  36. *.Salthouse TA. Attempted decomposition of age-related influences on two tests of reasoning. Psychology and Aging. 2001a;16:251–263. doi: 10.1037//0882-7974.16.2.251. [DOI] [PubMed] [Google Scholar]
  37. *.Salthouse TA. Structural models of the relations between age and measures of cognitive functioning. Intelligence. 2001b;29:93–115. [Google Scholar]
  38. *.Salthouse TA, Atkinson TM, Berish DE. Executive functioning as a potential mediator of age-related cognitive decline in normal adults. Journal of Experimental Psychology: General. 2003;132:566–594. doi: 10.1037/0096-3445.132.4.566. [DOI] [PubMed] [Google Scholar]
  39. Salthouse TA, Babcock RL. Decomposing adult age differences in working memory. Developmental Psychology. 1991;27:763–776. [Google Scholar]
  40. *.Salthouse TA, Ferrer-Caja E. What needs to be explained to account for age-related effects on multiple cognitive variables. Psychology and Aging. 2003;18:91–110. doi: 10.1037/0882-7974.18.1.91. [DOI] [PubMed] [Google Scholar]
  41. *.Salthouse TA, Fristoe N, McGuthry KE, Hambrick DZ. Relation of task switching to speed, age, and fluid intelligence. Psychology and Aging. 1998;13:445–461. doi: 10.1037/0882-7974.13.3.445. [DOI] [PubMed] [Google Scholar]
  42. *.Salthouse TA, Hambrick DZ, Lukas KE, Dell TC. Determinants of adult age differences on synthetic work performance. Journal of Experimental Psychology: Applied. 1996;2:305–329. [Google Scholar]
  43. *.Salthouse TA, Hancock HE, Meinz EJ, Hambrick DZ. Interrelations of age, visual acuity, and cognitive functioning. Journal of Gerontology: Psychological Sciences. 1996;51B:P317–P330. doi: 10.1093/geronb/51b.6.p317. [DOI] [PubMed] [Google Scholar]
  44. *.Salthouse TA, McGuthry KE, Hambrick DZ. A framework for analyzing and interpreting differential aging patterns: Application to three measures of implicit learning. Aging, Neuropsychology, and Cognition. 1999;6:1–18. [Google Scholar]
  45. *.Salthouse TA, Toth J, Daniels K, Parks C, Pak R, Wolbrette M, Hocking KJ. Effects of aging on efficiency of task switching in a variant of the trail making test. Neuropsychology. 2000;14:102–111. [PubMed] [Google Scholar]
  46. *.Salthouse TA, Toth J, Hancock HE, Woodard JL. Controlled and automatic forms of memory and attention: Process purity and the uniqueness of age-related influences. Journal of Gerontology: Psychological Sciences. 1997;52B:P216–P228. doi: 10.1093/geronb/52b.5.p216. [DOI] [PubMed] [Google Scholar]
  47. Schaie KW. Intellectual development in adulthood: The Seattle longitudinal study. Cambridge: Cambridge Univeristy Press; 1996. [Google Scholar]
  48. Schmidt M. Rey Auditory Verbal Learning Test: A Handbook. Los Angeles: Western Psychological Services; 1996. [Google Scholar]
  49. Shipley WC. Institute of Living Scale. Los Angeles: Western Psychological Services; 1946. [Google Scholar]
  50. *.Siedlecki KL, Salthouse TA, Berish DE. Is there anything special about the aging of source memory? Psychology and Aging. 2005;20:19–32. doi: 10.1037/0882-7974.20.1.19. [DOI] [PubMed] [Google Scholar]
  51. Singer T, Verhaeghen P, Ghisletta P, Linderberger U, Baltes PB. The fate of cognition in very old age: Six-year longitudinal findings in the Berlin Aging Study (BASE) Psychology and Aging. 2003;18:318–331. doi: 10.1037/0882-7974.18.2.318. [DOI] [PubMed] [Google Scholar]
  52. Sorenson H. Adult abilities. Minneapolis, MN: University of Minnesota Press; 1938. [Google Scholar]
  53. Spearman C. The abilities of man: Their nature and measurement. New York: Macmillan; 1927. [Google Scholar]
  54. Thorndike RL, Gallup GH. Verbal intelligence in the American adult. Journal of General Psychology. 1944;30:75–85. [Google Scholar]
  55. Tun PA, Wingfield A, Rosen MJ, Blanchard L. Response latencies for false memories: Gist-based processes in normal aging. Psychology and Aging. 1998;13:230–241. doi: 10.1037//0882-7974.13.2.230. [DOI] [PubMed] [Google Scholar]
  56. Verhaeghen P. Aging and vocabulary score: A meta-analysis. Psychology and Aging. 2003;18:332–339. doi: 10.1037/0882-7974.18.2.332. [DOI] [PubMed] [Google Scholar]
  57. Wechsler D. Wechsler Adult Intelligence Scale. 3. San Antonio: Harcourt Assessment; 1997. (WAIS-III) [Google Scholar]
  58. Woodcock RW. Woodcock Reading Mastery Test. Circle Pines, MN: American Guidance Service; 1987. [Google Scholar]
  59. Woodcock RW, Johnson MB. Woodcock-Johnson Psycho-educational Battery-Revised. Chicago, IL: Riverside; 1990. [Google Scholar]
  60. Wothke W. Longitudinal and multigroup modeling with missing data. In: Little TD, Schnabel KU, Baumert J, editors. Modeling longitudinal and multilevel data. Mahwah, NJ: Erlbaum; 2000. pp. 219–240. [Google Scholar]
  61. Zachary RA. Shipley Institute of Living Scale: Revised manual. Los Angeles: Western Psychological Services; 1986. [Google Scholar]

RESOURCES