Abstract
Background:
Previous studies have assessed the construct validity of individual subtests in the NIH Toolbox Cognition Battery (NIHTB-CB), though none have examined the construct validity of the cognitive domains. Importantly, the original NIHTB-CB validation studies were administered on a desktop computer, though the NIHTB-CB is now solely administered via an iPad. We examined the construct validity of each cognitive domain assessed in the NIHTB-CB, including a motor dexterity domain using the iPad application compared to a neuropsychological battery in a sample of healthy adults.
Methods:
Eighty-three adults ages 20–66 years (M = 44.35±13.41 years) completed the NIHTB-CB and a comprehensive neuropsychological assessment. Domain scores for each of six cognitive domains (attention and executive function, episodic memory, working memory, processing speed, language, and motor dexterity) and the fluid composite were computed for both batteries. We then assessed the construct validity using Pearson correlations and intraclass correlation coefficients (ICCs) for both demographically-corrected and uncorrected domains.
Results:
We found the attention and executive function, episodic memory, and processing speed domains had poor-to-adequate construct validity (ICCConsistency=−0.029 −0.517), the working memory and motor dexterity domains and the fluid composite had poor-to-good construct validity (ICCConsistency=0.215–0.801), and the language domain had adequate-to-good construct validity (ICCConsistency=0.408–0.829).
Conclusion:
The NIHTB-CB cognitive domains have poor-to-good construct validity, thus researchers should be aware that some tests representing cognitive constructs may not fully reflect the cognitive domain of interest. Future investigation of the construct validity and reliability of the NIHTB-CB administered using the iPad is recommended.
Keywords: NIH Toolbox Cognition Battery, Neuropsychological Assessment, Construct Validity, Cognitive Domains, iPad
One of the main ways to measure cognitive functioning is through a comprehensive neuropsychological assessment with a clinical neuropsychologist or other highly trained examiner. Notably, these tests are rigorously vetted to ensure they are reliable and valid, and that the tests and their respective norms are suitable for their target populations (Rabin et al., 2005). These tests are then used for evaluations that can better inform diagnoses and treatment plans (Braun et al., 2011). While neuropsychological testing yields rich information for clinical and research purposes, there are several challenges when using these measures. Such challenges include the need for oversight and training by a clinical neuropsychologist to administer and score the neuropsychological tests, limited access to expensive testing materials and norms, and the substantial amount of time required to select the appropriate measures to use, administer the assessments, and score and interpret the results (Nelson et al., 2015; White & Spooner, 2016). Overall, comprehensively assessing cognitive function via traditional methods may prove to be difficult for clinical researchers. Specifically, with the push for harmonization of neuropsychological data across a vast array of study samples and population-based neuroimaging studies, there is an urgent need for a more easily accessible neuropsychological assessment tool.
To address these issues, the National Institutes of Health (NIH) developed the Blueprint for Neuroscience Research to harmonize neuroscience research efforts to accelerate discoveries relating to cognitive and brain function across the lifespan in health and disease (Baughman et al., 2006). As part of this initiative, the NIH Toolbox Cognition Battery (NIHTB-CB) was developed to comprehensively and briefly assess neurological and behavioral function (Weintraub, Dikmen, et al., 2013). The NIHTB-CB was designed to assess cognition across the lifespan, uniquely encompassing tasks that can be used for those ages 3 to 85 (Weintraub, Dikmen, et al., 2013). This design allows for continuity in the scoring model, which eases explorations of cognitive development and aging in longitudinal studies (Dunn et al., 2015; Gershon et al., 2010; Weintraub et al., 2014). From its inception, the NIHTB-CB had six goals: design a concise cognitive battery that (1) lasts approximately 30 minutes; (2) can be used for individuals between the ages of 3 to 85; (3) has limited ceiling and floor effects to measure a full spectrum of cognitive functioning; (4) can comprehensively assess mental function across multiple subdomains of cognition; (5) integrates the newest technology for assessment; and (6) uses minimal, low cost, and easily accessible equipment (Weintraub et al., 2014). Assessments that met the goals of the NIH Blueprint’s mission as well as targeted major cognitive domains (i.e., attention and executive function, episodic memory, language, working memory, and processing speed) were created (Gershon et al., 2010; Weintraub, Dikmen, et al., 2013). Since its development, the NIHTB-CB has been widely used as a research test battery and has more recently been used to measure cognition in clinical populations (Baum et al., 2017; Carlozzi et al., 2017; Hackett et al., 2018; Tulsky et al., 2017; Tulsky & Heinemann, 2017). Overall, a key goal in developing the NIHTB-CB was to have all neurological research studies use the same cognitive outcome measures to help improve replicability and generalizability of research.
A number of published reports describe the psychometric properties of the NIHTB-CB among children (Akshoomoff et al., 2013; Bauer et al., 2013; Bauer & Zelazo, 2013; Carlozzi et al., 2013; Gershon et al., 2013; Mungas et al., 2013; D. S. Tulsky et al., 2013; Weintraub, Bauer, et al., 2013; Zelazo et al., 2013) and adults (Carlozzi et al., 2014; Dikmen et al., 2014; Gernsbacher & Kaschak, 2003; Heaton et al., 2014; Mungas et al., 2014; D. S. Tulsky et al., 2014; Weintraub et al., 2014; Zelazo et al., 2014) separately and together (Weintraub, Dikmen, et al., 2013). Studies of construct validity pitted each individual subtest of the NIHTB-CB against appropriate gold standard neuropsychological instruments. Importantly, these validation studies were performed early in the adoption of the NIHTB-CB when administration was exclusively done using a web-based instrument on a desktop computer rather than on the iPad app, which is now the sole method used to administer the battery (NIH, 2017). Overall, the initial series of validation studies mostly observed good short-term test-retest reliability and moderate-to-good construct validity for each of the subtests (Carlozzi et al., 2014; Dikmen et al., 2014; Gernsbacher & Kaschak, 2003; Heaton et al., 2014; Mungas et al., 2014; D. S. Tulsky et al., 2014; Weintraub et al., 2014; Zelazo et al., 2014). However, in addition to individual subtests, it is essential to consider the overarching cognitive domain each subtest represents. Ideally, a single subtest or combination of tests should present a similar outcome, as they both have the goal of accurately representing a cognitive construct. These domains of cognitive function are complex with many factors influencing their development and maintenance (Harvey, 2019). Additionally, it is challenging to categorize some individual tests to one cognitive domain due to overlapping components across domains (Harvey, 2019). When originally validating the NIHTB-CB, the gold standard comparison tests assessed multiple domains including language, working memory, executive functioning, and fluid reasoning (D. S. Tulsky et al., 2013, 2014). Due to this, it is essential to test the validity of not just the subtest, but the overarching domains’ construct validity.
In the present study, we examined how the five domains in the NIHTB-CB compared to a battery of traditional neuropsychological tests. In addition, for completeness, we also examined the motor dexterity domain, which is a component of the NIHTB Motor Battery. Juxtaposing the original validation studies, we assessed the construct validity of the cognitive domains tested using the NIHTB-CB administered via iPad compared to a comprehensive, traditional neuropsychological battery. We hypothesized the NIHTB-CB administered using the iPad application would demonstrate adequate-to-good domain-level construct validity determined by intraclass correlation coefficient (ICC) values (i.e., poor construct validity was defined as ICC 95% confidence intervals (CIs) ranging (Koo & Li, 2016) below 0.40, adequate as between 0.40 to 0.60, and good as above 0.60; Scott et al., 2019; Weintraub et al., 2014).
Methods
Participants
A total of 95 healthy adults between the ages of 20 to 66 were enrolled as demographically matched controls in a larger study (R01 MH118013) examining the Research Domain Criteria (RDoC) framework in people with and without HIV. The latter group were enrolled in numbers representative of the demographics of the HIV-infected population in the Omaha metropolitan area. Exclusion criteria included a history of neurological or psychiatric diagnoses, history of head trauma, current substance use disorder, and use of medications that may interfere with neural functioning. All demographic data were obtained via self-report from the participant during the intake process. The University of Nebraska Medical Center’s Institutional Review Board reviewed and approved this investigation. Each participant provided written informed consent following detailed description of the study.
NIH Toolbox Cognition Battery
Participants completed the NIHTB-CB and the Dexterity Test during their second visit with an examiner using the NIHTB application on a fifth generation 32 GB WiFi iPad (model A1822). All examiners were trained through the online test administration curriculum and in accordance with the manual. It took approximately 45 minutes to complete all NIHTB-CB tests, which covered the following cognitive domains (per NIHTB-CB framework): attention and executive function (Flanker Inhibitory Control and Attention Test, Dimensional Change Card Sort Test); episodic memory (Picture Sequence Memory Test); working memory (List Sorting Working Memory Test); processing speed (Pattern Comparison Test), language (Oral Reading Recognition Test, Picture Vocabulary Test); and motor dexterity (9-hole Pegboard Dexterity Test, Dominant and Non-Dominant Hands). Each test has been described in detail in prior work (Heaton et al., 2014; Mungas et al., 2013, 2014; Weintraub, Bauer, et al., 2013; Weintraub, Dikmen, et al., 2013). The age-corrected and uncorrected standard scores were calculated in the NIH Toolbox software and the resultant standardized scores were converted to z-scores. The z-scores were then averaged across the NIHTB-CB subtests comprising each respective cognitive domain to create the domain scores of interest in this investigation.
Neuropsychological Battery
Participants also completed a comprehensive neuropsychological battery assessing the same cognitive domains as the NIHTB-CB during their first study visit. The examiners were trained by a board-certified clinical neuropsychologist who oversaw all neuropsychological test administration and scoring procedures. Additionally, each assessment was scored twice (i.e., first by the original examiner, then by another examiner), and the neuropsychologist completed random quality control checks to ensure test administration and scoring remained consistent across all examiners for the duration of the study. The neuropsychological test battery assessed the following cognitive domains: attention and executive function (Comalli Stroop Interference Test [Comalli et al., 1962], Trail Making Test Part B [Heaton, 2004]), episodic memory (Wechsler Memory Scale - Third Edition [WMS-III] Logical Memory I and II, immediate and delayed recall [Wechsler, 1997]), working memory (Wechsler Adult Intelligence Scale - Third Edition [WAIS-III] Letter-Number Sequencing [Wechsler, 1997], WAIS-III Digit Span, Forwards and Backwards [Wechsler, 1997]), processing speed (WAIS-III Digit Symbol Coding [Wechsler, 1997], Trail Making Test Part A [Heaton, 2004]), language (Wide Range Achievement Test - Fourth Edition [WRAT-4] Word Reading [Wilkinson & Robertson, 2006]), and motor dexterity (Grooved Pegboard, Dominant and Non-Dominant Hands [Heaton, 2004; Kløve, 1963]). We did a secondary analysis of the episodic memory domain using only the immediate recall portion of the WMS-III Logical Memory I test, as it is more comparable to the NIHTB-CB Picture Sequence Memory Test but does not comprehensively encompass the episodic memory domain. Finally, we conducted analyses to assess the construct validity of the NIHTB-CB fluid composite and of individual tests within the processing speed, attention and executive function, and working memory domains to broaden the applicability of our results. See Table 1 for a comparison of the tests included in each cognitive domain for the NIH Toolbox and neuropsychological battery.
Table 1.
Tests Included in Each Cognitive Domain.
Cognitive Domain | NIH Toolbox Tests | Neuropsychological Tests |
---|---|---|
| ||
Attention & Executive Function | Flanker Inhibitory Control and Attention | Comalli Stroop Interference |
Dimensional Change Card Sort | Trail Making Test Part B | |
Episodic Memory | Picture Sequence Memory | WMS-IIIa Logical Memory I WMS-IIIa Logical Memory II, Delayed Recall |
Working Memory | List Sorting Working Memory | WAIS-IIIb Letter-Number Sequencing WAIS-IIIb Digit Span, Forwards and Backwards |
Processing Speed | Pattern Comparison | WAIS-IIIb Digit Symbol Coding Trail Making Test Part A |
Language | Oral Reading Recognition Picture Vocabulary |
WRAT-4c Word Reading |
Motor Dexterity | 9-Hole Pegboard Dexterity | Grooved Pegboard, Dominant and Non-Dominant Hands |
Fluid Composite | Flanker Inhibitory Control and Attention Dimensional Change Card Sort Picture Sequence Memory List Sorting Working Memory Pattern Comparison |
Comalli Stroop Interference Trail Making Test Part A Trail Making Test Part B WMS-III Logical Memory I and II WAIS-III Letter-Number Sequencing WAIS-III Digit Span, Forwards and Backwards WAIS-III Digit Symbol Coding |
Wechsler Memory Scale, Third Edition.
Wechsler Adult Intelligence Scale, Third Edition.
Wide-Range Achievement Test, Fourth Edition.
Performance validity was assessed with embedded measures: Reliable Digit Span (RDS) from WAIS-III Digit Span and time to complete Trail Making Test Part B. Participants were excluded from analyses if they failed both performance validity measures (i.e., scored < 8 on RDS [Jasinski et al., 2011] and completed Trail Making Test Part B in 120 or greater seconds [Busse & Whiteside, 2012]).
Demographically corrected scores for the neuropsychological tests were obtained using published normative data (Comalli et al., 1962; Heaton, 2004; Kløve, 1963; Wechsler, 1997; Wilkinson & Robertson, 2006). To get demographically normed scores for each test in the neuropsychological battery on the same metric (e.g., scaled, T, standard, etc. to z-scores), scores for each test were transformed to z-scores, which were then averaged across tests within each respective cognitive domain. Uncorrected z-scores were obtained by calculating the z-scores across the sample distribution of the raw scores for each test, and then the z-scores of tests within each cognitive domain were averaged together. For tests in which a higher raw score indicated worse performance (e.g., Trail Making Test Parts A and B, Grooved Pegboard, Dominant and Non-Dominant Hands, Comalli Stroop Interference Test), the raw scores were multiplied by −1 before z-score transformation. Both demographically corrected and uncorrected domain z-scores were examined to assess whether demographic correction impacted statistical relationships.
Fluid Composite Scores
Fluid composite scores from the neuropsychological assessment were calculated by averaging the following comparable tests to those included in the NIH Toolbox Cognitive Battery’s (NIHTB-CB) fluid composite score: Stroop Interference, Trail Making Test Part A, Trail Making Test Part B, Wechsler Memory Scale, Third Edition (WMS-III) Logical Memory I and II, Wechsler Adult Intelligence Scale, Third Edition (WAIS-III) Letter-Number Sequencing, WAIS-III Digit Span (Forwards and Backwards), and WAIS-III Digit Symbol Coding. Generating demographically corrected and uncorrected scores for the neuropsychological tests followed the same methods as described for the other cognitive domains. Uncorrected, age corrected, and fully corrected scores were used to assess the construct validity of the NIHTB-CB’s fluid composite.
Statistical Analyses
To assess the construct validity of the NIHTB-CB cognitive domains and fluid composite compared to the cognitive domains and fluid composite tested in the neuropsychological battery, we used Pearson correlation coefficients and intraclass correlation coefficients (ICC(3,1)). For ICCs, we calculated two-way mixed effects single measures of consistency and absolute agreement for both the demographically corrected domain z-scores and the uncorrected domain z-scores (Streiner et al., 2015). Quality of construct validity was examined by the 95% confidence interval (CI) for both ICCs and using the Pearson correlation coefficient. Based on previous literature, poor construct validity was indicated by an ICC 95% CI (Koo & Li, 2016) ranging below 0.40, adequate construct validity was between 0.40 to 0.60, and good construct validity was above 0.60 (Scott et al., 2019; Weintraub et al., 2014). Of note, our interpretations are based on the full range of CIs. Additionally, we assessed whether correlations among tests within each battery (i.e., the neuropsychological tests or the NIHTB-CB) were stronger than correlations between batteries. Statistical significance was determined using Fisher’s r-to-z transformations.
Transparency and Openness
The authors attest that we have accurately reported how we determined our sample size and data exclusions, and all measures used in the study, and we followed the Journal Article Reporting Standard (Kazak, 2018). Requests for data can be fulfilled via the corresponding author. The data used in this article will be made publicly available through the Collaborative Informatics and Neuroimaging Suite (COINS; https://coins.trendscenter.org/) framework upon completion of the full study. Data were analyzed using IBM SPSS, version 25. We did not pre-register this study’s design or analysis.
Results
Participant Demographics
Of the 95 participants enrolled, 5 participants were excluded for substance use and other exclusionary criteria, 4 participants withdrew from the study, and 3 did not complete the NIHTB-CB due to technical errors. The final sample included 83 participants between the ages of 20 to 66 years (M = 44.35, SD = 13.41) who successfully completed all assessments of interest. No participant in this sample failed both embedded performance validity measures. The mean level of education completed among participants was 16.36 years (SD = 2.09 years, Range = 12–20 years), which is equivalent to a bachelor’s degree. Of these participants, 77.1% were male. Furthermore, 94% of participants reported English as their first language, and the additional 6% of participants were bilingual and proficient in English. For a more detailed breakdown of demographic data for the final sample, refer to Table 2.
Table 2.
Participant Demographic Distributions for the Total Sample
Total Sample | |||
---|---|---|---|
n | % | ||
| |||
83 | |||
| |||
Age | 20 – 29 years | 17 | 20.5% |
30 – 39 years | 14 | 16.9% | |
40 – 49 years | 17 | 20.5% | |
50 – 59 years | 24 | 28.9% | |
60 – 69 years | 11 | 13.3% | |
Sex | Male | 64 | 77.1% |
Female | 19 | 22.9% | |
Handedness | Right | 77 | 92.8% |
Left | 6 | 7.2% | |
Race | White | 62 | 74.7% |
Black or African American | 8 | 9.6% | |
Asian | 8 | 9.6% | |
American Indian / Alaska Native | 0 | 0.0% | |
More Than One Race | 4 | 4.8% | |
Ethnicity | Hispanic or Latino | 5 | 6.0% |
Not Hispanic or Latino | 77 | 92.8% | |
First Language | English | 78 | 94.0% |
Spanish | 1 | 1.2% | |
Other | 4 | 4.8% | |
Second Language | English | 5 | 6.0% |
Spanish | 4 | 4.8% | |
Other | 9 | 10.8% | |
None | 65 | 78.3% | |
Education | Less Than High School | 0 | 0.0% |
High School Graduate | 4 | 4.8% | |
Partial College | 16 | 19.3% | |
College Graduate | 31 | 37.3% | |
Some Graduate | 2 | 2.4% | |
Master’s Degree | 21 | 25.3% | |
Doctorate Degree | 9 | 10.8% |
NIH Toolbox and Neuropsychological Battery
The mean corrected and uncorrected z-scores for all tests and cognitive domains across the NIHTB-CB and neuropsychological batteries were near or above 0 aside from the Flanker Inhibitory Control Test and the 9-hole Pegboard using the dominant hand (Table 3). Of note, the uncorrected z-scores for the NIHTB-CB tended to be numerically higher than the uncorrected neuropsychological battery z-scores but not statistically significant (Table 3), though this was because we normed these data based on the sample distribution rather than at the population level.
Table 3.
Means and Standard Deviations for the Corrected and Uncorrected Z-Scores for Each Test in the NIH Toolbox Cognition Battery and the Neuropsychological Battery
Corrected Scores | Uncorrected Scores | |||||||
---|---|---|---|---|---|---|---|---|
Cognitive Domains | NIH Toolbox | NP Battery | NIH Toolbox | NP Battery | ||||
Test | Mean (SD) | Test | Mean (SD) | Test | Mean (SD) | Test | Mean (SD) | |
Attention &
Executive Function |
0.024 (0.78) | 0.400 (1.03) | −0.029 (0.47) | 0.080 (0.85) | ||||
Flanker Inhibitory Control | −0.562 (0.67) | Stroop Interferencea | 0.158 (1.20) | Flanker Inhibitory Control | −0.007 (0.36) | Stroop Interferencea | 0.080 (0.99) | |
Card Sort | 0.611 (1.06) | Trail Making Part B | 0.654 (1.31) | Card Sort | 0.505 (0.43) | Trail Making Part B | 0.081 (0.95) | |
Episodic Memory | 0.508 (1.02) | 0.584 (0.88) | 0.505 (0.94) | 0.000 (0.97) | ||||
Picture Sequence Memory |
0.508 (1.02) | Logical Memory I | 0.518 (0.88) |
Picture Sequence Memory |
0.505 (0.94) | Logical Memory I | 0.000 (1.00) | |
Logical Memory II Delayed Recall | 0.651 (0.95) |
Logical Memory II Delayed Recall | 0.000 (1.00) |
|||||
Working Memory | 0.369 (0.93) | 0.608 (0.90) | 0.429 (0.70) | −0.022 (0.86) | ||||
List Sorting | 0.369 (0.93) | Letter-Number Sequencing | 0.735 (1.03) | List Sorting | 0.429 (0.70) | Letter-Number Sequencing | −0.003 (1.02) | |
Digit Span, Forwards & Backwards |
0.482 (0.94) | Digit Span, Forwards & Backwards |
−0.042 (0.87) | |||||
Processing Speed | 0.432 (1.18) | 0.573 (0.83) | 0.497 (1.04) | 0.050 (0.81) | ||||
Pattern Comparison | 0.432 (1.18) | Digit Symbol Coding | 0.679 (0.98) | Pattern Comparison | 0.497 (1.04) | Digit Symbol Coding | 0.094 (0.98) | |
Trail Making Part A | 0.469 (0.99) | Trail Making Part A | 0.006 (1.00) | |||||
Language | 0.602 (0.82) | 0.422 (0.83) | 0.631 (0.47) | 0.002 (1.04) | ||||
Oral Reading Recognition | 0.708 (0.92) | WRAT-4 Word Reading |
0.422 (0.83) | Oral Reading Recognition | 0.632 (0.43) | WRAT-4 Word Reading |
0.002 (1.04) | |
Picture Vocabulary | 0.495 (0.92) | Picture Vocabulary | 0.630 (0.64) | |||||
Motor Dexterity | −0.126 (0.72) | 0.081 (0.89) | 0.221 (0.34) | 0.000 (0.92) | ||||
9-Hole Pegboard, Dominant |
−0.162 (0.92) | Grooved Pegboard, Dominant | 0.165 (0.96) | 9-Hole Pegboard, Dominant |
0.188 (0.47) | Grooved Pegboard, Dominant | 0.000 (1.00) | |
9-Hole Pegboard, Non-Dominant |
−0.090 (0.74) | Grooved Pegboard, Non-Dominant | −0.004 (1.02) | 9-Hole Pegboard, Non-Dominant |
0.253 (0.32) | Grooved Pegboard, Non-Dominant | 0.000 (1.00) |
Note. The means and standard deviations for each domain z-score are displayed in bold. The means and standard deviations for individual tests for each domain are shown below each respective domain’s bolded z-score mean and standard deviation. SD – Standard deviation.
Three participants could not complete the Stoop Interference Test due to color blindness.
Construct Validity
The ICC agreement, ICC consistency, and Pearson correlation coefficients among the NIHTB-CB and the neuropsychological battery were widely distributed, spanning 0.187 to 0.801 (Tables 4 and 5). All domains evidenced poor-to-good construct validity based on the a priori defined criteria of 95% CI. There was not a consistent pattern of better agreement between the corrected or uncorrected domain scores. Overall, the attention and executive function, episodic memory, and processing speed domains evidenced poor-to-adequate construct validity, though the correlations and ICCs of the individual tests in the processing speed and attention and executive function domains were only marginally better (Tables 6 and 7). The working memory, language, and motor domains and the fluid composite (Tables 4 and 5, Figure 2) evidenced poor-to-good construct validity. Additionally, we assessed the episodic memory neuropsychological domain without the WMS-III Logical Memory II delayed recall subtest, and found poor-to-good construct validity for both the demographically corrected (ICCAbsolute = 0.315 [95% CI: 0.109, 0.495]; ICCConsistency = 0.317 [95% CI: 0.109, 0.497]; Pearson r = 0.328, p < 0.01), and uncorrected scores (ICCAbsolute = 0.398 [95% CI: 0.170, 0.578]; ICCConsistency = 0.447 [95% CI: 0.257, 0.604]; Pearson r = 0.448, p < 0.001). The results were similar when participants who reported English as a second language were removed from the analyses. See Figure 1 for scatterplots of the Pearson correlations of the NIHTB-CB domains and the neuropsychological battery domains using the corrected and uncorrected domain z-scores.
Table 4.
Intraclass Correlation Coefficients (ICC) and Pearson Correlation Coefficients for the NIH Toolbox Cognition Battery and the Neuropsychological Battery
Cognitive Domain | Corrected Scores | Uncorrected Scores | ||||
---|---|---|---|---|---|---|
| ||||||
ICC Absolute (95% CI) | ICC Consistency (95% CI) | Pearson r | ICC Absolute (95% CI) | ICC Consistency (95% CI) | Pearson r | |
| ||||||
Attention & Executive Function | 0.314 (0.108, 0.495) | 0.336 (0.127, 0.517) | 0.349** | 0.260 (0.045, 0.452) | 0.260 (0.044, 0.453) | 0.307* |
Episodic Memory | 0.189 (−0.028, 0.388) | 0.187 (−0.028, 0.386) | 0.190 | 0.213 (0.013, 0.401) | 0.241 (0.028, 0.434) | 0.242 |
Working Memory | 0.515 (0.337, 0.658) | 0.530 (0.356, 0.669) | 0.530** | 0.520 (0.198, 0.712) | 0.608 (0.453, 0.728) | 0.618** |
Processing Speed | 0.187 (−0.028, 0.386) | 0.187 (−0.029, 0.386) | 0.199 | 0.298 (0.091, 0.480) | 0.329 (0.124, 0.508) | 0.339* |
Language | 0.785 (0.668, 0.861) | 0.801 (0.708, 0.829) | 0.801** | 0.441 (0.048, 0.677) | 0.573 (0.408, 0.701) | 0.758** |
Motor Dexterity | 0.483 (0.300, 0.631) | 0.496 (0.315, 0.642) | 0.507** | 0.397 (0.201, 0.563) | 0.412 (0.215, 0.576) | 0.486** |
Note. ICC – Intraclass correlation coefficient, CI – Confidence interval.
P < 0.01
P < 0.001. The results were similar when participants who reported English as a second language were removed from the analyses.
Table 5.
Intraclass Correlation Coefficients (ICC) and Pearson Correlation Coefficients for the NIH Toolbox Cognition Battery and the Neuropsychological Battery Fluid Composite Score.
Uncorrected | Age Corrected | Fully Corrected | |||||||
---|---|---|---|---|---|---|---|---|---|
|
|||||||||
ICC Absolute (95% CI) | ICC Consistency (95% CI) | Pearson r | ICC Absolute (95% CI) | ICC Consistency (95% CI) | Pearson r | ICC Absolute (95% CI) | ICC Consistency (95% CI) | Pearson r | |
| |||||||||
Fluid Composite | 0.555 (0.384, 0.690) | 0.556 (0.384, 0.691) | 0.559* | 0.615 (0.459, 0.735) | 0.621 (0.465, 0.739) | 0.673* | 0.557 (0.364, 0.699) | 0.587 (0.423, 0.714) | 0.652* |
Note. ICC – Intraclass correlation coefficient, CI – Confidence interval.
P < 0.001. The results were similar when participants who reported English as a second language were removed from the analyses.
Table 6.
Intraclass Correlation Coefficients (ICC) and Pearson Correlation Coefficients for the NIH Toolbox Cognition Battery and the Neuropsychological Battery Processing Speed, Attention and Executive Function, and Working Memory Tests
Corrected Scores | Uncorrected Scores | ||||||
---|---|---|---|---|---|---|---|
|
|||||||
Tests | ICC Absolute (95% CI) | ICC Consistency (95% CI) | Pearson r | ICC Absolute (95% CI) | ICC Consistency (95% CI) | Pearson r | |
| |||||||
Processing Speed | Pattem Comparison & Trails A | 0.081 (−0.138, 0.292) | 0.081 (−0.136, 0.290) | 0.082 | 0.139 (−0.057, 0.330) | 0.153 (−0.064, 0.356) | 0.153 |
Pattern Comparison & Digit Symbol Coding | 0.244 (0.036, 0.434) | 0.248 (0.036, 0.439) | 0.253* | 0.378 (0.175, 0.549) | 0.405 (0.208, 0.570) | 0.406** | |
Trails A and Digit Symbol Coding | 0.399 (0.205, 0.564) | 0.406 (0.209, 0.571) | 0.406** | 0.344 (0.140, 0.521) | 0.343 (0.139, 0.519) | 0.343** | |
Flanker & Trails B | 0.144 (−0.072, 0.348) | 0.086 (−0.066, 0.252) | 0.177 | 0.182 (−0.035, 0.382) | 0.181 (−0.035, 0.381) | 0.274* | |
Attention & Executive Function | Flanker & Stroop Interference | 0.152 (−0.039, 0.342) | 0.192 (−0.028, 0.394) | 0.224* | 0.181 (−0.040, 0.385) | 0.180 (−0.040, 0.383) | 0.285* |
DCCS & Trails B | 0.318 (0.110, 0.499) | 0.316 (0.109, 0.497) | 0.323** | 0.288 (0.132, 0.515) | 0.377 (0.132, 0.515) | 0.449** | |
DCCS & Stroop Interference | 0.196 (−0.009, 0.389) | 0.211 (−0.008, 0.411) | 0.212 | 0.253 (0.041, 0.444) | 0.294 (0.080, 0.481) | 0.405** | |
Trails B & Stroop Interference | 0.308 (0.324, 0.654) | 0.325 (0.114, 0.507) | 0.326** | 0.508 (0.324, 0.654) | 0.505 (0.321, 0.651) | 0.505** | |
Flanker & DCCS | 0.283 (−0.095, 0.583) | 0.529 (0.354, 0.668) | 0.584** | 0.328 (−0.099, 0.642) | 0.601 (0.444, 0.723) | 0.611** | |
Working Memory | List Sorting Working Memory & Letter-Number Sequencing | 0.407 (0.208, 0.573) | 0.433 (0.241, 0.592) | 0.435** | 0.406 (0.182, 0.583) | 0.452 (0.263, 0.608) | 0.484** |
List Sorting Working Memory & Digit Span | 0.529 (0.356, 0.668) | 0.530 (0.356, 0.669) | 0.530** | 0.513 (0.261, 0.684) | 0.574 (0.409, 0.702) | 0.610** | |
Letter-Number Sequencing & Digit Span | 0.622 (0.462, 0.741) | 0.640 (0.493, 0.751) | 0.642** | 0.682 (0.547, 0.782) | 0.679 (0.544, 0.780) | 0.679** |
Note. ICC – Intraclass correlation coefficient, CI – Confidence interval. Flanker – Flanker Inhibitory Control and Attention, DCCS – Dimensional Change Card Sort.
P<0.05
P<0.01.
Table 7.
Fisher’s-r-to-z Transformations of Pearson Correlation Coefficients Within and Between Individual Tests in the NIH Toolbox Cognition Battery and the Neuropsychological Battery
Corrected Scores | Uncorrected Scores | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|||||||||||||
Cognitive Domain | NIHTB-CB Test | NP Test l | NP Test 2 | Pearson r NP Test l & NP Test 2 | Pearson r NP Test l & NIHTB-CB | Fisher Z | Pearson r NP Test 2 & NIHTB-CB | Fisher Z | Pearson r NP Test l & NP Test 2 | Pearson r NP Test l & NIHTB-CB | Fisher Z | Pearson r NP Test 2 & NIHTB-CB | Fisher Z |
| |||||||||||||
Processing Speed | Pattern Comparison | Trails A | Digit Symbol Coding | 0.406** | 0.082 | 2.20* | 0.253* | 1.09 | 0.343** | 0.153 | 1.29■ | 0.406** | −0.46 |
Attention & Executive Function | DCCS | Trails B | Stroop Interference | 0.326** | 0.584** | −2.07* | 0.212 | 0.76 | 0.505** | 0.449** | 0.45 | 0.405** | 0.78 |
Flanker | Trails B | Stroop Interference | 0.326** | 0.177 | 1.00 | 0.224* | 0.69 | 0.505** | 0.274* | 1.72■ | 0.285* | 1.63 | |
Working Memory | List Sorting | Letter-Number Sequencing | Digit Span | 0.642** | 0.435** | 1.87■ | 0.530** | 1.08 | 0.679** | 0.484** | 1.89■ | 0.610** | 0.75 |
Note. NIHTB-CB – NIH Toolbox Cognition Battery, NP – conventional neuropsychological test.
P<0.05
P<0.01.
reflects P = 0.05–0.07.
Figure 2.
Scatterplots depicting the associations between the neuropsychological domain fluid composite T scores displayed on the x-axis and the NIH Toolbox Cognitive Battery fluid composite T scores displayed on the y-axis. **P < 0.001.
Figure 1.
Scatterplots depicting the associations between the neuropsychological domain z-scores displayed on the x-axis and the NIH Toolbox Cognitive Battery domain z-scores displayed on the y-axis. (A) Depicts the demographically corrected domain z-scores. (B) Depicts the uncorrected domain z-scores. *P < 0.01, **P < 0.001.
Processing Speed, Attention and Executive Function, and Working Memory Subanalyses
Individual test scores from the processing speed, attention and executive function, and working memory domains were further assessed to ascertain whether the poor-to-good construct validity of these domains was unique to the NIHTB-CB, or if this is due to a more general problem in tests assessing higher-order cognitive functioning, as these tests tend to capture various constructs of cognition rather than one single construct. To do this, we converted the corrected and uncorrected standard scores for each of the tests within the processing speed, attention and executive function, and working memory domains. The standard scores were converted to z scores, as described previously.
Construct Validity of Individual Processing Speed, Attention and Executive Function, and Working Memory Tests
The NIHTB-CB Pattern Comparison Test evidenced poor construct validity with the Trail Making Test Part A (Table 6), and the construct validity of the NIHTB-CB Pattern Comparison Test with the WAIS-III Digit Symbol Coding Test was poor-to-adequate. However, the construct validity among the Trail Making Test Part A and WAIS-III Digit Symbol Coding was better, and the correlations within the neuropsychological processing speed domain and between the Trail Making Test Part A and the NIHTB-CB Pattern Comparison Test were significantly different.
We found similar results in the attention and executive function domain. The Flanker Inhibitory Control and Attention Test had poor construct validity with the Trail Making Test Part B, the Stroop Interference Test, and the DCCS. The DCCS also evidenced poor-to-good construct validity with the Trail Making Test Part B, Stroop Interference Test, and the Flanker. The Trail Making Test Part B and Stroop Interference tests evidenced poor-to-good construct validity. The Flanker and DCCS yield more consistent results than the Trail Making Test Part B and Stroop Interference Test, but these differences are not statistically significant. Interestingly, the Trail Making Test Part B is more highly correlated with the NIHTB-CB DCCS Test than with the Stroop Interference Test.
In the working memory domain, the List Sorting Working Memory Test with the WAIS-III Letter-Number Sequencing Test and the List Sorting Working Memory Test with the WAIS-III Digit Span Test evidenced poor-to-good construct validity. However, the construct validity between the WAIS-III Letter-Number Sequencing and the WAIS-III Digit Span was adequate-to-good. The construct validity of the working memory tests within and between the neuropsychological battery and the NIHTB-CB were more similar.
Discussion
In the current study, we examined the construct validity of the five domains in the NIHTB-CB along with a motor dexterity domain and the fluid composite compared to a comprehensive neuropsychological test battery. ICC absolute agreement, ICC consistency, and Pearson correlation coefficients all suggested relatively poor-to-good concordance between the NIHTB-CB and the neuropsychological battery. According to the ICC 95% CI standards defined a priori, the attention and executive function, episodic memory, and processing speed domains had poor-to-adequate construct validity, and the working memory, language, and motor dexterity domains and the fluid composite had poor-to-good construct validity.
There are many elements that could have influenced the construct validity of each of the domains. Most notably, the episodic memory domain in the NIHTB-CB is assessed solely through an immediate recall task (Bauer et al., 2013; Dikmen et al., 2014) and neglects the assessment of delayed recall, thus warranting further investigation whether episodic memory is accurately being assessed. The traditional neuropsychological test (WMS-III Logical Memory I) assesses immediate recall and has a second component (WMS-III Logical Memory II) using a 20–30-minute delayed recall trial and a recognition trial (Wechsler, 1997). We assessed the construct validity of the episodic memory with and without the delayed recall trial. The construct validity improved when comparing the Picture Sequence Memory from the NIHTB-CB to WMS-III Logical Memory I immediate recall versus when comparing it to the more holistic episodic memory domain encompassing both the WMS-III Logical Memory I and II immediate recall and delayed recall.
As alluded to previously, conducting neuropsychological assessments on a computerized device brings into question the temporal resolution of the device being used. Most of the NIHTB-CB was validated on a desktop computer and not the iPad interface that is used today (Carlozzi et al., 2014; Dikmen et al., 2014; Gernsbacher & Kaschak, 2003; Heaton et al., 2014; Mungas et al., 2014; D. S. Tulsky et al., 2014; Weintraub et al., 2014; Zelazo et al., 2014). The switch from the web-based NIHTB-CB to the iPad design not only altered the validation of the tests, but also the way they were scored and normed. In late 2016, after the switch to the iPad design, an email was sent to NIHTB-CB users to inform them about inconsistencies between normed scores produced by the iPad and web-based design (Brearly et al., 2019; Gershon & Diaz, personal communication, October 7, 2016). In response, the NIHTB team developed a new scoring system and updated earlier assessments to reflect the new method of scoring (Brearly et al., 2019; Casaletto et al., 2015; National Institutes of Health and Northwestern University, 2017).
While the NIHTB-CB norms have been automatically updated to reflect this change to the iPad administration platform (Casaletto et al., 2015), no studies to our knowledge have investigated the construct validity of the individual tests and cognitive domains through the iPad administration and with the updated norms. First, the iPad version introduced new instructions for the Flanker Inhibitory Control and Attention, Pattern Comparison, and Dimensional Change Card Sort tests (Brearly et al., 2019). In the desktop administration design, the test administrator provided the instructions, while the iPad design has standardized audio and text instructions (Brearly et al., 2019). There are pros and cons to both designs: the iPad provides an equally standardized instructional period for each participant, improving administrator bias such as tone and prosody, among many other factors. However, it loses the adaption component that a test administrator could regulate (e.g., knowing if a participant was distracted when the instructions were occurring, being able to provide more practice or clarity to a person who needs help understanding, etc.). For the List Sorting Working Memory test, the food and animal items are listed by the audio from the iPad, and then a reminder is given to sort the items from smallest to largest starting with food and then continuing with animals (NIH Toolbox Administrator’s Manual and eLearning Course, 2020; D. S. Tulsky et al., 2013, 2014), thus interfering with the maintenance and recall components of the test. Though this test is very similar to the WAIS-III Letter-Number Sequencing and Digit Span tests, the working memory domain had lower construct validity than we originally expected. Additionally, the original desktop design of the NIHTB-CB required participants to use directional keys on the keyboard, while the new iPad design of the NIHTB-CB introduces ‘home-base’. ‘Home-base’ is a pre-measured marker in front of the iPad in which the participant is instructed to place their finger in between trials (Brearly et al., 2019; National Institutes of Health and Northwestern University, 2017), and it is used during the Dimensional Change Card Sort and Flanker Inhibitory Control and Attention Tests (Akshoomoff et al., 2014; National Institutes of Health and Northwestern University, 2017). By design, ‘home-base’ should standardize the amount of movement each participant executes for each trial, accounting for variability that could influence reaction time (Akshoomoff et al., 2014; Foy & Foy, 2020). However, this design is difficult to keep consistent throughout the task, though it may remove some important behavioral findings that are key to these tests, such as the flanker effect (the difference in reaction time between the incongruent and congruent trials; Eriksen, 1995).
Regarding the NIHTB-CB’s attention and executive function, processing speed, and motor dexterity domains, there is a strong reaction time component incorporated in the scoring of the tests comprising each of these domains (i.e., reaction time was incorporated when accuracy was greater than or equal to 80% for the first 20 of 25 trials for the Dimensional Change Card Sort and Flanker Inhibitory Control and Attention; the number of correct items completed in 90 seconds for the Pattern Comparison Processing Speed Test; and the time to complete the 9-hole Pegboard Dexterity Test based on the examiner using the timer within the NIH Toolbox app on the iPad; Carlozzi et al., 2014; Zelazo et al., 2013, 2014), which is limited by the temporal precision of the iPad. Brearly and colleagues (2019) produced the first study assessing the comparability of the iPad and desktop-based NIHTB-CB designs in a group of veterans. They found that List Sorting Working Memory, Flanker Inhibitory Control and Attention, and Dimensional Change Card Sort were moderately correlated between the iPad and desktop-based administrations, while Pattern Comparison was poorly correlated (Brearly et al., 2019). Additionally, they found Flanker scores to be significantly different across administration conditions, suggesting there are quite a few differences between the desktop-based and iPad design (Brearly et al., 2019), which could in part explain why the domains that exhibited the most changes when switching to an iPad design had poor-to-adequate validity scores in our sample.
One cognitive domain in the NIHTB-CB that does not have a timing component is the language domain, which includes an oral reading test that is similar to the WRAT-4 Word Reading test, along with a word comprehension test, for which we did not have a comparable test in our neuropsychological battery. The absence of a timing component may be why the construct validity was better in the language domain compared to the construct validity of the other cognitive domains in the NIHTB-CB (Gershon et al., 2014).
Before closing, it is important to note the limitations of the present study. First, our findings revealed lower construct validity for the attention and executive function, episodic memory, and processing speed NIHTB-CB domains relative to previous studies using the original desktop-based version of the NIHTB-CB and gold-standard neuropsychological tests (Heaton et al., 2014; Mungas et al., 2014). We cannot rule out that our finding of relatively low construct validity is secondary to using different neuropsychological tests. Additionally, there are more liberal methods for establishing construct validity (e.g., using point ICC estimates rather than lower bound 95% CIs; (Anokhin et al., 2022) along with examining associations with clinically relevant biomarkers and developmental and aging trajectories that may come to other conclusions (Dikmen et al., 2014; Snitz et al., 2020).
Further, the sample used in this study was predominantly white and male, and it is possible that our findings may not be generalizable to other populations. Along the same lines, our sample focused on healthy adults (20–66 years) and the NIHTB-CB was designed to be used across the lifespan (3–85 years). Thus, future research should investigate the construct validity of the iPad administration of the NIHTB-CB in younger and older age groups. In addition, the uncorrected scores from the NIHTB-CB were based off population data while the uncorrected scores for the traditional neuropsychological tests were based on the distribution of scores in our sample for this analysis. This could have biased the results due to the aforementioned minimal diversity in the local population, which does not align with the broader population-based demographics. Therefore, this could have biased the direct comparison and may be an additional contributor to the relatively low ICC values.
Additionally, though our methods for training NIHTB-CB examiners followed the recommendations outlined in NIH Toolbox App Administrator’s Manual, these methods were not as rigorous as our training for the examiners of the neuropsychological battery. However, the examiners for this study were cross-trained in both the administration of the NIHTB-CB and the neuropsychological battery. Because of this, the examiners were well-prepared to sufficiently address any potential issues that occurred during testing for both the NIHTB-CB and the neuropsychological battery. Along this line, we administered the two batteries in a fixed order on separate study visits rather than in a counterbalanced order due to constraints of the study. Thus, we cannot rule out the possibility of extraneous factors influencing test results. However, we think this is unlikely as there is little concern regarding practice effects between the tests in the two batteries.
In summary, our study found poor-to-good construct validity of the domains assessed by the NIHTB-CB and the fluid composite. Overall, the attention and executive function, episodic memory, and processing speed domains evidenced poor-to-adequate construct validity, and the working memory, language, and motor dexterity domains and the fluid composite evidenced poor-to-good construct validity. There should be agreement between the cognitive domains assessing the same cognitive constructs. However, most neuropsychological measures of executive function and processing speed tend to represent several aspects of higher-order cognition and thus do not correlate as strongly as other cognitive constructs do. These results suggest there is a more general problem of poor construct validity among processing speed and executive function tests rather than an isolated problem with the NIHTB-CB processing speed and attention and executive function domains. Overall, we suggest that the NIHTB-CB administered via iPad should undergo further validation, as done in the original validation studies conducted early on in the NIHTB-CB’s development. This will help establish confidence in the research community that the new administration methods and updates to the NIHTB-CB iPad version have been held to the same standards as the original desktop administration. We suggest researchers be aware that some tests representing cognitive constructs may not fully reflect the cognitive domain of interest, with particular care regarding the cognitive tests in which reaction time is a substantial component of the test’s score. With the growing popularity of the NIHTB-CB in research settings, we recommend further investigation of the construct validity and reliability of the NIHTB-CB administered using the iPad.
Key Points.
Question:
Do the National Institutes of Health Toolbox Cognition Battery (NIHTB-CB) cognitive domains administered via iPad have adequate construct validity when compared to a comprehensive neuropsychological assessment?
Findings:
The NIHTB-CB cognitive domains have poor-to-good construct validity when administered via iPad.
Importance:
The NIHTB-CB is widely utilized in research, but the construct validity of the battery has not been tested since the change to iPad-based administration.
Next Steps:
More rigorous testing of the validity and reliability of the individual tests and cognitive domains assessed in the NIHTB-CB using the iPad administration is recommended.
Acknowledgements.
The authors thank the participants for graciously volunteering their time to participate in this research study.
Funding Statement.
This work was supported by the National Institutes of Health [grant numbers R01-MH116782, R01-MH118013, and R01-DA047828 to Tony W. Wilson].
Footnotes
Conflicts of Interest. None.
Ethical Standards. The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.
References
- Akshoomoff N, Beaumont JL, Bauer PJ, Dikmen SS, Gershon RC, Mungas D, Slotkin J, Tulsky D, Weintraub S, Zelazo PD, & Heaton RK (2013). National Institutes of Health Toolbox Cognition Battery (NIH Toolbox CB): Validation for children between 3 and 15 years: VIII. NIH Toolbox Cognition Battery (CB): Composite scores of crystallized, fluid, and overall cognition. Monographs of the Society for Research in Child Development, 78(4), 119–132. 10.1111/mono.12038 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akshoomoff N, Newman E, Thompson WK, McCabe C, Bloss CS, Chang L, Amaral DG, Casey BJ, Ernst TM, Frazier JA, Gruen JR, Kaufmann WE, Kenet T, Kennedy DN, Libiger O, Mostofsky S, Murray SS, Sowell ER, Schork N, … Jernigan TL (2014). The NIH Toolbox Cognition Battery: Results from a Large Normative Developmental Sample (PING). Neuropsychology, 28(1), 1–10. 10.1037/neu0000001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anokhin AP, Luciana M, Banich M, Barch D, Bjork JM, Gonzalez MR, Gonzalez R, Haist F, Jacobus J, & Lisdahl K. (2022). Age-related Changes and Longitudinal Stability of Individual Differences in ABCD Neurocognition Measures. Developmental Cognitive Neuroscience, 101078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bauer PJ, Dikmen SS, Heaton RK, Mungas D, Slotkin J, & Beaumont JL (2013). III. NIH Toolbox Cognition Battery (CB): Measuring episodic memory. Monographs of the Society for Research in Child Development, 78(4), 34–48. 10.1111/mono.12033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bauer PJ, & Zelazo PD (2013). IX. NIH Toolbox Cognition Battery (CB): Summary, conclusions, and implications for cognitive development. Monographs of the Society for Research in Child Development, 78(4), 133–146. 10.1111/mono.12039 [DOI] [PubMed] [Google Scholar]
- Baughman RW, Farkas R, Guzman M, & Huerta MF (2006). The National Institutes of Health Blueprint for Neuroscience Research. The Journal of Neuroscience, 26(41), 10329. 10.1523/JNEUROSCI.3979-06.2006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baum CM, Wolf TJ, Wong AWK, Chen CH, Walker K, Young AC, Carlozzi NE, Tulsky DS, Heaton RK, & Heinemann AW (2017). Validation and clinical utility of the executive function performance test in persons with traumatic brain injury. Neuropsychological Rehabilitation, 27(5), 603–617. 10.1080/09602011.2016.1176934 [DOI] [PubMed] [Google Scholar]
- Braun M, Tupper D, Kaufmann P, McCrea M, Postal K, Westerveld M, Wills K, & Deer T. (2011). Neuropsychological Assessment: A Valuable Tool in the Diagnosis and Management of Neurological, Neurodevelopmental, Medical, and Psychiatric Disorders. Cognitive and Behavioral Neurology, 24(3), 107–114. 10.1097/WNN.0b013e3182351289 [DOI] [PubMed] [Google Scholar]
- Brearly TW, Rowland JA, Martindale SL, Shura RD, Curry D, & Taber KH (2019). Comparability of iPad and Web-Based NIH Toolbox Cognitive Battery Administration in Veterans. Archives of Clinical Neuropsychology, 34(4), 524–530. 10.1093/arclin/acy070 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Busse M, & Whiteside D. (2012). Detecting Suboptimal Cognitive Effort: Classification Accuracy of the Conner’s Continuous Performance Test-II, Brief Test of Attention, and Trail Making Test. The Clinical Neuropsychologist, 26(4), 675–687. 10.1080/13854046.2012.679623 [DOI] [PubMed] [Google Scholar]
- Carlozzi NE, Tulsky DS, Chiaravalloti ND, Beaumont JL, Weintraub S, Conway K, & Gershon RC (2014). NIH Toolbox Cognitive Battery (NIHTB-CB): The NIHTB Pattern Comparison Processing Speed Test. Journal of the International Neuropsychological Society: JINS, 20(6), 630–641. 10.1017/S1355617714000319 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carlozzi NE, Tulsky DS, Kail RV, & Beaumont JL (2013). VI. NIH Toolbox Cognition Battery (CB): Measuring processing speed. Monographs of the Society for Research in Child Development, 78(4), 88–102. 10.1111/mono.12036 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carlozzi NE, Tulsky DS, Wolf TJ, Goodnight S, Heaton RK, Casaletto KB, Wong AWK, Baum CM, Gershon RC, & Heinemann AW (2017). Construct validity of the NIH Toolbox Cognition Battery in individuals with stroke. Rehabilitation Psychology, 62(4), 443–454. 10.1037/rep0000195 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casaletto KB, Umlauf A, Beaumont J, Gershon R, Slotkin J, Akshoomoff N, & Heaton RK (2015). Demographically Corrected Normative Standards for the English Version of the NIH Toolbox Cognition Battery. Journal of the International Neuropsychological Society: JINS, 21(5), 378–391. 10.1017/S1355617715000351 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Comalli PE, Wapner S, & Werner H. (1962). Interference Effects of Stroop Color-Word Test in Childhood, Adulthood, and Aging. The Journal of Genetic Psychology, 100(1), 47–53. 10.1080/00221325.1962.10533572 [DOI] [PubMed] [Google Scholar]
- Dikmen SS, Bauer PJ, Weintraub S, Mungas D, Slotkin J, Beaumont JL, Gershon R, Temkin NR, & Heaton RK (2014). Measuring Episodic Memory Across the Lifespan: NIH Toolbox Picture Sequence Memory Test. Journal of the International Neuropsychological Society : JINS, 20(6), 611–619. 10.1017/S1355617714000460 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dunn W, Griffith JW, Sabata D, Morrison MT, MacDermid JC, Darragh A, Schaaf R, Dudgeon B, Connor LT, Carey L, & Tanquary J. (2015). Measuring Change in Somatosensation Across the Lifespan. The American Journal of Occupational Therapy, 69(3), 6903290020p1–6903290020p9. 10.5014/ajot.2015.014845 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eriksen CW (1995). The flankers task and response competition: A useful tool for investigating a variety of cognitive problems. [Google Scholar]
- Foy JG, & Foy MR (2020). Dynamic Changes in EEG Power Spectral Densities During NIH-Toolbox Flanker, Dimensional Change Card Sort Test and Episodic Memory Tests in Young Adults. Frontiers in Human Neuroscience, 14, 158. 10.3389/fnhum.2020.00158 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gernsbacher MA, & Kaschak MP (2003). Neuroimaging studies of language production and comprehension. Annual Review of Psychology, 54, 91–114. 10.1146/annurev.psych.54.101601.145128 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gershon RC, Cella D, Fox NA, Havlik RJ, Hendrie HC, & Wagster MV (2010). Assessment of neurological and behavioural function: The NIH Toolbox. The Lancet. Neurology, 9(2), 138–139. 10.1016/S1474-4422(09)70335-7 [DOI] [PubMed] [Google Scholar]
- Gershon RC, Cook KF, Mungas D, Manly JJ, Slotkin J, Beaumont JL, & Weintraub S. (2014). Language measures of the NIH Toolbox Cognition Battery. Journal of the International Neuropsychological Society: JINS, 20(6), 642–651. 10.1017/S1355617714000411 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gershon RC, Slotkin J, Manly JJ, Blitz DL, Beaumont JL, Schnipke D, Wallner-Allen K, Golinkoff RM, Gleason JB, Hirsh-Pasek K, Adams MJ, & Weintraub S. (2013). IV. NIH TOOLBOX COGNITION BATTERY (CB): MEASURING LANGUAGE (VOCABULARY COMPREHENSION AND READING DECODING): NIH TOOLBOX COGNITION BATTERY (CB). Monographs of the Society for Research in Child Development, 78(4), 49–69. 10.1111/mono.12034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hackett K, Krikorian R, Giovannetti T, Melendez-Cabrero J, Rahman A, Caesar EE, Chen JL, Hristov H, Seifan A, Mosconi L, & Isaacson RS (2018). Utility of the NIH Toolbox for assessment of prodromal Alzheimer’s disease and dementia. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring, 10, 764–772. 10.1016/j.dadm.2018.10.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harvey PD (2019). Domains of cognition and their assessment. Dialogues in Clinical Neuroscience, 21(3), 227–237. 10.31887/DCNS.2019.21.3/pharvey [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heaton RK (2004). Revised comprehensive norms for an expanded Halstead-Reitan Battery: Demographically adjusted neuropsychological norms for African American and Caucasian adults, professional manual. Psychological Assessment Resources. [Google Scholar]
- Heaton RK, Akshoomoff N, Tulsky D, Mungas D, Weintraub S, Dikmen S, Beaumont J, Casaletto KB, Conway K, Slotkin J, & Gershon R. (2014). Reliability and validity of composite scores from the NIH Toolbox Cognition Battery in adults. Journal of the International Neuropsychological Society: JINS, 20(6), 588–598. 10.1017/S1355617714000241 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jasinski LJ, Berry DTR, Shandera AL, & Clark JA (2011). Use of the Wechsler Adult Intelligence Scale Digit Span subtest for malingering detection: A meta-analytic review. Journal of Clinical and Experimental Neuropsychology, 33(3), 300–314. 10.1080/13803395.2010.516743 [DOI] [PubMed] [Google Scholar]
- Kazak AE (2018). Editorial: Journal article reporting standards. American Psychologist, 73(1), 1–2. 10.1037/amp0000263 [DOI] [PubMed] [Google Scholar]
- Kløve H. (1963). Grooved pegboard. Lafayette, IN: Lafayette Instruments. [Google Scholar]
- Koo TK, & Li MY (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mungas D, Heaton R, Tulsky D, Zelazo PD, Slotkin J, Blitz D, Lai J-S, & Gershon R. (2014). Factor structure, convergent validity, and discriminant validity of the NIH Toolbox Cognitive Health Battery (NIHTB-CHB) in adults. Journal of the International Neuropsychological Society: JINS, 20(6), 579–587. 10.1017/S1355617714000307 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mungas D, Widaman K, Zelazo PD, Tulsky D, Heaton RK, Slotkin J, Blitz DL, & Gershon RC (2013). VII. NIH TOOLBOX COGNITION BATTERY (CB): FACTOR STRUCTURE FOR 3 TO 15 YEAR OLDS. Monographs of the Society for Research in Child Development, 78(4), 103–118. 10.1111/mono.12037 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nelson AP, Roper BL, Slomine BS, Morrison C, Greher MR, Janusz J, Larson JC, Meadows M-E, Ready RE, Rivera Mindt M, Whiteside DM, Willment K, & Wodushek TR (2015). Official Position of the American Academy of Clinical Neuropsychology (AACN): Guidelines for Practicum Training in Clinical Neuropsychology. The Clinical Neuropsychologist, 29(7), 879–904. 10.1080/13854046.2015.1117658 [DOI] [PubMed] [Google Scholar]
- National Institutes of Health and Northwestern University (2017). NIH Toolbox administrator’s manual and eLearning course. Retrieved from https://nihtoolbox.force.com/s/article/nih-toolbox-administrators-manual-and-elearning-course [Google Scholar]
- Rabin LA, Barr WB, & Burton LA (2005). Assessment practices of clinical neuropsychologists in the United States and Canada: A survey of INS, NAN, and APA Division 40 members. Archives of Clinical Neuropsychology: The Official Journal of the National Academy of Neuropsychologists, 20(1), 33–65. 10.1016/j.acn.2004.02.005 [DOI] [PubMed] [Google Scholar]
- Scott EP, Sorrell A, & Benitez A. (2019). Psychometric properties of the NIH toolbox cognition battery in healthy older adults: Reliability, validity, and agreement with standard neuropsychological tests. Journal of the International Neuropsychological Society, 25(8), 857–867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Snitz BE, Tudorascu DL, Yu Z, Campbell E, Lopresti BJ, Laymon CM, Minhas DS, Nadkarni NK, Aizenstein HJ, & Klunk WE (2020). Associations between NIH Toolbox Cognition Battery and in vivo brain amyloid and tau pathology in non‐demented older adults. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring, 12(1), e12018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Streiner DL, Norman GR, & Cairney J. (2015). Health measurement scales: A practical guide to their development and use. Oxford University Press, USA. [Google Scholar]
- Tulsky D, Carlozzi N, Holdnack J, Heaton R, Wong A, Goldsmith A, & Heinemann A. (2017). Using the NIH Toolbox Cognition Battery (NIHTB-CB) in Individuals with Traumatic Brain Injury. Rehabilitation Psychology, 62(4), 413–424. 10.1037/rep0000174 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tulsky DS (20171221). The clinical utility and construct validity of the NIH Toolbox Cognition Battery (NIHTB-CB) in individuals with disabilities. Rehabilitation Psychology, 62(4), 409. 10.1037/rep0000201 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tulsky DS, Carlozzi N, Chiaravalloti ND, Beaumont JL, Kisala PA, Mungas D, Conway K, & Gershon R. (2014). NIH Toolbox Cognition Battery (NIHTB-CB): The List Sorting Test to Measure Working Memory. Journal of the International Neuropsychological Society : JINS, 20(6), 599–610. 10.1017/S135561771400040X [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tulsky DS, Carlozzi NE, Chevalier N, Espy KA, Beaumont JL, & Mungas D. (2013). V. NIH Toolbox Cognition Battery (CB): Measuring working memory. Monographs of the Society for Research in Child Development, 78(4), 70–87. 10.1111/mono.12035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wechsler D. (1997). Wechsler Memory Scale Third Edition edn. San Antonio: The Psychological Corporation. [Google Scholar]
- Weintraub S, Bauer PJ, Zelazo PD, Wallner-Allen K, Dikmen SS, Heaton RK, Tulsky DS, Slotkin J, Blitz DL, Carlozzi NE, Havlik RJ, Beaumont JL, Mungas D, Manly JJ, Borosh BG, Nowinski CJ, & Gershon RC (2013). I. NIH Toolbox Cognition Battery (CB): Introduction and pediatric data. Monographs of the Society for Research in Child Development, 78(4), 1–15. 10.1111/mono.12031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weintraub S, Dikmen SS, Heaton RK, Tulsky DS, Zelazo PD, Bauer PJ, Carlozzi NE, Slotkin J, Blitz D, Wallner-Allen K, Fox NA, Beaumont JL, Mungas D, Nowinski CJ, Richler J, Deocampo JA, Anderson JE, Manly JJ, Borosh B, … Gershon RC (2013). Cognition assessment using the NIH Toolbox. Neurology, 80(11 Suppl 3), S54–64. 10.1212/WNL.0b013e3182872ded [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weintraub S, Dikmen SS, Heaton RK, Tulsky DS, Zelazo PD, Slotkin J, Carlozzi NE, Bauer PJ, Wallner-Allen K, Fox N, Havlik R, Beaumont JL, Mungas D, Manly JJ, Moy C, Conway K, Edwards E, Nowinski CJ, & Gershon R. (2014). The cognition battery of the NIH toolbox for assessment of neurological and behavioral function: Validation in an adult sample. Journal of the International Neuropsychological Society: JINS, 20(6), 567–578. 10.1017/S1355617714000320 [DOI] [PMC free article] [PubMed] [Google Scholar]
- White MH, & Spooner DM (2016). Does Size Really Matter? Contributions to the Debate on Short Versus Long Neuropsychology Assessments. Australian Psychologist, 51(5), 352–359. 10.1111/ap.12206 [DOI] [Google Scholar]
- Wilkinson GS, & Robertson GJ (2006). WRAT 4: Wide range achievement test. Psychological Assessment Resources Lutz, FL. [Google Scholar]
- Zelazo PD, Anderson JE, Richler J, Wallner-Allen K, Beaumont JL, Conway KP, Gershon R, & Weintraub S. (2014). NIH Toolbox Cognition Battery (CB): Validation of executive function measures in adults. Journal of the International Neuropsychological Society: JINS, 20(6), 620–629. 10.1017/S1355617714000472 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zelazo PD, Anderson JE, Richler J, Wallner-Allen K, Beaumont JL, & Weintraub S. (2013). II. NIH Toolbox Cognition Battery (CB): Measuring executive function and attention. Monographs of the Society for Research in Child Development, 78(4), 16–33. 10.1111/mono.12032 [DOI] [PubMed] [Google Scholar]