Abstract
Human immunodeficiency virus (HIV) infection is prevalent among children and adolescents in Botswana, but standardized neurocognitive testing is limited. The Penn Computerized Neurocognitive Battery (PennCNB) attempts to streamline evaluation of neurocognitive functioning and has been culturally adapted for use among youth in this high-burden, low-resource setting. However, its reliability across measurements (i.e., test–retest reliability) is unknown. This study examined the test–retest reliability of the culturally adapted PennCNB in 65 school-age children (age 7–17) living with HIV in Botswana. Intraclass correlation coefficients (ICCs) for PennCNB summary scores (ICCs > 0.80) and domain scores (ICCs = 0.66–0.88) were higher than those for individual tests, which exhibited more variability (ICCs = 0.50–0.82), with the lowest reliability on memory tests. Practice effects were apparent on some measures, especially within memory and complex cognition domains. Taken together, the adapted PennCNB exhibited adequate test–retest reliability at the domain level but variable reliability for individual tests. Differences in reliability should be considered in implementation of these tests.
Keywords: Neuropsychological assessment, Reliability, Cognitive development, Youth, HIV, Africa
Introduction
The introduction of combined antiretroviral therapies (ART) for human immunodeficiency virus (HIV) has reduced the incidence and severity of HIV-related encephalopathy in children living with HIV (Strehlau, Kuhn, Abrams, & Coovadia, 2016). Nonetheless, neurodevelopmental delays and cognitive impairment are still common in this population (e.g., Smith et al., 2012). Unfortunately, in resource-limited settings, where HIV infection impacts millions of children, cognitive and neurodevelopmental disorders commonly go undetected because of a lack of appropriate assessment instruments and local expertise.
The evaluation of neurocognitive deficits secondary to HIV requires validated assessment of cognitive status, which is typically accomplished through specialized batteries of neuropsychological tests that objectively capture the type and severity of deficit experienced by an individual. However, commonly used neuropsychological tests have limited utility in resource-limited settings due to their high cost, need for specialized training, and time commitment required to administer, score, and interpret. In Botswana, a resource-limited setting with a high prevalence of HIV, comprehensive neurocognitive assessment is relatively inaccessible (Mbakile-Mahlanza, Manderson, & Ponsford, 2015). As an alternative, computer-based neurocognitive testing holds promise in addressing the limitations of traditional neuropsychological tests in resource-limited settings. Such measures offer increased ease of administration, automated scoring of response speed and accuracy, and efficient data generation and aggregation. Computerized tests also facilitate implementation in the increasing number of large-scale international studies.
As described previously (Scott et al., 2020), there are ongoing efforts to culturally adapt, translate, and validate the Penn Computerized Neurocognitive Battery (PennCNB) for use with children and adolescents in Botswana. The PennCNB (Gur et al., 2010) is a collection of “neurobehavioral probes” designed to target not only the theoretical psychological constructs (e.g., episodic memory) but also the associated neuroanatomical networks. The PennCNB is well tolerated by school-age children and has been applied in several pediatric populations, such as children with 22q11.2 deletion syndrome (Goldenberg et al., 2012) and a community-based sample of nearly 10,000 youth aged 8–21 (Calkins et al., 2015). Testing with the PennCNB is automated and can be done in approximately 60–90 min, making it a reasonable choice for use in many settings. Moreover, it has been translated and adapted for multiple languages and cultures, including several resource-limited settings (e.g., Ibrahim et al., 2015; Scott et al., 2021).
For a neurocognitive test to be useful in research and clinical practice, it needs to have strong psychometric properties, including reliability across measurements (i.e., test–retest reliability). A test should be able to reliably indicate how well a patient performed compared to others across multiple measurements. This is especially true if a measure will be given to track change in neurocognitive functioning, such as decline due to disease changes or improvements due to interventions. It is important to know whether observed change reflects chance or clinically significant change in performance, and measures with low reliability reduce one’s confidence in this judgment.
This study reports the test–retest reliability and estimated practice effects of the PennCNB in a sample of children and adolescents living with HIV in Botswana as part of a larger study examining the feasibility, reliability, and validity of the PennCNB in Botswana.
Methods
Participants
We evaluated the test–retest reliability of the PennCNB among school-age children (age 7–17 years) in care at the Baylor-Botswana Children’s Clinical Centre of Excellence (COE), a clinic specializing in the care and treatment of children living with HIV in Gaborone, Botswana. School-age children were selected for the parent study because some PennCNB subtests are not suitable for younger children. Eligible participants were required to be proficient in English or Setswana and not have severe physical impairments or developmental delays that would interfere with completion of assessments. Informed consent and assent were obtained prior to data collection. Demographic data and health information were reported by participants’ caregivers. All participants were compensated for participation (40 pula, equal to approximately 4 USD) based on recommendations from the local Institutional Review Board (IRB). All procedures were approved by IRBs at the Health Research and Development Committee within the Ministry of Health and Wellness of Botswana, the Botswana-Baylor COE, and the University of Pennsylvania.
Neurocognitive Assessment
For the parent study, 13 tests were selected from the core pediatric version of the PennCNB (Scott et al., 2020). The tests were selected for inclusion based on their: (1) ability to assess neurocognitive domains hypothesized to be impacted in pediatric and in utero HIV or ART exposure; (2) suitability for large-scale administration; (3) appropriateness for resource-limited settings; (4) suitability for translation and adaptation across cultures (e.g., low language demands, limited unfamiliar stimuli); and (5) demonstrated sensitivity to individual differences and mild impairments.
The development of the PennCNB for use in Botswana and protocol for the parent project have been previously described (Scott et al., 2020). Briefly, the PennCNB underwent a rigorous cultural and language adaptation process informed by state-of-the-art guidelines (Hambleton & Zenisky, 2010) that involved translating (English to Setswana) and back translating (Setswana to English) the tests and instructions, review of content by local experts for content and face validity, piloting of tests, discussions about challenging concepts, modifications to select the most linguistically and conceptually appropriate Setswana terminology, and harmonization with English versions. Table 1 lists tests that were administered and corresponding cognitive domains, and previous work provides detailed descriptions of each test (Gur et al., 2010; Scott et al., 2020). Prior work using the parent sample shows that these tests separate into four primary neurocognitive factors: sensorimotor/processing speed, episodic learning/memory, complex cognition, and executive functioning (Van Pelt et al., 2021).
Table 1.
Test–retest reliability for PennCNB tests by neurocognitive domain
| Cognitive domain (test name) | Time 1 | Time 2 | Cohen’s d | r | ICC (95% CI) |
|---|---|---|---|---|---|
| Summary scores | |||||
| PennCNB accuracy | –0.26 (1.04) | 0.06 (1.14) | 0.29 | 0.90 | 0.87 (0.66–0.94) |
| PennCNB speed | –0.34 (0.94) | 0.12 (0.99) | 0.48 | 0.89 | 0.80 (0.23–0.93) |
| PennCNB efficiency | –0.33 (0.98) | 0.09 (1.05) | 0.41 | 0.94 | 0.87 (0.23–0.96) |
| Executive functioning | –0.25 (0.91) | 0.03 (0.90) | 0.28 | 0.88 | 0.85 (0.67–0.92) |
| Penn Continuous Performance Test | –0.33 (1.03) | 0.03 (1.01) | 0.35 | 0.88 | 0.83 (0.55–0.92) |
| Penn Go/No-Go | –0.13 (1.03) | –0.04 (1.01) | 0.09 | 0.82 | 0.82 (0.73–0.89) |
| Penn Trailmaking Test, Part B | –0.21 (0.88) | 0.02 (1.23) | 0.22 | 0.79 | 0.68 (0.55–0.86) |
| Fractal N-Back Test | –0.23 (1.09) | 0.08 (0.90) | 0.31 | 0.67 | 0.64 (0.45–0.77) |
| Episodic learning/memory | –0.21 (0.98) | 0.38 (0.97) | 0.58 | 0.79 | 0.67 (0.15–0.86) |
| Penn Face Memory Test | –0.31 (1.04) | 0.28 (1.17) | 0.53 | 0.64 | 0.57 (0.26–0.75) |
| Visual Object Learning Test | –0.07 (0.86) | 0.37 (1.04) | 0.46 | 0.58 | 0.53 (0.26–0.71) |
| Digit Symbol Substitution Test Recall | –0.20 (0.93) | 0.13 (0.99) | 0.34 | 0.58 | 0.56 (0.35–0.71) |
| Complex cognition | –0.31 (1.06) | 0.33 (0.99) | 0.64 | 0.91 | 0.77 (0.03–0.93) |
| Penn Conditional Exclusion Test | –0.41 (0.88) | 0.27 (1.12) | 0.68 | 0.78 | 0.62 (0.08–0.83) |
| Penn Line Orientation Test | –0.22 (0.95) | 0.17 (0.99) | 0.40 | 0.84 | 0.78 (0.45–0.89) |
| Penn Matrix Reasoning Test | –0.20 (1.01) | 0.22 (1.01) | 0.42 | 0.67 | 0.63 (0.37–0.78) |
| Sensorimotor/processing speed | –0.30 (1.01) | 0.01 (0.95) | 0.31 | 0.92 | 0.88 (0.58–0.95) |
| Digit Symbol Substitution Test | –0.25 (0.95) | 0.20 (0.96) | 0.47 | 0.89 | 0.80 (0.26–0.92) |
| Penn Trailmaking Test, Part A | –0.15 (0.92) | –0.01 (1.16) | 0.13 | 0.81 | 0.76 (0.52–0.87) |
| Motor Praxis Test | –0.44 (1.02) | –0.07 (0.81) | 0.40 | 0.75 | 0.68 (0.43–0.82) |
| Finger Tapping Test | –0.03 (0.89) | –0.22 (1.20) | –0.18 | 0.75 | 0.72 (0.58–0.82) |
Note. Time 1 and Time 2 scores are Z scores derived from the overall sample from the larger study (adjusted for age). Domain scores are mean scores across all tests within that domain. ICC = intraclass correlation; PennCNB = Penn Computerized Neurocognitive Battery.
The PennCNB was administered on a laptop by trained research staff to all participants in a quiet, private space in the COE. Standard operating procedures for administration were followed, including reading PennCNB instructions aloud and providing practice PennCNB modules before each test to ensure comprehension. The tests were administered in a fixed order. The PennCNB was administered at baseline and repeated after 6–14 weeks.
Administrators documented factors that may have impacted validity (e.g., behavior, technical problems). Algorithmic validity rules specific to each test detect participant-related problems (e.g., extended periods of inattention, random or habitual responding), including impossible response times (e.g., <200 ms), excessive outliers, and unusual response patterns (e.g., choosing the same option several times in a row). See Scott et al. (2021) for greater detail on these algorithmic validity rules. Experienced CNB data validators supervised by neuropsychologists make ultimate decisions to designate a test performance as invalid by integrating algorithmic rules, assessor comments, and visual data inspection. Individual test data determined to be invalid were not included in these analyses (1.9% of test data).
Analyses
PennCNB tests measure both accuracy (correct or incorrect) and speed (mean response time) for each test. Efficiency scores are also calculated as standardized sums of z-standardized accuracy and speed (response time multiplied by –1 so that faster is represented by higher scores). Speed scores were included as an efficiency measure for four tests because they only yielded speed values (Motor Praxis Test and Finger Tapping Test) or had too few levels of accuracy data to justifiably be treated as continuous (Penn Trailmaking Test, Parts A & B). Because of strong and non-linear effects of age on neurocognitive functioning (e.g., Gur et al., 2012), all CNB scores were age-standardized by regressing out age, age-squared, and age-cubed and retaining the residuals for analysis. These residuals can be interpreted as the number of SDs above or below the expected score, given the examinee’s age.
As our primary measure of test–retest reliability, we calculated intraclass correlation coefficients (ICC; two-way agreement) with one-way random-effects models using R version 4.0.2, as they account for absolute standing in addition to relative standing (Aldridge, Dovey, & Wade, 2017). Pearson’s product-moment correlations (r) were also calculated as indicators of consistency, which can be interpreted as an “upper bound” of reliability that does not account for absolute standing. ICCs and r values were calculated for individual test efficiency scores, mean efficiency z scores from each neurocognitive domain (e.g., executive functioning), and the overall test battery mean (i.e., summary scores) for accuracy, speed, and efficiency. We also generated Bland–Altman plots (Bland & Altman, 1986) to examine whether reliability differed by level of performance. Practice effects were calculated with Cohen’s d for the difference between Time 1 and Time 2 measurements.
Results
Participant Characteristics
Out of 71 participants who received the re-test assessment, 65 were within the 6–14 week test–retest interval and were included. The median test–retest interval was 11 weeks (interquartile range = 9,13). Participants had a median age of 10 years (IQR = 8,12). There was a slight predominance of female participants (36/65 [55.4%]), and all participants identified as Black African.
Test–Retest Reliability
ICCs and Pearson’s r values are reported in Table 1 by cognitive domain. ICCs for summary PennCNB accuracy, speed, and efficiency scores (>0.80) and all domain scores (ICCs = 0.77–0.88) except learning/memory (ICC = 0.67) were within ranges traditionally considered adequate-to-high (Strauss, Sherman, & Spreen, 2006). Summary and domain scores had generally higher reliabilities than individual tests within each domain, as expected. Scatterplots showing baseline and repeat scores for the domain scores are presented in Figure 1. Bland-Altman plots showing the difference scores versus baseline for average PennCNB domain efficiency scores are shown in Supplementary Figure 1, indicating that the magnitude of test–retest difference did not systematically differ by PennCNB performance.
Fig. 1.
(A) Scatterplots of Time 1 and Time 2 scores for PennCNB domain efficiency scores (adjusted for age). (B) Bland-Altman plots showing the difference versus Time 1 for PennCNB domain efficiency scores, with higher differences indicating better scores at the repeat assessment.
ICCs for individual tests varied by domain, and r values were generally higher than ICCs. ICCs for sensorimotor/processing speed ranged from 0.80 (Digit Symbol Substitution Test) to 0.68 (Motor Praxis Test). Executive functioning tests had ICCs ranging from 0.83 (Penn Continuous Performance Test) to 0.64 (Fractal N-Back Test). Episodic Learning/Memory tests had ICCs ranging from 0.57 (Penn Face Memory Test) to 0.53 (Visual Object Learning Test). Complex cognition tests had ICCs from 0.78 (Penn Line Orientation Test) to 0.62 (Penn Conditional Exclusion Test).
As shown in Table 1, practice effects were generally small-to-medium in magnitude (d = –0.18 to 0.68), with the largest practice effects on the Penn Conditional Exclusion Test.
Discussion
We evaluated the test–retest reliability of 13 neurocognitive tests selected from the pediatric version of the PennCNB among school-age children with HIV in Botswana. These tests were selected for their utility in assessing executive functions, episodic memory, and sensorimotor/processing speed given that neurocognitive deficits are most apparent in these cognitive domains in children living with HIV (e.g., Smith et al., 2012). Our findings indicated levels of reliability that were adequate to high at the summary and domain level (except learning/memory) and ranged from marginal to high across the individual PennCNB tests. Small-to-medium practice effects were apparent on several tests.
Literature reveals considerable variability in qualitative interpretation of ICCs when used to determine the reliability of neuropsychological instruments. Some investigators recommend ICC values of 0.7 for minimal acceptability (Cicchetti, 2001), while others have proposed lower cut-offs of 0.4–0.6 (Weintraub et al., 2014). Despite differences in approach, there is some consensus that a reliability estimate greater than 0.7 indicates acceptable test–retest reliability for research use (Strauss et al., 2006). Lowest test–retest reliability was found among PennCNB tests targeting episodic learning/memory, with the Visual Object Learning Test exhibiting the lowest reliability. Consistent with this finding, other investigators have reported poorer test–retest reliabilities in memory measures relative to other cognitive tests (e.g., Calamia, Markon, & Tranel, 2013), including a study using the PennCNB in Turkish adults (Izgi et al., 2021), potentially because of restrictions in test score ranges (Strauss et al., 2006). Lower reliability within this domain may also be due to using visual instead of verbal stimuli, which often have higher reliability (e.g., Dikmen, Heaton, Grant, & Temkin, 1999). Lower reliability estimates of memory measures may also be explained by the variable nature of this cognitive ability (Alioto et al., 2017), as it is unlikely that clinically meaningful memory changes would have occurred during our test–retest interval. It is also possible that cultural factors contribute to reliability of the Face Memory Test in this study, such as our sample being less familiar with diverse races (Pinkham et al., 2008). However, it should be noted that this adaptation has a greater proportion of face stimuli from Black individuals than U.S./Western versions of the test.
Of note, there were practice effects of a small-to-medium magnitude for several tests within this battery, especially in memory and complex cognition, similar to prior PennCNB research in healthy adults (Lee et al., 2020). Effects of practice can be significant when participants remember stimuli or apply test-taking strategies acquired on an earlier visit (Alioto et al., 2017). Practice effects can also interact with age-related increases in cognitive abilities (e.g., Swagerman et al., 2016) and result in lower reliability. However, there may be important cultural factors to consider in interpreting these practice effects. For example, if participants are less familiar with the testing context, baseline test results could partially reflect an adaptation to the testing context, potentially diluting measurement of the underlying construct. Thus, more stable and valid measurements of these constructs could occur at repeat testing. Though we do not have data on familiarity with testing contexts in this sample, such issues should be investigated in future research.
The modest test–retest reliability for some individual PennCNB tests demonstrated in our study may also be explained by other methodological factors. It has been shown that clinical groups, such as those with HIV, show lower reliability estimates compared to non-clinical, healthy samples (Calamia et al., 2013). For example, a sample of children and young adults with 22q11.2 deletion syndrome showed an ICC of 0.72 for overall PennCNB accuracy over a mean interval of 1.6 years (Gur et al., 2021). Future work should examine whether healthy children in Botswana show higher test-retest reliability than the current study. Given rapid neurodevelopmental changes in children and adolescents that could result in true cognitive change and confound longer measurements, we focused our test–retest interval to a time interval approximating the timeframe for follow-up in HIV clinics in Botswana. However, many test–retest reliability studies use shorter timeframes, which could account for some lower correlations in our sample. Finally, results may be due to statistical regression to the mean.
Our study has potential limitations to note. The sample size was relatively modest, although it was similar to the mean sample size of studies of neuropsychological test–retest reliability (Calamia et al., 2013). In addition, power analyses indicated adequate power (>80%) to detect ICCs corresponding to effect sizes ranging from poor to excellent. The external validity of our study may be influenced by demographic characteristics, including the age distribution of our sample and recruitment from an urban HIV clinic in Botswana. Thus, reliability derived from the PennCNB using a clinical pediatric sample may not generalize to healthy, non-clinical populations in Botswana.
These limitations notwithstanding, our results present an important extension of prior data regarding psychometric characteristics of the PennCNB in international settings (Ibrahim et al., 2015; Swagerman et al., 2016; Van Pelt et al., 2021). Overall, the PennCNB exhibited higher test–retest reliability (within ranges generally considered adequate) at the summary and domain level compared to individual tests, which exhibited more variable reliability. Thus, decisions based on individual PennCNB tests (as opposed to results from the whole battery or domain-level summary scores) should be conducted with caution. Together with prior research, results suggest that the PennCNB may show promise as a suite of neurocognitive tests to address the unique neurocognitive assessment challenges prevailing in resource-limited settings such as Botswana.
Funding
This work was supported by Eunice Kennedy Shriver National Institute of Child Health and Human Development grant R01 HD095278 and F31 HD101346. This publication was made possible through core services and support from the Penn Center for AIDS Research (CFAR), an National Institute of Health-funded program (P30 AI 045008), and The Penn Mental Health AIDS Research Center (P30MH 097488).
Conflict of Interest
The authors have no conflicts of interest to disclose.
Supplementary Material
Contributor Information
Billy M Tsima, Department of Family Medicine and Public Health, University of Botswana, Gaborone, Botswana.
Elizabeth D Lowenthal, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Children’s Hospital of Philadelphia, Global Health Center, Philadelphia, PA, USA.
Amelia E Van Pelt, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Children’s Hospital of Philadelphia, Global Health Center, Philadelphia, PA, USA; Leonard Davis Institute of Health Economics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA.
Tyler M Moore, Brain Behavior Laboratory, Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Mogomotsi Matshaba, Botswana-Baylor Children’s Clinical Centre of Excellence, Gaborone, Botswana.
Ruben C Gur, Brain Behavior Laboratory, Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Ontibile Tshume, Botswana-Baylor Children’s Clinical Centre of Excellence, Gaborone, Botswana.
Boitumelo Thuto, Botswana-Baylor Children’s Clinical Centre of Excellence, Gaborone, Botswana.
J Cobb Scott, Brain Behavior Laboratory, Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; VISN4 Mental Illness Research, Education, and Clinical Center at the Philadelphia VA Medical Center, Philadelphia, PA, USA .
References
- Aldridge, V. K., Dovey, T. M., & Wade, A. (2017). Assessing test-retest reliability of psychological measures: Persistent methodological problems. European Psychologist, 22 (4), 207–218. 10.1027/1016-9040/a000298. [DOI] [Google Scholar]
- Alioto, A. G., Kramer, J. H., Borish, S., Neuhaus, J., Saloner, R., Wynn, M., et al. (2017). Long-term test-retest reliability of the California Verbal Learning Test – second edition. The Clinical Neuropsychologist, 31 (8), 1449–1458. 10.1080/13854046.2017.1310300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bland, J. M., & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet, 327 (8476), 307–310. 10.1016/S0140-6736(86)90837-8. [DOI] [PubMed] [Google Scholar]
- Calamia, M., Markon, K., & Tranel, D. (2013). The robust reliability of neuropsychological measures: Meta-analyses of test-retest correlations. The Clinical Neuropsychologist, 27 (7), 1077–1105. 10.1080/13854046.2013.809795. [DOI] [PubMed] [Google Scholar]
- Calkins, M. E., Merikangas, K. R., Moore, T. M., Burstein, M., Behr, M. A., Satterthwaite, T. D., et al. (2015). The Philadelphia Neurodevelopmental Cohort: Constructing a deep phenotyping collaborative. Journal of Child Psychology and Psychiatry, 56 (12), 1356–1369. 10.1111/jcpp.12416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cicchetti, D. V. (2001). The precision of reliability and validity estimates re-visited: Distinguishing between clinical and statistical significance of sample size requirements. Journal of Clinical and Experimental Neuropsychology, 23 (5), 695–700. 10.1076/jcen.23.5.695.1249. [DOI] [PubMed] [Google Scholar]
- Dikmen, S. S., Heaton, R. K., Grant, I., & Temkin, N. R. (1999). Test-retest reliability and practice effects of expanded Halstead-Reitan Neuropsychological Test Battery. Journal of the International Neuropsychological Society, 5 (4), 346–356. 10.1017/S1355617799544056. [DOI] [PubMed] [Google Scholar]
- Goldenberg, P. C., Calkins, M. E., Richard, J., McDonald-McGinn, D., Zackai, E., Mitra, N., et al. (2012). Computerized neurocognitive profile in young people with 22q11.2 deletion syndrome compared to youths with schizophrenia and At-Risk for psychosis. American Journal of Medical Genetics Part B: Neuropsychiatric: Genetics, 159B (1), 87–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gur, R. C., Moore, T. M., Weinberger, R., Mekori-Domachevsky, E., Gross, R., Emanuel, B. S., et al. (2021). Relationship between intelligence quotient measures and computerized neurocognitive performance in 22q11.2 deletion syndrome. Brain and Behavior, 11, e2221. 10.1002/brb3.2221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gur, R. C., Richard, J., Calkins, M. E., Chiavacci, R., Hansen, J. A., Bilker, W. B., et al. (2012). Age group and sex differences in performance on a computerized neurocognitive battery in children age 8-21. Neuropsychology, 26 (2), 251–265. 10.1037/a0026712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gur, R. C., Richard, J., Hughett, P., Calkins, M. E., Macy, L., Bilker, W. B., et al. (2010). A cognitive neuroscience-based computerized battery for efficient measurement of individual differences: Standardization and initial construct validation. Journal of Neuroscience Methods, 187 (2), 254–262. 10.1016/j.jneumeth.2009.11.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hambleton, R. K., & Zenisky, A. L. (2010). Translating and adapting tests for cross-cultural assessments. In Matsumoto D. & Vijver F. J. R. (Eds.), Cross-cultural research methods in psychology (pp. 46–70). Cambridge University Press, 10.1017/CBO9780511779381.004. [DOI] [Google Scholar]
- Ibrahim, I., Tobar, S., Elassy, M., Mansour, H., Chen, K., Wood, J., et al. (2015). Practice effects distort translational validity estimates for a neurocognitive battery. Journal of Clinical and Experimental Neuropsychology, 37 (5), 530–537. 10.1080/13803395.2015.1037253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Izgi, B., Moore, T. M., Yalcinay-Inan, M., Port, A. M., Kuscu, K., Gur, R. C., et al. (2021). Test–retest reliability of the Turkish translation of the Penn Computerized Neurocognitive Battery. Applied Neuropsychology, Adult, 1–10. [DOI] [PubMed] [Google Scholar]
- Lee, G., Moore, T. M., Basner, M., Nasrini, J., Roalf, D. R., Ruparel, K., et al. (2020). Age, sex, and repeated measures effects on NASA’s “Cognition” Test Battery in STEM educated adults. Aerospace Medicine and Human Performance, 91 (1), 18–25. 10.3357/AMHP.5485.2020. [DOI] [PubMed] [Google Scholar]
- Mbakile-Mahlanza, L., Manderson, L., & Ponsford, J. (2015). The experience of traumatic brain injury in Botswana. Neuropsychological Rehabilitation, 25 (6), 936–958. 10.1080/09602011.2014.999000. [DOI] [PubMed] [Google Scholar]
- Pinkham, A. E., Sasson, N. J., Calkins, M. E., Richard, J., Hughett, P., Gur, R. E., et al. (2008). The other-race effect in face processing among African American and Caucasian individuals with schizophrenia. The American Journal of Psychiatry, 165 (5), 639–645. 10.1176/appi.ajp.2007.07101604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scott, J. C., Moore, T. M., Roalf, D. R., Satterthwaite, T. D., Wolf, D. H., Port, A. M., et al. (2021). Development and application of novel performance validity metrics for computerized neurocognitive batteries. PsyArXiv. 10.31234/osf.io/64ucw. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scott, J. C., Moore, T. M., Stein, D. J., Pretorius, A., Zingela, Z., Nagdee, M., et al. (2021). Adaptation and validation of a computerized neurocognitive battery in the Xhosa of South Africa. Neuropsychology, 35 (6), 581–594. 10.1037/neu0000742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scott, J. C., Van Pelt, A. E., Port, A. M., Njokweni, L., Gur, R. C., Moore, T. M., et al. (2020). Development of a computerised neurocognitive battery for children and adolescents with HIV in Botswana: Study design and protocol for the Ntemoga study. BMJ Open, 10 (8), e041099. 10.1136/bmjopen-2020-041099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith, R., Chernoff, M., Williams, P. L., Malee, K. M., Sirois, P. A., Kammerer, B., et al. (2012). Impact of HIV severity on cognitive and adaptive functioning during childhood and adolescence. The Pediatric Infectious Disease Journal, 31 (6), 592–598. 10.1097/INF.0b013e318253844b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strauss, E., Sherman, E. M. S., & Spreen, O. (2006). A compendium of neuropsychological tests: Administration, norms, and commentary, 3rd ed. (pp. xvii, 1216). Oxford University Press. [Google Scholar]
- Strehlau, R., Kuhn, L., Abrams, E. J., & Coovadia, A. (2016). HIV-associated neurodevelopmental delay: Prevalence, predictors and persistence in relation to antiretroviral therapy initiation and viral suppression. Child: Care, Health and Development, 42 (6), 881–889. 10.1111/cch.12399. [DOI] [PubMed] [Google Scholar]
- Swagerman, S. C., deGeus, E. J. C., Kan, K.-J., vanBergen, E., Nieuwboer, H. A., Koenis, M. M. G., et al. (2016). The computerized neurocognitive battery: Validation, aging effects, and heritability across cognitive domains. Neuropsychology, 30 (1), 53–64. 10.1037/neu0000248. [DOI] [PubMed] [Google Scholar]
- Van Pelt, A. E., Scott, J. C., Morales, K. H., Matshaba, M., Gur, R. C., Tshume, O., et al. (2021). Structural validity of a computerized neurocognitive battery for youth affected by human immunodeficiency virus in Botswana. Psychological Assessment. 34 (2),, 139–146. 10.1037/pas0001066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weintraub, S., Dikmen, S. S., Heaton, R. K., Tulsky, D. S., Zelazo, P. D., Slotkin, J., et al. (2014). The cognition battery of the NIH toolbox for assessment of neurological and behavioral function: Validation in an adult sample. Journal of the International Neuropsychological Society, 20 (6), 567–578. 10.1017/S1355617714000320. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


