Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Nov 1.
Published in final edited form as: Rehabil Psychol. 2022 Sep 1;67(4):587–596. doi: 10.1037/rep0000464

Usability and Validity of a VR Cognitive Assessment Tool for Pediatric TBI

Jiabin Shen 1, Christine Koterba 2,3, Julie Samora 2,3, Jeffery Leonard 2,3, Rui Li 4, Junxin Shi 5, Keith Owen Yeates 6, Henry Xiang 2,5, Gerry Taylor 2,4
PMCID: PMC10165731  NIHMSID: NIHMS1889179  PMID: 36048061

Abstract

Objective.

Deficits in executive functions are prevalent among children with traumatic brain injury (TBI). Assessing cognitive impairment is critical for evaluating and monitoring recovery. The present paper reports a pilot study to evaluate the preliminary usability and validity of a virtual reality cognitive assessment tool (VR-CAT) specifically designed for children with TBI.

Method.

A total of 54 children, 24 with TBI and 30 with orthopedic injury (OI), participated in a cross-sectional cohort study at a Level-1 trauma center. The VR-CAT was evaluated in terms of user experience as well as preliminary psychometric properties including test-retest reliability, face validity, concurrent validity with two standard EF assessment tools, and utility in distinguishing the TBI and OI groups.

Results.

Children in both groups reported high levels of usability (i.e., enjoyment and motivation). The VR-CAT composite and scores on tests of inhibitory control and working memory demonstrated modest test-retest reliability across two independent assessment visits, as well as acceptable face validity, modest concurrent validity, and clinical utility.

Conclusion.

The present study is among the first to evaluate the applicability of an immersive VR cognitive assessment tool in children with TBI. The findings support high usability, adequate psychometric properties, and satisfactory clinical utility of VR-CAT, suggesting it is a promising tool for assessing executive functions in this vulnerable population.

Keywords: virtual reality, executive function, TBI, psychometrics, children


According to the Centers for Disease Control and Prevention (CDC), pediatric traumatic brain injury (TBI) is the leading cause of death and disability in children (Centers for Disease Control Prevention, 2015; Faul et al., 2010; Hiu Lam & Mackersie, 1999; Keenan & Bratton, 2006), Pediatric TBI often results in significant impairment in cognitive ability (Faul et al., 2010), particularly in executive functions (EFs) (Leblanc et al., 2005; Scheibel & Levin, 1997; Scott & Schoenberg, 2011), defined as cognitive capacities for self-controlled discipline, creativity, and flexibility (Diamond, 2013; Diamond & Lee, 2011). EF deficits have profound implications for children’s daily behaviors (Diamond, 2013) and quality-of-life (Mangeot et al., 2002), including increased attention problems (Willcutt et al., 2005), substandard academic performance (Diamantopoulou et al., 2007), and poorer psychosocial adjustment (Riggs et al., 2004). Therefore, reliable and valid EF assessment for pediatric TBI is critical to help clinicians develop precise treatment and rehabilitation plans.

To assess cognitive problems post-TBI (Carruthers et al., 2014; Diamond & Lee, 2011), a variety of brief cognitive assessments have been developed specifically for children. For example, the Cognitive and Linguistic Scale (CALS) is a brief measure that was developed to assess functional changes during a child’s recovery and can be used with pediatric patients with significant cognitive impairment after brain injury (Slomine et al., 2008; Vasa et al., 2015; Watson et al., 2021). The CALS is most often used serially to quantify a child’s recovery over time in the acute phase of injury recovery and measures functioning in several cognitive domains. Similarly, the Lebby-Asbell Neurocogitive Screening Examination for Children (LANSE-C) was developed as a brief assessment tool for children (Lebby & Asbell, 2007). It also assesses several cognitive domains and can be used as a screening tool to determine whether additional evaluations are needed (Lebby et al., 2015; Vercellini et al., 2016). Additionally, recent measures of executive functions have been developed to provide greater ecological validity, such as the Jansari Test of Executive Functions for Children (Jansari et al., 2014). This measure is an innovative and sophisticated multi-component computer task that has proved applicable to children and adolescents with acquired brain injuries (Gilboa et al., 2019). However, evidence-based EF assessment tools specifically designed for pediatric TBI are lacking. Virtual reality (VR) has rapidly emerged as a promising platform to address the limitations of traditional assessment tools, including compliance, engagement, and ecological richness (Jansari et al., 2014). For example, VR can offer a versatile and safe environment to evaluate children’s cognition, offering potentially high transferability of test results to real-life functions. Furthermore, as a novel gaming platform, VR is especially appealing to chidren with TBI (Shen et al., 2020). Higher levels of engagement could increase motivation for children to participate in a VR asssessment, potentially improving patient compliance.

A recent systematic review found that VR-based cognitive assessment has become increasingly popular in neuropsychology (Neguț et al., 2016). However, only five studies were identfied using VR in patients with brain injuries, mostly focusing on adults (Kang et al., 2008; Pugnetti et al., 1998; Rand et al., 2007; Rand et al., 2009) and stroke patients (Kang et al., 2008; Rand et al., 2007; Rand et al., 2009). Given the significant developmental changes from childhood to adulthood, VR systems developed for adults are likely not appropriate for children. One study examined the usability of a virtual supermarket to measure EFs in children but failed to evaluate its psychometric properties (Erez et al., 2013).

The present study aimed to address this research gap by validating a novel VR-based Cognitive Assessment Tool (VR-CAT) for assessing three EFs in children with TBI, including inhibitory control, working memory, and cognitive flexibility, based on Diamond’s Framework of Executive Functions (Diamond, 2013). Using a cross-sectional cohort design, this study tested the hypotheses that the VR-CAT would demonstrate adequate test-retest reliability, concurrent validity, and clinical utility.

METHODS

Transparency and Openness

This report includes information regarding determination of sample size, data exclusions (for missing data), and study design, procedures, and measures according to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Guidelines (STROBE checklist included as supplementary materials). De-identified data, analysis code, and research materials are available upon request by emailing the corresponding author. Data were analyzed using SAS 9.4. This study’s design and hypotheses were not pre-registered.

Participants and Recruitment

A total of 54 children were recruited from a Level I Pediatric Trauma Center, including 24 children with TBI and 30 children with OI. Due to the preliminary nature of this pilot study, the sample size was determined primarily based on available patient volume at the recruitment site without a-prior power analysis. Participants were identified through daily review of electronic medical records, monthly trauma registry, and clinician/self referral at the hospital. Inclusion criteria for the TBI group were: (1) age 7 to 17 years inclusive; (2) TBI diagnosis within the past year with a Glasgow Coma Scale (GCS) score ⩾ 13 plus presence of a depressed skull fracture or trauma related intracranial abnormality (complicated mild TBI), GCS = 9–12 (moderate TBI), or GCS = 3–8 (severe TBI); and 3) a score ≤28 on the Agitated Behavior Scale, indicating no to mild agitation. Participants in the OI group had sustained an OI without loss of consciousness or indicators of possible brain injury such as facial injuries and were selected to match the TBI group as closely as possible on age and sex. Exclusion criteria for both groups were: 1) a severe pre-injury impairment preventing administration of study measures (per medical records); 2) placement by medical providers on droplet/contact isolation; 3) imposition of video game restrictions; 4) abuse as the cause of injury; 5) receiving narcotics for pain; and 6) primary language spoken other than English.

All procedures were approved by the institutional review board.

The VR-based Cognitive Assessment Tool (VR-CAT)

Details of the VR-CAT system has been described elsewhere (Shen et al., 2020). As illustrated in Figure 1, the VR-CAT consists of a fully-immersive HTC© VIVE VR viewer powered by a high-performance laptop on which a Windows-based VR application invites children to rescue an animated character named “Lubdub” from a castle. The program consists of three child-friendly assessment tasks that correspond to the three core EFs: (1) direct sentinels away from the castle gates (VR Inhibitory Control) (Zelazo et al., 2013), (2) open a series of castle gates by replicating the cryptography sequence of items surrounding each gate in forward/backward order (VR Working Memory) (Zelazo et al., 2013), and (3) rescue Lubdub by matching patterns between Lubdub and the surrounding guards (VR Cognitive Flexibility) (Chelune & Baer, 1986). The assessment consists of 30 trials per task with approximately 30 minutes required to complete all tasks. Scoring methods are as follows:

  • For VR Inhibitory Control, response time and percent of correct responses were recorded for each trial, with the mean and standard deviations of both variables computed across all trials. A VR Inhibitory Control Composite Score was computed by dividing correct percentages by response time, with higher scores indicating both faster and more accurate responses. A standardized VR Inhibitory Control Score was then computed by computing a composite standard score (M = 100, SD = 15), with higher scores indicating higher levels of inhibitory control.

  • For VR Working Memory, response time and maximal number of items recalled were recorded for each trial. Similar to Task 1, the mean and standard deviations of both variables were computed across all trials and a VR Working Memory Composite Score computed by dividing the maximum number of recalled items across trials by response time. A standardized VR Working Memory Score was then calculated by computing a composite standard score (M = 100, SD = 15), with higher scores indicating higher levels of working memory.

  • For VR Cognitive Flexibility, the number of errors after each rule change was recorded across trials. A standardized VR Cognitive Flexibility Score was caculated by computing a standard score (M = 100, SD = 15) using the mean number of error trials after rule change, with higher scores indicating lower levels of cognitive flexibility.

  • A VR Composite Score was also computed by taking the average of all three tasks’ standard scores and by standardizing these means. This composite score served as an overall indicator of children’s EF performance as measured within the virtual environment, with higher scores indicate better EF performance.

Figure 1.

Figure 1.

The VR-CAT Program

Other Measures

Simulator sickness was assessed using the Simulator Sickness Questionnaire, or SSQ (Kennedy et al., 1993). The SSQ consists of 15 items (0–3 scale) related to potential side effects common to playing in virtual environments, such as “general discomfort” and “headache”. The total score was the average rating, with higher scores indicating higher levels of simulator sickness.

Perceived exertion was assessed using the Borg Rating of Perceived Exertion Scale (Williams, 2017), which asks children a single question about their subjective feeling of exertion after playing the VR games. The scale ranges from 6 (no exertion at all) to 20 (maximal exertion).

VR satisfaction was assessed on a 5-point Likert scale with questions about their overall VR experience, such as levels of perceived fun, enjoyment, and motivation to complete the VR-CAT. Scores were averaged, with higher scores indicating greater satsifaction levels.

Simulator realism was measured by the participants’ reported perception of realism within the virtual world using a single 5-point likert scale question: “Did you feel like you were inside the game?” Higher scores indicate higher levels of simulator realism.

Additional tasks included the NIH Toolbox Cognition Battery (Zelazo et al., 2013) and the parental Behavior Rating Inventory of Executive Function-2nd Edition (BRIEF-2) (Gioia et al., 2015). NIH Toolbox tasks administered for this study were the Flanker Inhibitory Control and Attention Test (inhibitory control), List Sorting Working Memory Test (working memory), and Dimensional Change Card Sort Test (cognitive flexibility). Age-corrected standard scores were reported and their average was labeled as NIH Toolbox Composite Score in this study, with higher scores indicating better EF. The BRIEF-2 T-scores (mean=50, SD=10, higher scores indicate worse EF) of interest for this study were those for the subscales Inhibit (inhibitory control), Working Memory (working memory), and Shift (cognitive flexibility), as well as the Global Executive Composite (GEC). Data from BRIEF-2 were excluded for three OI participants with “questionable” ratings (score of 3) on the Negativity, Inconsistency, or Infrequency validity scales.

Procedures

Assessment #1:

Both the VR-CAT and NIH Toolbox tasks. Test order within the Assessment #1 was randomized for each participant to minimize potential order effects. One parent (or legal guardian) completed the BRIEF-2. Participants’ VR experience, including simulator sickness, perceived exertion, satisfaction, and simulator realism, was also assessed.

Assessment #2:

VR-CAT only and conducted approximtely three weeks (Table 1) after Assessment #1 to evaluate test-retest reliability.

Table 1.

Description of Study Sample, by Injury Type (N=54)

Demographics OI (N=30) TBI (N=24) p
Sex (n, %) 0.56
 Male 21 (70) 15 (63)
 Female 9 (30) 9 (38)
Age (median, IQR) 11 (5) 14 (6) 0.23
Time between assessments, days (median, IQR) 21 (22) 7 (12) 0.01
Race (n, %) 0.06
 White/Non-Hispanic 15 (50) 20 (83)
 Black/Non-Hispanic 10 (33) 3 (13)
 Hispanic 1 (3) 0 (0)
 Other 4 (13) 1 (4)

Days since injury (median, IQR) 37 (30) 26 (15) 0.02

TBI-Only Characteristics

GCS (median, IQR) -- 10 (11)
TBI Severity (n, %)
 Complicated Mild -- 8 (35)
 Moderate -- 4 (17)
 Severe -- 11 (48)

Note. The OI (orthopedic injury) and TBI (traumatic brain injury) groups did not differ significantly in age, post injury duration (Wilcoxon rank sum test), sex (Chi-square test), or race (Fisher’s Exact Test). Missing: 5 missing (1 in TBI, 4 in OI) for Time between assessments, 4 missing (2 in TBI, 2 in OI) for Days since injury. For TBI group only, 1 missing for TBI severity, 1 missing for GCS

Data Analysis Plan

Step 1. Demographics and usability.

Descriptive information was provided for group characteristics. VR-CAT usibility was evaluated by median and interquartile range (IQR) of simulator sickness scores, Borg perceived exertion scores, and VR satisifaction scores. Categorical data are presented as frequencies (percentages), compared using Chi-squared test and Fisher’s exact test if there were ≤ 5 cases per group; continuous variables are presented as median (interquartile range [IQR]) with differences assessed using Wilcoxon rank-sum test.

Step 2. Psychometric properties of the VR-CAT.

Test-retest reliability was evaluated by computing the Intraclass Correlation Coefficient (ICC) between participants’ performance on the two VR-CAT assessments. ICC accounts for both consistency of performances between re-tests and change in subjects’ performance as a group over time, which is ideal for evaluating test-retest reliability (Vaz et al., 2013) with the following guidelines: low (ICC: <0.5), moderate (ICC: 0.5~0.75), high reliability (ICC: 0.75~1.0)(Koo & Li, 2016). Face validity was defined as users’ self-reported perception of simulation realism in the VR environment (Schwebel et al., 2008) as evaluated by the single-item scale of simulator realism. A Wilcoxon rank-sum test was conducted for group comparisons. Concurrent validity was assessed by examining the correlation of VR-CAT scores and NIH Toolbox scores/parent BRIEF-2 scores.

Step 3. Clinical utility.

Clinical utility was assessed by examining group differences on VR-CAT, NIH Toolbox, and BRIEF-2. Hedge’s g were calculated as an unbiased measure of effect sizes (Lakens, 2013), with higher values indicating larger group differences using the following guidelines: small (Hedges’ g: 0.15~0.4), medium (Hedges’ g: 0.4~0.75), large (Hedges’ g: >0.75) (Brydges, 2019). Additional analyses were conducted to examine clinical utility using area under the curve (AUC) and likelihood ratio (LR) statistics, with results from these analyses presented as supplementary materials.

RESULTS

Sample Characteristics

Table 1 summarizes the basic demographic and clinical characteristics of the study sample. Wilcoxon rank sum test, Fisher’s exact test, and Chi-squared tests revealed no significant differences between the TBI and OI groups for all applicable variables except time between assessments and days since injury. Most children in the TBI group had moderate to severe injuries. but we were unable to compare injury severity between the TBI and OI groups due to lack of comparable injury severity measures applicable to both groups. We did not have data to directly compare children who were eligible and consented to participate to those who were ineligible or did not agree to participate.

Usability of VR-CAT

Table 2 presents data on the usability of VR-CAT. Both groups reported favorable levels of usability after using the VR-CAT, including low levels of simulator sickness symptoms and of perceived physical exertion, and high levels of fun playing the VR-CAT games, liking the VR-CAT games, motivation to use VR-CAT again in the future, desire to see more VR-based sessions in the hospital, and expected compliance with hypothetical future gaming sessions. Sex and age differences were found on select usability variables. Specifically, Wilcoxon rank sum test revealed that female participants reported more simulator sickness symptoms than male participants (M Female=0.13, M Male=0.06; p = .005), and correlations indicated that older age was associated with fewer reported simulator sickness symptoms (p = .02), less reported fun during the VR games (p = .001), and less motivation to return to clinical appointments (p = .01).

Table 2.

Usability: Means and Standard Deviations of Patients’ Subjective Experiences with VR-CAT

Variables (score range) Groups
Overall (N=54) OI (N=30) TBI (N=24) p
Simulator Sickness (0–3) 0.06 (0.13) 0.06 (0.13) 0.00 (0.09) 0.14
Physical Exertion (6–20) 7.5 (4) 7.5 (3) 7.5 (5) 0.96
Had Fun (1–5) 4 (1) 4 (1) 4 (1) 0.84
Liked the VR Games (1–5) 4 (1) 4.5 (1) 4 (1) 0.60
Wanted to Play Again in Future (1–5) 4 (2) 4 (2) 4 (2) 0.81
Wanted to See VR in Hospitals (1–5) 5 (1) 5 (1) 5 (2) 0.75
Motivated to Return to Clinical Appointments (1–5) 5 (1) 5 (1) 4.5 (2) 0.28

Missing: 1 missing for Liked the VR Games score.

Psychometric Properties of VR-CAT

Participants reported adequate levels of face validity (i.e., realism) for the VR-CAT in both the TBI group (M=3.21, SD=1.41) and the OI group (M=3.73, SD=1.11), with no significant group differences (p=0.1252). Table 3 presents the test-retest reliability of the VR-CAT. Overall, the VR Inhibitory Control task demonstrated the best test-retest reliability, followed by the VR Working Memory task and the VR Composite Score. The VR Cognitive Flexibility task showed the poorest test-retest reliability, a pattern also seen in subsequent psychometric analyses.

Table 3.

Test-Retest Reliability for VR-CAT

Variable Overall (N=54)
OI (N=30)
TBI (N=24)
ICC 95% CI ICC 95% CI ICC 95% CI
Inhibitory Control 0.63 (0.46, 0.78) 0.65 (0.42, 0.83) 0.55 (0.28, 0.79)
Working Memory 0.51 (0.31, 0.71) 0.74 (0.54, 0.88) 0.22 (0.02, 0.77)
Cognitive Flexibility 0.04 (0.00, 0.99) 0.04 (0.00, 1.00) 0.02 (0.00, 1.00)
VR Composite 0.48 (0.28, 0.69) 0.56 (0.31, 0.78) 0.29 (0.06, 0.72)

ICC= intraclass correlation coefficient; CI=confidence interval (Koo & Li, 2016)

Table 4 presents the concurrent validity of the VR-CAT with the NIH Toolbox. Results indicated moderate albeit statistically significant correlations between the VR and NIH Toolbox tasks for the inhibitory control and working memory tasks, with small-to-medium effect sizes. Correlations of the VR Cognitive Flexibility task with the corresponding Toolbox task were not significant, with negligible effects. The VR Composite Score was significantly correlated with NIH Toolbox scores across the sample and subgroups with small-to-medium effects.

Table 4.

Concurrent Validity: Correlation Between VR-CAT and NIH Toolboxa

Variable Overall (N=54)
OI (N=30)
TBI (N=24)
r p r p r p
Inhibitory Control 0.49 <0.01 0.41 0.03 0.47 0.02
Working Memory 0.49 <0.01 0.44 0.02 0.50 0.02
Cognitive Flexibility 0.02 0.87 −0.05 0.79 0.12 0.57
VR Composite 0.53 <0.01 0.40 0.03 0. 57 <0.01

Note.

a

Correlations were calculated between specific corresponding tasks in VR and NIH Toolbox

Table 5 presents the concurrent validity of the VR-CAT with BRIEF-2. Overall, no significant correlations were found between VR-CAT tasks and BRIEF-2 subscales and GEC scores, although the generally negative associations were as expected given that higher BRIEF-2 scores indicate worse EF skills. The only significant correlations were found within the TBI group, in which VR Working Memory task was positively correlated with the BRIEF-2 Behavioral Regulation Index, while the VR Cognitive Flexibility Task was negatively correlated with the BRIEF-2 Emotion Regulation Index and Global Executive Composite scores.

Table 5.

Concurrent Validity: Correlation Between VR-CAT and Parent-Reported BRIEF-2 Scores

Overall (N=51) OI (N=27)a TBI (N=24)

BRIEF-2 VR_IC VR_WM VR_CF VR Composite VR_IC VR_WM VR_CF VR Composite VR_IC VR_WM VR_CF VR Composite
BRI1 −0.10 −0.02 −0.02 −0.08 −0.19 −0.14 0.26 −0.01 0.18 0.53* −0.30 0.15
ERI2 −0.05 −0.02 −0.25 −0.16 −0.25 −0.17 0.04 −0.13 0.25 0.37 −0.50* 0.02
CRI3 −0.04 −0.06 −0.06 −0.08 −0.19 −0.08 0.29 0.05 0.26 0.26 −0.38 0.05
GEC4 −0.06 −0.04 −0.13 −0.12 −0.21 −0.13 0.24 −0.01 0.24 0.39 −0.45* 0.05
*

p<0.05

**

p<0.01

BRI=Behavioral Regulation Index; ERI=Emotion Regulation Index; CRI=Cognitive Regulation Index; GEC=Global Executive Composite; IC=inhibitory control; WM=working memory; CF=cognitive flexibility; VR=virtual reality

a

BRIEF-2 data from three OI participants were excluded from analysis due to validity issues.

Clinical Utility

Table 6 presents the clinical utility of the VR-CAT in distinguishing between children with and without TBI. VR-CAT demonstrated a small effect size for the VR Composite Score (Hedge’s g = 0.312), smaller than NIH Toolbox Composite Score (Hedge’s g = 0.756) and BRIEF-2 Global Executive Composite score (Hedge’s g = 0.633). Subtask analysis indicated that the smaller overall effect sizes of VR-CAT were mostly due to a near-zero effect size for VR Cognitive Flexibility (Hedge’s g = 0.053), in contrast to large effect sizes for VR Inhibitory Control (Hedge’s g = 0.725) and VR Working Memory (Hedge’s g = 0.964), respectively. Findings from additional analyses of test utility based on AUC and LR were generally consistent with the primary findings (see Supplementary Table 1).

Table 6.

Clinical Utility: Distinguishing Between TBI and OI Based on Scores from VR-CAT, NIH Toolbox and BRIEF-2

Instruments OI (N=30) TBI (N=24) Hedge’s g 95% CI of Hedge’s g
VR-CAT Scores
 VR Inhibitory Control 104.25 (12.67) 93.31 (17.27) 0.725 (−3.245, 4.694)
 VR Working Memory 101.55 (13.97) 88.82 (11.70) 0.964 (−2.507, 4.435)
 VR Cognitive Flexibility 96.84 (15.24) 97.66 (15.15) 0.053 (−4.001, 4.107)
 VR Composite Score 100.67 (10.32) 97.66 (8.40) 0.312 (−2.850, 2.227)
NIH Toolbox Age-Corrected Scores
 Flanker Inhibitory Control and Attention Test 92.47 (14.89) 84.21 (16.67) 0.518 (−3.670, 4.706)
 List Sorting Working Memory Test 113.77 (18.57) 99.96 (17.48) 0.752 (−4.074, 5.579)
 Dimensional Change Card Sort Test 94.47 (13.56) 87.67 (17.02) 0.441 (−3.610, 4.492)
 NIH Toolbox Composite Score 100.23 (11.12) 90.61 (14.13) 0.756 (−2.589, 4.101)
BRIEF-2 T Scores a
 Behavioral Regulation Index (BRI) 47.56 (9.89) 54.33 (10.78) 0.646 (−2.168, 3.478)
 Emotional Regulation Index (ERI) 48.19 (7.70) 52.96 (10.32) 0.520 (−1.957, 2.997)
 Cognitive Regulation Index (CRI) 48.33 (9.59) 54.42 (11.59) 0.567 (−2.336, 3.470)
 Global Executive Composite (GEC) 48.04 (9.37) 54.79 (11.65) 0.633 (−2.249, 3.515)
a

BRIEF-2 data from three OI participants were excluded from analysis due to validity issues.

DISCUSSION

The present study found that the VR-CAT had adequate usability and face validity (i.e., VR realism), modest test-retest reliability and concurrent validity for inhibitory control and working memory, and acceptable clinical utility. The generally poorer performance of children with TBI on EF tasks compared to children with OI is consistent with existing literature (Keenan et al., 2018; Krasny-Pacini et al., 2017; Shen et al., 2020).

Cognitive assessment is integral to monitoring a child’s recovery from TBI and should be completed at regular intervals after injury. However, numerous factors contribute to whether follow-up evaluations happen after a child with TBI leaves the hospital. One factor that might be relevant is motivation, which could be fostered by the adoption of innovative assessment modalities such as virtual reality video games (Winnick et al., 2005). This is consistent with the study finding that children from both the TBI and OI groups reported high levels of motivation to return for future appointments. But a causal relationship between the use of VR and level of motivation for follow-up appointments cannot be asserted from the present study because we did not ask children about the reasons behind their elevated motivation. Children may have reported high levels of motivation to attend follow-up appointments not because of the VR, but for other factors such as enjoying the attention they received from the researchers.

In analyses examining sex and age differences in the VR experience, female participants reported more simulator sickness than male participants and older participants were less likely to have simulator sickness, fun, and motivation for follow-up appointments. We suspect at least part of the reason for these sex- and age-based differences were due to a third variable: experience with VR/general video gaming. For example, older participants may be more likely to have prior experience with VR or general video games, resulting in higher tolerance with VR-CAT (fewer symptoms), but also higher expectations for a fun experience to motivate them to return for follow-up appointments. Unfortunately, this possibility cannot be confirmed as data on prior VR/video game experiences were not collected. Future research is needed to determine if age and sex differences can be replicated in a larger sample and with control of potential confounds such as prior gaming experience.

Psychometric properties are critical if a tool like the VR-CAT is to be considered as a supplement to existing neuropsychological measures (Roberts & Priest, 2006). The findings provided preliminary but encouraging evidence for the reliability and validity for VR-CAT, particularly for its inhibitory control and working memory tasks. Notably, reliability in the OI group appeared generally higher than that in the TBI group. Although this finding requires replication, it is possible that brain injury contributed to less stable levels of engagement in these tasks or that more variable performance in the TBI group reflected recovery-related changes in brain functioning between the two assessments compared to the relatively stable functioning of the children in the OI group (Lindsey et al., 2019).

The study found, in general, little correlation between BRIEF-2 scores and VR-CAT performance. For example, although the BRIEF-2 GEC score showed a non-significant but negative correlation with VR-CAT tasks for the overall sample, the only signficiant correlation we found for GEC score was for the VR Cognitive Flexibility Task. But caution is required to interpret this finding given the low reliability and concurrent validity of the VR Cognitive Flexibility task. We suspect the most probable explanation for the non-significant correlation between VR-CAT and BRIEF-2 might be that objective and subject EF measures tend not to correlate with each other, as reflected in existing neuropsychological literature (Gilboa et al., 2019; Soto et al., 2020). This is not surprising given the fact that objective and subjective EF measures collect data from different information sources (child vs. caregiver) across the two clinical groups. For example, caregivers of children with TBI might be more attentive to changes in their children’s EF-related behaviors than caregivers of children with OI, because EF impariment is a more common phenomenon in children suffering from a brain injury than an orthopedic injury. Such differences in attention might lead to distinctive perceived frequencies or different scoring thresholds when reporting their children’s daily behavior frequencies on BRIEF-2, leading to a possible divergence between VR and BRIEF-2 scores. Another possibility could be data outliers. However, upon examination of the distribution of BRIEF-2 scores, we found no evidence that extreme values were responsible for this unexpected association. Regardless of possible mechanisms, caution is needed when interpreting the results until further research can replicate the findings.

Finally, the study found acceptable clinical utility of the VR-CAT. Children in the OI group showed higher performance than children with TBI on two VR tasks and the overall VR Composite Score, supporting that VR-CAT is capable of distinguishing between children with and without TBI. The effect sizes for group differences on VR Inhibitory Control and VR Working Memory were even slightly higher than effect sizes for the more established EF measures like the NIH Toolbox and BRIEF-2. Further development of the VR-CAT is needed to strengthen its clinical utility and better establish the construct validity of the tasks used to assess the three components of EF.

Limitations and Future Directions

The findings should be interpreted with caution as the current study was a cross-sectional design with the assessment delivered only via VR. To better understand the usability of various delivery modes to improve follow-up assessments for children with TBI, future research should compare VR with other delivery modes, such as tablets or paper/pencils. Second, this study is limited by its small sample size, reducing the generalizability of findings and precluding control for important socio-demographic covariates such as race/ethnicity or SES. For example, due to the small cell size in several subcategories of the race/ethnicity variable (as small as 0), we had insufficient statistical power to conduct inferential analyses with race/ethnicity as a covariate, limiting our understanding of how such socio-demographic factors might be related to different usability and EF performance outcomes. A third limitation is that only two visits were available to assess test-retest reliability, with varied gaps between visits for the TBI and OI group due to scheduling accommodations for families. Fourth, concurrent validity was assessed only in relation to the NIH Toolbox and BRIEF-2. Fifth, although the VR-CAT was installed on a mobile station that could be utlized anywhere in a clinical setting, it is not accessible in other settings such as homes. Future research should adapt the system to online and/or mobile computing platforms. This option may be particularly useful not only during a global pandemic, but also at other times for increased accessibility to long-term follow-up services. Finally, the gaming content of the current iteration of the VR-CAT is more fantasy-than reality-based, which may limit generalization to real-world senarios. Given the immersive realisim provided by the VR technology and positve usability data from the present study, an important direction for future research is to develop more realistic tasks within the VR platform with the long-term goal of achieving higher ecological validity of VR-based EF assessment among pediatic patients with TBI.

CONCLUSIONS

The present study is among the first to rigorously evaluate the usability, reliability, validity, and clinical utility of a VR-based EF assessment tool designed for children with TBI. The study found modest test-retest reliability, concurrent validity, and clinical utility in assessing inhibitory control and working memory. Future research should include more realistic VR tasks, a larger sample size, more comprehensive outcome measures, and greater accessibility via online/mobile platforms.

Supplementary Material

Supplemental Material 1
Supplemental Material 2

Impact:

  • VR-CAT is among the first virtual reality assessment tool for measuring executive functions specifically designed for children with TBI

  • The study employed an age- and sex-matched control group of non-TBI patients for rigorous evaluation of VR-CAT

  • The study found high usability and promising psychometric properties of VR-CAT

  • Future research should continue refining VR-CAT as a promising tool for early detection of EF impairment among pediatric brain injuries.

Acknowledgments:

Research reported in this publication was supported by the Ohio Department of Public Safety (ODPS) Emergency Medicine Service (EMS) Grant Program and the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health under award number K99HD093814 and R00HD093814. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Thanks to Kimberly Lever, Deborah Grayson, and Megan Ramsden for assistance with participant recruitment and data collection. Special thanks to Tyler Busch for assisting with quality control of study data.

Footnotes

Disclosures:

Authors declare no conflict of interest.

REFERENCES

  1. Brydges CR (2019). Effect size guidelines, sample size calculations, and statistical power in gerontology. Innovation in aging, 3(4), igz036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Carruthers K, Zampieri C, & Damiano D (2014). Relating motor and cognitive interventions in animals and humans. Translational Neuroscience, 5(4), 227–238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Centers for Disease Control Prevention. (2015). Report to congress on traumatic brain injury in the United States: epidemiology and rehabilitation. National Center for Injury Prevention and Control, 1–72. [Google Scholar]
  4. Chelune GJ, & Baer RA (1986). Developmental norms for the Wisconsin Card Sorting test. Journal of clinical and experimental neuropsychology, 8(3), 219–228. [DOI] [PubMed] [Google Scholar]
  5. Diamantopoulou S, Rydell A-M, Thorell LB, & Bohlin G (2007). Impact of executive functioning and symptoms of attention deficit hyperactivity disorder on children’s peer relations and school performance. Developmental neuropsychology, 32(1), 521–542. [DOI] [PubMed] [Google Scholar]
  6. Diamond A (2013). Executive functions. Annual review of psychology, 64, 135–168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Diamond A, & Lee K (2011). Interventions shown to aid executive function development in children 4 to 12 years old. Science, 333(6045), 959–964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Erez N, Weiss PL, Kizony R, & Rand D (2013). Comparing performance within a virtual supermarket of children with traumatic brain injury to typically developing children: A pilot study. OTJR: occupation, participation and health, 33(4), 218–227. [DOI] [PubMed] [Google Scholar]
  9. Gilboa Y, Jansari A, Kerrouche B, Uçak E, Tiberghien A, Benkhaled O, Aligon D, Mariller A, Verdier V, & Mintegui A (2019). Assessment of executive functions in children and adolescents with acquired brain injury (ABI) using a novel complex multi-tasking computerised task: The Jansari assessment of Executive Functions for Children (JEF-C©). Neuropsychological rehabilitation, 29(9), 1359–1382. [DOI] [PubMed] [Google Scholar]
  10. Gioia GA, Isquith PK, Guy SC, & Kenworthy L (2015). BRIEF-2: Behavior rating inventory of executive function Psychological Assessment Resources Lutz, FL. [Google Scholar]
  11. Hiu Lam W, & Mackersie A (1999). Paediatric head injury: incidence, aetiology and management. Pediatric Anesthesia, 9(5), 377–385. [DOI] [PubMed] [Google Scholar]
  12. Jansari AS, Devlin A, Agnew R, Akesson K, Murphy L, & Leadbetter T (2014). Ecological assessment of executive functions: a new virtual reality paradigm. Brain Impairment, 15(2), 71–87. [Google Scholar]
  13. Kang YJ, Ku J, Han K, Kim SI, Yu TW, Lee JH, & Park CI (2008). Development and clinical trial of virtual reality-based cognitive assessment in people with stroke: preliminary study. CyberPsychology & Behavior, 11(3), 329–339. [DOI] [PubMed] [Google Scholar]
  14. Keenan HT, & Bratton SL (2006). Epidemiology and outcomes of pediatric traumatic brain injury. Developmental neuroscience, 28(4–5), 256–263. [DOI] [PubMed] [Google Scholar]
  15. Keenan HT, Clark AE, Holubkov R, Cox CS, & Ewing-Cobbs L (2018). Psychosocial and executive function recovery trajectories one year after pediatric traumatic brain injury: the influence of age and injury severity. Journal of neurotrauma, 35(2), 286–296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Kennedy RS, Lane NE, Berbaum KS, & Lilienthal MG (1993). Simulator sickness questionnaire: An enhanced method for quantifying simulator sickness. The international journal of aviation psychology, 3(3), 203–220. [Google Scholar]
  17. Koo TK, & Li MY (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine, 15(2), 155–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Krasny-Pacini A, Chevignard M, Lancien S, Escolano S, Laurent-Vannier A, De Agostini M, & Meyer P (2017). Executive function after severe childhood traumatic brain injury–Age-at-injury vulnerability periods: The TGE prospective longitudinal study. Annals of physical and rehabilitation medicine, 60(2), 74–82. [DOI] [PubMed] [Google Scholar]
  19. Lakens D (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in psychology, 4, 863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lebby PC, & Asbell SJ (2007). The Source for TBI: Children & Adolescents LinguiSystems. [Google Scholar]
  21. Lebby PC, Pollock M, Mouanoutoua A, & Lewey JH (2015). Performance of children and adolescents with brain injury on the Lebby-Asbell neurocognitive screening examination for children and adolescents. Journal of child neurology, 30(10), 1255–1262. [DOI] [PubMed] [Google Scholar]
  22. Leblanc N, Chen S, Swank PR, Ewing-Cobbs L, Barnes M, Dennis M, Max J, Levin H, & Schachar R (2005). Response inhibition after traumatic brain injury (TBI) in children: Impairment and recovery. Developmental neuropsychology, 28(3), 829–848. [DOI] [PubMed] [Google Scholar]
  23. Lindsey HM, Wilde EA, Caeyenberghs K, & Dennis EL (2019). Longitudinal neuroimaging in pediatric traumatic brain injury: current state and consideration of factors that influence recovery. Frontiers in neurology, 10, 1296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Mangeot S, Armstrong K, Colvin AN, Yeates KO, & Taylor HG (2002). Long-term executive function deficits in children with traumatic brain injuries: Assessment using the Behavior Rating Inventory of Executive Function (BRIEF). Child Neuropsychology, 8(4), 271–284. [DOI] [PubMed] [Google Scholar]
  25. Neguț A, Matu S-A, Sava FA, & David D (2016). Virtual reality measures in neuropsychological assessment: a meta-analytic review. The Clinical Neuropsychologist, 30(2), 165–184. [DOI] [PubMed] [Google Scholar]
  26. Pugnetti L, Mendozzi L, Attree EA, Barbieri E, Brooks BM, Cazzullo CL, Motta A, & Rose FD (1998). Probing memory and executive functions with virtual reality: Past and present studies. CyberPsychology & Behavior, 1(2), 151–161. [Google Scholar]
  27. Rand D, Katz N, & Weiss PL (2007). Evaluation of virtual shopping in the VMall: Comparison of post-stroke participants to healthy control groups. Disability and rehabilitation, 29(22), 1710–1719. [DOI] [PubMed] [Google Scholar]
  28. Rand D, Rukan SB-A, Weiss PL, & Katz N (2009). Validation of the Virtual MET as an assessment tool for executive functions. Neuropsychological rehabilitation, 19(4), 583–602. [DOI] [PubMed] [Google Scholar]
  29. Riggs NR, Blair CB, & Greenberg MT (2004). Concurrent and 2-year longitudinal relations between executive function and the behavior of 1st and 2nd grade children. Child Neuropsychology, 9(4), 267–276. [DOI] [PubMed] [Google Scholar]
  30. Roberts P, & Priest H (2006). Reliability and validity in research. Nursing standard, 20(44), 41–46. [DOI] [PubMed] [Google Scholar]
  31. Scheibel RS, & Levin HS (1997). Frontal lobe dysfunction following closed head injury in children: Findings from neuropsychology and brain imaging [Google Scholar]
  32. Schwebel DC, Gaines J, & Severson J (2008). Validation of virtual reality as a tool to understand and prevent child pedestrian injury. Accident Analysis & Prevention, 40(4), 1394–1400. [DOI] [PubMed] [Google Scholar]
  33. Scott JG, & Schoenberg MR (2011). Frontal lobe/executive functioning. In The little black book of neuropsychology (pp. 219–248). Springer. [Google Scholar]
  34. Shen J, Xiang H, Luna J, Grishchenko A, Patterson J, Strouse RV, Roland M, Lundine JP, Koterba CH, & Lever K (2020). Virtual Reality–Based Executive Function Rehabilitation System for Children With Traumatic Brain Injury: Design and Usability Study. JMIR serious games, 8(3), e16947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Slomine B, Eikenberg J, Salorio C, Suskauer S, Trovato M, & Christensen J (2008). Preliminary evaluation of the Cognitive and Linguistic Scale: A measure to assess recovery in inpatient rehabilitation following pediatric brain injury. The Journal of Head Trauma Rehabilitation, 23(5), 286–293. [DOI] [PubMed] [Google Scholar]
  36. Soto EF, Kofler MJ, Singh LJ, Wells EL, Irwin LN, Groves NB, & Miller CE (2020). Executive functioning rating scales: Ecologically valid or construct invalid? Neuropsychology, 34(6), 605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Vasa RA, Suskauer SJ, Thorn JM, Kalb L, Grados MA, Slomine BS, Salorio CF, & Gerring JP (2015). Prevalence and predictors of affective lability after paediatric traumatic brain injury. Brain injury, 29(7–8), 921–928. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Vaz S, Falkmer T, Passmore AE, Parsons R, & Andreou P (2013). The case for using the repeatability coefficient when calculating test–retest reliability. PloS one, 8(9), e73990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Vercellini D, Cunningham B, Lebby P, & Canfield M (2016). Psychometric Test Development and Assessment-2 Structural Analysis of the The Lebby-Asbell Neurocognitive Screening Examination: Bridging the Gap Between Mental Status Examinations and Neuropsychological Batteries. Archives of Clinical Neuropsychology, 31(6), 576–576. [Google Scholar]
  40. Watson WD, Suskauer SJ, Askin G, Nowak S, Baum KT, Gerber LM, Blackwell LS, Koterba CH, Hoskinson KR, & Kurowski BG (2021). Cognitive recovery during inpatient rehabilitation following pediatric traumatic brain injury: a pediatric brain injury consortium study. Journal of head trauma rehabilitation, 36(4), 253–263. [DOI] [PubMed] [Google Scholar]
  41. Willcutt EG, Doyle AE, Nigg JT, Faraone SV, & Pennington BF (2005). Validity of the executive function theory of attention-deficit/hyperactivity disorder: a meta-analytic review. Biological psychiatry, 57(11), 1336–1346. [DOI] [PubMed] [Google Scholar]
  42. Williams N (2017). The Borg rating of perceived exertion (RPE) scale. Occupational Medicine, 67(5), 404–405. [Google Scholar]
  43. Winnick S, Lucas DO, Hartman AL, & Toll D (2005). How do you improve compliance? Pediatrics, 115(6), e718–e724. [DOI] [PubMed] [Google Scholar]
  44. Zelazo PD, Anderson JE, Richler J, Wallner-Allen K, Beaumont JL, & Weintraub S (2013). II. NIH Toolbox Cognition Battery (CB): Measuring executive function and attention. Monographs of the Society for Research in Child Development, 78(4), 16–33. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material 1
Supplemental Material 2

RESOURCES