Abstract
Background
Executive dysfunction is commonly impaired in individuals with mild cognitive impairment (MCI). Traditional tools like the Stroop test are widely used to evaluate this domain but lack ecological validity. Virtual reality (VR)-based cognitive assessments, grounded in embodied cognition, may offer a more immersive and sensitive approach to detecting subtle executive dysfunction.
Methods
This study developed and validated a novel VR-based Stroop Test (VRST) that simulates a real-life clothing-sorting task involving incongruent word-color stimuli. A total of 413 older adults (224 healthy controls and 189 with MCI) completed the VRST using a hand-held controller. Behavioral metrics including task completion time, 3D(three-dimensional) trajectory length, and hesitation latency were collected. Participants also underwent traditional assessments: the Korean version of the Montreal Cognitive Assessment (MoCA-K), the paper-based Stroop test, and the Corsi Block Test (CBT). Receiver operating characteristic curves and Spearman correlations were used to analyze discriminant power and construct validity.
Results
All VR-derived behavioral markers effectively differentiated older adults with MCI from HCs, with 3D trajectory length showing the highest area under the curve (0.981), followed by hesitation latency (0.967). These surpassed the MoCA-K (0.962). Significant correlations were observed between VRST outcomes and global cognition (MoCA-K), inhibition (Stroop), and working memory (CBT), supporting convergent validity. Importantly, baseline motor abilities did not significantly differ between groups, suggesting that executive function could contributed to performance differences.
Conclusions
The VRST provides a valid, reliable, and scalable means of detecting MCI-related executive dysfunction through embodied cognitive-motor interaction. Its ability to capture subtle behavioral changes in a realistic context suggests strong potential for use in both clinical and community-based cognitive screening settings.
Trial registration: This study was retrospectively registered in the Thai Clinical Trial Registration with identifier TCTR 20250625011.
Keywords: Mild cognitive impairment, Virtual reality, Embodied cognition, Inhibitory control, Executive function, Digital biomarker, Stroop test
Introduction
Mild cognitive impairment (MCI) represents a prodromal phase of Alzheimer’s disease (AD), characterized by noticeable declines in memory and executive functioning that exceed typical age-related changes [1]. The growing emphasis on early intervention underscores the importance of regularly monitoring individuals with MCI to detect potential progression to AD [2]. While neuropsychological screening tools remain the primary method for identifying MCI in clinical practice, their limitations are well-documented [3–5]. Specifically, the Montreal Cognitive Assessment (MoCA), widely used to discriminate MCI, has shown relatively low sensitivity, reducing its utility in accurately distinguishing early MCI from normal cognitive aging [4–6]. Other screening tools such as the Clock Drawing Test and the Mini-Cog have also been used to distinguish MCI. However, similar to the MoCA, these tools exhibit limited sensitivity and specificity. Consequently, existing screening tools remain insufficient for reliably detecting MCI [7–9].
In addition to cognitive impairments, individuals with MCI have also been reported to exhibit cognitive-related motor dysfunction, particularly in upper-extremity functions [10]. A previous study demonstrated that smartphone-derived keystroke dynamics, which reflect fine motor control during spontaneous cognitive-motor interaction, could discriminate individuals with MCI from cognitively healthy older adults with a sensitivity of 97.9% and specificity of 96.9%—both of which markedly exceed those of conventional screening tools [11]. Growing interest has emerged in digital biomarkers, especially those generated through mobile and wearable technologies, as they offer promising avenues for MCI detection given their sensor sensitivity and practical applicability [11, 12]. From this perspective, behavioral metrics collected during immersive virtual reality (VR)-based cognitive-motor tasks—such as hand movement trajectory and response hesitation—have shown measurable differences between individuals with MCI and cognitively healthy older adults, highlighting their potential as digital biomarkers for early screening [13, 14]. While smartphone-based methods such as keystroke analysis offer high diagnostic performance and practical advantages in real-world environments, they are best suited for passive or continuous monitoring. In contrast, VR-based assessments allow for structured, ecologically immersive task environments in clinical settings. This enables precise manipulation of stimuli and collection of embodied behavioral responses under standardized conditions.
Accordingly, VR–based behavioral features have recently gained attention as promising digital biomarkers for the early detection of MCI. Patients with MCI exhibited slower and more erratic hand movements, more scattered gaze patterns, longer task completion times, and higher error rates in a virtual kiosk environment compared to healthy controls, indicating the diagnostic value of immersive VR behavioral markers [13]. These indicators, derived from ecologically valid scenarios simulating instrumental activities of daily living, were significantly correlated with neuropsychological measures such as executive function, attention, and visuospatial abilities. In addition, another study using a VR-based multi-domain cognitive screening test showed that variables such as response latency, hand movement efficiency, and error frequency could effectively differentiate MCI from normal aging with high diagnostic sensitivity and user acceptability [14]. These findings highlight the growing evidence that VR-enabled cognitive assessments—particularly those capturing motor and behavioral responses during complex cognitive tasks—can offer a powerful supplement to conventional screening tools in the early identification of MCI.
While prior research utilizing VR-derived behavioral markers has offered promising directions for early MCI detection, several limitations restrict their clinical applicability. Many studies have assessed performance in virtual simulations of daily tasks—such as kiosk use—extracting behavioral features to differentiate MCI from healthy aging [13, 14]. However, these immersive environments often lack clear cognitive targeting, especially of domains like inhibitory control, making it difficult to attribute observed behaviors to specific dysfunctions. Moreover, the simultaneous engagement of memory, attention, and planning complicates the interpretation of cognitive load sources, reducing construct specificity [15, 16]. Another key issue is the inadequate control of baseline motor function. While motor metrics like hand movements are often used as cognitive proxies, their validity is limited if baseline motor function differs between groups. Demonstrating no initial motor differences is crucial to attribute later motor discrepancies to cognitive impairment [13, 14]. Finally, most VR-based studies have not benchmarked their markers against standardized tools like the Korean version of MoCA (MoCA-K), leaving their diagnostic utility and clinical added value uncertain without direct comparative analyses [14, 17].
Therefore, this study aimed to examine the efficacy of behavioral digital markers derived from the Virtual Reality Stroop Test (VRST) in distinguishing individuals with MCI from cognitively healthy older adults. By leveraging a task specifically designed to engage inhibitory control within an ecologically valid virtual environment, this study further sought to assess whether the classification performance of VR-based behavioral markers could be enhanced through comparison with a conventional screening tool.
Methods
Participants
Participants were recruited from senior and daycare centers in Seoul and Asan, South Korea, and were divided into two groups: 224 healthy controls (HCs) and 189 individuals with MCI. Based on criteria from a previous study [1], inclusion criteria for MCI involved: (1) subjective complaints of memory decline; (2) objective memory impairment compared to age- and education-matched HCs, defined by performance at least 1.5 standard deviations below the mean on a neuropsychological test battery (The Seoul Verbal Learning Test for verbal memory; The Rey Complex Figure Test for visual memory); (3) preserved general cognitive abilities as assessed by the Cognitive Impairment Screening Test; (4) independence in daily living activities; and (5) no prior experience using VR. Exclusion criteria included: (1) a clinical diagnosis of dementia, (2) the presence of neurological or psychiatric conditions such as stroke or depression, and (3) significant visual or auditory impairments. These criteria are in line with the original Petersen criteria, which define MCI primarily as memory-specific (amnestic MCI).
The study was approved by the Institutional Review Board of Soonchunhyang University and conducted in accordance with established ethical guidelines for research involving human participants. All participants gave written informed consent after receiving comprehensive explanations regarding the study’s aims, procedures, and their rights, including the freedom to withdraw at any point without any negative consequences.
Procedure
All participants performed the MoCA-K [18], the conventional Stroop test [19], and the computerized Corsi Block Test (CBT) [20, 21] to assess baseline cognitive performance. Subsequently, the relationship between these assessments and VR-based behavioral markers. In addition, to rule out the influence of initial motor function differences, baseline upper extremity function was evaluated using and the Box and Block Test [22] and the Grooved Pegboard Test [23].
All participants then completed the VRST developed for this study. Prior to VRST administration, all participants were guided through a tutorial session of at least 10 min to familiarize them with the VR interface, including using the hand controller. All participants were instructed to use their dominant hand for controlling the device. After completing the tutorial, participants were given a 10-minute break and were then seated with both feet flat on the floor. The VRST was conducted in a quiet, private room, with the examiner standing nearby to ensure safety without interfering with the task. Each session of the VRST lasted approximately 2 min and was repeated three times, with a 30-second break between trials. A floor space of 3 × 3 m was cleared in advance to allow unobstructed controller movement.
Outcome measures
The VRST was intentionally developed to simulate an ecologically valid scenario while engaging inhibitory control through a reverse Stroop paradigm. In this task, participants were required to ignore a salient but task-irrelevant perceptual feature—namely, the color of the object—and instead categorize virtual items (e.g., shirts, pants, socks, or shoes) based on their semantic identity. For instance, a yellow shirt would appear in the virtual space, and participants had to grasp and move it into a storage box labeled “shirts,” which might itself be a different color (e.g., red) (Fig. 1). Although the VRST included only incongruent stimuli, this design was intentional to induce continuous inhibitory demands while minimizing task duration and cognitive fatigue, particularly for older adult participants. By requiring suppression of a salient but task-irrelevant visual feature (color) in every trial, the task was structured to elicit sustained inhibitory control, consistent with the core cognitive principle of the traditional Stroop paradigm [24].
Fig. 1.
The screenshot of the VRST: participants used embodied interaction to sort incongruently labeled clothing items into matching category boxes
The task was implemented in Unity (Unity Technologies, San Francisco, CA, USA) and presented on a 23-inch LCD monitor (1920 × 1080 resolution, 60 Hz refresh rate) without the use of a head-mounted display. To enhance accessibility and minimize cybersickness, participants interacted with the virtual environment on a desktop setup using the HTC Vive Controller (HTC Corp., Taiwan). All behavioral responses were captured at a sampling rate of 90 Hz through Unity’s XR Interaction Toolkit and exported as CSV files for subsequent analysis.
The VRST developed for this study produced three behavioral outcome metrics. First, the total completion time was recorded, representing the duration required to correctly sort all 20 stimuli. Second, the 3D trajectory length of the controller was measured across the x, y, and z axes to reflect the physical effort exerted during task performance. Third, hesitation latency was captured, defined as the time interval between pressing the controller button and the initiation of object movement, indicating motor hesitation or cognitive processing delay. These three measures were not arbitrarily selected but instead represent the full set of behavioral data that could be technically captured and analyzed based on the design of the VR system. All metrics were extracted using Unity’s Interaction Toolkit and the HTC Vive controller’s output, which defined the system’s measurable behavioral parameters. Although their selection was based on technical feasibility, each metric aligns with established cognitive theories. Completion time is associated with general processing speed and attentional control, reflecting cognitive efficiency as conceptualized in Sternberg’s memory scanning paradigm [25]. Hesitation latency may indicate deficits in response initiation and controlled processing, consistent with Schneider and Shiffrin’s dual-process theory [26, 27]. The 3D trajectory length reflects inefficiencies in motor planning and spatial control, which may stem from impaired working memory or inhibitory regulation—functions often disrupted in early stages of MCI, and modeled in part by motor-cognitive integration frameworks such as the diffusion decision model [28]. To ensure that the VRST measured inhibitory control rather than being confounded by motor variability, several design constraints were implemented. The virtual environment remained static with no moving distractors, and participants performed the task while seated, using only their dominant hand within a defined interaction space. Each trial presented a single virtual item, which participants were instructed to grasp and move into a category-specific target box using a handheld controller.
The original MoCA was developed to discriminate MCI from normal aging, and its domains consist of visuospatial/executive function, attention, memory, language, abstraction, and orientation [29]. Its score ranges from 0 to 30 points, where higher scores indicate better global cognitive function. The MoCA-K was adapted based on the original MoCA [18]. The cutoff score of the MoCA-K for MCI was 23, and 1 point could be added for participants with fewer than 6 years of education. In a previous study with a cutoff score of 23, its sensitivity and specificity were 94.2% and 40.5%, respectively [18].
The traditional Stroop test employed the word–color interference condition using the Korean version of the Stroop task. Participants were instructed to name the ink color of 112 color-word stimuli printed on an A4 sheet, in which the word meaning and the ink color were always incongruent. Testing continued until all items were completed or 2 min had elapsed. The completion time and the number of correct/error responses were recorded by an examiner using a stopwatch [19]. In the case of the traditional Stroop test, only the total completion time was used for analysis in this study. This is distinct from the VRST, for which all three behavioral markers—completion time, hesitation latency, and 3D trajectory length—were included in the analysis.
The computerized CBT test was employed to evaluate participants’ working memory capacity. Given that individuals with MCI often experience impairments in both memory and executive functioning [1], a working memory test incorporating both memory and executive control was selected [30]. During the test, nine white squares were randomly arranged on a tablet screen. A subset of these squares sequentially changed color from white to red, and participants were instructed to remember both the locations and the order of the changes. Subsequently, they were required to recall the sequence by touching the corresponding squares in the same forward order [20, 21]. Each trial involved changing five squares, with fifteen trials in total. In this study, accuracy was analyzed as an outcome.
The Box and Block Test is a standardized measure of gross manual dexterity. During the task, participants were instructed to move as many 2.5 cm wooden blocks as possible from one compartment of a box to the other using one hand, within a time limit of 60 s. The total number of blocks successfully transferred was recorded. This test is widely used in both clinical and research settings to assess upper limb function and has demonstrated strong reliability and validity for evaluating motor performance in older adults and neurological populations [22]. In the present study, only the performance of the dominant hand was used for analysis.
The Grooved Pegboard Test (Lafayette instruments # 32025) is designed to evaluate fine motor coordination, manual dexterity, and psychomotor speed. The test requires participants to insert grooved metal pegs into matching keyhole-shaped slots on a 5 × 5 matrix board as quickly as possible. The pegs must be properly rotated to align with the slot orientation, introducing a visuomotor challenge. Completion time and number of drops or errors are typically recorded. This test is sensitive to changes in neurological function and is commonly used to assess motor processing deficits in aging and cognitive impairment contexts [23]. In the present study, only the dominant hand completion time was used for analysis.
Statistical analysis
SPSS for Windows (version 22.0; IBM Corp) was used to analyze the data. The demographic characteristics of the participants were analyzed using descriptive statistics. Independent 2-tailed t test and chi-square test were used to compare both groups. A receiver operating characteristic (ROC) curve analysis was used to confirm sensitivity and specificity, and a cutoff score for patients with MCI was determined according to the highest Youden Index (sensitivity + specificity– 1), which could be a criterion for choosing an optimal cutoff score. A Spearman correlation was performed to examine the relationship between VR-based behavioral markers and global cognitive function and inhibition. Statistical significance was set at p < 0.05.
Results
General and clinical characteristics in both groups
There were no statistically significant differences in demographic characteristics between both groups (all p > 0.05) except for the MoCA-K, Stroop test, CBT, and VRST-derived metrics (p < 0.001). This finding indicated that HCs were age- and education-matched to patients with MCI except for cognitive function (Table 1).
Table 1.
General and clinical characteristics of participants (N = 413)
| HCs (n = 224) | MCI (n = 189) | t / χ2 | p | Effect size | |
|---|---|---|---|---|---|
| Age (year) | 71.51 (2.89) | 71.65 (2.76) | 0.472 | 0.637 | d =0 0.047 |
| Sex (male/female) | 103/121 | 80/109 | 0.555 | 0.456 | V =0 0.037 |
| Education period (year) | 5.94 (2.19) | 5.97 (2.31) | 0.142 | 0.887 | d = 0.014 |
| Dominant hand (left/right) | 1/223 | 3/186 | 1.391 | 0.238 | V =0 0.058 |
| BBT (count) | 58.62 (1.99) | 58.87 (2.16) | 1.208 | 0.228 | d =0 0.119 |
| Grooved pegboard test (second) | 73.39 (14.16) | 72.95 (15.41) | 0.299 | 0.765 | d = 0.029 |
| MoCA-K (point) | 24.56 (1.51) | 20.48 (1.55) | 26.919 | < 0.001 | d = 2.656 |
| Stroop test (second) | 71.67 (6.03) | 84.73 (5.64) | 22.551 | <0 0.001 | d = 2.225 |
| CBT (accuracy, %) | 0.77 (0.06) | 0.71 (0.06) | 9.294 | <0 0.001 | d = 0.917 |
| VRST_completion time (second) | 72.65 (3.15) | 80.00 (3.07) | 23.849 | < 0.001 | d = 2.361 |
| VRST_3D trajectory length (cm) | 22.40 (1.33) | 27.86 (1.54) | 38.528 | <0 0.001 | d = 3.318 |
| VRST_hesitation latency (second) | 5.33 (0.70) | 7.29 (0.70) | 26.024 | < 0.001 | d = 2.800 |
Shown are mean values (standard deviation). BBT: Box and Block Test, CBT: Corsi Block Test, MoCA-K: Korean version of the Montreal Cognitive Assessment; VRST: Virtual reality-based Stroop test
Sensitivity, specificity, and discriminant power
For dissociating MCI from the matched HCs, 3D trajectory length and hesitation latency in the VRST showed a higher Area Under the Curve (AUC) than the MoCA-K (3D trajectory length: 0.941; hesitancy latency: 0.925; MoCA-K: 0.813), suggesting that VR-derived behavioral markers can better discriminate MCI compared to the conventional MCI screening tool (Table 2). Specifically, 3D trajectory length yielded a maximum sensitivity (96.8%) and hesitation latency achieved a maximum specificity (100.0%) (Table 2; Fig. 2).
Table 2.
Sensitivity and specificity of MCI detection (N = 413)
| Variable | Sensitivity | Specificity | Youden index | Cut-off | AUC (95% CI) |
|---|---|---|---|---|---|
| MoCA-K (point) | 0.920 | 0.894 | 0.813 | 22.50 | 0.962* (0.947-0.978) |
| Completion time (second) | 0.873 | 0.915 | 0.788 | 76.27 | 0.950* (0.929-0.971) |
| 3D trajectory length (cm) | 0.968 | 0.973 | 0.941 | 24.52 | 0.981* (0.966-0.995) |
| Hesitation latency (second) | 0.926 | 1.000 | 0.925 | 6.15 | 0.967* (0.949-0.985) |
*p < 0.001. AUC: Area under the curve, MoCA-K: Korean version of the Montreal Cognitive Assessment, VRST: Virtual reality-based Stroop test
Fig. 2.
ROC curves of 4 predictors. Greater AUC values indicate higher power in discriminating older adult with MCI form healthy controls. AUC: Area under the curve, MoCA-K: Korean version of the Montreal Cognitive Assessment, VRST: Virtual reality-based Stroop test
Correlation in keystroke dynamics and cognitive function
VR-derived behavioral markers were found to be significantly correlated with the MoCA-K (completion time: r = − 0.598, p < 0.01; 3D trajectory length: r = − 0.683, p < 0.001; hesitation latency: r = − 0.671, p < 0.01), the Stroop test (completion time: r = 0.583, p < 0.01; 3D trajectory length: r = 0.627, p < 0.01; hesitation latency: r = 0.611, p < 0.01), and the CBT (completion time: r = − 0.309, p < 0.01; 3D trajectory length: r = − 0.381, p < 0.01; hesitation latency: r = − 0.353, p < 0.01 (Table 3). These findings suggested that VR-derived behavioral markers are associated with global cognitive function, inhibition control, and working memory.
Table 3.
Correlation of cognitive function with VR-derived behavioral markers (N = 413)
| MoCA-K | Stroop test | CBT | VRST | ||||
|---|---|---|---|---|---|---|---|
| Completion time | 3D trajectory length | Hesitation latency | |||||
| MoCA-K (point) | - | − 0.614* | 0.380* | − 0.598* | − 0.683* | -0.671* | |
| Stroop test (second) | − 0.614* | - | −0 0.296* | 0.583* | 0.627* | 0.611* | |
| CBT (accuracy, %) | 0.380* | − 0.296* | - | −0 0.309* | − 0.381* | − 0.353* | |
| VRST | Completion time (second) | − 0.598* | 0.583* | − 0.309* | - | 0.598* | 0.638* |
| Trajectory length (cm) | −0 0.683* | 0.627* | − 0.381* | 0.598* | - | 0.652* | |
| Hesitation latency (second) | -0.671* | 0.611* | − 0.353* | 0.638* | 0.652* | - | |
*p < 0.01. CBT: Corsi Block Test, MoCA-K: Korean version of the Montreal Cognitive Assessment, VRST: Virtual reality-based Stroop test
Discussion
This study aimed to evaluate behavioral markers derived from the VRST as potential markers for distinguishing MCI from cognitively healthy older adults. For this purpose, VR-derived behavioral markers were investigated in 224 HCs and 189 older adults with MCI, alongside the conventional screening tool. By measuring VR-derived behavioral markers through an ecologically valid sorting activity with Stroop interference, this study aimed to identify distinct patterns that differentiate between HCs and older adults with MCI.
The findings revealed that VR-derived behavioral markers, particularly completion time, 3D hand movements, and hesitation latency, were effective in distinguish between HCs and older adults with MCI. Notably, the sensitivity and specificity of VR-derived behavioral markers surpassed those of the conventional screening tool. Specifically, 3D hand movements showed the highest sensitivity and hesitation latency achieved the highest specificity, suggesting that cognitive declines associated with MCI can be captured through embodied VR interactions, highlighting their potential as digital biomarkers.
Motor dysfunction, alongside cognitive dysfunction, is increasingly recognized as a hallmark of MCI, with prior research suggesting that the extent of motor decline may serve as a distinguishing feature between MCI and normal aging [31]. In particular, bradykinesia in the dominant upper limb has been proposed as a sensitive indicator for identifying MCI [13, 14]. The present study’s observation of prolonged and exaggerated VR-based behavioral responses in older adults with MCI align with these findings, reinforcing the association between motor impairment and MCI-related pathology [32]. More specifically, recent research suggests that individuals with MCI exhibit not only impairments in fine motor skills but also measurable deficits in gross motor abilities, particularly in tasks involving coordinated upper limb movements such as reaching or object manipulation [33]. The VR controller-based task used in this study primarily engages gross motor function, and the observed decline in movement efficiency and task performance among MCI participants supports the growing recognition that gross motor dysfunction is also an important marker of early cognitive impairment [34].
Declines in motor function may be influenced by impairments in working memory, which plays a crucial role in maintaining motor chunk length during complex motor sequence execution [35, 36]. Individuals with MCI, even amnestic subtypes, commonly show working memory deficits, often linked to volume reductions in the prefrontal cortex [37]. This is consistent with the present study’s findings, in which older adults with MCI demonstrated lower performance on the CBT task compared to HCs. Interestingly, no statistically significant differences were observed in pure gross or fine motor function between the groups, as measured by the Box and Block test and the Grooved Pegboard Test. These results suggest that delayed behavioral features of the VRST in older adults with MCI are more likely attributable to deficits in working memory than to motor dysfunction. Thus, working memory impairments may negatively impact behavioral performance in cognitively demanding VR environments.
Previous studies have demonstrated the promise of VR-derived behavioral markers—such as hand trajectory and eye movement—for detecting MCI [13, 14]. However, many of these studies used generalized daily task simulations (e.g., kiosk use, virtual shopping) that lacked specificity for targeting distinct cognitive domains like inhibitory control [13, 15, 16]. As these tasks engaged multiple cognitive processes simultaneously, it became difficult to attribute observed behaviors to specific dysfunctions. In contrast, this study employed a VR-based Stroop task explicitly designed to assess inhibitory control, thereby improving cognitive domain specificity within an ecologically valid environment. On the other hand, in the VRST, participants were required to categorize objects based on semantic identity, while ignoring an automatically salient but irrelevant perceptual feature—color. This design aligns with a reverse Stroop paradigm, which, although often considered weaker than the standard version, still engages prepotent response inhibition [24, 38]. Furthermore, neuroimaging evidence indicates that both classic and reverse Stroop paradigms recruit overlapping prefrontal networks, including the dorsolateral prefrontal cortex (DLPFC), highlighting the shared inhibitory mechanisms between them [39]. Thus, despite the reversed structure, the VRST maintains theoretical validity as a measure of inhibition.
Earlier studies often overlooked baseline motor function, despite using motor-based metrics like movement time as cognitive proxies [13, 14]. This study addressed that gap by incorporating standardized motor assessments—the Box and Block Test and Grooved Pegboard Test—to account for motor variability and clarify the cognitive basis of VR task performance. Finally, few prior studies directly compared VR markers to established cognitive tools like the MoCA-K, limiting clinical relevance [14, 17]. This study conducted such a comparison and found that VR-derived features showed higher sensitivity and specificity, supporting their potential as effective and complementary tools in early MCI detection.
Notably, 3D trajectory length exhibited the AUC of all outcome measures, suggesting that 3D trajectory length is more effective in distinguishing MCI than other VRST-derived metrics. This finding is consistent with a previous study reporting than a hand movement-related feature VR kiosk task demonstrated greater discriminant power than task completion time in identifying MCI [14]. From a cognitive-motor control perspective, increased movement trajectory may reflect greater motor inefficiency and disorganized planning during goal-directed actions, which are commonly linked to impairments in executive function. These kinematic alterations can be understood through the framework of internal model theory, which posits that the brain constructs forward and inverse models to predict sensory consequences and generate motor commands necessary for efficient movement execution [28]. When these internal models deteriorate—as may occur in MCI—the resulting movements tend to become more spatially variable, inefficient, and less optimized. Such disruptions are closely associated with fronto-striatal network dysfunction, particularly involving the DLPFC and supplementary motor area (SMA). These brain regions are known to play critical roles in inhibitory control, spatial working memory, and high-level motor planning, and are among the earliest to show degeneration in individuals with MCI [33]. Furthermore, the DLPFC maintains strong connections with SMA, premotor cortex, basal ganglia, and cerebellar structures, supporting its role as a supervisory controller in motor organization and adjustment [40]. Unlike task duration, which may be influenced by overall processing speed or task strategy, the spatial complexity and irregularity of hand movement trajectories may more directly capture subtle deficits in motor control, attention shifting, and inhibitory regulation—all of which are frequently impaired in the early stages of cognitive decline [41]. In other words, an increased 3D trajectory length can be used as a good marker of MCI, which indicates that a 3D movement monitoring system could provide a cost-effective and scalable solution for the detection of MCI.
On the other hand, a previous study enhanced the discriminant power of VR-derived features by integrating them with magnetic resonance imaging (MRI) data through a multimodal analysis approach [13]. This reflects the growing interest in developing robust multimodal biomarkers by combining diverse data sources. However, the improvement in classification accuracy was only marginal—approximately a 5% increase compared to VR-derived behavioral markers alone [13]. Given the limited accessibility and high cost associated with MRI, as well as the practical challenges of collecting MRI data longitudinally alongside VR-based behavioral data, alternative multimodal approaches should be considered.
The primary strength of this study lies in its integration of embodied cognition into a clinically applicable, immersive assessment. The VRST enables the extraction of high-resolution motor-cognitive markers such as 3D movement and hesitation latency—during task performance, offering more sensitive indicators of executive dysfunction than traditional paper-based tools. By targeting inhibitory control within a realistic virtual environment, the VRST can detect subtle cognitive impairments that may not yet manifest as overt errors, enhancing early identification of MCI-related decline [42–44]. Additionally, unlike prior multimodal approaches that depend on costly neuroimaging, the VRST uses readily available commercial VR equipment, improving accessibility and scalability for clinical and community-based applications.
Although mobile-based tools offer the advantage of high accessibility in everyday environments, VR-based assessments like the VRST provide structured tasks that can be administered in clinical contexts. The VRST enables direct measurement of goal-directed behavior under standardized inhibitory demands, offering behavioral metrics that are not easily obtainable through passive mobile data. Despite increasing evidence for the diagnostic value of VR-based tools, their clinical adoption remains limited due to high equipment costs, lack of platform standardization, and usability issues such as cybersickness or user unfamiliarity [45, 46]. To address these barriers, the VRST was designed without head-mounted displays, used in a seated and static setting with a single controller, and fully automated to reduce examiner involvement. For broader adoption, future versions should incorporate age-friendly interfaces and simplified, self-guided protocols. Further usability testing is needed to support implementation by non-specialists and enable scalable use in real-world cognitive screening.
Despite the promising findings and strengths of this study, there are some limitations to be considered. Firstly, the current findings could not be generalized as MCI was limited to its amnestic type. Nevertheless, considering that individuals with amnestic MCI show less heterogeneity in cognitive function than non-amnestic MCI and multi-domain MCI [1, 5], the findings of this study have clinical implications. Secondly, although working memory declines in participants with MCI were investigated via the CBT test, the prefrontal cortex, which underlies working memory, was not objectively observed. Thus, it is difficult to affirm whether VR-derived features in older adults with MCI are due to working memory deficits caused by neurodegeneration in the prefrontal cortex. A neuroimaging study will provide more objective evidence of changes in MCI’s VR-derived behavioral markers, allowing for sub-group analysis based on neurodegeneration severity. Third, this study demanded technology familiarization with a VR task such as a controller to some extent. While all participants received training before the task and performed seated within a constrained workspace, individual differences in technological experience could have introduced variability in task performance particularly in older adults. Finally, the task included only incongruent trials without a congruent or neutral control condition. While this limits the ability to compute traditional interference scores or contrast inhibitory with non-inhibitory processing, the VRST was intentionally designed to impose sustained inhibitory demands by repeatedly presenting salient but task-irrelevant features. This approach aimed to reduce task duration and fatigue in older adults while preserving the core mechanism of the Stroop effect. Although prior research has demonstrated that incongruent-only designs can still elicit reliable measures of inhibitory control, such findings are largely based on conventional, well-established paradigms rather than newly developed, ecologically embedded tasks like the VRST [47]. Therefore, while the existing literature supports the feasibility of using incongruent-only formats, further validation is needed to generalized this approach to novel digital assessment such as the VRST.
Future studies should include a broader range of MCI subtypes to improve the generalizability of VR-derived behavioral markers. Incorporating neuroimaging data, particularly from the prefrontal cortex, could help clarify whether observed VR performance differences are linked to underlying neural degeneration. In addition, future versions of the VRST may benefit from including congruent or neutral control conditions to enhance construct validity and enable more fine-grained comparisons between inhibitory and non-inhibitory processing. Finally, simplifying the VR interface or providing adaptive guidance may enhance usability for older adults unfamiliar with VR technology, supporting wider adoption in real-world screening contexts.
Conclusions
VR-derived behavioral markers reflecting both motor and cognitive dysfunction—particularly inhibitory control—were shown to be more clinically informative for identifying MCI than a conventional screening tool. This study highlights the potential of using 3D trajectory length and hesitation latency, derived from the VRST, as sensitive and scalable digital biomarkers. These findings suggest that immersive VR assessments can serve as ecologically valid alternatives to traditional neuropsychological tools, offering a feasible and engaging method for early MCI detection in both clinical and community-based settings.
Acknowledgements
The author would like to acknowledge the participants who took part in the study for their cooperation in this work.
Abbreviations
- AD
Alzheimer’s disease
- AUC
Area under the curve
- CBT
Corsi block test
- MCI
Mild cognitive impairment
- MoCA-K
Korean version of the Montreal Cognitive Assessment
- MMSE-K
Korean version of the Mini-Mental Status Examination
- ROC
Receiver operating characteristic
- VRST
Virtual reality-based stroop test
Author contributions
The author confirms sole responsibility for the following: study conception and design, data collection, analysis and interpretation of results, and manuscript preparation.
Funding
This work was supported by the Soonchunhyang University Research Fund and the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by Ministry of Education (no. 2021R1I1A3041487).
Data availability
The data sets generated during and/or analyzed during this study are available from the corresponding author on reasonable request.
Declarations
Ethics approval and consent to participate
The study was approved by the Institutional Review Board of Soonchunhyang University (IRB No. 202306-SB-070). All the participants signed an informed consent prior to the start of the study.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Petersen RC. Mild cognitive impairment as a diagnostic entity. J Intern Med. 2004;256(3):183–94. 10.1111/j.1365-2796.2004.01388.x. [DOI] [PubMed] [Google Scholar]
- 2.Chang CY, Silverman DH. Accuracy of early diagnosis and its impact on the management and course of alzheimer’s disease. Expert Rev Mol Diagn. 2004;4(1):63–9. 10.1586/14737159.4.1.63. [DOI] [PubMed] [Google Scholar]
- 3.Waldemar G, Dubois B, Emre M, Georges J, McKeith IG, Rossor M, et al. EFNS. Recommendations for the diagnosis and management of alzheimer’s disease and other disorders associated with dementia: EFNS guideline. Eur J Neurol. 2007;14(1):e1–26. 10.1111/j.1468-1331.2006.01605.x. [DOI] [PubMed] [Google Scholar]
- 4.Park JH, Park JH. A systematic review of computerized cognitive function tests for the screening of mild cognitive impairment. Korea J Occup Ther. 2016;24(2):19–31. 10.14519/jksot.2016.24.2.02. [Google Scholar]
- 5.Park JH. Machine-learning algorithms based on screening tests for mild cognitive impairment. Am J Alzheimers Dis Other Demen. 2020;35:1533317520927163. 10.1177/1533317520927163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Park JH, Jung M, Kim J, Park HY, Kim JR, Park JH. Validity of a novel computerized screening test system for mild cognitive impairment. Int Psychogeriatr. 2018;30(10):1455–63. 10.1017/S1041610218000923. [DOI] [PubMed] [Google Scholar]
- 7.Tran J, Nimojan T, Saripella A, Tang-Wai DF, Butris N, Kapoor P, et al. Rapid cognitive assessment tools for screening of mild cognitive impairment in the preoperative setting: a systematic review and meta-analysis. J Clin Anesth. 2022;78:110682. 10.1016/j.jclinane.2022.110682. [DOI] [PubMed] [Google Scholar]
- 8.Ghose S, Das S, Poria S, Das T. Short test of mental status in the detection of mild cognitive impairment in India. Indian J Psychiatry. 2019;61(2):184–91. 10.4103/psychiatry.IndianJPsychiatry_145_18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Rami L, Molinuevo JL, Sanchez-Valle R, Bosch B, Villar A. Screening for amnestic mild cognitive impairment and early alzheimer’s disease with M@T (Memory alteration Test) in the primary care population. Int J Geriatr Psychiatry. 2007;22(4):294–304. 10.1002/gps.1672. [DOI] [PubMed] [Google Scholar]
- 10.Rycroft SS, Quach LT, Ward RE, Pedersen MM, Grande L, Bean JF. The relationship between cognitive impairment and upper extremity function in older primary care patients. J Gerontol Biol Sci Med Sci. 2019;74(4):568–74. 10.1093/gerona/gly246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Park JH. Discriminant power of smartphone-derived keystroke dynamics for mild cognitive impairment compared to a neuropsychological screening test: cross-sectional study. J Med Internet Res. 2024;26:e59247. 10.2196/59247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rabinowitz I, Lavner Y. Association between finger tapping, attention, memory, and cognitive diagnosis in elderly patients. Percept Mot Skills. 2014;119(1):259–78. 10.2466/10.22.PMS.119c12z3. [DOI] [PubMed] [Google Scholar]
- 13.Park B, Kim Y, Park J, Choi H, Kim SE, Ryu H, Seo K. Integrating biomarkers from virtual reality and magnetic resonance imaging for the early detection of mild cognitive impairment using a multimodal learning approach: validation study. J Med Internet Res. 2024;26:e54538. 10.2196/54538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kim SY, Park J, Choi H, Loeser M, Ryu H, Seo K. Digital marker for early screening of mild cognitive impairment through hand and eye movement analysis in virtual reality using machine learning: first validation study. J Med Internet Res. 2023;25:e48093. 10.2196/48093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Garcia-Betances RI, Jiménez-Mixco V, Arredondo MT, Cabrera-Umpiérrez MF. Using virtual reality for cognitive training of the elderly. Am J Alzheimers Dis Other Demen. 2015;30(1):49–54. 10.1177/1533317514545866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Yan M, Yin H, Meng Q, Wang S, Ding Y, Li G, et al. A virtual supermarket program for the screening of mild cognitive impairment in older adults: diagnostic accuracy study. JMIR Serious Games. 2021;9(4):e30919. 10.2196/30919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mizukami K, Taguchi M, Kouketsu T, Sato N, Tanaka Y, Iwakiri M, et al. A cognitive function test utilizing eye-tracking technology in virtual reality is useful to distinguish between normal cognition, MCI and mild dementia. Arch Gerontol Geriatr Plus. 2024;1(4):100070. 10.1016/j.aggp.2024.100070. [DOI] [PubMed] [Google Scholar]
- 18.Lee JY, Lee DW, Cho SJ, Na DL, Jeon HJ, Kim SK, et al. Brief screening for mild cognitive impairment in elderly outpatient clinic: validation of the Korean version of the Montreal cognitive assessment. J Geriatr Psychiatry Neurol. 2008;21(2):104–10. 10.1177/0891988708316855. [DOI] [PubMed] [Google Scholar]
- 19.Kim TY, Kim S, Sohn JE, Lee EA, Yoo BG, Lee SC, et al. Development of the Korean Stroop test and study of the validity and the reliability. J Korean Geriatr Soc. 2004;8(4):233–40. [Google Scholar]
- 20.Monaco M, Costa A, Caltagirone C, Carlesimo GA, Forward. Neurol Sci. 2013;34(5):749–54. 10.1007/s10072-012-1130-x. and backward span for verbal and visuo-spatial data: standardization and normative data from an Italian adult population. [DOI] [PubMed]
- 21.Kessels RP, van Zandvoort MJ, Postma A, Kappelle LJ, de Haan EH. The corsi block-tapping task: standardization and normative data. Appl Neuropsychol. 2000;7(4):252–8. 10.1207/S15324826AN0704_8. [DOI] [PubMed] [Google Scholar]
- 22.Mathiowetz V, Volland G, Kashman N, Weber K. Adult norms for the box and block test of manual dexterity. Am J Occup Ther. 1985;39(6):386–91. 10.5014/ajot.39.6.386. [DOI] [PubMed] [Google Scholar]
- 23.Lee TY. Normative values for the grooved pegboard test in adult. Phys Ther Korea. 2001;8(2):87–94. [Google Scholar]
- 24.Durgin FH. The reverse Stroop effect. Psychon Bull Rev. 2000;7(1):121–5. 10.3758/bf03210730. [DOI] [PubMed] [Google Scholar]
- 25.Sternberg S. Memory-scanning: mental processes revealed by reaction-time experiments. Am Sci. 1969;57(4):421–57. 10.2307/27828738. [PubMed] [Google Scholar]
- 26.Shiffrin RM, Schneider W. Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. Psychol Rev. 1977;84(2):127–90. 10.1037/0033-295X.84.2.127. [Google Scholar]
- 27.Ratcliff R, Smith PL, Brown SD, McKoon G. Diffusion decision model: current issues and history. Trends Cogn Sci. 2016;20(4):260–81. 10.1016/j.tics.2016.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kawato M. Internal models for motor control and trajectory planning. Curr Opin Neurobiol. 1999;9(6):718–27. 10.1016/s0959-4388(99)00028-8. [DOI] [PubMed] [Google Scholar]
- 29.Nasreddine ZS, Phillips NA, Bedirian V, Charbonneau S, Whitehead V, Collin I, et al. The Montreal cognitive assessment, moca: a brief screening tool for mild cognitive impairment. J Am Geriatr Soc. 2005;53(4):695–9. 10.1111/j.1532-5415.2005.53221.x. [DOI] [PubMed] [Google Scholar]
- 30.Baddeley A. Working memory and language: an overview. J Commun Disord. 2003;36(3):189–208. 10.1016/s0021-9924(03)00019-4. [DOI] [PubMed] [Google Scholar]
- 31.Aggarwal NT, Wilson RS, Beck TL, Bienias JL, Bennett DA. Motor dysfunction in mild cognitive impairment and the risk of incident alzheimer disease. Arch Neurol. 2006;63(12):1763–9. 10.1001/archneur.63.12.1763. [DOI] [PubMed] [Google Scholar]
- 32.Louis E, Schupf N, Manly J, Marder K, Tang MX, Mayeux R. Association between mild parkinsonian signs and mild cognitive impairment in a community. Neurology. 2005;64(7):1157–61. 10.1212/01.WNL.0000156157.97411.5E. [DOI] [PubMed] [Google Scholar]
- 33.Montero-Odasso M, Speechley M, Muir‐Hunter SW, Sarquis‐Adamson Y, Sposato LA, Hachinski V, et al. Motor and cognitive trajectories before dementia: results from gait and brain study. J Am Geriatr Soc. 2018;66(9):1676–83. 10.1111/jgs.15341. [DOI] [PubMed] [Google Scholar]
- 34.Toosizadeh N, Najafi B, Reiman EM, Mager RM, Veldhuizen JK, O’Connor K, et al. Upper-extremity dual-task function: an innovative method to assess cognitive impairment in older adults. Front Aging Neurosci. 2016;8:167. 10.3389/fnagi.2016.00167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Bangert AS, Balota DA. Keep up the pace: declines in simple repetitive timing differentiate healthy aging from the earliest stages of alzheimer’s disease. J Int Neuropsychol Soc. 2012;18(6):1052–63. 10.1017/S1355617712000860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Anguera JA, Reuter-Lorenz PA, Willingham DT, Seidler RD. Failure to engage Spatial working memory contributes to age-related declines in visuomotor learning. J Cogn Neurosci. 2011;23(1):11–25. 10.1162/jocn.2010.21451. [DOI] [PubMed] [Google Scholar]
- 37.Saunders NL, Summers MJ. Attention and working memory deficits in mild cognitive impairment. J Clin Exp Neuropsychol. 2010;32(4):350–7. 10.1080/13803390903042379. [DOI] [PubMed] [Google Scholar]
- 38.Fiedman NP, Miyake A. The relations among Inhibition and interference control functions: A latent-variable analysis. J Exp Psychol Gen. 2004;133(1):101–35. 10.1037/0096-3445.133.1.101. [DOI] [PubMed] [Google Scholar]
- 39.Song Y, Hakoda Y. An fMRI study of the functional mechanisms of Stroop/reverse-Stroop effects. Behav Brain Res. 2015;290:187–96. 10.1016/j.bbr.2015.04.047. [DOI] [PubMed] [Google Scholar]
- 40.Nachev P, Kennard C, Husain M. Functional role of the supplementary and pre-supplementary motor areas. Nat Rev Neurosci. 2008;9(11):856–69. 10.1038/nrn2478. [DOI] [PubMed] [Google Scholar]
- 41.Colella D, Guerra A, Paparella G, Cioffi E, Di Vita A, Trebbastoni A, et al. Motor dysfunction in mild cognitive impairment as tested by kinematic analysis and transcranial magnetic stimulation. Clin Neurophysiol. 2021;132(2):315–22. 10.1016/j.clinph.2020.10.028. [DOI] [PubMed] [Google Scholar]
- 42.Rabi R, Vasquez BP, Alain C, Hasher L, Belleville S, Anderson ND. Inhibitory control deficits in individuals with amnestic mild cognitive impairment: a meta-analysis. Neuropsychol Rev. 2020;30(1):97–125. 10.1007/s11065-020-09438-0. [DOI] [PubMed] [Google Scholar]
- 43.Puente AN, Faraco C, Terry DP, Brown C, Miller LS. Minimal functional brain differences between older adults with and without mild cognitive impairment during the Stroop. Aging Neuropsychol Cogn. 2014;21(3):346–69. 10.1007/s11065-020-09428-6. [DOI] [PubMed] [Google Scholar]
- 44.Mudar RA, Chiang HS, Eroh J, Nguyen LT, Maguire MJ, Spence JS, et al. The effects of amnestic mild cognitive impairment on go/nogo semantic categorization task performance and event-related potentials. J Alzheimers Dis. 2016;50(2):577–90. 10.3233/JAD-150586. [DOI] [PubMed] [Google Scholar]
- 45.Rizzo AS, Koenig ST. Is clinical virtual reality ready for primetime? Neuropsychology. 2017;31(8):877–99. 10.1037/neu0000405. [DOI] [PubMed] [Google Scholar]
- 46.Realdon O, Rossetto F, Nalin M, Baroni I, Cabinio M, Fioravanti R, et al. Technology-enhanced multi-domain at home continuum of care program with respect to usual care for people with cognitive impairment: the Ability-TelerehABILITation study protocol for a randomized controlled trial. BMC Psychiatry. 2016;16(1):425. 10.1186/s12888-016-1132-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Pinho JP, Azevedo APS, Serrão JC, Forner-Cordero A, Amadio AC, Mezêncio B. Aging effects of haptic input on postural control under a dual-task paradigm. Exp Gerontol. 2022;168:111928. 10.1016/j.exger.2022.111928. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
Data Availability Statement
The data sets generated during and/or analyzed during this study are available from the corresponding author on reasonable request.


