Abstract
Background
COVID-19 has highlighted the need for remote cognitive testing. Virtual testing may lessen burden and can reach a larger patient population. The reliability and validity of virtual cognitive testing in Parkinson disease (PD) is unknown.
Objectives
To validate neuropsychological tests for virtual administration in PD.
Methods
Participants enrolled in an observational, cognition-focused study completed a rater-administered cognitive battery in-person and via video conference 3–7 days apart. Order of administration was counterbalanced. Analyses to compare performance by type of administration (virtual versus in-person) included paired t-test, intraclass correlation (ICC) and linear mixed-effects models.
Results
Data for 35 (62.9% male) PD participants (62.5% normal cognition, 37.5% cognitive impairment) were analyzed. Only the semantic verbal fluency test demonstrated a difference in score by administration type, with a significantly better score when administered virtually (paired t-test p = 0.011 and linear mixed-effects model p = 0.012). Only the Dementia Rating Scale-2, Trails A test and phonemic verbal fluency demonstrated good reliability (ICC value 0.75–0.90) for virtual versus in-person administration, and values for visit 1 versus visit 2 were similarly low overall. Trail making tests were successfully administered virtually to only 18 (51.4%) participants due to technical issues.
Conclusions
Virtual cognitive testing overall is feasible in PD, and virtual and in-person cognitive testing generate similar scores at the group level, but reliability is poor or moderate for most tests. Given that mode of test administration, learning effects and technical dificulties explained relatively little of the low test-retest reliability observed, there may be significant short-term variability in cognitive performance in PD in general, which has important implications for clinical care and research.
Keywords: Parkinson disease, cognition, virtual testing
Introduction
There is increasing interest in virtual administration of motor and non-motor assessments in Parkinson disease (PD), partially to allow more frequent and informative testing, as well as to minimize patient or participant burden. This trend has been greatly accelerated by the ongoing COVID-19 pandemic.
Cognitive assessments are a key component of clinical care and many clinical research projects, including randomized controlled trials (RCTs), yet there is little data reporting on the validity of virtual non-motor assessments in PD. If it can be determined that virtual administration of cognitive assessments is reliable and valid, the results could have a significant impact on how PD clinical care is delivered and clinical research is conducted in the future.
A systematic review indicated good reliability of virtual assessments compared with in-person assessments to diagnose dementia in general(1). There has been one study demonstrating good retest reliability for in-person versus virtual administration of the Montreal Cognitive Assessment (MoCA) in a cohort of non-PD elderly individuals with and without cognitive impairment(2). A meta-analysis of cognitive testing via virtual conferencing in geriatric populations also indicated a high potential for virtual administration as a substitute for in-person testing(3).
Importantly, a study measuring computer literacy and its effect on both online and in-person cognitive testing suggested that older populations demonstrate worse computer literacy and perform worse on both online and in-person cognitive testing and indicated a need to correct for computer literacy when examining online cognitive test scores, specifically for tests that require motor coordination and processing speed(4).
As an alternative to traditional, in-person, paper-and-pencil cognitive testing, remote computerized cognitive testing in various iterations is becoming increasingly common(5). This includes unsupervised, self-completed cognitive testing, with some batteries already piloted in PD (Cogstate Brief Battery)(6), others with previous supervised versions extensively used in PD (CANTAB Connect)(7), and other new batteries not used yet in populations with demonstrated cognitive impairment (Amsterdam Cognition Scan)(8). However, for the time being, supervised cognitive testing, whether traditional paper-and-pencil tests or computerized testing, remains most commonly used in clinical care and clinical research.
The objective of this study was to determine the reliability of virtual versus in-person administration of commonly used cognitive assessments in PD. We hypothesized that virtual administration of cognitive assessments would have high agreement with in-person administration, which would support virtual administration of standard cognitive assessments in the context of both clinical care and clinical research.
Methods
Participants
35 Parkinson’s disease patients with a range of cognitive abilities (65.7% normal cognition, 28.6% MCI and 5.7% mild dementia based on consensus diagnosis as previously outlined(9)), were recruited from the NIA U19 Clinical Core at the University of Pennsylvania (U19 AG062418). Subjects were required to have a MoCA score ≥ 20 as well as reliable internet connection to participate. Subjects were asked to complete the virtual portion of the assessment on a laptop, desktop, or tablet, although two participants completed on a smart phone due to technical issues.
Assessments
Neuropsychological testing
Three trained raters administered a comprehensive neuropsychological battery assessing global cognition and the five major cognitive domains. Tests administered were the MoCA (version 7.1)(10), Mattis Dementia Rating Scale 2 (DRS-2)(11), verbal fluency phonemic (FAS) and semantic (animals) test(12), Hopkins Verbal Learning Test-Revised (HVLT-R)(13), Letter-Number Sequencing (LNS)(14), Symbol Digit Modalities Test (SDMT)(15), Clock Drawing Test(16), Trail Making Test A and B(17), Judgment of Line Orientation (JLO)(18) and Boston Naming Test (BNT)(19). For the purposes of retest reliability analyses, follow-up testing was performed within 3–7 days (mean 5.37 ±1.7 days), and order of administration type (virtual or in-person) was randomized and counterbalanced. To address possible practice effects, Form 1 and Form 4 of the HVLT-R were utilized, and administered in a randomized fashion as well. A randomization schedule for administration order and HVLT-R version was created and adhered to as closely as patient scheduling allowed. The same assessor completed both the virtual and in-person visit for each participant to minimize impact of inter-rater variability.
Virtual testing
For virtual testing, participants were asked to meet via the BlueJeans or Zoom video conferencing applications. Prior to virtual testing, participants were mailed a “virtual test packet” that included blank paper for drawing, as well as testing templates for some written tests (i.e., Trails A and B, SDMT, Clock Draw and MoCA). A PowerPoint displayed relevant images and instructions that would otherwise be shown using a stimulus booklet or template to be used in conjunction with the virtual test packet (supplementary material). Some images on the PowerPoint were used for instructional purposes (e.g., SDMT, and Trails A and B), some were presented for participants to draw in their own packets (e.g., DRS-2 and MoCA), and other tests required the participant to describe what they saw on-screen (e.g., JLO, BNT and MoCA). Participants were asked to use either a laptop (N = 19), desktop (N = 6) or tablet (N = 8) to adequately view images presented to them on screen; however, two patients completed the testing via the BlueJeans or Zoom mobile app on their smartphone due to technical dificulties and reported no issues. Raters used either a desktop or laptop to administer and supervise tests.
Participants seen in-person first were provided with a stamped and addressed envelope and asked to mail back the completed test packet once their virtual test was complete. Those seen virtually first returned their mailed packet at their follow-up in-person visit. All virtual tests packets were returned.
Other assessments
Other clinical assessments included the Unified Parkinson’s Disease Rating Scale (UPDRS) Part III motor score(20), Geriatric Depression Scale–15 Item (GDS-15)(21), Hoehn and Yahr stage(20) and total levodopa equivalent daily dose (LEDD)(22). These data were collected at the first visit regardless of administration type, with exception of the UPDRS III motor score, which was obtained at the in-person visit.
Functional assessments
Functional assessments were administered to assess daily functioning and to assist in the consensus cognitive diagnosis process. These tests include the Penn Parkinson’s Daily Activity Questionnaire 15 (PDAQ-15)(23) (to be completed by both patient and knowledgeable informant, if available, the Activities of Daily Living (ADLi) Questionnaire(24) (to be completed by either knowledgeable informant (preferred) or patient), UPDRS Part II score(20), and Schwab and England score(20). 26 and 7 ADLIs, and 24 and 33 PDAQs, were completed by a knowledgeable informant and patient, respectively. We utilized the knowledgeable informant ADLI and patient PDAQ for analyses. These informants did not assist in completing any of the cognitive testing. These assessments were administered by the rater at the first visit, with exception of questionnaires completed by knowledgeable informants, which were self-completed and returned via mail.
Consensus cognitive diagnosis process
Each participant received a consensus cognitive diagnosis (normal cognition, mild cognitive impairment or dementia) by a trained panel of raters, as previously described(9).
Statistical Analyses
Recruitment began in response to and soon after COVID-19 pandemic onset, and continued until routine in-person resumed (November 2020 – August 2022). Descriptive statistics (percentages and means and standard deviations) were utilized for key demographics, cognitive tests, functional assessments, and other non-motor assessments. Paired t-tests were used to determine the difference in average performance between in-person and virtual tests, as well as between visit one and visit two. Raw scores were used for all analyses.
Intraclass correlation coefficients (ICC) were run to assess the reliability of tests at each visit type. These were two-way mixed absolute agreement correlations, with cutoffs ≥ 0.90 being excellent, 0.75–0.9 good, 0.5–0.75 moderate, and < 0.5 poor reliability. Retest reliability based on visit number (i.e., visit 1 versus visit 2) was also examined. Finally, linear mixed-effects models (LMM) were performed to assess the effect of both administration type and visit order number on cognitive test scores. Fixed effects included administration type, visit order number, age, PD duration, education, and sex. A random intercept term was included in the mixed-effects model to account for the correlations of the cognitive scores.
Given this is a pilot study, uncorrected p value < 0.05 was considered to be significant. All statistical tests are two-sided. Statistical analyses were done using SPSS (version 28(14)).
Results
Participant characteristics
Descriptive information for the cohort is in Table 1. Of the 35 participants, 62.9% were male, and all were white. The mean (SD) age was 69.11 (7.79), education 16.66 (2.09) years, and disease duration 10.46 (5.26) years. Regarding consensus cognitive diagnosis, 23 (65.7%) had normal cognition, 10 (28.6%) mild cognitive impairment (MCI), and 2 (5.7%) dementia.
Table 1.
Participant characteristics
| Characteristic | N | Value (Mean (SD)) or Percentage | |
|---|---|---|---|
| Demographics | |||
| Sex (% Male) | 35 | 62.9 | |
| Age (years) | 35 | 69.1 (7.8) | |
| Education (years) | 35 | 16.7 (2.1) | |
| Race (% White) | 35 | 100 | |
| PD duration (years) | 35 | 10.5 (5.3) | |
| Clinical Assessments | |||
| Total LEDD1 (mg/day) | 35 | 906.5 (570.8) | |
| UPDRS2 III | 35 | 23.3 (11.7) | |
| Geriatric Depression Scale – 15 Item | 35 | 2.6 (2.3) | |
| Hoehn & Yahr stage | 35 | 2.4 (0.7) | |
| Cognitive Assessments (in-person results) | |||
| DRS-23 | 35 | 136.1 (5.9) | |
| MoCA4 | 35 | 26.7 (2.6) | |
| Clock Draw Test | 35 | 5.2 (1.1) | |
| Boston Naming Test | 35 | 57.5 (3.2) | |
| Judgment of Line Orientation | 35 | 23.3 (4.3) | |
| Trails A | 35 | 42.6 (17.0) | |
| Trails B | 35 | 99.5 (57.8) | |
| Symbol Digit Modalities Test | 35 | 33.9 (12.0) | |
| HVLT-R5 Total Immediate | 35 | 22.2 (5.4) | |
| HVLT-R Delayed | 35 | 6.7 (3.7) | |
| HVLT-R Recognition Discrimination | 35 | 9.4 (2.3) | |
| Letter-Number Sequencing | 35 | 9.5 (2.1) | |
| Verbal Fluency – FAS | 35 | 45.7 (13.0) | |
| Verbal Fluency – Animals | 35 | 18.7 (5.8) | |
| Normal Cognition | Cognitive Impairment | ||
| 65.7 | 34.3 | ||
| Functional Assessments | |||
| Penn Daily Activities Questionnaire – 15 | 35 | 45.9 (10.7) | |
| Activities of Daily Living Inventory | 33 | 70.6 (10.2) | |
Levodopa Equivalency Daily Dose
Unified Parkinson Disease Rating Scale
Dementia Rating Scale-2
Montreal Cognitive Assessment
Hopkins Verbal Learning Test-Revised
Virtual versus in-person cognitive performance
Mean scores and t-tests
Average scores for each administration type were similar (Table 2). A paired t-test of in-person versus virtual scores did not find statistically significant differences for mode of administration for any of the cognitive tests except the semantic verbal fluency test (p = 0.01) (Table 2). Virtual testing on average took slightly longer (mean time = 66.1 ±10.11 minutes) to complete than in-person testing (mean time = 56.0 ±8.4 minutes).
Table 2.
Paired t-test of in-person and virtual cognitive test scores
| Test | N | Average change (SD)* | t | df | Two-sided p value |
|---|---|---|---|---|---|
| DRS-2 | 34 | −0.88 (3.013) | −1.708 | 33 | 0.10 |
| MoCA | 35 | −0.23 (2.250) | −0.601 | 34 | 0.55 |
| Clock Draw Test | 35 | −0.37 (1.352) | −1.625 | 34 | 0.11 |
| Boston Naming Test | 35 | −0.69 (2.207) | −1.838 | 34 | 0.08 |
| Judgment of Line Orientation | 35 | −0.97 (5.623) | −1.022 | 34 | 0.31 |
| Trails A | 32 | −3.69 (15.249) | −1.368 | 31 | 0.18 |
| Trails B | 19 | 2.58 (32.016) | 0.351 | 18 | 0.73 |
| Symbol Digit Modalities Test | 35 | −1.46 (13.086) | −0.660 | 34 | 0.51 |
| HVLT-R Total Immediate | 35 | 0.71 (5.050) | 0.837 | 34 | 0.41 |
| HVLT-R Delayed | 35 | −0.43 (3.822) | −0.663 | 34 | 0.51 |
| HVLT-R Recognition Discrimination | 35 | 0.23 (2.263) | 0.597 | 34 | 0.55 |
| Letter-Number Sequencing | 35 | 0.31 (2.285) | 0.814 | 34 | 0.42 |
| Verbal Fluency – FAS | 35 | −2.26 (8.511) | −1.569 | 34 | 0.13 |
| Verbal Fluency – Animals | 35 | −1.77 (3.904) | −2.684 | 34 | 0.01 |
Change is in-person minus virtual test performance
Not all assessments could be completed successfully at virtual visits. Only 32 (91.4%) and 19 (54.3%) participants were able to successfully complete written Trails A and B, respectively. Because administrators were unable to correct participants as per test instructions at the time of test administration, those who could not complete either of the written Trail making tests were marked as having an “administration error.”
Reliability
Intraclass correlations for virtual versus in-person testing demonstrated good reliability only for the DRS-2 (0.849), Trails A (0.754), and phonemic verbal fluency (0.815) (Table 3). The remaining 11 test scores showed poor or moderate reliability.
Table 3.
Intraclass correlations of in-person and virtual test scores
| Test | N | In-person Mean (SD) | Virtual Mean (SD) | Correlation Coefficient | Interpretation of Reliability | 95% CI |
|---|---|---|---|---|---|---|
| DRS-2 | 34 | 136.2 (5.9) | 137.1 (5.3) | 0.849 | Good | 0.718–0.922 |
| MoCA | 35 | 26.7 (2.6) | 27.0 (2.5) | 0.606 | Moderate | 0.345–0.779 |
| Clock Draw Test | 35 | 5.2 (1.1) | 5.5 (1.0) | 0.150 | Poor | −0.170–0.449 |
| Boston Naming Test | 35 | 57.5 (3.1) | 58.1 (2.5) | 0.683 | Moderate | 0.457–0.826 |
| Judgment of Line Orientation | 35 | 23.3 (4.31) | 24.2 (5.26) | 0.317 | Poor | −0.011–0.584 |
| Trails A | 32 | 42.6 (17.4) | 46.3 (25.7) | 0.754 | Good | 0.557–0/871 |
| Trails B | 19 | 79.3 (45.7) | 76.7 (32.5) | 0.685 | Moderate | 0.341–0.866 |
| Symbol Digit Modalities Test | 35 | 33.9 (12.0) | 35.3 (15.5) | 0.559 | Moderate | 0.281–0.750 |
| HVLT-R Total Immediate | 35 | 22.2 (5.4) | 21.5 (4.9) | 0.516 | Moderate | 0.226–0.722 |
| HVLT-R Delayed | 35 | 6.7 (3.7) | 7.1 (3.4) | 0.416 | Poor | 0.099–0.656 |
| HVLT-R Recognition Discrimination | 35 | 9.4 (2.3) | 9.2 (1.8) | 0.403 | Poor | 0.082–0.647 |
| Letter-Number Sequencing | 35 | 9.5 (2.1) | 9.2 (2.1) | 0.414 | Poor | 0.098–0.654 |
| Verbal Fluency – FAS | 35 | 45.7 (13.0) | 47.9 (15.4) | 0.815 | Good | 0.665–0.902 |
| Verbal Fluency – Animals | 35 | 18.7 (5.8) | 20.4 (5.8) | 0.745 | Moderate | 0.518–0.686 |
Linear mixed-effects models
To further explore the impact of administration type on cognitive test performance, and to control for important covariates, we ran linear mixed-effects models. Fixed effects were administration type (in-person versus virtual), visit order number (first versus second visit), sex, age at test, PD duration and years of education. Only the semantic verbal fluency test (p = 0.01) proved to be significantly impacted by mode of administration, with significantly better scores for virtual versus in-person administration (Table 4).
Table 4.
Linear mixed-effects models of in-person versus virtual test scores
| Test | Regression Coefficient | F Statistic | Numerator df | Denominator df | p | Direction1 |
|---|---|---|---|---|---|---|
| DRS-2 | −0.891 | 3.007 | 1 | 31.76 | 0.09 | Virtual |
| MoCA | −0.212 | 0.323 | 1 | 33.00 | 0.57 | Virtual |
| Clock Draw Test | −0.356 | 2.803 | 1 | 33.00 | 0.10 | Virtual |
| Boston Naming Test | −0.654 | 4.064 | 1 | 33.02 | 0.05 | Virtual |
| Judgment of Line Orientation | −0.905 | 1.066 | 1 | 33.00 | 0.31 | Virtual |
| Trails A | −3.942 | 2.092 | 1 | 30.83 | 0.16 | In Person |
| Trails B | 6.674 | 0.739 | 1 | 15.85 | 0.40 | Virtual |
| Symbol Digit Modalities Test | −1.511 | 0.464 | 1 | 33.00 | 0.50 | Virtual |
| HVLT-R Total Immediate | 0.629 | 0.478 | 1 | 32.00 | 0.49 | In Person |
| HVLT-R Delayed | −0.549 | 0.677 | 1 | 32.00 | 0.42 | Virtual |
| HVLT-R Recognition Discrimination | 0.193 | 0.236 | 1 | 32.00 | 0.63 | In Person |
| Letter-Number Sequencing | 0.327 | 0.721 | 1 | 33.00 | 0.42 | In Person |
| Verbal Fluency – FAS | −2.229 | 2.361 | 1 | 33.00 | 0.13 | Virtual |
| Verbal Fluency – Animals | −1.776 | 7.040 | 1 | 33.01 | 0.01 | Virtual |
Direction indicates which administration type performed better
Overall retest reliability
We also assessed retest reliability for visit 1 versus visit 2 scores. ICCs found that, once again, only the DRS-2, Trails A, and phonemic verbal fluency tests demonstrated good retest reliability when administered 3–7 days apart, regardless of the mode of administration (Table 5). All other tests showed poor or moderate reliability.
Table 5.
Intraclass correlations of test scores at visit 1 and visit 2
| Test | N | Visit 1 Mean (SD) | Visit 2 Mean (SD) | Correlation Coefficient | Interpretation of Reliability | 95% CI |
|---|---|---|---|---|---|---|
| DRS-2 | 34 | 136.3 (5.5) | 137.0 (5.7) | 0.849 | Good | 0.720–0.837 |
| MoCA | 35 | 26.6 (2.7) | 27.1 (2.3) | 0.610 | Moderate | 0.357–0.781 |
| Clock Draw Test | 35 | 5.1 (1.2) | 5.6 (0.8) | 0.179 | Poor | −0.117–0.462 |
| Boston Naming Test | 35 | 57.2 (3.0) | 58.4(2.5) | 0.691 | Moderate | 0.389–0.845 |
| Judgment of Line Orientation | 35 | 22.6 (5.5) | 24.9(3.6) | 0.350 | Poor | 0.044–0.602 |
| Trails A | 32 | 44.3 (19.3) | 44.5 (24.5) | 0.752 | Good | 0.549–0.871 |
| Trails B | 19 | 77.2 (34.2) | 78.8 (44.5) | 0.684 | Moderate | 0.339–0.866 |
| Symbol Digit Modalities | 35 | 35.5(14.9) | 33.7 (12.7) | 0.560 | Moderate | 0.284–0.750 |
| HVLT-R Total Immediate | 35 | 21.9 (4.6) | 21.8 (5.6) | 0.513 | Moderate | 0.217–0.722 |
| HVLT-R Delayed | 35 | 7.3(3.2) | 6.5(3.8) | 0.424 | Poor | 0.118–0.658 |
| HVLT-R Recognition Discrimination | 35 | 9.5 (1.8) | 9.0 (2.2) | 0.410 | Poor | 0.102–0.649 |
| Letter-Number Sequencing | 35 | 9.1 (2.2) | 9.5 (2.0) | 0.417 | Poor | 0.106–0.655 |
| Verbal Fluency – FAS | 35 | 46.3 (13.1) | 47.3 (15.3) | 0.814 | Good | 0.664–0.902 |
| Verbal Fluency – Animals | 35 | 19.6(5.5) | 19.5 (6.3) | 0.739 | Moderate | 0.540–0.859 |
Examining the impact of visit 1 versus visit 2 on cognitive test performance in linear mixed-effects models while accounting for administration type, Clock Draw (p = 0.02), BNT (p = 0.001), and JLO (p = 0.01) performance were significantly better at the second visit compared with visit one.
Discussion
As telemedicine becomes more widely used, both clinically and in clinical research, the need to administer cognitive testing virtually has grown. Our results show that in PD, overall cognitive test performance is similar when administered virtually versus in-person, but that there is significant variability in test performance over the short-term regardless of the mode of administration.
Average cognitive test scores for virtual testing were similar to in-person testing, and in the linear mixed-effects models, mode of administration did not predict test performance, except for better performance for several tests when administered virtually. However, the retest reliability for virtual versus in-person testing was poor to moderate for most tests, which prompted us to examine overall retest reliability (i.e., visit 1 versus visit 2). The findings were similar, with most tests showing poor retest reliability from visit 1 to visit 2, separated by just 3–7 days.
The results suggest that there are significant short-term fluctuations in cognitive performance in PD patients, which has implications for interpreting a single or one-time test score in the context of clinical care and clinical research. Cognitive fluctuations can occur in PD patients who are being treated with levodopa to manage their motor symptoms, as part of non-motor fluctuations(25). In addition, fluctuations in cognition, attention, and arousal are a core clinical feature for the diagnosis of dementia with Lewy bodies(26), a disorder related to PD. On a clinical questionnaire about PD features, 15 (42.9%) of our participants self-reported experiencing cognitive fluctuations that could explain, in part, the variability in test scores we found in such a short period of time. The reasons for low retest reliability could differ between those participants with and without a diagnosis of cognitive impairment, but our small sample size prevented such secondary analyses. Future testing with larger cohorts and a wide range of cognitive abilities will help determine which PD patients are most appropriate for virtual cognitive testing. Alternatively, some of the low retest reliability that we found may be inherent to the tests themselves.
None of the variables included in the linear mixed-effects models (i.e., administration type, visit number, sex, age at test, PD duration, and education) had a significant effect, except visit number on the BNT, JLO, and Clock Draw, which may have been due to the practice (i.e., learning) effects, as the same test version was administered at each visit. To our knowledge, parallel versions of these tests are not available. Alternate versions of the MoCA were considered but were not utilized as at the time of study initiation they were not confirmed to be interchangeable with the original version.
Virtual administration of testing introduced novel limitations not normally seen with in-person administration. Unavoidable technological issues, such as limited internet connectivity and audiovisual issues were problematic for some participants. Screen size was limited by the participants’ choice of device, reflecting the variability of devices used in the general population, and smaller screen sizes may have influenced performance on tests that largely rely on visual stimuli (e.g., JLO, BNT and MoCA). Larger sample sizes using a range of devices and screen sizes, and examining results by type of device, are needed to evaluate this effect. Aging populations tend to have advanced impairment compounded by additional comorbidities that can lead to trouble understanding and troubleshooting computers or similar technology, sometimes requiring patience and thorough instructions. Additionally, older populations tend to have more difficulty with hearing, which is only exacerbated by the slightly muffled audio quality on video chats despite increased volume settings on their end. Finally, there were three instances of suspected “cheating” at virtual visits. Although unconfirmed, it was suspected that one patient may have completed the SDMT in their virtual testing packet before or after the test, and the others may have written down the words on the HVLT-R (verbal memory test). The Trail Making Test proved incompatible with virtual administration for some participants, as raters were unable to directly observe participants completing the tests and were therefore unable to correct them as required by administrator test instructions. This occurred in 3 (8.6%) participants for Trials A and 16 (45.7%) participants for Trails B.
Virtual administration of cognitive tests is limited to those who have reliable internet access and technology that can support video conferencing. Also, we did not attempt virtual testing with patients with moderate-severe dementia, as we did not think it would be feasible to assess the participants effectively. Additionally, a certain degree of computer literacy is required, which is especially a problem in older, cognitively-impaired cohorts. However, our cohort was highly educated, and likely has a level of computer literacy, and access to high-quality devices and internet connectivity, that is not generalizable to the PD patients in general. Despite this, a virtual option makes cognitive testing much more accessible for non-local participants, and especially so for PD patients with advanced motor disabilities. Raters were limited in their ability to accurately or immediately score some tests, such as the Clock Draw Test and Trail Making Test, until the test packet was returned via mail by participants, instead of relying on screenshots taken during testing. Thus, obtaining accurate data was dependent on both the patient and the mail to return the packets, though this did not prove to be an issue for our cohort. Finally, scheduling constraints prevented the two visits from being conducted at the same time of day for each participant, although all participants were evaluated in an “on” state by self-report.
This study provides preliminary evidence that virtual administration of cognitive testing is feasible in PD and produces similar results to traditional in-person testing for numerous global and detailed cognitive tests at the group level. In a somewhat unexpected additional finding, there was significant short-term variability in cognitive test performance overall, regardless the mode of administration, which has implications for interpreting cognitive test results from a single session administered as part of clinical care or clinical research. Future studies with larger sample sizes, are needed to further evaluate virtual testing as a possible substitute for traditional in-person testing, and to explain variability in performance. Regardless any limitations, in a typically older population for which in-person clinical or clinical research visits can be a challenge, virtual cognitive testing in PD is feasible and convenient.
Acknowledgments:
This study was funded by the NIA U19-AG062418 (Center on Alpha-synuclein Strains in Alzheimer Disease & Related Dementias). The funder played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript.
Competing Interests:
DW: In the past year Dr. Weintraub has received research funding or support from Michael J. Fox Foundation for Parkinson’s Research, Alzheimer’s Therapeutic Research Initiative (ATRI), Alzheimer’s Disease Cooperative Study (ADCS), International Parkinson and Movement Disorder Society (IPMDS), National Institute on Health (NIH), Parkinson’s Foundation; U.S.
Department of Veterans Affairs and Acadia Pharmaceuticals; honoraria for consultancy from Acadia Pharmaceuticals, Alkahest, Aptinyx, Cerevel Therapeutics, CHDI Foundation, Clintrex LLC (Otsuka), EcoRl Capital, Eisai, Perring, Gray Matter Technologies, Great Lake Neurotechnologies, Intra-Cellular Therapies, Janssen, Merck, Sage, Scion and Signant Health; and license fee payments from the University of Pennsylvania for the QUIP and QUIP-RS.
SX: Dr. Xie receives research funding from U.S. National Institute of Health; serves as a consultant to Roche on a cancer clinical trial.
Authors JG, EM, SS, and BT declare no financial or non-financial competing interests.
Footnotes
Supplementary Files
Ethical Compliance Statement:
This study was approved by the University of Pennsylvania IRB. All subjects provided written consent to participate in this study, which was scanned and uploaded to the Penn Integrated Neurodegenerative Disease Database. We confirm that we have read the Journal’s position on issues involved in ethical publication and affirm that this work is consistent with those guidelines.
Contributor Information
Daniel Weintraub, University of Pennsylvania Perelman School of Medicine.
Julia Gallagher, University of Pennsylvania Perelman School of Medicine.
Eugenia Mamikonyan, University of Pennsylvania.
Sharon Xie, University of Pennsylvania Perelman School of Medicine.
Baochan Tran, University of Pennsylvania Perelman School of Medicine.
Sarah Shaw, University of Pennsylvania Perelman School of Medicine.
Data Availability:
The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.
References
- 1.Watt JA, Lane NE, Veroniki AA, Vyas MV, Williams C, Ramkissoon N, et al. Diagnostic accuracy of virtual cognitive assessment and testing: Systematic review and meta-analysis. J Am Geriatr Soc. 2021;69(6):1429–40. [DOI] [PubMed] [Google Scholar]
- 2.Iiboshi K, Yoshida K, Yamaoka Y, Eguchi Y, Sato D, Kishimoto M, et al. A validation study of the remotely administered Montreal Cognitive Assessment tool in the elderly Japanese population. Telemed J E Health. 2020;26(7):920–8. [DOI] [PubMed] [Google Scholar]
- 3.Gentry MT, Lapid MI, Rummans TA. Geriatric telepsychiatry: systematic review and policy considerations. Am J Geriatr Psychiatry. 2019;27(2):109–27. [DOI] [PubMed] [Google Scholar]
- 4.Lee Meeuw Kjoe PR, Agelink van Rentergem JA, Vermeulen IE, Schagen SB. How to correct for computer experience in online cognitive testing? Assessment. 2021;28(5):1247–55. [DOI] [PubMed] [Google Scholar]
- 5.Owens AP, Ballard C, Beigi M, Kalafatis C, Brooker H, Lavelle G, et al. Implementing remote memory clinics to enhance clinical care during and after COVID-19. Front Psychiatry. 2020;11:579934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Dobbs B, Pawlak N, Biagioni M, Agarwal S, Shaw M, Pilloni G, et al. Generalizing remotely supervised transcranial direct current stimulation (tDCS): feasibility and benefit in Parkinson’s disease. J Neuroeng Rehabil. 2018;15(1):114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Yarnall A, Breen D, Duncan G, Khoo T, Coleman S, Firbank M, et al. Characterizing mild cognitive impairment in incident Parkinson disease: The ICICLE-PD Study. Neurology. 2014;82:308–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Feenstra HEM, Murre JMJ, Vermeulen IE, Kieffer JM, Schagen SB. Reliability and validity of a self-administered tool for online neuropsychological testing: The Amsterdam Cognition Scan. J Clin Exp Neuropsychol. 2018;40(3):253–73. [DOI] [PubMed] [Google Scholar]
- 9.Pigott K, Rick J, Xie S, Hurtig H, Chen-Plotkin A, Duda J, et al. Longitudinal study of normal cognition in Parkinson disease. Neurology. 2015;85(15):1276–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Nasreddine Z, Phillips N, Bédirian V, Charbonneau S, Whitehead V, Collin I, et al. The Montreal Cognitive Assessment, MoCA: a brief screening tool for mild cognitive impairment. Journal of the American Geriatrics Society. 2005;53:695–9. [DOI] [PubMed] [Google Scholar]
- 11.Mattis S. Dementia Rating Scale-2. Lutz, FL: Psychological Assessment Resources, Inc.; 2001. 2001. [Google Scholar]
- 12.Gladsjo J, Shuman C, Evans J, Peavy G, Miller S, Heaton R. Norms for letter and category fluency: demographic corrections for age, education, and ethnicity. Assessment. 1999;6:147–78. [DOI] [PubMed] [Google Scholar]
- 13.Brandt J, Benedict R. The Hopkins Verbal Learning Test-Revised. Odessa, FL: Psychological Assessment Resources; 2001. 2001. [Google Scholar]
- 14.Wechsler D. WMS-III Administration and Scoring Manual. San Antonio, TX: The Psychological Corporation; 1997. 1997. [Google Scholar]
- 15.Smith A. Symbol digit modalities test: Manual. Los Angeles: Western Psychological Services; 1982. 1982. [Google Scholar]
- 16.Hubbard E, Santini V, Blankevoort C, Volkers K, Barrup M, Byerly L, et al. Clock drawing performance in cognitively normal elderly. Archives of Clinical Neuropsychology. 2008;23:295–327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Reitan R. Validity of the Trail Making Test as an indicator of organic brain disease. Perceptual and Psychomotor Skills. 1958;8:271–6. [Google Scholar]
- 18.Benton A, Varney N, Hamsher K. Visuospatial judgment: a clinical test. Archives of Neurology. 1978;35:364–7. [DOI] [PubMed] [Google Scholar]
- 19.Kaplan E, Goodglass H, Weintraub S. Boston Naming Test. Philadelphia: Lea and Febiger; 1983. 1983. [Google Scholar]
- 20.Fahn S, Elton R, members Up. Unified Parkinsons Disease Rating Scale. In: Fahn S, Marsden C, Goldstein M, Calne D, editors. Recent developments in Parkinson’s disease. Florham Park, NJ: Macmillan Healthcare Information; 1987. p. 153–63. [Google Scholar]
- 21.Sheikh J, Yesavage J. Geriatric Depression Scale (GDS): recent evidence and development of a shorter version. In: TL B, editor. Clinical Gerontology: A Guide to Assessment and Intervention. New York: The Haworth Press; 1986. p. 165–73. [Google Scholar]
- 22.Tomlinson C, Stowe R, Patel S, Rick C, Gray R, Clarke C. Systematic review of levodopa dose equivalency reporting in Parkinson’s disease. Movement Disorders. 2010;25:2649–53. [DOI] [PubMed] [Google Scholar]
- 23.Brennan L, Siderowf A, Rubright JD, Rick J, Dahodwala N, Duda JE, et al. The Penn Parkinson’s Daily Activities Questionnaire-15: Psychometric properties of a brief assessment of cognitive instrumental activities of daily living in Parkinson’s disease. Parkinsonism Relat Disord. 2016;25:21–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Galasko D, Bennett D, Sano M, Ernesto C, Thomas R, Grundman M, et al. An inventory to assess activities of daily living for clinical trials in Alzheimer’s disease. Alzheimer Disease and Associated Disorders. 1997;11(2 Suppl):33S–9S. [PubMed] [Google Scholar]
- 25.Ray Chaudhuri K, Poewe W, Brooks D. Motor and nonmotor complications of levodopa: Phenomenology, risk factors, and imaging features. Mov Disord. 2018;33(6):909–19. [DOI] [PubMed] [Google Scholar]
- 26.McKeith IG, Boeve BF, Dickson DW, Halliday G, Taylor JP, Weintraub D, et al. Diagnosis and management of dementia with Lewy bodies: Fourth consensus report of the DLB Consortium. Neurology. 2017;89(1):88–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.
