Abstract
Introduction
Artificial intelligence (AI) systems leveraging speech and language changes could support timely detection of Alzheimer's disease (AD).
Methods
The AMYPRED study (NCT04828122) recruited 133 subjects with an established amyloid beta (Aβ) biomarker (66 Aβ+, 67 Aβ–) and clinical status (71 cognitively unimpaired [CU], 62 mild cognitive impairment [MCI] or mild AD). Daily story recall tasks were administered via smartphones and analyzed with an AI system to predict MCI/mild AD and Aβ positivity.
Results
Eighty‐six percent of participants (115/133) completed remote assessments. The AI system predicted MCI/mild AD (area under the curve [AUC] = 0.85, ±0.07) but not Aβ (AUC = 0.62 ±0.11) in the full sample, and predicted Aβ in clinical subsamples (MCI/mild AD: AUC = 0.78 ±0.14; CU: AUC = 0.74 ±0.13) on short story variants (immediate recall). Long stories and delayed retellings delivered broadly similar results.
Discussion
Speech‐based testing offers simple and accessible screening for early‐stage AD.
Keywords: Alzheimer's disease, artificial intelligence, clinical assessment, clinical screening, deep learning, diagnostics, digital health, episodic memory, language, machine learning, mild cognitive impairment, remote, speech
1. BACKGROUND
Pathological changes in Alzheimer's disease (AD) begin years before symptoms of dementia or early clinical stages of mild cognitive impairment (MCI), and up to decades before diagnosis. 1 Clinical trials targeting the earliest stages of AD typically rely on measuring amyloid beta (Aβ) biomarkers using positron emission tomography (PET) or in cerebrospinal fluid (CSF) obtained from a lumbar puncture. The high cost and/or invasive nature of these procedures restricts use in standard clinical care and broader population screening. Blood plasma biomarkers hold promise for reducing screening costs but remain invasive and do not differentiate clinical stages of the disease. 2
More importantly, cognitively unimpaired individuals with biomarker evidence of amyloid beta pathology will not always develop clinical manifestations in their lifetime. Individuals with Aβ positivity should therefore only be considered at risk for progression to AD, and the diagnosis of AD should be reserved for those with additional evidence of an AD cognitive phenotype according to clinical diagnostic standards. 3 Cognitive testing is thus crucial for an early diagnosis. Cognitive tests have been supported for use as endpoints of treatment efficacy early in the AD continuum by regulatory bodies. 4 , 5 However, traditional cognitive tests typically require significant qualified staff time to administer and score. In the case of amyloid positive asymptomatic subjects, only subtle reductions in cognitive function or longitudinal change are observable. 6 , 7
Cognitive test results often reflect simple indices of response accuracy or recall, ignoring differences in the content, structure, and delivery of patients’ responses to tasks. For episodic memory tests, such as tests of story recall, test performance does not typically differ between clinically unimpaired Aβ+ and Aβ– individuals, but differences can be seen in the recall of proper nouns, 8 and the serial position of elements recalled. 9 Later in the disease course, differences are seen in rates of verbatim or paraphrased recall, 10 language density, and pauses. 11
There is a need for cognitive screening tools allowing fast and frequent assessment of the at‐risk population. Speech data collected on ubiquitous digital devices represents an excellent candidate for this goal. Verbal memory tasks can be scored automatically using natural language processing technologies, 12 and augmented with acoustic and linguistic measures to further improve detection. 11 Recent methods in artificial intelligence (AI) enable extraction of more information‐dense patterns from text and audio data. 13 , 14 We hypothesize that these could form the basis of speech biomarkers sensitive to earlier disease stages, possibly before overt cognitive decline (asymptomatic at‐risk individuals).
Using speech elicited from a remotely self‐administered task, the Automatic Story Recall Task (ASRT), we aim to develop an AI‐based system to (1) differentiate Aβ+ and Aβ– subjects; and (2) differentiate those with and without MCI. The potential use case for the ASRT system includes initial clinical screening to detect MCI and subtle signs of cognitive decline in amyloid‐confirmed asymptomatic subjects (preclinical AD). We examine the performance of the ASRT system index test compared to current standard of care in primary care referrals for MCI using a simulation approach.
RESEARCH IN CONTEXT
Systematic Review: The authors reviewed the literature using traditional (e.g., PubMed) sources, meeting abstracts, and presentations. Speech and language changes are reported in Alzheimer's disease (AD) dementia, typically documented in subjects in the (more progressed) dementia stages and without biomarker confirmation of AD.
Interpretation: Our study uses an artificial intelligence (AI) system designed to evaluate paraphrase, applied to recorded speech collected from a story recall task administered remotely via participants’ mobile phones. The AI system differentiates amyloid beta positive and amyloid beta negative subjects, and subjects with mild cognitive impairment or mild AD from cognitively healthy subjects. Transcription and analyses are automated.
Future Directions: Our study builds on this early work to show the clinical utility and feasibility of speech‐based AI systems for the detection of AD in its earliest stages. Results demonstrate the use case and promise of speech‐based AI systems for clinical practice.
2. METHODS
2.1. Study design
The AMYPRED study (NCT04828122) is a prospective study with data collection planned before the index test was performed. The “index test” refers to the test under investigation, in this case, the ASRT AI‐based system. The study uses a 2 × 2 cross‐sectional design, combining amyloid status (Aβ+ and Aβ–) and clinical status (cognitively unimpaired [CU] and MCI/mild AD). The study design is summarized in Figure 1, showing how the design allows for the investigation of amyloid and MCI in the full sample, and amyloid positivity within MCI and CU subsamples. Reference standards for Aβ positivity and clinical status were established prior to recruitment into the study. The ASRT system index test results were therefore not available to the assessors of the reference standard. Primary outcomes were assessed using tournament leave‐pair‐out cross‐validation analysis. 15 For each iteration, reference standards (MCI and Aβ labels) were available for training but not for test data.
FIGURE 1.

Study design. 2 × 2 cross‐sectional design, combining amyloid status (Aβ+ and Aβ–) and clinical status (CU and MCI/mild AD). Planned comparisons presented in italics: design allows for the investigation of amyloid positivity in the full sample, MCI in the full sample, and amyloid positivity within MCI and CU subsamples. MCI, mild cognitive impairment; Aβ+, amyloid beta positive; Aβ–, amyloid beta negative; AD, Alzheimer's disease; CU, cognitively unimpaired; MCI, mild cognitive impairment
2.2. Participants
Participants were a convenience sample recruited from trial participant registries in three UK sites (London/Guildford, Plymouth, and Birmingham) between November 2020 and July 2021. Subjects were approached if they had undergone a prior Aβ PET scan or CSF test (confirmed Aβ– within 30 months or Aβ+ within 60 months) and were CU or diagnosed with MCI or mild AD in the previous 5 years. MCI due to AD and mild AD diagnoses were made following the National Institute on Aging–Alzheimer's Association core clinical criteria. 16 Participants were recruited from registries for recent trials, and as such references for diagnostic and amyloid measures (and thresholds adopted) varied as a function of differences between prior trials from which participants were recruited.
Potential participants were screened via video conferencing, during which the Mini‐Mental State Examination (MMSE) 17 was administered. Inclusion criteria comprised: age 50 to 85; MMSE raw score of 23 to 30 for participants with MCI/mild AD, 26 to 30 for CU; clinical diagnosis made in previous 5 years for participants with MCI/mild AD; English as a first language; availability of a caregiver or close associate to support completing the Clinical Dementia Rating scale semi‐structured interview; 18 ability to use and access to a smartphone (Android 7 or above or iOS 11 or above); and access to the internet on a personal computer, notebook, or tablet (supported operating systems and internet browser software documented in supporting information).
Exclusions comprised: current diagnosis of general anxiety disorder or major depressive disorder; recent (6‐month) history of unstable psychiatric illness; history of stroke within the past 2 years or transient ischemic attack or unexplained loss of consciousness in the last 12 months. Participants taking medications for AD symptoms were required to be on a stable dose for at least 8 weeks.
2.3. Study assessments
Participants underwent a clinical assessment via video call, followed by optional remote assessments daily using their personal digital devices for 7 to 8 days. In response to participant feedback of high burden, the remote assessment schedule was changed partway through the study, where briefer assessments were favored and the number of days of remote assessment was increased from 7 to 8 days to spread out assessments. Full details are provided in Skirrow et al. 19 In the current study, the index test (ASRT system) is derived from data collected during remote, smartphone‐based assessments.
2.3.1. Telemedicine assessments
Cognitive tests from the Preclinical Alzheimer's Cognitive Composite with semantic processing (PACC5) were administered and the mean z‐score was calculated as previously described. 6 The composite includes summary scores from five measures: (1) the MMSE, 17 a global cognitive screening test; (2) the Logical Memory Delayed Recall, 20 a story recall test after a 30‐minute delay from initial presentation; (3) Digit Symbol Coding, 21 a symbol substitution test; (4) the sum of free + total recall items from the Free and Cued Selective Reminding Test, 22 a multimodal associative memory test; and (5) Category Fluency (animals, vegetables, fruits), a semantic memory test.
The Clinical Dementia Rating scale, 18 assessing the severity of cognitive symptoms of dementia, was completed by experienced research staff and scored to deliver a global score (CDR‐G). Modifications to enable remote assessment during the SARS‐CoV‐2 pandemic are detailed in Table S1 in supporting information.
2.3.2. Remote assessments
During telemedicine assessments, participants were supported to install the Novoic mobile application on their own smartphones. They were encouraged to complete optional unsupervised self‐assessments daily for the following 7 to 8 days. Remote self‐assessments included ASRTs. ASRTs have 18 short and 18 long story variants (mean of 119 and 224 words per story, and stimulus duration ≈1 minute and 1 minute 40 seconds, respectively). A more detailed breakdown of story structure and balancing is provided in Skirrow et al. 19 ASRTs were administered in triplets (three stories administered consecutively each day). The self‐assessment schedule is provided in supplementary materials (Table S2 in supporting information).
Participants were instructed to listen to pre‐recorded ASRTs and retell the stories in as much detail as they could remember, immediately after the presentation of each story and after a delay. Task responses were recorded on the app and automatically uploaded to a secure server. In response to feedback from the first participants taking part in the study, most participants were given a handout of the assessment schedule with written instructions specifying that they should not take notes or accept help from anyone else when completing the task.
2.4. Sample size determination
Power calculations completed using the pROC package in R, with power specified at 80% and significance set at 0.05, indicated that an area under the curve (AUC) as low as 0.67 would be detectable with N = 40 participants in each group. To detect a minimally clinically useful AUC of 0.75, 23 N = 40 participants in each group would provide 99% power, while N = 20 participants in each group would provide 82% power.
2.5. Outcome measures
Key outcome measures included the ASRT system index test result, identifying: (1) Aβ positivity in the full sample, (2) MCI in the full sample, (3) Aβ positivity in MCI/mild AD, (4) Aβ positivity in the CU subsample. Diagnostic accuracy was established through comparison to PET or CSF Aβ status and clinical diagnosis established in prior trials. Furthermore, an AI‐based continuous measure predicting PACC5 scores from ASRTs was derived.
Short ASRT triplets (immediate recall, automatically transcribed) were primary measures of interest. These were experienced as a lower burden by participants, yielding higher compliance and a greater number of data points for model training and analysis. Long ASRT stories and delayed recall were also examined.
2.6. Ethics statement
This study was approved by institutional review at the West Midlands Health Research Authority (UK REC reference: 20/WM/0116). Informed consent was taken electronically in accordance with Health Research Authority guidelines.
2.7. Overview of the ASRT system
The ASRT system was based on the “edit encoder” of the ParaBLEU model, 24 the state of the art for paraphrase evaluation. Given two input texts, the edit encoder outputs a vector‐based representation of the abstract, generalized patterns that differ between them. On an established paraphrase quality benchmark, models using ParaBLEU numerical representations correlate more strongly with human judgements than other existing metrics. 24 Differing from the standard ParaBLEU setup, the model was pretrained with longer paraphrase examples to mirror the length of source‐retelling pairs, and without the entailment component of the loss function as entailment labels were unavailable for the updated pretraining dataset. The base model of the edit encoder used a pretrained Longformer model, 25 to accommodate longer texts.
2.8. Statistical analysis
2.8.1. AI system application
Although adherence varied across participants, ASRTs have high parallel forms reliability, and only modest improvement on repeated administration of the ASRT task, indicating that different story triplets can be substituted for one another. 19
Responses were transcribed using Google's Speech‐to‐Text 26 automatic speech recognition system, as well as manually following a standardized procedure, which specified verbatim transcription of commentary, filled pauses, and partial words. Analyses were completed in Python, using a proprietary framework built using PyTorch. Analyses were completed only with the text from the transcribed retellings. Vocal and acoustic data were not examined.
The word error rate (WER) of the automatic transcript was calculated using the HuggingFace package 27 as the average number of errors per word in the manual transcript. This was calculated after removing punctuation, setting all text characters to lowercase, and removing filled pauses and partial words from transcripts prior to comparison.
For each retelling, two output vectors were derived, based on non‐redundant differences between the target (story text) and retelling (target→retelling and retelling→target) as represented by the ParaBLEU model. Classifiers were trained using logistic regression models with the sklearn package in Python to predict pairs of labels (MCI/mild AD or CU; Aβ+ or Aβ–) with tournament leave‐pair‐out cross‐validation analysis (TLPO), a form of cross‐validation used to estimate the model performance on unseen data. Research has shown that leave‐pair‐out cross‐validation has robust performance relative to other cross‐validation approaches, and limited bias. 15 In TLPO, every possible pair of data points is held out in turn while the model is trained using all other data points. The AUC estimate is calculated by ranking the data points according to the model's predictions.
Cross‐validation was performed at the participant level, such that the train and test sets for each fold had no overlapping participants. Predictions in the test fold were performed on one ASRT triplet per participant (comprising six retellings: three immediate and three delayed), selected at random from the set of ASRT triplets completed over the remote assessment period. In each fold, models were trained for each of the two derived vectors and the predictions were ensembled by simple averaging. The training set for each test fold comprised all ASRTs from all other participants. Long and short triplets, immediate and delayed recall, and automatic and manual transcription were examined separately.
2.8.2. Clinical and biomarker discrimination of models
Story‐level predictions were ensembled for each participant across the three stories to create participant‐level predictions. These were used to create a ranking for receiver operating characteristic (ROC) curve analysis. The ASRT system was compared to (1) a demographic comparison (comprising age, sex, and years of education) and (2) the PACC5 z‐score. For one CU participant, missing data for years in education was replaced with the group median. The demographic comparison model was trained using the participant information as input to a logistic regression model using an identical setup to the models trained on top of the ParaBLEU output vectors. PACC5, for which the input was a single score, was analyzed within the TLPO framework but using the score directly.
Predictions were assessed by the AUC; and sensitivity, specificity, and Cohen's kappa at Youden's index for the test result compared to reference standards. Statistical significance of differences between AUCs and 95% confidence intervals for AUCs were computed using DeLong's method. 28
2.8.3. Correlation with demographic variables
The relationship between demographic variables (age, sex, years of education) and ASRT system predictions with automatic transcription were evaluated. Non‐parametric statistics were adopted due to non‐normal distribution of ASRT system predictions.
2.8.4. PACC5 prediction
PACC5 z‐scores were predicted from ASRT speech samples, trained with leave‐one‐out cross‐validation using ridge regression models with polynomial kernels. The Pearson correlation coefficient between predicted and actual PACC5 z‐scores was computed.
2.8.5. Screening simulation
Screening for MCI was simulated in a hypothetical age 65+ sample (N = 1000) with proportional representation of each age group representative of the US population, 29 and MCI prevalence estimates by age from prior meta‐analysis. 30 The ASRT system's (short stories, immediate recall, automatic transcription) sensitivity and specificity within the sample was determined at Youden's index, and compared to reported sensitivity (Sn = 50.0%) and specificity of (Sp = 66.0%) of physician subjective judgment 31 and pooled sensitivity (Sn = 62.7%) and specificity (Sp = 63.3%) of the MMSE for detecting MCI in prior meta‐analysis reported in prior research. 32 Methods are described further in supporting information.
3. RESULTS
3.1. Participants
One hundred and thirty‐three participants were recruited and completed study visits via video call, with 86.5% with Aβ status confirmed by PET scan (115/133), 10.5% via CSF (14/133), and 3% with Aβ biomarker source data missing (4/133). The MCI/mild AD participant group comprised primarily MCI participants, with 10 individuals (20.4%) having a diagnosis of mild AD.
At least one complete optional remote self‐assessment was completed by 86.5% (115/133, Figure 2). For those who engaged in at least one optional remote self‐assessment, overall engagement with daily testing was high (adherence to daily tests 78% in CU and 65% in MCI/mild AD 19 ). No adverse events were reported during self‐assessments. At least one full short ASRT triplet was completed by 105 participants, and 98 participants completed at least one full long ASRT triplet (immediate and delayed recall).
FIGURE 2.

Patient and reference standard selection. A, Participant inclusion criteria: participants were included based on prior amyloid status and clinical diagnosis confirmation. B, Participant flow diagram, documenting exclusions and dropouts during study recruitment. AD, Alzheimer's disease; CSF, cerebrospinal fluid; CU, cognitively unimpaired; MCI, mild cognitive impairment; MMSE, Mini‐Mental State Examination; N, number; PET, positron emission tomography
Those who did not complete remote assessments were more commonly diagnosed with MCI/mild AD (χ2 = 5.49, P = .02) and had lower CDR‐G scores (r = –0.20, P = .04), indicating that participants who did not engage in remote assessments were more cognitively impaired than those that completed remote assessments. However, they did not differ in age (r = –0.15, P–.12), education level (r = –0.005, P = .96), male/female ratio (χ2 = 0.004, P = .95), Aβ+/Aβ– ratio (χ2 = 0.96, P = .33), or MMSE score (r = –0.15, P = .11).
Demographics in the remote assessment sample (for subgroup and full sample analyses) are shown in Table 1. This shows no clear differences between research subgroups, and clinical and biomarker groups, indicating that the groups are adequately matched. Demographics for the entire sample and by short and long ASRT training sets are given in Tables S3‐S5 in supporting information.
TABLE 1.
Participant demographic and clinical characteristics
| Subgroup analyses | Full sample analyses | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Clinical group | Biomarker group | ||||||||||
| Group 1: (N = 22) | Group 2: (N = 27) | Group 3: (N = 34) | Group 4: (N = 32) | P‐value | CU (N = 66) | MCI/mild (AD N = 49) | P‐value | Amyloid beta negative (N = 59) | Amyloid beta positive (N = 56) | P‐value | |
| Amyloid beta positive/negative (N) | Positive | Negative | Positive | Negative | – | 34/32 | 22/27 | .48 | Negative | Positive | – |
| MCI/CU group (N) | MCI/Mild AD | MCI/Mild AD | CU | CU | – | CU | MCI | – | 27/32 | 22/34 | .48 |
| Female/male (N) | 7/15 | 16/11 | 21/13 | 19/13 | .12 | 40/26 | 23/26 | .15 | 35/24 | 28/28 | .32 |
| Years of education, mean (SD) | 15.05 (3.32) | 15.08 (2.92) | 14.97 (3.77) | 15.41 (3.35) | .98 | 15.18 (3.55) | 15.06 (3.08) | .78 | 15.26 (3.14) | 15.00 (3.57) | .78 |
| Age, mean (SD) | 71.00 (5.83) | 67.22 (7.95) | 70.44 (4.18) | 69.84 (3.78) | .38 | 70.15 (3.97) | 68.92 (7.26) | .74 | 68.64 (6.14) | 70.66 (4.85) | .13 |
| MMSE, mean (SD) | 27.24A (1.64) | 27.41B (2.02) | 29.24C (1.05) | 28.81D (1.11) | <.001 AC, AD, BC, BD | 29.03 (1.09) | 27.33 (1.85) | <.001 | 28.16 (1.74) | 28.47 (1.62) | .25 |
| CDR‐G, mean (SD) | 0.52A (0.11) | 0.50B (0.14) | 0.08C (0.18) | 0.10D (0.20) | <.001 AC, AD, BC, BD | 0.09 (0.19) | 0.51 (0.12) | <.001 | 0.28 (0.27) | 0.25 (0.27) | .54 |
Notes: Demographic and clinical characteristics shown by research groupings 1–4, and summary statistics for participants characterized by clinical diagnostic or biomarker profiles. Group 1, amyloid beta positive MCI/mild AD; Group 2, amyloid beta negative MCI/mild AD; Group 3, amyloid beta positive cognitively unimpaired; Group 4, amyloid beta negative cognitively unimpaired.
Abbreviations: AD, Alzheimer's disease; CDR‐G, Global Clinical Dementia Rating; CU, cognitively unimpaired; MCI, mild cognitive impairment; MMSE, Mini‐Mental State Examination; N, number; SD, standard deviation.
3.2. ASRT system application
Short story triplets at immediate recall yielded on average 2.7 minutes of speech data per participant used for model training and development. For short stories at immediate recall, Aβ classification in the full sample was no better than chance across the ASRT system and the two comparison analyses (Figure 3A). Within the MCI (Figure 3C) and CU (Figure 3D) subsamples the ASRT system AUC for Aβ detection was 0.78 and 0.74, respectively, showing better Aβ signal when examined within more homogeneous groups, and with confidence intervals not crossing 0.5, showing better‐than‐random performance. For Aβ detection in the CU group, the ASRT system was the only predictor to perform above chance level (Figure 3D). MCI classification using the ASRT system in the full sample yielded an AUC of 0.85 (Figure 3B). Results show that the ASRT system produces good classifications (above or nearing the AUC = 0.75 clinically useful range 23 ) for MCI, and amyloid in CU and MCI subgroups.
FIGURE 3.

ROC curves for the ASRT system and comparison models (short ASRTs, immediate recall). AUCs for the classifiers predicting: (A) amyloid positivity, (B) MCI/mild AD in the full sample. Subsample comparisons of classifier performance predicting (C) amyloid positivity within the MCI/mild AD; and (D) amyloid positivity in the CU sample. The table below each figure provides sensitivity (Sn) and specificity (Sp) at Youden's index, and Cohen's kappa (Cohen's K) measures. The reference test was biomarker confirmation on PET or CSF for (A), (C), and (D). Reference test was clinical diagnosis for (B). The demographic comparison includes age, sex, and education level. AD, Alzheimer's disease; ASRT, Automatic Story Recall Test; AUC, area under the curve; CSF, cerebrospinal fluid; CU, cognitively unimpaired; MCI, mild cognitive impairment; PACC5, Preclinical Alzheimer's Cognitive Composite with semantic processing; PET, positron emission tomography; ROC, receiver operator characteristic;.
Average WER across participant recordings for automatic transcripts compared to manual transcripts was 0.16. Overall, ASRT system performance differed little between automatic and manual transcription (Table 2, Figures S1‐S4 in supporting information). The only significant contrasts were seen in analyses of the full sample in immediate recall of short ASRTs, for which the AUC for Aβ in the whole sample was modestly higher in the automatically transcribed data (z = –2.07, P = .04), and in long ASRTs at delayed recall, for which the AUC for MCI was modestly higher for manually transcribed data (z = 2.08, P = .04).
TABLE 2.
Area under the curve (AUC ± 95% confidence intervals) for remote ASRT system variants and demographic comparison
| Task type | Delay type | Model | (A) Full sample amyloid beta | (B) Full sample MCI/mild AD | (C) Amyloid beta in MCI subsample | (D) Amyloid beta in CU group |
|---|---|---|---|---|---|---|
| Short ASRT | (Sample size) | (N = 105) | (N = 105) | (N = 46) | (N = 59) | |
| – | Demographic comparison | 0.58 ± 0.11 | 0.55 ± 0.11 | 0.66 ± 0.17 | 0.38 ± 0.15 | |
| Immediate | ASRT system (ASR) | 0.62 ± 0.11 A | 0.85 ± 0.07*** | 0.78 ± 0.14 | 0.74 ± 0.13** | |
| ASRT system (manual) | 0.55 ± 0.11 A | 0.85 ± 0.08*** | 0.70 ± 0.15 | 0.67 ± 0.14** | ||
| Delayed | ASRT system (ASR) | 0.60 ± 0.11 | 0.83 ± 0.08*** | 0.65 ± 0.17 | 0.71 ± 0.13** | |
| ASRT system (manual) | 0.63 ± 0.11 | 0.85 ± 0.08*** | 0.65 ± 0.16 | 0.75 ± 0.13** | ||
| Long ASRT | (Sample size) | (N = 98) | (N = 98) | (N = 40) | (N = 58) | |
| – | Demographic comparison | 0.58 ± 0.11 | 0.57 ± 0.12 | 0.67 ± 0.18 | 0.37 ± 0.15 | |
| Immediate | ASRT system (ASR) | 0.60 ± 0.11 | 0.85 ± 0.07*** | 0.79 ± 0.14 | 0.43 ± 0.15 | |
| ASRT system (manual) | 0.59 ± 0.11 | 0.84 ± 0.08*** | 0.84 ± 0.12 | 0.46 ± 0.15 | ||
| Delayed | ASRT system (ASR) | 0.61 ± 0.11 | 0.84 ± 0.08*** A | 0.88 ± 0.11 | 0.55 ± 0.15 | |
| ASRT system (manual) | 0.65 ± 0.11 | 0.88 ± 0.07*** A | 0.92 ± 0.09* | 0.55 ± 0.15 | ||
Notes: Comparison of performance of the ASRT system for classifying (A) amyloid beta positivity in the full sample, (B) MCI in the full sample, (C) amyloid beta positivity in the MCI/mild AD subsample, and (D) amyloid beta in the CU subsample, using immediate and delayed recalls of short and long ASRT triplets as input. Difference between ASRT system and demographic comparison: *P < .05, **P < .01, **P < .0001. A difference between AUCs for manual and automatic transcripts at P < .05.
Abbreviations: AD, Alzheimer's disease; ASR, Automatic speech recognition—automatically transcribed; ASRT, Automatic Story Recall Task; CU, cognitively unimpaired; Manual, manually transcribed; MCI, mild cognitive impairment; N, number.
Results were broadly consistent for long ASRT stories and delayed recall (Table 2, Figures S1‐S4), with the exception of detection of Aβ in CU participants, which was better than random for the ASRT system for short stories only.
Demographic comparison performed consistently worse than the ASRT system (Figure 3, Table 2), with confidence intervals incorporating chance level. Compared to the ASRT system, the PACC5 delivered subtly but non‐significantly higher AUCs for detecting MCI in the full sample (z = –1.27, P = .21), similar performance for Aβ status in MCI (z = 1.11, P = .27), and poorer performance for Aβ status in CU (Figure 3; z = 2.61, P = .009). Results show that the demographic comparison does not exceed chance level performance. The ASRT system performs similarly to, or significantly better than (in the case of Aβ status in CU for short ASRT stories), a lengthy supervised test battery developed to detect cognitive changes in preclinical AD (PACC5).
3.3. Relationship of ASRT system prediction outputs with demographic variables
ASRT system results (automatic transcription) were evaluated in relation to demographic variables (age, years of education, and sex). Statistical associations were evaluated for all ASRT system prediction variants: amyloid prediction, MCI prediction, short and long stories, immediate and delayed recall. Modest or negligible correlation coefficients were seen, significant only for age and amyloid prediction from short immediate ASRT recall (rho = 0.22, P = .02), and years in education and MCI prediction using long delayed ASRT recall (rho = –0.19, P = .04). Sex differences were seen for MCI predictions from immediate recall of short (Wilcoxon test, r = –0.21, P = .02) and long ASRTs (r = –0.22, P = .02) only.
3.4. PACC5 prediction
The Pearson correlation between AI model predicted (short ASRT stories, immediate recall, automatic transcription) and actual PACC‐5 z scores was 0.74. This shows that the ASRT system can be used to derive continuous scores that are correlated with a well‐established cognitive composite test.
3.5. Simulation of MCI screening in primary care
Screening for MCI in primary care was simulated in a hypothetical age 65+ sample (MCI prevalence 15.4%). We report on the simulated rate of referrals for further investigation for MCI (correct referrals: the proportion of referred individuals who do indeed have MCI; and incorrect referrals: the proportion of individuals referred who do not have MCI). Compared to unassisted physician judgement, 31 routine screening using the ASRT system (short stories, immediate recall) would increase correct referral rates in primary care by 56.0%, and reduce incorrect referrals by 26.5%; compared to screening via the MMSE 32 the ASRT system would increase correct referral from primary care by 24.4% while reducing incorrect referrals by 31.9%.
4. DISCUSSION
Recent systematic reviews describe changes in vocal and linguistic speech patterns in AD, primarily in cohorts with more progressed AD and without biomarker confirmation. 13 , 33 The current findings show changes in speech occurring earlier in the disease process, and modest but detectable differences in speech relating to changes associated with Aβ positivity.
For the lowest burden assessments (short stories, immediate recall, automatic transcription, and analysis pipeline), and testing on an average of only 2.7 minutes of speech, the ASRT system predicted MCI (AUC = 0.85) and Aβ positivity in MCI and CU participant groups (AUC = 0.78 and 0.74, respectively). Aβ predictions in the full sample were no better than random. This could be due to more subtle impairments associated with Aβ positivity, which may be obscured by broader changes seen accompanying MCI. The ASRT system consistently performed better than random, with MCI prediction consistently and significantly above the demographic comparison. The ASRT system also performed as well as, and in limited cases better than, a lengthy supervised test battery developed to detect cognitive changes in preclinical AD (PACC5).
ASRT system results were broadly consistent across manual and automatic AS transcription. MCI prediction was consistent for long and short ASRTs and immediate and delayed recall. Aβ status prediction was not as consistent across task variants and groups: predictions of Aβ status in MCI was above random for all task variants, but in the presence of a modestly elevated demographic comparison; in CU participants the ASRT system performed well for short, but not long, ASRTs. The reason for differing results across task variants is unclear, and may indicate differential task difficulty effects interacting with demographic and clinical‐biomarker groupings. Further research is needed to test if these discrepant results replicate in larger samples, and further evaluation of the cause of the discrepant findings is required.
ASRT system results are derived only from transcribed speech data and the original story source text. ParaBLEU is a deep learning model that outputs a vector‐based representation of the differences between two texts. Given the nature of the model it is difficult to establish specific quantitative and qualitative changes that the model is identifying because these are developed within neural networks and numerically abstracted and vectorized. However, it is possible to discuss the existing literature within which these findings arise and the likely sensitivity of the model to expected changes in speech.
Prior results show that task performance on the ASRT test, as measured by automated assessment of paraphrase‐enabled recall, correlates moderately with performance on the Logical Memory Delayed Recall test, a test commonly used to evaluate episodic memory function. 19 Similarly, the results for MCI from the current paper likely incorporate aspects of episodic memory (how much of the source text was retained and recollected), commonly impaired in MCI. However, the multifactorial output of the model allows it to simultaneously pick up on other differences in the text pairs. These may include aspects such as dysfluencies, pauses, repetitions, lexical and semantic content, and other linguistic changes that have been noted as discrepant between individuals with MCI and healthy control individuals. 10 , 11 , 34 , 35 The recall of proper nouns, 8 the use of specific word classes, 36 and the serial position of elements recalled, 9 which have been found to be affected in individuals with amyloid positivity. During model training, the weights of these vectors (representing aspects of speech and language, and interactions between them) are tuned to optimally predict the outcome measure of interest (MCI or amyloid positivity).
In the context of potential improvement in outcomes through lifestyle and medical interventions, 37 and the availability of new amyloid‐targeting medications, early detection of AD and clear disease indication matters. However, in clinical practice, AD is not routinely screened for 38 and is underdiagnosed even at the dementia stage. 39 Compared to standard‐of‐care assessments for MCI, routine screening using the ASRT system could increase correct referrals by up to 56.0% and reduce incorrect referrals by up to 31.9%.
Speech assessments were unsupervised, self‐administered, and analyzed with an automated pipeline. Remote, unsupervised testing can improve inclusivity, increase standardization, and provide access to more advanced testing without the need for the extensive experience of neuropsychological workup. Furthermore, speech‐based AI models present a potentially attractive low‐cost and low‐burden screen for Aβ positivity, which could be used for recruitment into trials targeting amyloid in CU individuals. Combining the algorithm with other risk factors (e.g., age, apolipoprotein E genotype) could further increase discriminative power.
The prediction and simulation results indicate that this test could have a use case for clinical screening in primary care, prior to participants being referred for more in‐depth neuropsychological and clinical review. Our results indicate that this remotely self‐administered AI enhanced cognitive test shows better prediction than a commonly used cognitive screening test (the MMSE) and physician subjective judgment, which are two common methods used to identify participants suitable for onward referral.
4.1. Limitations
We recruited participants with prior amyloid PET and CSF amyloid test results and clinical diagnoses. With increasing Aβ positivity with age, 40 conversion may have occurred for some participants in the interim period. CSF and PET Aβ positivity are differentially associated with cognitive decline, suggesting that they may be optimally sensitive at different disease stages. 41 Similarly, variation in diagnostic criteria for MCI/mild AD (between trials from which participants were recruited) is likely to have introduced variability in our diagnostic reference standards. False labels can impact training of AI systems. Improvements in model performance could be expected with concurrent and consistent reference standards.
Further evaluation of the ASRT system would now be of interest in relation to other disease biomarkers based on tau and neurodegeneration/neuronal injury in line with the A/T/N framework for AD. 42
Although uptake of optional remote assessment was high, non‐completion was associated with greater CDR‐G indexed clinical impairment and was more common in MCI/mild AD participants. Usability data on the ASRT self‐completed assessments is reported in Skirrow et al., 19 showing that participants experience few technical problems, and overall experience the application as easy to use and interesting. However, despite these promising results, remote, unsupervised cognitive assessments may be challenging for individuals with more progressed cognitive impairment. Additionally, remote assessments in the absence of supervision or surveillance include uncertainty around whether additional aids (taking notes, support from caregivers) were used during the assessment period. In the current study participants were instructed not to use any memory aids, but for certain subjects and in certain cases, supervised testing in clinic or via telemedicine may be more appropriate.
Test engagement varied from day to day and, as a result, our analyses included test results from different ASRT stimuli and testing days across different participants. Variability introduced by differences in the story stimuli themselves, and practice effects, may have affected sensitivity of the ASRT system. However, this approach allowed us to maximize the sample available, and enabled us to develop AI models which can be applied across a class of stimuli.
Our study limited speech evaluation to the textual analysis of transcribed retellings. However, speech is highly multidimensional, incorporating not only linguistic but also acoustic and temporal features, which could confer additional sensitivity to cognitive impairment and amyloid positivity. 33 Prior research indicates that approaches using acoustic speech input are also sensitive to dementia, albeit usually to a lesser extent than approaches using textual speech input. 43 Combining these different information modalities in speech could help to further augment classification sensitivity, 11 , 44 although this improvement is not always shown in practice. 45 Potential directions for future research in the AMYPRED sample include linguistic, temporal, and acoustic aspects of speech.
The ASRT system was developed and tested within a British English–speaking sample, selected to exclude concurrent neurological and mental health conditions. Validation is now needed in more clinically heterogeneous samples and across different accents and languages. Larger scale studies are needed to confirm and refine our results. We expect significant performance improvements in the ASRT system with a larger training dataset, increasing power to detect more subtle changes in speech patterns.
CONFLICTS OF INTEREST
Emil Fristed, Jack Weston, Marton Meszaros, Caroline Skirrow, Raphael Lenain, and Udeepa Meepegama are employees of Novoic Ltd. Emil Fristed, Jack Weston, Marton Meszaros, and Raphael Lenain are shareholders and Marton Meszaros, Udeepa Meepegama, and Caroline Skirrow are option holders in the company. Emil Fristed and Jack Weston are directors on the board of Novoic. Stefano Cappa has received speaker's fees from Roche and Biogen. Dag Aarsland has received research support and/or honoraria from Astra‐Zeneca, Lundbeck, Novartis Pharmaceuticals, Evonik, Roche Diagnostics, and GE Health, and served as paid consultant for H. Lundbeck, Eisai, Heptares, Mentis Cura, Eli Lilly, Cognetivity, Enterin, Acadia, and Biogen. Author disclosures are available in the supporting information.
Supporting information
Supporting Information
Supporting Information
ACKNOWLEDGMENTS
We are extremely grateful to our participants who took part in the study and their families/caregivers who supported their participation. We also thank the study sites and their scientific and research team for recruitment, study coordination, conducting interviews, and data collection efforts. The study was funded by Novoic, a clinical late‐stage digital medtech company developing AI‐based speech biomarkers. The funder of the study provided financial support toward collection and analysis of the data and was involved in study design, data interpretation, and writing of the report.
Fristed E, Skirrow C, Meszaros M, et al. A remote speech‐based AI system to screen for early Alzheimer's disease via smartphones. Alzheimer's Dement. 2022;14:e12366. 10.1002/dad2.12366
REFERENCES
- 1. Villemagne VL, Burnham S, Bourgeat P, et al. Amyloid β deposition, neurodegeneration, and cognitive decline in sporadic Alzheimer's disease: a prospective cohort study. Lancet Neurol. 2013;12(4):357‐367. doi: 10.1016/S1474-4422(13)70044-9 [DOI] [PubMed] [Google Scholar]
- 2. Tosun D, Veitch D, Aisen P, et al. Detection of β‐amyloid positivity in Alzheimer's disease neuroimaging initiative participants with demographics, cognition, MRI and plasma biomarkers. Brain Commun. 2021;3(2):fcab008. doi: 10.1093/braincomms/fcab008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Dubois B, Villain N, Frisoni GB, et al. Clinical diagnosis of Alzheimer's disease: recommendations of the International Working Group. Lancet Neurol. 2021;20(4):484‐496. doi: 10.1016/S1474-4422(21)00066-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Kozauer N, Katz R. Regulatory innovation and drug development for early‐stage Alzheimer's disease. N Engl J Med. 2013;368(13):1169‐1171. doi: 10.1056/NEJMp1302513 [DOI] [PubMed] [Google Scholar]
- 5. European Medicines Agency. Guideline on the clinical investigation of medicines for the treatment of Alzheimer's disease 2018. https://www.ema.europa.eu/en/documents/scientific‐guideline/guideline‐clinical‐investigation‐medicines‐treatment‐alzheimers‐disease‐revision‐2_en.pdf (accessed April 12, 2021)
- 6. Papp KV, Rentz DM, Orlovsky I, Sperling RA, Mormino EC. Optimizing the preclinical Alzheimer's cognitive composite with semantic processing: the PACC5. Alzheimers Dement (N Y). 2017;3(4):668‐677. doi: 10.1016/j.trci.2017.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Baker JE, Lim YY, Pietrzak RH, et al. Cognitive impairment and decline in cognitively normal older adults with high amyloid‐β: a meta‐analysis. Alzheimers Dement (Amst). 2017;6:108‐121. doi: 10.1016/j.dadm.2016.09.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Mueller KD, Koscik RL, Du L, et al. Proper names from story recall are associated with beta‐amyloid in cognitively unimpaired adults at risk for Alzheimer's disease. Cortex. 2020;131:137‐150. doi: 10.1016/j.cortex.2020.07.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Bruno D, Mueller KD, Betthauser T, et al. Serial position effects in the logical memory test: loss of primacy predicts amyloid positivity. J Neuropsychol. 2021;15(3):448‐461. doi: 10.1111/jnp.12235 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Foldi NS. Getting the hang of it: preferential gist over verbatim story recall and the roles of attentional capacity and the episodic buffer in Alzheimer disease. J Int Neuropsychol Soc. 2011;17:69‐79. doi: 10.1017/S1355617710001165 [DOI] [PubMed] [Google Scholar]
- 11. Roark B, Mitchell M, Hosom JP, Hollingshead K, Kaye J. Spoken language derived measures for detecting mild cognitive impairment. IEEE Trans Audio Speech Lang Process. 2011;19(7):2081‐2090. doi: 10.1109/TASL.2011.2112351 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Lehr M, Prud'hommeaux E, Shafran I, Roark B. Fully automated neuropsychological assessment for detecting mild cognitive impairment. INTERSPEECH. 2012. [Google Scholar]
- 13. de la Fuente Garcia S, Ritchie CW, Luz S. Artificial intelligence, speech, and language processing approaches to monitoring alzheimer's disease: a systematic review. J Alzheimers Dis. 2020;78(4):1547‐1574. doi: 10.3233/JAD-200888 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Weston J, Lenain R, Meepagama U, Fristed E. Learning de‐identified representations of prosody from raw audio. 2021;139:11134‐11145. [Google Scholar]
- 15. Airola A, Pahikkala T, Waegeman W, De Baets B, Salakoski T. An experimental comparison of cross‐validation techniques for estimating the area under the ROC curve. Comput Stat Data Anal. 2011;55(4):1828‐1844. doi: 10.1016/j.csda.2010.11.018 [DOI] [Google Scholar]
- 16. Albert MS, DeKosky ST, Dickson D, et al. The diagnosis of mild cognitive impairment due to Alzheimer's disease: recommendations from the National Institute on Aging‐Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimers Dement. 2011;7(3):270‐279. doi: 10.1016/j.jalz.2011.03.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Folstein MF, Folstein SE, McHugh PR. Mini‐mental state” a practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res. 1975;12(3):189‐198. [DOI] [PubMed] [Google Scholar]
- 18. Morris JC. The clinical dementia rating (CDR): current version and scoring rules. Neurology. 1993;43(11):2412‐2414. doi: 10.1212/wnl.43.11.2412-a [DOI] [PubMed] [Google Scholar]
- 19. Skirrow C, Meszaros M, Meepegama U, et al. Validation of a remote and fully automated story recall task to assess for early cognitive impairment in older adults: a longitudinal case‐control observational study. JMIR Aging. 2022;5(3):e37090. doi: 10.2196/37090 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Wechsler D, Stone CP. Wechsler Memory Scale‐Revised. Psychological Corporation; 1987. [Google Scholar]
- 21. Wechsler D. WAIS‐R Manual: Wechsler Adult Intelligence Scale‐Revised. Psychological Corporation; 1981. [Google Scholar]
- 22. Grober E, Ocepek‐Welikson K, Teresi J. The free and cued selective reminding test: evidence of psychometric adequacy. Psychol Sci Q. 2009;51(3):266‐282. [Google Scholar]
- 23. Fan J, Upadhye S, Worster A. Understanding receiver operating characteristic (ROC) curves. CJEM. 2006;8(1):19‐20. doi: 10.1017/s1481803500013336 [DOI] [PubMed] [Google Scholar]
- 24. Weston J, Lenain R, Meepegama U, Fristed E. Generative pretraining for paraphrase evaluation. ACL. 2022;1:4052‐4073. [Google Scholar]
- 25. Beltagy I, Peters ME, Cohan A. Longformer: The Long‐Document Transformer. 2020. ArXiv 2020:arXiv:2004.05150.
- 26. Google Speech‐to‐Text. Accessed Jun 7, 2022 https://cloud.google.com/speech‐to‐text
- 27. Lhoest Q, del Moral AV, Jernite Y, et al. Datasets: a community library for natural language processing. arXiv. Epub 2021: doi: arXiv:2109.02846 [Google Scholar]
- 28. Sun X, Xu W. Fast implementation of delong's algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Process Lett. 2014;21(11):1389‐1393. doi: 10.1109/LSP.2014.2337313 [DOI] [Google Scholar]
- 29. Statista. Resident population of the United States by sex and age as of July 1, 2020. Resident Population of the United States by Sex and Age as of July 1, 2020 (in Millions) 2021. https://www.statista.com/statistics/241488/population‐of‐the‐us‐by‐sex‐and‐age/(accessed September 27, 2021)
- 30. Petersen RC, Lopez O, Armstrong MJ, et al. Practice guideline update summary: mild cognitive impairment: report of the guideline development, dissemination, and implementation subcommittee of the American Academy of Neurology. Neurology. 2018;90(3):126‐135. doi: 10.1212/WNL.0000000000004826 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Tong T, Thokala P, McMillan B, Ghosh R, Brazier J. Cost effectiveness of using cognitive screening tests for detecting dementia and mild cognitive impairment in primary care. Int J Geriatr Psychiatry. 2017;32(12):1392‐1400. doi: 10.1002/gps.4626 [DOI] [PubMed] [Google Scholar]
- 32. Mitchell AJ. A meta‐analysis of the accuracy of the mini‐mental state examination in the detection of dementia and mild cognitive impairment. J Psychiatr Res. 2009;43(4):411‐431. doi: 10.1016/j.jpsychires.2008.04.014 [DOI] [PubMed] [Google Scholar]
- 33. Martínez‐Nicolás I, Llorente TE, Martínez‐Sánchez F, Meilán JJG. Ten years of research on automatic voice and speech analysis of people with alzheimer's disease and mild cognitive impairment: a systematic review article. Front Psychol. 2021;12:620251. doi: 10.3389/fpsyg.2021.620251 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Chapin K, Clarke N, Garrard P, Hinzen W. A finer‐grained linguistic profile of Alzheimer's disease and Mild Cognitive Impairment. J Neurolinguistics. 2022;63:101069. doi: 10.1016/j.jneuroling.2022.101069 [DOI] [Google Scholar]
- 35. Ahmed S, Haigh AM, de Jager CA, Garrard P. Connected speech as a marker of disease progression in autopsy‐proven Alzheimer's disease. Brain. 2013;136(Pt 12):3727‐3737. doi: 10.1093/brain/awt269 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Mueller KD, Van Hulle CA, Koscik RL, et al. Amyloid beta associations with connected speech in cognitively unimpaired adults. Alzheimers Dement (Amst). 2021;13(1):e12203. doi: 10.1002/dad2.12203 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Livingston G, Huntley J, Sommerlad A, et al. Dementia prevention, intervention, and care: 2020 report of the lancet commission. Lancet. 2020;396(10248):413‐446. doi: 10.1016/S0140-6736(20)30367-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Alzheimer's Association. 2019. Alzheimer's facts and figures. https://www.alz.org/media/documents/alzheimers‐facts‐and‐figures‐2019‐r.pdf (accessed July 22, 2021)
- 39. Connolly A, Gaehl E, Martin H, Morris J, Purandare N. Underdiagnosis of dementia in primary care: variations in the observed prevalence and comparisons to the expected prevalence. Aging Ment Health. 2011;15(8):978‐984. doi: 10.1080/13607863.2011.596805 [DOI] [PubMed] [Google Scholar]
- 40. Jansen WJ, Ossenkoppele R, Knol DL, et al. Prevalence of cerebral amyloid pathology in persons without dementia: a meta‐analysis. JAMA. 2015;313(19):1924‐1938. doi: 10.1001/jama.2015.4668 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Guo T, Shaw LM, Trojanowski JQ, Jagust WJ, Landau SM, Alzheimer's Disease Neuroimaging Initiative . Association of CSF Aβ, amyloid PET, and cognition in cognitively unimpaired elderly adults. Neurology. 2020;95(15):e2075‐e2085. doi: 10.1212/WNL.0000000000010596 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Jack CR, Bennett DA, Blennow K, et al. A/T/N: an unbiased descriptive classification scheme for Alzheimer's disease biomarkers. Neurology. 2016;87(5):539‐547. doi: 10.1212/WNL.0000000000002923 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Ilias L, Askounis D. Multimodal deep learning models for detecting dementia from speech and transcripts. Front Aging Neurosci. 2022;14:830943. doi: 10.3389/fnagi.2022.830943 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Martinc M, Haider F, Pollak S, Luz S. Temporal integration of text transcripts and acoustic features for Alzheimer's diagnosis based on spontaneous speech. Front Aging Neurosci. 2021;13:642647. doi: 10.3389/fnagi.2021.642647 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Fraser KC, Lundholm Fors K, Eckerström M, Öhman F, Kokkinakis D. Predicting MCI status from multimodal language data using cascaded classifiers. Front Aging Neurosci. 2019;11:205. doi: 10.3389/fnagi.2019.00205 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting Information
Supporting Information
