Abstract
Background:
The age distribution and diversity of the VA Million Veteran Program (MVP) cohort make it a valuable resource for studying the genetics of Alzheimer’s disease (AD) and related dementias (ADRD).
Objective:
We present and evaluate the performance of several ICD code-based classification algorithms for AD, ADRD, and dementia for use in MVP genetic studies and other studies using VA electronic medical record (EMR) data. These were benchmarked relative to existing ICD algorithms and AD-medication-identified cases.
Methods:
We used chart review of n=103 MVP participants to evaluate diagnostic utility of the algorithms. Suitability for genetic studies was examined by assessing association with APOE-ε4, the strongest genetic AD risk factor, in a large MVP cohort (n=286K).
Results:
The newly developed MVP-ADRD algorithm performed well, comparable to the existing PheCode dementia algorithm (Phe-Dementia) in terms of sensitivity (0.95 and 0.95) and specificity (0.65 and 0.70). The strongest APOE-ε4 associations were observed in cases identified using MVP-ADRD and Phe-Dementia augmented with medication-identified cases (MVP-ADRD or medication, p=3.6 ×10−290; Phe-Dementia or medication, p=1.4 ×10−290). Performance was improved when cases were restricted to those with onset age ≥60.
Conclusions:
We found that our MVP-developed ICD-based algorithms had good performance in chart review and generated strong genetic signals, especially after inclusion of medication-identified cases. Ultimately, our MVP-derived algorithms are likely to have good performance in the broader VA, and their performance may also be suitable for use in other large-scale EMR-based biobanks in the absence of definitive biomarkers such as amyloid-PET and CSF-biomarkers.
Keywords: Alzheimer’s disease, ADRD, military Veterans, phenotyping, algorithm development
INTRODUCTION
A major public health challenge is the rapidly increasing prevalence of Alzheimer’s disease (AD) and AD-related dementias (ADRD). In the United States, one current estimate of AD prevalence for individuals 65 and older is 6.7 million.1 With adults living longer and the average age in the U.S. increasing,2 the prevalence of AD and ADRD is expected to grow. This necessitates the curation of large, real-world datasets that will allow for comprehensive examinations of the various risk and protective factors for dementia. One source of such data is large-scale enterprise-level electronic medical record (EMR) systems, which can be utilized for epidemiological investigations. Examples of EMR systems include the Centers for Medicare & Medicaid Services’ (CMS) Chronic Conditions Warehouse (CCW), the EMR database incorporated into the National Institutes of Health All of Us Research Program, and the US Department of Veterans Affairs (VA) Corporate Data Warehouse (CDW).
Studies of EMR data often utilize International Classification of Diseases (ICD) codes for dementia case-control ascertainment.3–5 Of note, a variety of ICD-code only algorithms for dementia have already been developed that can be used in epidemiological research. For example, the CMS CCW has an established set of algorithms used to identify AD cases in Medicare and Medicaid data. PheCodes, or phenotype codes, are rules-based algorithms that combine similar ICD codes into a single, clinically meaningful phenotype.6 Additionally, a group published an algorithm that built off the PheCode algorithms and incorporated language processing (i.e., a technique that involves searching for and extracting information from the EMR, including key words and/or strings of text) to identify cases and controls for the purposes of large-scale phenome-wide association studies.7 This algorithm is referred to as the multimodal automated phenotyping (MAP) algorithm, which produces a likelihood of being a case, and an optimal cutoff is determined based on the distribution of these percentage estimates.7
In general, ICD code-based dementia classifiers have been shown to have reasonable performance in clinical settings, with positive predictive values (PPVs) that can vary widely depending on the data source (33–100%) but are often able to achieve PPVs of >75%.8 Requiring the presence of more than one ICD code for cases has been shown to improve the PPV.9 Furthermore, ICD code-based classifications are often used as a first step in training machine learning algorithms to identify important features in the EMR.3 Algorithms can also combine ICD codes with additional features from the EMR, including pharmacy data (e.g., whether the patient was prescribed AD medications).
The VA Million Veteran Program (MVP) is one of the largest and most genetically diverse biobanks in the world. Launched in 2011, MVP has recruited more than one million Veterans as of November 2023, ranging in age from 18 to over 100, with recruitment and enrollment ongoing.10 The mission of MVP is to examine how genes and environmental factors confer risk for medical illnesses and influence health outcome in the Veteran population, with the ultimate goal of bringing personalized medicine to the forefront of VA healthcare.10,11 Importantly, nearly half of the MVP cohort is age 65 or older and hence are at risk for AD and related forms of dementia. Indeed, the VA EMR has previously been used for epidemiological research on dementia (e.g.,3–5). However, there are some known limitations that must be addressed when using EMR data and ICD codes to study AD/ADRD, although these problems are not unique to VA. First, diagnoses of AD/ADRD are often entered into the EMR by clinicians who do not specialize in the assessment of dementia;12,13 thus, the VA EMR includes wide usage of non-specific dementia codes such as ICD9:294.21 (i.e., dementia, unspecified, with behavioral disturbance) in patients with AD, even in specialty clinics.12,13 Because of this issue, the inclusion of other AD-related traits and dementias can increase the ability to detect individuals with an AD diagnosis that can be missed by examining AD-specific ICD codes.14 Second, Veterans with AD have high rates of mixed dementias (e.g., vascular), which can make determining the primary etiology difficult. Third, only a small portion of clinically diagnosed AD cases have neuropathological confirmation or AD-biomarker data available in the EMR. Similarly, comprehensive neuropsychological assessment data are available for only a fraction of AD cases. Fourth, earlier onset (< 60 years old) case identification is important but complicated by the fact that false positive rates are very high when dementia ICD code classifications are applied to younger patients in the VA EMR4 and other health systems.15 Given these known challenges, we sought to develop novel ICD code algorithms, tailored to MVP studies of AD and dementia. These algorithms are similar to the PheCode and CCW algorithms, but adapted for use in VA, and MVP in particular.
In this study, we present a chart review-based validation of a series of AD and dementia-related algorithms developed in MVP, and include comparisons to PheCode, CCW, and MAP algorithms for a spectrum of phenotypes from AD to all-cause dementia. Additionally, since MVP includes the assessment of genome-wide genotypes for MVP participants, MVP also affords a unique opportunity to compare our algorithms to a strong genetic predictor of AD. Specifically, we evaluated the suitability of using our algorithms by testing associations with the apolipoprotein E (APOE) ε4 isoform. APOE ε4 is the largest genetic determinant of the common “late onset” form of AD,16 although the effect of APOE ε4 varies substantially by ancestry.17 Hypotheses for this study were that our ADRD diagnosis would be the most suited for AD-related dementia studies and inclusion in large-scale epidemiological and genetically informative projects.
MATERIALS & METHODS
Cohort
The VA EMR, dating back to 1997,18 is the primary data source of phenotype data for MVP. MVP studies are provisioned tables including all ICD-9/10 codes, treatment codes, and medication/pharmacy data for all MVP participants. This study is based on the MVP v18_2 phenotype release which includes MVP participants enrolled prior to January 9, 2019 (n=698,352). All Veterans participating in MVP sign an informed consent form upon study enrollment. Two cohorts of MVP participants were examined as part of this study: the “Chart Review Cohort” and the “Genetic Cohort.” Details for each cohort are included below. The overarching MVP project was approved by the VA Central Institutional Review Board in 2010.
Chart Review Cohort.
Complete charts minus identifiers for 103 Veterans were provided to this study by the MVP Data Core in two batches. The first batch (n=20) was a random selection of participants from three MAP algorithm AD probability bins: low AD probability (<=10%, n=7), medium AD probability (50–60%, n=7), and high AD probability (~90%, n=6). The second batch (n=83) was a random selection of participants chosen to ensure that they had at least some level of evidence for mild cognitive impairment (MCI), AD, or dementia by requiring that the subject have exactly 1 ICD code for AD (n=29) regardless of the other types of codes, exactly 1 ICD code for MCI (n=26), or exactly 1 ICD code for any form of dementia (n=28).
Genetic Cohort.
As the APOE effect varies by ancestry,17 we restricted these analyses to MVP participants with genetic data of European ancestry as determined using the harmonized ancestry and race/ethnicity (HARE) method.19 The control cohort was selected to be age 65+ with no dementia-related ICD codes, MCI ICD codes, or AD prescription medication (n=258,257). AD cases were identified from the remaining MVP cohort according to the different algorithms and/or medication usage, and n’s vary by algorithm (largest n=27,839 cases; see Table 1 for details). We did not initially filter cases by age of onset. However, given the unreliability previously noted in dementia ICD codes assigned to younger Veterans,4 we examined the possibility that the association between ε4 and case status would increase after restricting our cases to individuals who received their first dementia code past a certain age. The association between case status and APOE ε4 was again evaluated, but excluding subjects who had their first dementia ICD code prior to two thresholds: age 60 or age 65.
Table 1.
ICD codes included in the different classification algorithms, including (1A) ICD-9 Codes and (1B) ICD-10 Codes. Blue represents AD-specific algorithms; green represents AD+ (an intermediate step between AD and ADRD) algorithms; yellow represents ADRD algorithms; and red represents dementia algorithms.
| Diagnostic Category | AD | Non-Specific Dementia | Related Dementia | Other Dementias | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (1A) ICD-9 Codes | 331.0 Alzheimer Disease | 294.20 Unspecified dementia without behavioral disturbance | 294.21 Unspecified dementia with behavioral disturbance | 294.8 Other persistent mental disorders | 290.0 Senile dementia uncomplicated | 290.20 Senile dementia with delusional features | 290.21 Senile dementia with depressive features | 290.3 Senile dementia with delirium | 331.2 Senile degeneration of the brain | 331.7 Cerebral degeneration | 290.40 Vascular dementia | 290.41 Vascular dementia | 290.42 Vascular dementia | 290.43 Vascular dementia | 331.82 Lewy Body Dementia | 331.1 Frontotemporal dementia | 331.19 Other Frontotemporal dementia | 290.10 Presenile Dementia | 290.11 Presenile dementia w/delirium | 290.12 Presenile delusion | 290.13 Presenile dementia w/depressive features | 331.5 Idiopathic normal pressure Hydrocephalus | 333.4 Huntington Disease | 332 Parkinson Disease | 331.11 Pick Disease of the Brain | 294.10 Dementia in conditions w/behavioral disturbance | 294.11 Dementia in conditions w/o behavioral disturbance | 797 Senility without mention of psychosis |
| MVP-AD | ||||||||||||||||||||||||||||
| CCW-AD | ||||||||||||||||||||||||||||
| Phe-AD | ||||||||||||||||||||||||||||
| MAP-AD | ||||||||||||||||||||||||||||
| MVP-AD+ | ||||||||||||||||||||||||||||
| MVP-ADRD | ||||||||||||||||||||||||||||
| CCW-ADRD | ||||||||||||||||||||||||||||
| MVP-Dementia | ||||||||||||||||||||||||||||
| Phe-Dementia | ||||||||||||||||||||||||||||
| MAP-Dementia | ||||||||||||||||||||||||||||
| Diagnostic Category | AD | Non-Spec Dem | Related Dementia | Other Dementias | Other Age-Related Codes | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (1B) ICD-10 Codes | G30.0 Alzheimer Disease Early Onset | G30.1 Alzheimer Disease | G30.8 Alzheimer Disease | G30.9 Alzheimer Disease | F03.90 Unspecified dementia w/o behavioral disturbance | F03.91 Unspecified dementia w/behavioral disturbance | F01.50 Vascular dementia | F01.51 Vascular dementia | G31.83 Lewy Body Dementia | G31.0 Frontotemporal dementia | G31.09 Other Frontotemporal dementia | G91.2 Idiopathic Normal Pressure Hydrocephalus | G10 Huntington Disease | G20 Parkinson Disease | A81.00 Creutzfeldt-Jakob Disease | G31.01 Picks Disease of the Brain | F10.96 Korsakoff Syndrome | F02.80 Dementia in other diseases w/behavioral disturbance | F02.81 Dementia in other diseases w/o behavioral disturbance | G31.1 Senile degeneration of the brain | F04 Amnestic disorder | F05 Delirium | F06.1 Catatonic disorder | F06.8 Other specified disorders due to known physiological | G13.8 Systemic atrophy | G31.2 Degeneration of nervous system due to alcohol | G94 Other disorders of brain | R41.81 Age-related cognitive decline | R54 Age-related physical debility |
| MVP-AD | |||||||||||||||||||||||||||||
| CCW-AD | |||||||||||||||||||||||||||||
| Phe-AD | |||||||||||||||||||||||||||||
| MAP-AD | |||||||||||||||||||||||||||||
| MVP-AD+ | |||||||||||||||||||||||||||||
| MVP-ADRD | |||||||||||||||||||||||||||||
| CCW-ADRD | |||||||||||||||||||||||||||||
| MVP-Dementia | |||||||||||||||||||||||||||||
| Phe-Dementia | |||||||||||||||||||||||||||||
| MAP-Dementia | |||||||||||||||||||||||||||||
MVP AD-Related Diagnostic Algorithms
As part of ongoing MVP projects (e.g.20–22), we developed four phenotypes related to AD and dementia based on ICD codes, adapted to the difficulties of identifying AD cases based on the limitations of EMR data and the VA EMR. From most narrow to most broad, our ICD code-based phenotypes were: (1) late-onset “AD” were cases age 65+; (2) “AD+,” which includes AD or other non-specific dementia codes (e.g., dementia without behavioral disturbance); (3) Alzheimer’s Disease and Related Dementias (“ADRD”), and (4) “Dementia”. To qualify as a case at any of these levels, we required the presence of two or more qualifying ICD codes on different dates. See Table 1 for a listing of the ICD codes included in each definition and the relationship to the other ICD code-based classifiers as described below. Our four diagnosis levels are nested by design, so that anyone who is classified as an AD case also meets criteria for AD+, ADRD, and Dementia. In the MVP-AD+, MVP-ADRD, and MVP-Dementia algorithms, we included the ICD code 294.8 (“Other persistent mental disorders/Other specified organic brain syndromes [chronic]),” as this has been used as a dementia diagnosis code within the VA system.13
Other Algorithms
For comparison, we evaluated the performance of AD and dementia-related algorithms from CCW (n=2); PheCode (n=2); and MAP (n=2). Additionally, we included another algorithm based on the presence of prescriptions for FDA-approved medications for the treatment of AD from the VA EMR. Implementation of these algorithms in the MVP cohort is described below.
Chronic Conditions Warehouse (CCW) Algorithms
In this study, we evaluated two already existing dementia-related algorithms from CCW: (1) Alzheimer’s Disease (“CCW-AD,” revised 02/2022) and Alzheimer’s Disease and Related Disorders or Senile Dementia (“CCW-ADRD,” revised 02/2022). Based on our observation that identification of CCW-AD and CCW-ADRD cases in other studies only requires one or more ICD codes, we also required only one ICD code (inpatient or outpatient) to be considered a case in the CCW algorithms.
PheCode Algorithms
We examined PheCode 290.11 (Alzheimer’s disease; referred to as “Phe-AD” hereafter) and PheCode 290.1 (Dementias; referred to as “Phe-Dementia” hereafter). See Table 1 for the ICD-9/10 codes required to be a Phe-AD and Phe-Dementia case. As is typical for many studies using PheCode diagnoses,23 we required two or more qualifying ICD codes on different dates to be identified as a Phe-AD or Phe-Dementia case.
Multimodal Automated Phenotyping (MAP) Algorithms
We examined MAP 290.11 (“MAP-AD”) and 290.1 (“MAP-Dementia”). Note that the ICD codes used for MAP-AD and MAP-Dementia are the same as those included in the Phe-AD and Phe-Dementia algorithms. We used the probability thresholds for case identification as determined to be optimal according to the MAP algorithm, specifically AD likelihood>0.22 and Dementia likelihood >0.24.
AD-Medication
We evaluated a set of MVP participants identified as being a “medication” case if the VA EMR included a prescription of cholinesterase inhibitors (e.g., donepezil, galantamine, rivastigmine) or memantine, rather than on the basis of ICD codes (although these individuals were not excluded from the ICD code algorithms). We additionally evaluated the performance of case sets cases identified using MVP-ADRD and Phe-Dementia augmented with medication-identified cases (i.e., MVP-ADRD or medication; Phe-Dementia or medication).
Chart Review Procedures
The notes from the EMR were scrubbed of patient identifiers (names, birth dates, addresses) and provisioned to an MVP workspace. To detect relevant notes, only those with specific keywords related to AD and dementia were provisioned. See Supplementary Table 1 for a full list of the keywords. A Microsoft Access template was used to access the files and to record relevant information for the review (see “Review Template” in the Supplementary Materials). The template allowed the noting of specific evidence used to make the diagnostic determinations, including the presence of a neuropsychological assessment, presence of cognitive screening scores (e.g., Mini-Mental State Exam [MMSE], Montreal Cognitive Assessment [MoCA], St. Louis University Mental Status [SLUMS]), the results of brain neuroimaging scans, AD medications, and autopsy results.
Given the nature of the VA EMR, the evidence needed to determine whether a patient had AD or dementia with a high degree of certainty was not always available (e.g., AD-related biomarkers, neuropsychological testing, neuroimaging, post-mortem data). As such, we utilized a “silver standard” classification system where each subject was classified as “Likely”, “Possible”, or “Not Likely” AD cases and all-cause dementia cases. A detailed description of the guidelines and criteria used by the reviewers is provided in the Supplementary Materials (see “Review Guide”). While the raters were encouraged to use their clinical experience and judgment, they were instructed to forego requiring the highest level of evidence they would utilize in their direct evaluations of patients to establish a clinical diagnosis of AD or dementia. Instead, they were instructed to quantify and rate the strength of the evidence present in the EMR for the purposes of algorithm evaluation, although they were not informed of the algorithm classifications.
Charts were randomly assigned to two reviewers. Reviewers included three neuropsychologists and one psychiatrist with specialty training in the assessment of dementia in older adults. At weekly consensus meetings, each reviewer’s diagnostic classifications were recorded. In the cases where there was initial disagreement, it was noted for the purposes of determining inter-rater reliability (IRR), and then the reviewers were asked to reach a consensus.
IRR was calculated using the Cohen’s Kappa statistic using the R irr package (https://cran.r-project.org/web/packages/irr/irr.pdf). The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated using the epi.tests function in the R epiR package (https://cran.r-project.org/web/packages/epiR/epiR.pdf). Our primary analysis contrasted those with “Not Likely” with “Possible” and “Likely” AD or dementia. Since it was not reasonable to evaluate the performance of an AD algorithm relative to a chart review for a broad dementia classification and vice versa, we contrasted the AD-centric algorithms (i.e., MVP-AD, CCW-AD, Phe-AD, MAP-AD) to the chart review-based diagnosis of AD, and the dementia-centric algorithms (i.e., MVP-Dementia, Phe-Dementia, MAP-Dementia) were compared to the chart review-based diagnosis of dementia. AD medication usage and the AD+ and ADRD algorithm classifications (MVP-AD+, MVP-ADRD, CCW-ADRD) were compared to both AD and dementia chart review classifications.
APOE Genotyping and Analysis
Genotype data processing and cleaning was performed by the MVP Bioinformatics Core. The chip design and genotype cleaning pipeline have been described elsewhere.24 APOE genotypes were generated based on the imputed genotypes for the two SNPs used to determine the isoform: rs7412 and rs429358. SNP data were extracted from the Phase 4 MVP genotype release which includes data for approximately 650,000 MVP participants. Imputation was performed using the NHLBI Trans-Omics for Precision Medicine (TOPMed) reference panel.25 Both rs7412 and rs429358 are well imputed in MVP participants of European ancestry (imputation quality r2>.9). The “best-guess” genotypes for these SNPs were generated from the imputed data using a certainty threshold of 90%. From the APOE isoform genotypes, the number of ε4 alleles (0, 1, or 2) was coded for analysis. APOE ε4 dosage was tested in association with case/control status in a logistic model with APOE ε4 allele dosage included as a continuous predictor with age as a covariate. We additionally examined the association between the individual APOE genotypes (ε2ε2, ε2ε3, ε3ε4, ε4ε4) vs the most common ε3ε3 genotype in logistic models. We did not analyze the ε2ε4 genotype due to the heterogeneous effect of the protective ε2 allele and ε4 risk allele. Finally, to ensure that the genetic association is not solely limited to the APOE locus, we examined associations between case/control status and an AD polygenic risk score (PRS) calculated based on the Kunkle et al. AD GWAS26,27 as we have done elsewhere20. This PRS excluded the APOE region on chromosome 19 and is, therefore, an index of AD genetic risk summarizing the contribution from other known risk loci throughout the genome.
RESULTS
ICD Code Comparison
Table 1 presents the ICD-9 and ICD-10 codes included in each of the algorithms examined. The set of ICD codes for the three AD algorithms are identical, with the exception of MVP-AD, which purposefully did not include the ICD-10 code for Alzheimer’s Disease Early Onset (ICD-10:G30.0). The MVP, CCW, PheCode, and MAP algorithms for ADRD and dementia are broadly similar in that they all include the AD codes (including early onset AD), various non-specific dementia codes, and a core set of related dementias including Vascular Dementia (ICD-9:290.4 and ICD-10:F01.5) and Frontotemporal Dementia (ICD-9:331.1 and ICD-10:G31.0). However, the CCW-ADRD algorithm only includes Other Frontotemporal Dementia codes (ICD-9:331.19 and ICD-10:G31.09) and does not include the Frontotemporal Dementia code itself.
Overall, the CCW-ADRD algorithm incorporates the largest number of codes from the broadest set of diagnostic categories (i.e., AD, non-specific dementia, related dementia, other dementias, and other age-related codes), many of which are not included in the other ADRD and dementia algorithms (e.g., Cerebral Degeneration [ICD-9:331.7], Delirium [ICD-10:F05], Age-Related Physical Disability [ICD-10:R54]). The CCW-ADRD algorithm also differs from the other ADRD and dementia algorithms in that it excludes the codes for Lewy Body Dementia (ICD-9:331.82 and ICD-10:G31.83), even though this is considered a core ADRD component elsewhere.
The MVP-ADRD and Phe-Dementia algorithms are generally similar but differ on a few points. Specifically, MVP-ADRD includes Other Persistent Mental Disorders (ICD-9:294.8), which, as we noted, has previously been used as a dementia code in the VA, and Phe-Dementia includes Pick’s Disease (ICD-9:331.11 and ICD-10:G31.01) and several “in other condition” codes (e.g., Dementia in other diseases with behavioral disturbance [ICD-10:F02.80]), which are only included in the MVP coding scheme at the MVP-Dementia level. Finally, the MVP-Dementia algorithm considers a range of dementia and cognitive disorders that are not included in the other ADRD and dementia algorithms (e.g., Idiopathic Normal Pressure Hydrocephalus [ICD-9:331.5 and ICD-10:G91.2] and Parkinson’s Disease [ICD-9:332 and ICD-10:G20]).
Chart Review Results
Supplementary Table 2 presents the demographic information for MVP participants selected for chart reviews (n=103). The mean age at last recorded VA visit of those provisioned was approximately 82 years in both batches. The IRR was high for both the classification of AD (kappa=0.714) and dementia (kappa=0.842). Based on the charts that were reviewed, most had “Likely Dementia” (70% in Batch 1 and 64% in Batch 2). In contrast, substantially fewer subjects were classified as “Likely AD” (45% in Batch 1 and 29% in Batch 2).
The frequencies of the ICD code classifications were examined within the chart review subjects and in MVP as a whole (Table 2). The largest number of AD and dementia cases were identified using the CCW algorithms. In fact, because of the requirements for Batch 1, all included charts would necessarily qualify as CCW-AD and CCW-ADRD cases. The MVP-ADRD, MVP-Dementia, and Phe-Dementia classifications identified fewer cases than the CCW-ADRD classification, and the MAP-AD and MAP-Dementia algorithms identified the least.
Table 2.
Prevalence of AD/Dementia according to the different algorithms compared to chart review diagnoses, in MVP as a whole, and in MVP participants age 65 and over.
| Chart Review Batch 1 (n=20) | Chart Review Batch 2 (n=83) | MVP v18_2 Full Cohort (n=698,352) | MVP v18_2 Age ≥ 65 (n=405,540) | |
|---|---|---|---|---|
| N (%) | N (%) | N (%) | N (%) | |
| MVP-AD | 12 (60.00) | 35 (42.17) | 4,931 (0.71) | 4,791 (1.18) |
| CCW-AD | 20 (100.00) | 63 (75.90) | 8,169 (1.17) | 7,745 (1.91) |
| Phe-AD | 12 (60.00) | 37 (44.58) | 5,034 (0.72) | 4,877 (1.20) |
| MAP-AD | 13 (65.00) | 35 (42.17) | 2,326 (0.33) | 2,240 (0.55) |
| MVP-AD+ | 17 (85.00) | 58 (69.88) | 20,478 (2.93) | 18,425 (4.54) |
| MVP-ADRD | 18 (90.00) | 64 (77.11) | 23,900 (3.42) | 21,397 (5.28) |
| CCW-ADRD | 20 (100.00) | 78 (93.98) | 42,790 (6.13) | 35,312 (8.71) |
| MVP-Dementia | 18 (90.00) | 67 (80.72) | 33,401 (4.78) | 29,451 (7.26) |
| Phe-Dementia | 18 (90.00) | 66 (79.52) | 22,055 (3.16) | 20,013 (4.93) |
| MAP-Dementia | 15 (75.00) | 48 (57.83) | 10,995 (1.57) | 10,195 (2.51) |
| AD Medication | 17 (85.00) | 47 (56.63) | 18,941 (2.71) | 17,040 (4.20) |
Abbreviations: AD = Alzheimer’s disease; MVP = Million Veteran Program; CCW = Chronic Conditions Warehouse; ADRD = Alzheimer’s disease and related dementias; MAP = Multimodal Automated Phenotype.
The sensitivity and specificity of the different AD and ADRD algorithms relative to chart review classifications of “Not Likely AD” vs. “Possible/Likely AD” are presented in Figure 1 and Table 3A. The corresponding comparison of the ADRD and Dementia algorithms relative to the classifications of “Not Likely Dementia” vs. “Possible/Likely Dementia” are presented in Figure 2 and Table 3B. The classic trade-off of sensitivity vs. specificity is readily apparent in the results. The CCW algorithms, which identified most chart review subjects as cases, had sensitivity and NPV of 1 for both AD and dementia, but lower specificity and PPV for AD (0.48 and 0.73, respectively) and dementia (0.22 and 0.82). When compared with the chart review classifications of AD, the MVP, PheCode, and MAP AD algorithms had very similar performance, with sensitivity and specificity of 0.69–0.72 and 0.86–0.88, respectively. When compared with the chart review classifications of dementia, the MVP-ADRD, MVP-Dementia, and Phe-Dementia classifications had very similar performance, with sensitivity estimates of 0.95–0.96 and specificity estimates of 0.65–0.70. The MAP-Dementia algorithm was more conservative than the MVP-Dementia and Phe-Dementia algorithms, with a significantly lower sensitivity (based on the non-overlapping CIs) and a higher specificity estimate of 0.87, but one that was not significantly higher.
Figure 1.

The sensitivity of specificity of AD and expanded AD algorithms as well as AD-medication-identified cases when evaluated in a chart review of 103 MVP participants selected to have some evidence of dementia or cognitive decline.
Table 3.
Performance of (3A) AD and ADRD algorithms relative to the chart review outcomes of “Not Likely AD” vs. “Possible/Likely AD” and (3B) ADRD and Dementia algorithms relative to the chart review outcomes of “Not Likely Dementia” vs. “Possible/Likely Dementia.”
| Classification Algorithm | Sensitivity | Specificity | Positive Predictive Value | Negative Predictive Value | |
|---|---|---|---|---|---|
| A) Compared to Chart Review AD | MVP-AD | 0.70 (0.57, 0.81) | 0.88 (0.74, 0.96) | 0.90 (0.77, 0.97) | 0.67 (0.53, 0.79) |
| CCW-AD | 1.00 (0.94, 1.00) | 0.48 (0.32, 0.64) | 0.73 (0.63, 0.83) | 1.00 (0.83, 1.00) | |
| Phe-AD | 0.72 (0.59, 0.83) | 0.88 (0.74, 0.96) | 0.90 (0.78, 0.97) | 0.69 (0.54, 0.80) | |
| MAP-AD | 0.69 (0.56, 0.80) | 0.86 (0.71, 0.95) | 0.88 (0.75, 0.95) | 0.65 (0.51, 0.78) | |
| MVP-AD+ | 0.95 (0.86, 0.99) | 0.60 (0.43, 0.74) | 0.77 (0.66, 0.86) | 0.89 (0.72, 0.98) | |
| MVP-ADRD | 0.98 (0.91, 1.00) | 0.45 (0.30, 0.61) | 0.72 (0.61, 0.82) | 0.95 (0.75, 1.00) | |
| AD Medication | 0.75 (0.63, 0.86) | 0.57 (0.41, 0.72) | 0.72 (0.59, 0.82) | 0.62 (0.45, 0.77) | |
| B) Compared to Chart Review Dementia | MVP-AD+ | 0.88 (0.78, 0.94) | 0.78 (0.56, 0.93) | 0.93 (0.85, 0.98) | 0.64 (0.44, 0.81) |
| MVP-ADRD | 0.95 (0.88, 0.99) | 0.70 (0.47, 0.87) | 0.92 (0.83, 0.97) | 0.80 (0.56, 0.94) | |
| CCW-ADRD | 1.00 (0.95, 1.00) | 0.22 (0.07, 0.44) | 0.82 (0.73, 0.89) | 1.00 (0.48, 1.00) | |
| MVP-Dementia | 0.96 (0.89, 0.99) | 0.65 (0.43, 0.84) | 0.91 (0.82, 0.96) | 0.83 (0.59, 0.96) | |
| Phe-Dementia | 0.95 (0.88, 0.99) | 0.65 (0.43, 0.84) | 0.90 (0.82, 0.96) | 0.79 (0.54, 0.94) | |
| MAP-Dementia | 0.75 (0.64, 0.84) | 0.87 (0.66, 0.97) | 0.95 (0.87, 0.99) | 0.50 (0.34, 0.66) | |
| AD Medication | 0.70 (0.59, 0.80) | 0.65 (0.43, 0.84) | 0.88 (0.77, 0.94) | 0.38 (0.23, 0.55) |
Abbreviations: AD = Alzheimer’s disease; ADRD = Alzheimer’s disease and related dementias; MVP = Million Veteran Program; CCW = Chronic Conditions Warehouse; MAP = Multimodal Automated Phenotype.
Figure 2.

The sensitivity of specificity of expanded AD and Dementia algorithms as well as AD-medication-identified cases when evaluated in a chart review of 103 MVP participants selected to have some evidence of dementia or cognitive decline.
The MVP-AD+ algorithm, which is an intermediate step between the MVP-AD and the MVP-ADRD algorithms, did increase sensitivity relative to the AD algorithm (0.95 for MVP-AD+ vs. 0.70 for MVP-AD); however, this was accompanied by a commensurate drop in specificity (0.60 for MVP-AD+ vs. 0.88 for MVP-AD). Unsurprisingly, given prior reports of non-selective prescription of AD medication in practice,28,29 AD medication had low specificity for AD based on chart review (0.57), although slightly above the CCW-AD estimates. The sensitivity for AD medication (0.75) was slightly higher than the MVP-AD, Phe-AD, and MAP-AD algorithms, although not significantly so. When compared to the chart review-based classification of dementia, the specificity of AD medication was identical to that of the MVP-Dementia and Phe-Dementia algorithms, but had a significantly lower sensitivity of 0.70, similar to the MAP-Dementia algorithm. The PPV and NPV estimates mirrored these trends, but with a narrower range, so most comparisons were not significant based on the CIs (Table 3A and 3B).
APOE Results
Descriptive statistics for the MVP participants in the APOE analysis are presented in Supplementary Table 3. The AD/dementia cases identified by the different algorithms and controls were largely male (96.6% to 97.3%). The mean age for controls (75.7 years) was less than the mean age across the different case definitions, which were similar (79.7 to 83.0 years). The proportion (24%) of APOE ε4 carriers (i.e., those with one or two ε4 alleles) in the identified controls (those with no MCI or dementia-related ICD codes or AD medication) was lower than the APOE ε4 rates under any of the ICD-code-based AD and dementia algorithms (31% to 46%).
The significance of the associations between APOE ε4 and the different diagnostic algorithms is presented in Table 4A. All algorithms produced case definitions that were strongly associated with APOE ε4 (p<10−100) and age (p<10−200). Consistent with the higher specificity and lower sensitivity observed for the AD-specific algorithms in the chart reviews, the estimated ORs for the AD-specific algorithms were high (>2), with the MAP-AD algorithm yielding the highest OR (2.65) but with the lowest number of AD cases (n=1,632) and lowest significance (highest p-value) of any of the algorithms (p=3.03×10−128). Within the AD algorithms, the CCW-AD algorithm produced the lowest OR (2.06) and the most significant association (smallest p-value, p=2.14×10−211). The sole difference between the MVP-AD and Phe-AD algorithms is the inclusion of early-onset AD codes in the Phe-AD algorithm, which provides a useful opportunity for comparison. Because of this inclusion, the Phe-AD case cohort has 333 more cases than MVP-AD. However, it is notable that these extra cases increased the significance of the association with APOE ε4, which is not associated with the early-onset form of AD (p=2.75×10−183 vs. 9.37×10−164), indicating that these codes are being given to patients with the late-onset AD. In all cases, the ADRD and dementia algorithms identified more cases and were more associated with APOE ε4 than the AD algorithms. We observed similar associations when directly comparing the ε3ε4 and ε4ε4 vs ε3ε3 individual APOE genotypes (see Supplementary Table 4), but only observed nominal associations in the comparisons between specific ε2 carrier genotypes and algorithm combinations (ε2ε2 vs ε3ε3: CCW-ADRD p=0.009; ε2ε3 vs ε3ε3: MAP-AD p=0.025, MVP-ADRD p=0.024, Phe-Dementia p=0.015; all other ε2ε2 and ε2ε3 comparisons: p>0.05; Supplementary Table 4). All case classifications were also associated with the AD PRS which excluded the APOE region (all p<1×10−17), and again, the most significant associations were observed with the MVP-ADRD and Phe-Dementia classifications (p=1.63×10−32 and 1.73×10−34, respectively; See Supplementary Table 5).
Table 4.
Algorithm performance in MVP participants of European ancestry, including (4A) comparison of the algorithms in terms of association with the APOE ε4 locus and (4B) comparison of algorithm performance after restricting case diagnosis age to 60+ and 65+.
| Algorithm | # of Cases | APOE ε4 OR (CI) | APOE ε4 p-value |
|---|---|---|---|
| MVP-AD | 3,500 | 2.25 (2.12, 2.38) | 5.02e-174 |
| CCW-AD | 5,676 | 2.06 (1.97, 2.15) | 2.14e-211 |
| Phe-AD | 3,569 | 2.27 (2.15, 2.40) | 2.75e-183 |
| MAP-AD | 1,632 | 2.65 (2.45, 2.87) | 3.03e-128 |
| MVP-AD+ | 13,666 | 1.72 (1.67, 1.78) | 6.62e-253 |
| MVP-ADRD | 15,770 | 1.68 (1.63, 1.73) | 4.38e-257 |
| CCW-ADRD | 27,839 | 1.50 (1.46, 1.53) | 1.90e-247 |
| MVP-Dementia | 22,780 | 1.47 (1.43, 1.51) | 3.11e-188 |
| Phe-Dementia | 14,597 | 1.72 (1.67, 1.77) | 4.71e-266 |
| MAP-Dementia | 7,204 | 1.94 (1.86, 2.02) | 2.36e-218 |
| AD Medication | 12,979 | 1.78 (1.73, 1.84) | 2.31e-277 |
| AD Medication or MVP-ADRD | 20,417 | 1.63 (1.59, 1.68) | 3.57e-290 |
| AD Medication or Phe-Dementia | 19,644 | 1.65 (1.60, 1.69) | 1.40e-290 |
| Algorithm | # of Cases | APOE ε4 OR (CI) | APOE ε4 p-value |
| MVP-AD | 3,500 | 2.25 (2.12, 2.38) | 5.02e-174 |
| MVP-AD60 | 3,417 | 2.26 (2.14, 2.40) | 2.83e-173 |
| MVP-AD65 | 3,236 | 2.26 (2.13, 2.40) | 9.37e-164 |
| MAP-AD | 1,632 | 2.65 (2.45, 2.87) | 3.03e-128 |
| MAP-AD60 | 1,583 | 2.68 (2.47, 2.90) | 2.65e-127 |
| MAP-AD65 | 1,481 | 2.65 (2.44, 2.89) | 7.00e-117 |
| MVP-AD+ | 13,666 | 1.72 (1.67, 1.78) | 6.62e-253 |
| MVP-AD+60 | 12,266 | 1.77 (1.71, 1.83) | 1.36e-254 |
| MVP-AD+65 | 10,984 | 1.79 (1.73, 1.85) | 1.49e-240 |
| MVP-ADRD | 15,770 | 1.68 (1.63, 1.73) | 4.38e-257 |
| MVP-ADRD60 | 14,095 | 1.72 (1.67, 1.78) | 1.74e-261 |
| MVP-ADRD65 | 12,542 | 1.75 (1.69, 1.81) | 8.54e-249 |
| CCW-ADRD | 27,839 | 1.50 (1.46, 1.53) | 1.90e-247 |
| CCW-ADRD60 | 23,293 | 1.56 (1.52, 1.60) | 4.60e-264 |
| CCW-ADRD65 | 20,376 | 1.59 (1.55, 1.63) | 1.17e-253 |
| Phe-Dementia | 14,597 | 1.72 (1.67, 1.77) | 4.71e-266 |
| Phe-Dementia60 | 13,508 | 1.76 (1.71, 1.82) | 3.07e-275 |
| Phe-Dementia65 | 12,375 | 1.78 (1.73, 1.84) | 9.06e-265 |
| MAP-Dementia | 7,204 | 1.94 (1.86, 2.02) | 2.36e-218 |
| MAP-Dementia60 | 6,742 | 1.98 (1.90, 2.07) | 3.88e-220 |
| MAP-Dementia65 | 6,173 | 2.00 (1.91, 2.09) | 2.08e-208 |
Abbreviations: MVP = Million Veteran Program; APOE = apolipoprotein E; AD = Alzheimer’s disease; CCW = Chronic Conditions Warehouse; MAP = Multimodal Automated Phenotype; ADRD = Alzheimer’s disease and related dementias.
AD medication usage was strongly associated with APOE ε4. It had a higher OR than either the MVP-ADRD and Phe-Dementia cases and was more significant than either of them (p=2.31×10−277). We then examined combining the medication information with the cases identified by the MVP-ADRD or the Phe-Dementia algorithms. This produced even more significant associations with APOE ε4 with very similar ORs and p-values between the two ICD-code-based algorithms: ORs=1.63 and 1.65 and p=3.57×10−290 and 1.40×10−290, respectively. When we examined the association between the AD PRS and AD medication, the pattern was very similar to the observed associations with APOE ε4. AD medication alone was slightly more significantly associated with the PRS (1.11×10−34) than the two ICD code algorithms (p=1.63×10−32 for MVP-ADRD and p=1.73×10−34 for Phe-Dementia), but the combined set of AD medication and ICD code algorithms were the most significant (p=4.19×10−36 and p=2.60×10−37 for medication with MVP-ADRD and Phe-Dementia, respectively). See Supplementary Table 5. When examining the individual APOE genotype comparisons, the results were more ambiguous (Supplementary Table 4). AD medication was not associated with either of the ε2ε2 or ε2ε3 genotypes, either alone or in conjunction with MVP-ADRD or Phe-Dementia. However, the association with the combined AD medication or MVP-ADRD and AD medication or Phe-Dementia classifications were more significant than both MVP-ADRD or Phe-Dementia in both the ε3ε4 and ε4ε4 analyses. In contrast, in the ε4ε4 analyses, the combined medication-ICD classifications were less significant than the analysis of AD medication alone.
APOE ε4 and Age Cutoffs
The resulting association values restricting the age to those with diagnosis ≥60 or ≥65 are presented in Table 4B. In every case, imposing an age restriction reduced the number of cases and increased the estimated OR of the association, indicating a reduction of the false positive rate. The significance of the association for all examined algorithms was strongest when the diagnosis age of ≥60 cutoff was used.
DISCUSSION
We performed a chart review and genetic validation of several different algorithms for identifying AD, ADRD, and dementia cases using VA EMR data for MVP participants for use in genetic studies and epidemiological studies of VA data. Our “in-house” algorithms had comparable or better performance relative to previously derived AD and dementia algorithms when evaluated by chart review. With regard to AD-specific classifications, the MVP-AD, Phe-AD, and MAP-AD algorithms had similar performance to one another with comparable sensitivity, specificity, PPV, and NPV (see Table 4) and were relatively conservative. As for the broader ADRD/dementia classification, our MVP-ADRD algorithm performance was equivalent to the Phe-Dementia algorithm in terms of sensitivity and specificity. Not surprisingly, the CCW-ADRD algorithm had high sensitivity but low specificity, likely due to using only one ICD code to identify a case and the broad class of included ICD codes. Finally, AD medication had considerably lower specificity with decreased sensitivity compared to other algorithms.
When comparing the frequency of AD cases to the frequency of dementia cases observed in the present study, our results further support that AD is under-coded15 relative to non-specific and other forms of dementia. Therefore, using expanded phenotypes like ADRD to extract cases from the EMR can increase sensitivity for identifying AD.15 This conclusion is apparent from our chart review whereby the diagnostic sensitivity increased from 0.70 for MVP-AD to 0.95 for MVP-ADRD. Furthermore, the number of identified AD cases relative to the number of identified dementia cases (42%−60% vs. 77–90%) further indicates the poor ability of AD-specific algorithms to detect AD. AD is the most common form of dementia, with population rates indicating that 60%−80% of dementia cases typically have AD.30
In addition to completing a chart review to determine the performance of various AD and dementia phenotyping algorithms, we also evaluated the performance of the algorithms in terms of their ability to detect genetic associations. Of note, all case definitions were strongly associated with APOE genotype. In general, APOE analyses showed that the high sensitivity/low specificity algorithms outperformed the more conservative AD-specific algorithms and the MAP algorithm. Weighting sensitivity over specificity yielded a trade-off of smaller p values but lower OR estimates. Stated differently, the AD-specific algorithms and MAP algorithms produced larger OR’s but were likely to have less power relative to the high sensitivity/low specificity algorithms. This finding suggests that the AD-specific algorithms are too conservative. In contrast, AD medication was surprisingly the factor most significantly associated with APOE ε4, boosting its association with the MVP-ADRD algorithm. Finally, when examining adjustments to the filter for age of onset, we found that the association with APOE ε4 was strongest when the age cutoff was set to ≥60.
While there are many strengths of this study, it is not without limitations. To begin, we focused on a class of ICD-code-based algorithms, as well as AD medication data, to identify AD/dementia cases and controls. We did not examine other algorithms to identify AD cases based on a more complete extraction of data from the EMR using NLP machine learning (e.g.31) or algorithms to identify individuals at risk for developing dementia or to characterize disease progression (e.g.32–34), which are all important areas of investigation. However, given that ICD-code-based algorithms are widely used in epidemiological studies (e.g.3–5), phenome-wide association studies23, monitoring of EMR data (e.g.31), and even for training machine-learning-based models (e.g.31), we believe that investigating their performance is also important. In addition, all analyses were conducted within the context of MVP, which is not a random sample of VA patients. As such, several limitations relate to the data available for chart review. While our chart reviews considered a wide range of AD-relevant factors including the presence of neuropsychological assessment, cognitive screening scores, results of brain neuroimaging scans, AD medications, and autopsy results, these data were not consistently available for all Veterans. Given these limitations (which are not unique to the VA EMR), our chart review is not up to the same quality of the careful phenotyping done with many AD case/control cohorts (e.g.35) where neuropsychological test data is available for all cases and controls and AD biomarkers and MRI may be assessed for many or all participants. Additionally, many Veterans receive care outside of the VA; therefore, incorporating ICD codes covering treatment from additional providers (e.g., from the CMS database) would likely increase the sensitivity of the evaluated algorithms. The accuracy of our chart reviews would also likely be enhanced if treatment notes from external providers were available for review.
Another limitation relates to the use of genetic association to gauge algorithm performance, especially as it relates to use of AD medication to identify cases. Our APOE genotype and PRS analyses suggest that broader ADRD phenotypes and AD medication prescription could identify a larger case set and improve power to detect associations. However, use of medication as a case identifier may not perform similarly in all cohorts and over time. It is quite possible that the additional significance of the genetic association with the medication+MVP-ADRD algorithm relative to the MVP-ADRD-alone analysis might be eliminated if more complete ICD code data were available. We also note that it is possible that some of the association with APOE ε4 is driven by a Veteran’s prior knowledge of their APOE ε4 carrier status. That is, patients may more aggressively advocate for AD medication if they know they are at increased genetic risk. However, the association observed with the AD PRS make this unlikely to be the sole driver of the AD medication/genetic link. Nevertheless, because of the noted limitations, we emphasize that caution should be taken when incorporating prescription information into these algorithms moving forward.
Finally, we note that the performance of these algorithms and data available to assess them is likely to change in the future. For example, the Food and Drug Administration recently approved infusion with lecanemab for treatment of early AD with confirmed amyloid pathology. However, lecanemab treatment is contraindicated for APOE ε4 carriers due to increased risk of hemorrhage, and APOE genotyping is recommended prior to treatment36. Hence, APOE genetic association may not be a valid measure for assessing algorithm performance in the future, especially if lecanemab treatment becomes more prevalent. Moreover, other loci throughout the genome have been identified that have a differential effect based on APOE genotype (e.g.37–39), and genetic studies defining cases as those having undergone lecanemab treatment may be biased to detect risk loci that affect ε4 non-carriers. The rising use of CSF and peripheral AD biomarkers as well as amyloid PET at VA and other clinical centers is also likely to improve the assessment and characterization of dementia heterogeneity, identify cases in an earlier stage of disease progression, and also to provide critical data which will improve the accuracy of disease classification algorithms. Periodic reassessment of algorithm performance will be necessary going forward, as clinical practice evolves.
CONCLUSION
We found that our MVP-ADRD phenotype is likely to yield good performance in genetic studies of AD and dementia, comparable to the Phe-Dementia algorithm, in studies of VA data of older Veterans, especially when these PheCode-based algorithms are implemented with an age-of-onset restriction and supplemented with additional cases identified through AD medication use. Our results bolster previous research showing AD is underdiagnosed and under-coded in the VA EMR, which may be true for other healthcare systems. By comparing the results across different algorithms, we have identified several commonalities that appear to be associated with good performance for genetic studies. Specifically, our findings support using broad classes of AD and related traits in the algorithm, requiring more than a single ICD code to qualify as a case, the use of AD medication to identify additional cases, and the restriction of analyses to subjects who received their first dementia codes over age 60. While we have evaluated these algorithms in VA data, we note that these general lessons are likely to hold up in other EMR-based cohorts, though a similar evaluation of the algorithm performance across multiple systems would be useful in the future.
Ultimately, the use of algorithms to identify disease cases in the EMR can be valuable for both epidemiological and genetic research. While our MVP-developed ICD-based algorithms had good performance in chart review and generated strong genetic signals, especially after inclusion of medication-identified cases, future research should validate these algorithms using neuropsychological assessment data and biomarkers of disease. Furthermore, given that there is a continuum of dementia, there will be an ongoing need for the development of more precise AD/dementia assessment tools—from the earliest stages of mild cognitive impairment to the latest stages of progressive cognitive decline and dementia. Onset of diagnosis and rate of disease progression may be additional factors to consider when refining AD/dementia algorithms. Finally, examining the prescription patterns of AD medication within the VA and how these patterns influence the algorithms will be important directions for future research.
Supplementary Material
ACKNOWLEDGMENTS
This research is based on data from the Million Veteran Program, Office of Research and Development, Veterans Health Administration. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the US government.
The MVP Cognitive Decline and Dementia During Aging Working Group members are:
Richard Sherva1,2, Rui Zhang1, Tori Anglin3, Catherine Chanfreau3, Kelly Cho4,5, Jennifer R. Fonda6-8, J. Michael Gaziano4,5, Kelly M. Harrington4,6, Yuk-Lam Ho4, William Kremen9,10, Elizabeth Litkowski11,12, Julie Lynch3, Zoe Neale1,6, Panos Roussos 13-15, David Marra4,6, Jesse Mez2,16,17, Mark W. Miller1,6, David H. Salat18, Debby Tsuang19, Erika Wolf1,6, Qing Zeng20, Matthew S. Panizzon9,10,21, Victoria C. Merritt9,21,22, Lindsay A. Farrer2,16,17,23-25, and co-chairs Richard L. Hauger*9,10,22 and Mark W. Logue*1,2,6,23.
1 National Center for PTSD, Behavioral Sciences Division, VA Boston Healthcare System, Boston, MA, 02130, USA,
2 Boston University School of Medicine, Biomedical Genetics, Boston, MA, 02118, USA,
3 VA Informatics and Computing Infrastructure (VINCI), Salt Lake City, USA,
4 Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, 02130,
5 Division of Aging, Brigham & Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA,
6 Department of Psychiatry, Boston University School of Medicine, Boston, MA, 02118 USA,
7 Translational Research Center for TBI and Stress Disorders (TRACTS) and Geriatric Research, Educational and Clinical Center (GRECC), VA Boston Healthcare System, Boston, MA, 02130,
8 Department of Psychiatry, Harvard Medical School, Boston, MA 02215,
9 Department of Psychiatry, School of Medicine, University of California, San Diego, La Jolla, CA, USA,
10 Center for Behavior Genetics of Aging, University of California, San Diego, La Jolla, CA, USA,
11Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA,
12 VA Eastern Colorado Healthcare System, Aurora, CO, USA,
13 Center for Disease Neurogenomics, Departments of Psychiatry and Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA,
14 Center for Dementia Research, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY 10962, USA,
15 Mental Illness Research Education and Clinical Center (MIRECC), James J. Peters VA Medical Center, Bronx, New York, 10468, USA,
16 Department of Neurology, Boston University School of Medicine, Boston, MA, USA,
17 Boston University Alzheimer’s Disease Research Center, Boston University School of Medicine, Boston, MA, USA,
18 Neuroimaging Research for Veterans Center, VA Boston Healthcare System, Boston, MA, USA,
19 Geriatric Research, Education, and Clinical Center, VA Puget Sound Health Care System, Seattle, WA, USA,
20 VA Washington DC Healthcare System, Washington, DC, USA,
21 Center of Excellence for Stress and Mental Health, VA San Diego Healthcare System, San Diego, CA, 92161,
22 VA San Diego Healthcare System, 3350 La Jolla Village Dr, San Diego, CA, 92161,
23 Department of Biostatistics, Boston University School of Public Health, Boston, MA, 02118, USA,
24 Department of Ophthalmology, Boston University School of Medicine, Boston, MA, USA,.
25 Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA.
FUNDING
This research was supported by VA BLR&D BX004192 (MVP015, Logue PI), VA BLR&D BX005749 (MVP040, Logue PI), VA CSR&D IK2 CX001952 (MVP026, Merritt PI), and VA CSR&D CX001727 (MVP022, Hauger PI); the VISN-22 VA Center of Excellence for Stress and Mental Health (CESAMH); and National Institute of Aging R01 grants AG050595, AG05064, and AG065385.
Footnotes
CONFLICT OF INTEREST
The authors have no conflict of interest to report.
DATA AVAILABILITY STATEMENT
The data and code used to generate MVP results are accessible to researchers with MVP data access. Due to VA policy, MVP is currently only accessible to researchers with a funded MVP project (e.g., VA Merit Award, Career Development Award, NIH R01).
See https://genhub.va.gov/file/view/897656 for additional information.
REFERENCES
- 1.Alzheimer’s Association. 2023 Alzheimer’s disease facts and figures. Alzheimer’s & Dementia. 2023;19:1598–1695. [DOI] [PubMed] [Google Scholar]
- 2.U.S. Census Bureau. America is Getting Older: New Population Estimates Highlight Increase in National Median Age. Updated 06/23/2023. Accessed 04/09/2024, 2024. https://www.census.gov/newsroom/press-releases/2023/population-estimates-characteristics.html#:~:text=JUNE%2022%2C%202023%20%E2%80%94%20The%20nation’s,of%20the%20population%20is%20younger.
- 3.Shao Y, Zeng QT, Chen KK, Shutes-David A, Thielke SM, Tsuang DA-O. Detection of probable dementia cases in undiagnosed patients using structured and unstructured electronic health records. BMC Med Inform Decis Mak. 2019;19(1):128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Marceaux JC, Soble JR, O’Rourke JJF, et al. Validity of early-onset dementia diagnoses in VA electronic medical record administrative data. Clin Neuropsychol. Aug 2020;34(6):1175–1189. doi: 10.1080/13854046.2019.1679889 [DOI] [PubMed] [Google Scholar]
- 5.Barnes DE, Byers AL, Gardner RC, Seal KH, Boscardin WJ, Yaffe K. Association of mild traumatic brain injury with and without loss of consciousness with dementia in US military veterans. JAMA Neurology. 2018;75(9):1055–1061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Denny JC, Ritchie MD, Basford MA, et al. PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations. Bioinformatics. 2010;26(9):1205–1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Liao KP, Sun J, Cai TA, et al. High-throughput multimodal automated phenotyping (MAP) with application to PheWAS. J Am Med Inform Assoc. 2019;26(11):1255–1262. doi: 10.1093/jamia/ocz066 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wilkinson T, Ly A, Schnier C, et al. Identifying dementia cases with routinely collected health data: A systematic review. Alzheimers Dement. 2018;14(8):1038–1051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Brown A, Kirichek O, Balkwill A, et al. Comparison of dementia recorded in routinely collected hospital admission data in England with dementia recorded in primary care. Emerg Themes Epidemiol. 2016;13:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.VA Office of Research & Development. Million Veteran Program (MVP). U.S. Department of Veterans Affairs. Accessed August 28, 2023, http://www.research.va.gov/mvp [Google Scholar]
- 11.Gaziano JM, Concato J, Brophy M, et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. Journal of Clinical Epidemiology. 2016;70:214–223. [DOI] [PubMed] [Google Scholar]
- 12.Cho K, Gagnon DR, Driver JA, et al. Dementia Coding, Workup, and Treatment in the VA New England Healthcare System. Int J Alzheimers Dis. 2014;2014:821894. doi: 10.1155/2014/821894 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Butler D, Kowall NW, Lawler E, Michael Gaziano J, Driver JA. Underuse of diagnostic codes for specific dementias in the Veterans Affairs New England healthcare system. J Am Geriatr Soc. May 2012;60(5):910–5. doi: 10.1111/j.1532-5415.2012.03933.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Taylor DH Jr., Fillenbaum GG, Ezell ME. The accuracy of medicare claims data in identifying Alzheimer’s disease. J Clin Epidemiol. Sep 2002;55(9):929–37. doi: 10.1016/s0895-4356(02)00452-3 [DOI] [PubMed] [Google Scholar]
- 15.Salem LC, Andersen BB, Nielsen TR, Stokholm J, Jorgensen MB, Waldemar G. Inadequate diagnostic evaluation in young patients registered with a diagnosis of dementia: a nationwide register-based study. Dement Geriatr Cogn Dis Extra. Jan 2014;4(1):31–44. doi: 10.1159/000358050 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Saunders AM, Strittmatter WJ, Schmechel D, et al. Association of apolipoprotein E allele ϵ4 with late-onset familial and sporadic Alzheimer’s disease. Neurology. 1993;43(8):1467–1467. [DOI] [PubMed] [Google Scholar]
- 17.Belloy ME, Andrews SJ, Le Guen Y, et al. APOE Genotype and Alzheimer Disease Risk Across Age, Sex, and Population Ancestry. JAMA Neurol. Dec 1 2023;80(12):1284–1294. doi: 10.1001/jamaneurol.2023.3599 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Fihn SD, Francis J, Clancy C, et al. Insights from advanced analytics at the Veterans Health Administration. Health Affairs. 2014;33(7):1203–1211. [DOI] [PubMed] [Google Scholar]
- 19.Fang H, Hui Q, Lynch J, et al. Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity in Genome-wide Association Studies. Am J Hum Genet. Oct 3 2019;105(4):763–772. doi: 10.1016/j.ajhg.2019.08.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Logue MW, Miller MW, Sherva R, et al. Alzheimer’s Disease and Related Dementias among Aging Veterans: Examining Gene-by-Environment Interactions with Posttraumatic Stress Disorder and Traumatic Brain Injury. Alzheimer’s & Dementia. 2022;IN PRESS [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Sherva R, Zhang R, Sahelijo N, et al. African Ancestry GWAS of Dementia in a Large Military Cohort Identifies Significant Risk Loci. medRxiv. 2022;2022.05.25.22275553(doi: 10.1101/2022.05.25.22275553) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Neale ZE, Fonda JR, Miller MW, et al. Subjective cognitive concerns, APOE epsilon4, PTSD symptoms, and risk for dementia among older veterans. Alzheimers Res Ther. Jun 29 2024;16(1):143. doi: 10.1186/s13195-024-01512-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bastarache L Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS. Annu Rev Biomed Data Sci. Jul 20 2021;4:1–19. doi: 10.1146/annurev-biodatasci-122320-112352 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hunter-Zinck H, Shi Y, Li M, et al. Genotyping Array Design and Data Quality Control in the Million Veteran Program. Am, J Hum Genet. 2020;106(4):535–548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Taliun D, Harris DN, Kessler MD, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature. 2021/02/01 2021;590(7845):290–299. doi: 10.1038/s41586-021-03205-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kunkle BW, Grenier-Boley B, Sims R, et al. Author Correction: Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Abeta, tau, immunity and lipid processing. Nat Genet. Sep 2019;51(9):1423–1424. doi: 10.1038/s41588-019-0495-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kunkle BW, Grenier-Boley B, Sims R, et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Abeta, tau, immunity and lipid processing. Nat Genet. Mar 2019;51(3):414–430. doi: 10.1038/s41588-019-0358-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ponjoan A, Garre-Olmo J, Blanch J, et al. How well can electronic health records from primary care identify Alzheimer’s disease cases? Clin Epidemiol. 2019;11:509–518. doi: 10.2147/CLEP.S206770 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Reuben DB, Hackbarth AS, Wenger NS, Tan ZS, Jennings LA. An Automated Approach to Identifying Patients with Dementia Using Electronic Medical Records. J Am Geriatr Soc. Mar 2017;65(3):658–659. doi: 10.1111/jgs.14744 [DOI] [PubMed] [Google Scholar]
- 30.2020 Alzheimer’s disease facts and figures. Alzheimers Dement. Mar 10 2020;doi: 10.1002/alz.12068 [DOI] [PubMed] [Google Scholar]
- 31.Haye S, Thunell J, Joyce G, et al. Estimates of diagnosed dementia prevalence and incidence among diverse beneficiaries in traditional Medicare and Medicare Advantage. Alzheimers Dement (Amst). Jul-Sep 2023;15(3):e12472. doi: 10.1002/dad2.12472 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Qorri B, Tsay M, Agrawal A, Au R, Geraci J. Using machine intelligence to uncover Alzheimer’s disease progression heterogeneity. Exploration of Medicine. 2020;1(6):377–395. doi: 10.37349/emed.2020.00026 [DOI] [Google Scholar]
- 33.Avelar-Pereira B, Belloy ME, O’Hara R, Hosseini SMH, Alzheimer’s Disease Neuroimaging I. Decoding the heterogeneity of Alzheimer’s disease diagnosis and progression using multilayer networks. Mol Psychiatry. Jun 2023;28(6):2423–2432. doi: 10.1038/s41380-022-01886-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Park JH, Cho HE, Kim JH, et al. Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data. NPJ Digit Med. 2020;3:46. doi: 10.1038/s41746-020-0256-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Birkenbihl C, Salimi Y, Domingo-Fernandez D, et al. Evaluating the Alzheimer’s disease data landscape. Alzheimers Dement (N Y). 2020;6(1):e12102. doi: 10.1002/trc2.12102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Cummings J, Apostolova L, Rabinovici GD, et al. Lecanemab: Appropriate Use Recommendations. J Prev Alzheimers Dis. 2023;10(3):362–377. doi: 10.14283/jpad.2023.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Jun G, Ibrahim-Verbaas CA, Vronskaya M, et al. A novel Alzheimer disease locus located near the gene encoding tau protein. Mol Psychiatry. Jan 2016;21(1):108–17. doi: 10.1038/mp.2015.23 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Jun GR, You Y, Zhu C, et al. Protein phosphatase 2A and complement component 4 are linked to the protective effect of APOE varepsilon2 for Alzheimer’s disease. Alzheimers Dement. Nov 2022;18(11):2042–2054. doi: 10.1002/alz.12607 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.ZMa Y, Jun GR, Zhang X, et al. Analysis of Whole-Exome Sequencing Data for Alzheimer Disease Stratified by APOE Genotype. JAMA Neurol. Sep 1 2019;76(9):1099–1108. doi: 10.1001/jamaneurol.2019.1456 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data and code used to generate MVP results are accessible to researchers with MVP data access. Due to VA policy, MVP is currently only accessible to researchers with a funded MVP project (e.g., VA Merit Award, Career Development Award, NIH R01).
See https://genhub.va.gov/file/view/897656 for additional information.
