Abstract
Introduction
Studies investigating the relationship between blood pressure (BP) measurements from electronic health records (EHRs) and Alzheimer's disease (AD) rely on summary statistics, like BP variability, and have only been validated at a single institution. We hypothesize that leveraging BP trajectories can accurately estimate AD risk across different populations.
Methods
In a retrospective cohort study, EHR data from Veterans Affairs (VA) patients were used to train and internally validate a machine learning model to predict AD onset within 5 years. External validation was conducted on patients from Michigan Medicine (MM).
Results
The VA and MM cohorts included 6860 and 1201 patients, respectively. Model performance using BP trajectories was modest but comparable (area under the receiver operating characteristic curve [AUROC] = 0.64 [95% confidence interval (CI) = 0.54–0.73] for VA vs. AUROC = 0.66 [95% CI = 0.55–0.76] for MM).
Conclusion
Approaches that directly leverage BP trajectories from EHR data could aid in AD risk stratification across institutions.
Keywords: Alzheimer's disease, blood pressure trajectory, electronic health record, machine learning, risk prediction
1. INTRODUCTION
Cardiovascular risk factors are associated with increased risk of Alzheimer's disease (AD) 1 and could be exploited to predict AD risk years before clinical diagnosis. Current work investigating relationships between AD risk and blood pressure (BP) primarily focuses on prospectively collected clinical trial data. 2 , 3 , 4 , 5 , 6 , 7 , 8 These trials are often limited in the amount of longitudinal data collected because sample sizes are frequently small (e.g., <1000 individuals), 2 , 3 the follow‐up period is short (e.g., <3 years), 4 , 5 , 6 or the measurements are sparse (e.g., once every 5 years). 7 , 8 To address these limitations, recent work used electronic health record (EHR) data, which contain decades of longitudinal data for thousands of patients, from a single institution to study the relationship between AD and BP. 9 However, assumptions were made regarding what aspects of the longitudinal measurements were important (e.g., BP variability). Moreover, these associations have only been validated on a single health‐care system. We expand on prior work by (1) using machine learning approaches to directly leverage longitudinal measurement trajectories without making assumptions and (2) validating on an external cohort. We hypothesize that (1) using EHR‐based BP trajectories can help predict AD onset and (2) performance will be on par with summary statistics.
RESEARCH IN CONTEXT
Systematic Review: We searched the literature for reports investigating the relationship between blood pressure (BP) and Alzheimer's disease (AD). Previous research using EHR (electronic health record) data focused on a limited set of summary statistics, rather than the time‐series trajectory, and provided validation at only one institution.
Interpretation: We developed an EHR‐based model to predict AD onset using BP trajectories. The model performed similarly to using summary statistics, showing the potential to generalize to new biomarkers, where predictive summary statistics may not be known in advance. We also validated the model on an external cohort, showing the potential to generalize to different populations. Overall, this model could be used to uncover new patterns between AD and BP for future investigation and to recruit high‐risk individuals to clinical studies like Alzheimer's Disease Neuroimaging Initiative.
Future Directions: Model performance could be improved with additional longitudinal data. This approach could be applied to newly discovered biomarkers.
2. METHODS
We describe the inclusion/exclusion criteria applied to two populations to obtain our study cohorts. This study was approved by the institutional review boards at the University of Michigan and Phoenix Veterans Affairs.
2.1. Study cohorts
2.1.1. Development and internal validation cohort
We trained on patients from the five hospitals of the Veterans Affairs (VA) Veterans Integrated Service Network (VISN, formerly VISN18) region's Cerner EHR instance (Cerner Corporation). 10 Patient timelines were aligned at the first available encounter between 68 and 72 years (i.e., we predicted AD onset for all patients at the first encounter between 68 and 72). We aligned to control for age and because AD incidence rises at 75 years. 11 We excluded patients labeled with AD at alignment; patients with <5 years of follow‐up without an AD label; and like previous work, 9 patients with <3 measurements before alignment. Only patients with hypertension, identified by ICD (International Classification of Diseases) codes recorded ≤2 years before alignment, 12 were included to control for the effect of hypertension on AD risk.
2.1.2. External validation cohort
The external validation cohort included patients from Michigan Medicine's (MM) Epic EHR instance (Epic Systems Corporation) aligned between 68 and 72. To control for data availability, we only included patients with ≥35 BP measurements, because VA patients generally had more measurements than MM patients. A total of 35 measurements was chosen to match the average number of measurements over a patient's entire history before alignment. Cohort characteristics were compared between populations, using χ2 tests for statistical significance.
2.1.3. Outcome
The model was trained to predict AD onset within 5 years of alignment. AD onset was labeled using a cohort discovery tool based on ICD codes for AD. 13
2.2. Model development and evaluation
2.2.1. Data preprocessing
We focused on features that were easy to collect or recorded routinely, retrospectively extracting only those in Table SA3 in supporting information for each cohort. Starting from alignment and going backward in time at 6‐month intervals through 5 years of historical data, we recorded patient demographics (e.g., race), and the most recent vital sign measurements (e.g., the latest systolic BP measurement). For any missing measurement during the 6‐month interval, the previous value was carried forward, and a binary indicator denoting imputed values was set to 1. We also included the number of measurements taken within the 6‐month interval. A total of 5 years of historical lookback was chosen based on data availability.
2.2.2. Model training
Our model, “BP Trajectories,” was a long‐short term memory (LSTM) 14 recurrent neural network trained with the development cohort, using features from “General Information” and “Trajectories” (Table SA3). We also trained two baseline LSTMs. The first, “BP Stats,” used all features from Table SA3 except BP trajectories. The second, “No BP,” excluded both BP trajectories and summary statistics 4 , 5 , 9 , 15 , 16 (Appendix SA4 in supporting information). Neural network parameters were optimized using Adam. We used early stopping and random search in the hyperparameter space for model selection (see Appendix SA5 in supporting information).
2.2.3. Internal and external validation
On the held‐out VA validation and external MM cohorts, we measured the AUROC and AUPR (area under the receiver operating characteristic and precision‐recall curves, respectively), reporting empirical 95% confidence intervals (CIs) from 1000 bootstrapped samples. Statistical significance was tested using a resampling test. 17
2.2.4. Model interpretation
To visualize which trajectories “BP Trajectories” found important, we plotted the median and interquartile range (IQR) of trajectories among the predicted high‐risk (≥90th risk percentile) and low‐risk patients (≤10th risk percentile) for the internal and external validation cohorts. Given the expected population differences in the internal and external cohorts (e.g., fraction female), we conducted a permutation importance analysis on MM to measure the extent of feature use by the model, reporting 95% CIs (Appendix SA6 in supporting information).
2.3. Results
2.3.1. Cohort characteristics
The development and internal validation cohorts included 5488 and 1372 patients, respectively. Across cohorts, 2.4% of patients experienced the outcome. The external MM cohort included 1201 patients, 2.5% of which experienced the outcome (Figure SA1 in supporting information). The internal and external validation cohorts had several differences (Appendix SA2 in supporting information), including the proportion female (internal = 2.0%; external = 55.4%), proportion with dyslipidemia (internal = 30.8%; external = 69.7%), and median diastolic BP (internal = 79 mmHg [IQR = 71–86]; external = 72 mmHg [IQR = 67–78]). However, the median systolic BPs were similar (internal = 137 mmHg [IQR = 125–150]; external = 135 mmHg [IQR = 126–143]).
2.3.2. Internal and external validation
On the VA validation cohort, “BP Trajectories” achieved AUROC = 0.64, 95% CI = 0.54–0.73 and AUPR = 0.04, 95% CI = 0.02–0.07. For MM, “BP Trajectories” performed similarly (AUROC = 0.66, 95% CI = 0.55–0.76; AUPR = 0.06, 95% CI = 0.03–0.13; Figure 1). Performance was comparable to “BP Stats” (Figure 1).
2.3.3. Model interpretation
On the VA validation cohort, systolic and diastolic BP were consistently higher in the high‐risk group than the low‐risk group (Figure 2). MM had similar patterns for systolic BP. MM predictions were mostly affected by vitals, with 95% CI = 0.043–0.188, 95% CI = 0.009–0.036 describing the drop in AUROC and AUPR, respectively, from permutation importance (Table SA4).
3. DISCUSSION
We developed a model using EHR‐based BP trajectories to predict AD onset. It was developed and internally validated using VA data and externally validated using MM data. The model had modest discriminative performance. Despite differences in health systems, EHR platforms, and patient populations, our patterns in discriminative performance (e.g., “BP Trajectories” was comparable to “BP Stats”) and observed high‐/low‐risk BP trajectory patterns were consistent, demonstrating the potential to generalize.
Our results highlight the potential for model interoperability across institutions. Interoperability is a known challenge in health care due to differences in patient populations and medical/coding practices, and few studies have addressed it. 18 As hospitals collect more data, 19 addressing this will be crucial to improve health‐care practices.
Like previous work, high‐risk patients generally had higher systolic and diastolic BP and greater variability. 3 , 4 , 5 , 6 , 7 , 8 , 9 Leveraging trajectories provides additional information by highlighting when these differences matter most. Summary statistics do not readily capture such differences.
While others used time‐series trajectories to predict AD onset using datasets like the Alzheimer's Disease Neuroimaging Initiative (ADNI), 20 we used EHR data. Electronic health records contain longitudinal data from routine clinical care (e.g., vitals). This allows us to potentially develop screening tools for the general population to identify high‐risk individuals in any health‐care system without requiring invasive tests. Such individuals could be recruited to clinical trials like ADNI for biological validation, providing more high‐risk individuals for enrollment.
Although we focused on BP, this approach could potentially be used to identify meaningful longitudinal relationships among other features (e.g., image‐based biomarkers). With BP, summary statistics important for predicting AD onset were established. 4 , 5 , 9 , 15 , 16 For new features, these statistics may not be established. Because using trajectories performed comparably to summary statistics for BP, one could potentially benefit from using trajectories when the longitudinal relationship between the feature and risk of AD onset is unknown, stimulating hypothesis generation.
Our study has several limitations. The cohort discovery tool used to identify AD patients had sensitivity = 0.70. 13 We excluded patients with <35 BP measurements for MM, so the approach may not generalize to individuals with fewer measurements. Finally, the amount of BP data available was limited in terms of the lookback period and frequency of measurement, with missing rates between 10% and 70% (Table SA2). While high, this reflects clinical practice, in which patients may not have routinely collected measurements. We hypothesize that longer lookbacks and more routinely collected measurements could improve performance. However, we are encouraged that, despite high rates of missingness, the model could capture a predictive signal.
We demonstrated the potential of using EHR‐based BP trajectories to predict AD onset, and our results were consistent across two EHRs. Leveraging EHR trajectories could help uncover the relationship between BP and AD by discovering unrecognized temporal patterns. Such analyses could apply to other features/diseases without knowing which summary statistics are predictive.
CONFLICTS OF INTEREST
The authors have no conflicts of interest to report
Supporting information
ACKNOWLEDGMENTS
This research program is supported by the NIH/NIA–funded Michigan Alzheimer's Disease Center (5P30AG053760) and the National Science Foundation (IIS 2124127). The views and conclusions in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the NSF, the NIH, the VA, or the US Government.
Tjandra D, Migrino RQ, Giordani B, et al. Use of blood pressure measurements extracted from the electronic health record in predicting Alzheimer's disease: A retrospective cohort study at two medical centers. Alzheimer's Dement. 2022;18:2368–2372. 10.1002/alz.12676
REFERENCES
- 1. Lennon MJ, Makkar SR, Crawford JD, Sachdev PS. Midlife hypertension and Alzheimer's disease: a systematic review and meta‐analysis. J Alzheimer's Dis. 2019;71(1):307–316. [DOI] [PubMed] [Google Scholar]
- 2. Hanon O, Latour F, Seux ML, et al. Evolution of blood pressure in patients with Alzheimer's disease: a one year survey of a French Cohort (REAL. FR). J Nutr Health Aging. 2005;9(2):106. [PubMed] [Google Scholar]
- 3. Ninomiya T, Ohara T, Hirakawa Y, et al. Midlife and late‐life blood pressure and dementia in Japanese elderly: the Hisayama study. Hypertension. 2011;58(1):22–28. [DOI] [PubMed] [Google Scholar]
- 4. Lattanzi S, Luzzi S, Provinciali L, Silvestrini M. Blood pressure variability predicts cognitive decline in Alzheimer's disease patients. Neurobiol Aging. 2014;35(10):2282–2287. [DOI] [PubMed] [Google Scholar]
- 5. Wijsman LW, De Craen AJ, Muller M, et al. Blood pressure lowering medication, visit‐to‐visit blood pressure variability, and cognitive function in old age. Am J Hypertens. 2016;29(3):311–318. [DOI] [PubMed] [Google Scholar]
- 6. de Heus RA, Olde Rikkert MG, Tully PJ, Lawlor BA, Claassen JA, NILVAD Study Group . Blood pressure variability and progression of clinical Alzheimer disease. Hypertension. 2019;74(5):1172–1180. [DOI] [PubMed] [Google Scholar]
- 7. Kivipelto M, Helkala EL, Laakso MP, et al. Midlife vascular risk factors and Alzheimer's disease in later life: longitudinal, population based study. BMJ. 2001;322(7300):1447–1451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Abell JG, Kivimäki M, Dugravot A, et al. Association between systolic blood pressure and dementia in the Whitehall II cohort study: role of age, duration, and threshold used to define hypertension. Eur Heart J. 2018;39(33):3119–3125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Yoo JE, Shin DW, Han K, et al. Blood pressure variability and the risk of dementia: A nationwide cohort study. Hypertension, 2020;75(4):982–990. [DOI] [PubMed] [Google Scholar]
- 10. VA Informatics and Computing Infrastructure (VINCI), VA HSR RES 13‐457, U.S. Department of Veterans Affairs. (2008). Retrieved July 15 2021, from https://vaww.VINCI.med.va.gov [Google Scholar]
- 11. Alzheimer's Association . 2021 Alzheimer's disease facts and figures. Alzheimers Dement. 2021;17(3):327–406 [DOI] [PubMed] [Google Scholar]
- 12. Pace R, Peters T, Rahme E, Dasgupta K. Validity of health administrative database definitions for hypertension: a systematic review. Can J Cardiol. 2017;33(8):1052–1059. [DOI] [PubMed] [Google Scholar]
- 13. Tjandra D, Migrino RQ, Giordani B, Wiens J. Cohort discovery and risk stratification for Alzheimer's disease: an electronic health record‐based approach. Alzheimer's Dement: Transl Res Clin Interv. 2020;6(1):e12035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Hochreiter S, Schmidhuber J. Long short‐term memory. Neural Comput. 1997;9(8):1735–1780. [DOI] [PubMed] [Google Scholar]
- 15. Nwabuo CC, Yano Y, Moreira HT, et al. Association between visit‐to‐visit blood pressure variability in early adulthood and myocardial structure and function in later life. JAMA Cardiol. 2020;5(7):795–801. 10.1001/jamacardio.2020.0799 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Yano Y. Visit‐to‐visit blood pressure variability—what is the current challenge? Am J Hypertens. 2017;30(2):112–114. [DOI] [PubMed] [Google Scholar]
- 17. Dixon PM. Bootstrap resampling.Encyclopedia of Environmetrics. Wiley. 2006:212–220. [Google Scholar]
- 18. Naveed A, Hu YF, Sigwele T, Mohi‐Ud‐Din G, Susanto M. Similarity analyzer for semantic interoperability of electronic health records using artificial intelligence (AI). J Sci Eng. 2019;1(2):53–58. [Google Scholar]
- 19. Topol EJ. High‐performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44‐56. [DOI] [PubMed] [Google Scholar]
- 20. Moore PJ, Lyons TJ, Gallacher J, Alzheimer's Disease Neuroimaging Initiative . Random forest prediction of Alzheimer's disease using pairwise selection from time series data. PLoS One. 2019;14(2):e0211558. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.