Skip to main content
AMIA Summits on Translational Science Proceedings logoLink to AMIA Summits on Translational Science Proceedings
. 2025 Jun 10;2025:385–394.

Determining the Importance of Clinical Modalities for NeuroDegenerative Disorders and Risk of Patient Injury Using Machine Learning and Survival Analysis

Kazi Noshin 1,*, Mary Regina Boland 3,*, Bojian Hou 4, Weiqing He 4, Victoria Lu 1, Carol Manning 2, Li Shen 4,, Aidong Zhang 1,
PMCID: PMC12150751  PMID: 40502273

Abstract

Falls among the elderly and especially those with NeuroDegenerative Disorders (NDD) reduces life expectancy. The purpose of this study is to explore the role of Machine Learning on Electronic Health Records (EHR) data for time-to-event survival analysis prediction of injuries, and role of sensitive attributes, e.g., Race, Ethnicity, Sex, in these models. We used multiple survival analysis methods on a cohort of 29,045 patients 65 years and older treated at PennMedicine for either NDD, Mild Cognitive Impairment (MCI), or another disease. We compare the algorithms and explore the role of multiple modalities on improving prediction of injuries among NDD patients, specifically medications and laboratory tests. Overall, we found that medication features resulted in either increased Hazard Ratios (HR) or reduced HR depending on the NDD type. We found that being of Black race significantly increased the risk offall/injury in the models that included only medication and sensitive attribute features. The combined model that used both modalities (medications and laboratory information) removed this relationship between being of Black race and increases in fall/injury. Therefore, we found that combining modalities in these survival models in the prediction offall/injury risk among NDD and MCI individuals results in findings that are robust to different Racial and Ethnic groups with no biases apparent in our final combined modality results. Furthermore, combining modalities (both medications and laboratory values) improved the survival analysis performance across multiple survival analysis methods, when compared using the C-index.

1 Introduction

1.1 NeuroDegenerative Disorders, Alzheimer’s Disease and Related Dementias

Alzheimer’s Disease and Related Dementias (ADRD) afflicts an estimated 6.9 million people in the United States of America (USA), using current July 2024 statistics [1]. NeuroDegenerative Disorders (NDD) includes a larger set of disorders that includes ADRD and additional movement disorders such as Parkinson’s Disease (PD), Amyotrophic Lateral Sclerosis (ALS) and other motor neuron diseases. In the USA, almost 7 million individuals are reported to be afflicted with some type of NDD and that number may be higher. However, despite how common these diseases are among the elderly population, especially among those 65 and older, not much is known about risk factors among these patients in community-based settings. There also remains a paucity of research among diverse populations, including investigating sex-disparities [2] and racial disparities [3] in outcomes.

1.2 Relation between NDD and Injury

One challenge for NDD patients is progression of the disease, which can be measured through an increase in the number of injuries/falls. Injuries and falls are common among NDD patients [4]. One study captured the number of falls across various NDD types and found that Degenerative Ataxis had the lowest rate of falls with 9 falls during the study period, followed by PD with a rate of 14 falls per period, and Progressive Supranuclear Palsy (PSP) with a rate of 29 falls during the study period [4]. Injuries due to falls are also related to underlying gait disorders, which are common among those with an NDD. Among those 60 years and younger, gait disorders are rare with only 15% of that group reported as having a gait disorder [5, 6]. However, by 85 years of age, gait abnormalities rise to 82% of the population indicating that gait abnormalities are a common affliction among the elderly population [5, 6].

Regarding falls specifically, 30% of the 65+ population in the USA fall each year [5, 7]. Among those older than 80 years (which is a high-risk age range for NDD), 50% fall each year [5, 7]. The mortality rate is also high for those who are older and who fall. Studies show that 25% of deaths related to falls, and their subsequent injuries, occur in 13% of people older than 65 years (this also indicates that those who fall often have repeat falls) [5, 8].

1.3 Electronic Health Records (EHRs), NDDs, and Falls/Injuries

Electronic Health Records (EHRs) enable the extraction of large cohorts of community-based individuals with a variety of health conditions. EHRs often contain millions of patients, and including risk patient history information both in structured and unstructured data elements that can be utilized for diverse analyses [9, 10, 11, 12, 13, 14]. This includes information about diagnoses, medications, laboratory test results, procedures, demographics and clinical notes. These modalities are utilized by a variety of different studies on both ADRD, NDD and falls/injuries. One example study found was able to extract patients with agitation, both with and without Alzheimer’s Disease (AD), and they found that those with agitation were at risk of falling with rates above 20% in their EHR-derived cohort [15].

One study constructed a model that utilized EHR data, from Maine, for those 65+ and older [16]. They constructed a model using EHR data to predict who would fall within the first 30, and 30-60 days in the following year [16]. Their model achieved a great C-statistic of 0.807, and were able to capture over 50% of the falls within the next few days of the subsequent year [16]. Another EHR study predicted fall risk, achieving an Area Under the Curve of 0.79, finding many known risk factors for fall were significant in their EHR cohort [17].

Likewise, EHRs have been used to study NDD, and even ADRD. Unsupervised learning methods were employed by Xu et al. [18] to reveal four subphenotypes of AD. Their subphenotypes were correlated with common comorbidities of ADRD, including mental health diseases and cardiovascular disease [18]. We also have experience utilizing EHR data for NDD and ADRD subtypes [19]. Many of these prior studies, both on NDD and fall/injury risk, do not incorporate socioeconomic or racial/ethnic disparities into their algorithm development. This is important as not properly capturing these features can lead to biased research results [20, 21].

1.4 Importance of Survival Analysis

Survival analysis is a powerful statistical method for handling time-to-event data, which is prevalent in medical studies [22]. It allows researchers to analyze the duration until an event of interest occurs. This type of analysis also accommodates censored data, a key feature that ensures participants lost to follow-up or those who do not experience the event within the study period are included in the analysis [23]. This maximizes the use of available data and reduces potential bias. Moreover, survival analysis offers a robust framework for evaluating the efficacy of treatments by capturing both short-term and long-term outcomes, which is particularly valuable in chronic disease management [22]. Survival models also support the creation of risk prediction tools, enhancing patient counseling and treatment planning [22, 23]. Furthermore, in health economics, it is used to estimate life expectancy and quality-adjusted life years (QALYs), key metrics in cost-effectiveness analyses of medical interventions [24].

1.5 Limitations of Existing Methods Due to Large Number of Features

The application of traditional survival analysis methods face significant limitations when confronted with high dimensional data, which is increasingly common in the era of big data research [25, 26]. A key issue is the “curse of dimensionality,” where the number of features vastly exceeds the number of observations, leading to overfitting and poor model generalization. In such cases, the necessity for feature selection or dimensionality reduction arises. The computational burden of fitting high-dimensional models is also substantial, often resulting in prolonged processing times and increased memory requirements that can make analyses infeasible. Furthermore, as the number of features increases, the interpretability of the results becomes more challenging, making it difficult for researchers and clinicians to draw actionable insights. Traditional methods are often unable to capture complex, non-linear interactions between features, potentially overlooking critical prognostic factors [26].

1.6 Research Motivation and Purpose of Our Study

The purpose of our study is to investigate NDD progression over time by studying injuries/falls among NDD. We will utilize survival analysis methods, with a large number of potential EHR features derived from multiple modalities (multi-modal features) along with patient demographic information. We are utilizing demographics to explore whether our ML survival analysis models are biased in any way by demographics (e.g., Race, Ethnicity).

2 Methodology

2.1 Cohort Selection

We used Electronic Health Records (EHR) data collected during routine clinical care at the University of Pennsylvania, including both in-patient and out-patient data. This study only utilizes de-identified health data collected during routine clinical care and was approved by the University of Pennsylvania’s Institutional Review Board (IRB) with approval id: 851588. The process of developing our final, included survival cohort is shown in Figure 1.We started with a cohort of ADRD, dementia patients aged 65 and older (N=70,420). Because we are interested in utilizing survival analysis techniques to study the risk of falls/injuries among our NDD cohort, we first need to identify patients that had a fall or injury. We did this using a set of PheCodes that involve injuries, that would likely be involved in a patient fall [27]. After all patients were annotated as having or not having a fall, we then took the first diagnosis date for their fall or injury. Next, we excluded individuals where the event (i.e., the fall/injury) occurred before the NDD diagnosis (excluding N=4,839). Next, we included only those patients that visited Penn more than 1 time, this removes those that just visited for a single visit because we will not be able to study the time to a second event on a separate date if they only visited once. This resulted in the exclusion of 10,500 patients.

Figure 1:

Figure 1:

Flowchart of Inclusion into Survival Cohort

We defined our start date to be the date of first diagnosis of any NDD included in our study or Mild Cognitive Impairment (MCI). The NDD’s included in our study were AD, PD, Vascular Dementia (VD), Other Dementia (OD), and Lewy Body (LB) Dementia. Therefore, whichever of the 6 diagnoses occurred first would be the start date for the study for that patient. Patients that fell/were injured on their start date were excluded from the study (N=6,724 excluded) because we cannot ascertain which occurred first for those patients (Figure 1). The final restriction involved restricting to a 15 year time frame because some patients had diagnoses that were in the 1940s due to congenital conditions being treated at Penn in their older age. Therefore, these patients might skew the survival curves, and so we excluded these individuals (N=1,810). The remaining 46,547 patients were missing information pertaining to either medication, lab data or other demographics/sensitive attributes. For medication, 1,166 patients were excluded due to a lack of medication entry, 7,341 patients were removed for missing vital or lab entries. Subsequently, patients with no age entry were excluded. Our final cohort that we used for the subsequent survival analysis experiments contained N=29,045 patients who have both medication and laboratory data.

We identified patients as having one of five NDD types: AD, PD, VD, OD and LB using a combination of PheCodes and direct extraction of relevant International Classification of Diseases (ICD) version 9 (ICD-9) and 10 (ICD-10) codes. We also extract Mild Cognitive Impairment (MCI) using well known and established ICD-9 and ICD-10 codes [28]. The diagnoses were not mutually exclusive, patients could have multiple NDDs and MCI. As stated earlier, the start date of the study was the first date of diagnosis of any of the 5 NDDs or MCI.

2.2 Feature Preprocessing

We included patient demographics (Race, Ethnicity, Sex, Age, MCI, and NDD) in our models. For the Race variable, we used one-hot encoding to represent each as a binary feature. We also included two different EHR feature modalities: medications and laboratory test information described in detail below. We included demographic factors, medications, and laboratory tests due to their established associations with health disparities, disease risk, comorbidities, and critical biomarkers linked to NDD and NDD progression and monitoring.

Medications Distinct medication features were extracted from our UPenn de-identified EHR data. These medications had previously been linked to the Observational Health Data Sciences & Informatics (OHDSI) Common Data Model framework [29], which harmonizes all medications to distinct concept codes. These medication concept codes were used as binary features. Patients were only assigned a 1 for a specific medication concept code if that patient was on that medication at baseline or before the start date of the study. Otherwise that patient would be labeled with a 0.

Laboratory Test The vital sign and laboratory result information was also provided in accordance with the OHDSI Common Data model framework [29]. Therefore, laboratory values and vital signs were provided in the form of LOINC (Logical Observation Identifiers Names and Codes) codes [30] in our EHR data. There are some discrepancies in the dataset, for example, some patients have different results for a particular lab in a specific date. In such cases, we aggregated the information of vital signs and laboratory results by computing the mean value for each patient, date, and LOINC code, which allowed us to reduce variability and focus on central trends. The data then was transformed into a pivot table format, where each row represented a unique patient-date combination, and columns represented different vital signs and laboratory results (LOINC codes), with their corresponding mean values. The laboratory result codes were used as binary features. The binary columns for vital and laboratory code indicate whether data for the code was recorded on or before the study’s start date. If data for a specific code exists, a binary value of 1 is assigned; otherwise, 0 is assigned. If a patient doesn’t have any vital or lab record, we placed nan in all the columns for that patient. The processed dataset represents the presence or absence of vital signs and laboratory tests at baseline for each patient.

2.3 Feature Selection

To identify important features from the model, we extracted and evaluated feature importance based on the model’s learned weights by training a deep cox [31] model. In this step, we used 33792, 29094, and 29045 patients for medication, lab, and combined modalities respectively. The importance score for each feature was computed by multiplying the weight matrices from all sequential layers. The absolute values of the elements were taken to quantify the importance of each feature. Then the top 100 features were selected based on their importance. Additionally, the demographic columns (white, black, asian, hispanic, other, and male), the diagnosis columns (AD, PD, VD, OD, LB, and MCI), age, and length of time treated were retained from the original dataset. As a result, the medication and lab dataset resulted in 107 and 103 columns respectively. It is important to note that the importance of selected features may vary across different feature selection methods; however, the most significant variables are likely to consistently receive high importance rankings regardless of the method used [32, 33].

2.4 Survival Analysis

To perform survival analysis in this study, we used three baseline models: Cox Proportional Hazards model (CoxPH) [34], DeepSurv (deep cox) [31], Deep Survival Machines (DSM) [35], to compare with Deep Clustering Survival Machines (DCSM) [36]. ‘CoxPHSurvivalAnalysis’ 1 from ‘sksurv’ package is used for CoxPH method. We used “concordance index” (C Index) to evaluate the time-to-event prediction performance of all the methods. We performed this study based on three modalities: medication, laboratory results and combined (both medication and laboratory). We have 3 datasets for each modality with the information from same patients. The entire data set was split into a training set and a held-out testing set with a ratio of 7:3. We used ‘Optuna’ [37] on the training set to optimize hyperparameters for DeepSurv, DSM, and DCSM. The learning parameter step size was chosen from [1e-5, 1e-2]. The layer setting of the multiple perceptron was chosen from [[50], [100], [50, 50]] where “50” and “100” are the number of neurons in each layer. The number of distributions was chosen from [1, 5].

2.5 Plotting Hazard Ratios for NDD Type and Sensitive Attributes

In this analysis, we aimed to evaluate the performance of all the models mentioned in section 2.4, across different subgroups by calculating the concordance index (C-index) for each subgroup. We extracted the column indices of different NDD types, including AD, PD, VD, OD, and LB. Once the model was trained, we applied the prediction function to obtain the survival outputs for the held-out test set. We then used the previously identified subgroup indices to extract the corresponding predicted values and true labels (event and time) and computed the C-index for each subgroup. This same approach was followed for demographic subgroups, including gender and racial groups (e.g., male, asian, white, black, hispanic). We included these demographics features and other ‘sensitive attributes’ to study the various models’ performance in these sensitive groups. We did this knowing that ML EHR models can be biased by demographics [38].

Then we evaluated the hazard ratios and their associated confidence intervals for the NDD types and the sensitive variables using the CoxPH model. We extracted the trained model coefficients and exponentiated to obtain the hazard ratios, reflecting the relative risk of the covariates on the outcome. We applied bootstrapping to generate 95% confidence intervals around the coefficients by resampling the test data.

3 Results

3.1 Brief Description of Cohort

Our process resulted in a cohort of N=29,045 patients after those with missing data were excluded (Figure 1). The majority of the missing data occurred because of the age variable, which we wanted to include as a covariate in our analyses and this variable was missing a large chunk of data. Not all of the 29,045 patients in our cohort had either one of the 5 NDDs or MCI. The number of patients included by NDD type or MCI is given in Table 1.

Table 1:

Number of Diagnoses in Current Cohort

  AD PD VD OD LB MCI
Num. of Patients 1952 5403 711 4656 424 7635
Patients with Fall/Injury 478 722 227 1130 149 1982
% Fall/Injury 24.49 13.36 31.93 24.27 35.14 25.96
Mean Age (at Start of Study) 74.21 71.07 73.37 73.46 72.43 67.48

The largest sample size was for MCI patients, followed by PD then OD. Diagnoses of LB were the least frequent, although 424 patients were included in our final cohort. The fall or injury rate by disease varied with the highest reported among LB patients with 35.14%, followed by VD at 31.93% (Table 1). The group that had the lowest fall/injury rate was PD at 13.35% 1. As expected the youngest group of patients were those with MCI with a mean age of 67 years while the oldest patients were those with AD diagnoses at 74 years of age (Table 1). All 5 of the NDDs had mean age at start of study greater than 71 years of age indicating that these patients are in fact somewhat older.

3.2 Implementation of Survival Models on Different Modalities

Figure 2 shows the C-index for three different data types: Medication, Laboratory result, and Combined (Medication and Laboratory result) with 95% confidence intervals. The C-index allows comparisons of the performance for the four different models (CoxPH, Deep Cox, DSM, and DCSM) for predicting survival outcomes. DCSM has the lowest performance of around 0.6 in medication and laboratory category, whereas DCSM performs better when combining both modalities than the other two methods but still slightly under performs compared to the other models.

Figure 2:

Figure 2:

Model Performance: Test C-index

Investigating the Role of Sensitive Attributes (e.g., Race) on Model Performance Figure 3A compares the C-index across four racial groups: White, Black, Asian, and Hispanic, for three types of data categories: medication, laboratory, and combined. For White people, medication data shows more variability, with DCSM performing worse than the others. Laboratory data performs significantly better for Asian people, especially for the DSM model. Figure 3B shows the C-index across all five different NDDs for medication, laboratory results, and combined (both medication and laboratory) modalities. In the case of AD, DCSM performs the best with a c-index close to 0.6 for medication. All models perform better for laboratory results of PD, with a C-index close to 0.65. For combined data of PD, the performance is also similar across all models. When it comes to VD, medication data shows variability, with the deep cox model performing lower than the others. Performance across medication and combined data is similar for OD, with the DSM and DCSM models performing slightly better. For LB, the laboratory data and the CoxPH model performs best, while medication data shows lower values across all models except DSM.

Figure 3:

Figure 3:

Model C-index Performance by Race and NDD Type

Hazard Ratio Plots of Features and Variation by Sensitive Attributes and NDD Type

Figure 4 compares the hazard ratios (HR) with their corresponding 95% confidence intervals (CI) from three different modalities for various sensitive variables. We explored the relationship between the different modalities (medications, laboratory values and combined) with sensitive attributes and 5 NDD diagnoses. The HR relationship between thein-clusion of medication features varied by NDD types. Whereas, for laboratory values there did not appear to be any significant difference in the HRs across NDD types and sensitive attributes (Figure 4). For medication features, the HRs appeared to increase risk of falls/injuries for OD and PD and reduce the risk of falls/injuries for AD and VD (see Figure 4). This indicates that medication features are important for these different NDD types in different ways and highlights the importance of including medication features in models of NDD.

Figure 4:

Figure 4:

Hazard Ratios of NDD types and Sensitive Variables

4 Discussion

Overall, we found that combining modalities (medications and laboratory information) improved performance of the time-to-event survival models predicting fall/injury risk among NDD and MCI patients. We found that medication information resulted in either increases or decreases in the HRs for select sensitive attributes indicating that medications maybe more biased. This effect disappeared in the combined modality (Figure 4).

4.1 Sensitive Attributes (Race) and Risk of Fall in Different Survival Models

Being of Black race significantly increased the risk of fall/injury in the models that included only medication + sensitive attribute features. The combined model removed this relationship between being of Black race and increases in fall/injury. We also found that OD and PD patients had increased risk of fall/injury among the model using medication features + sensitive attribute features. Contrasting LB and AD had reduced risk of fall/injury among the model using the medication features + sensitive attribute features. Interestingly, the combined model did not show this relationship as the laboratory-related features did not appear to have statistically significant differences by sensitive attributes or NDD type (Figure 4). Looking at the models’ performances further by Race, overall performance was worse for Hispanic. Laboratory data provides the highest C-index values particularly for White and Asian populations. DSM and Deep Cox models consistently perform well across most racial groups and data categories, while DCSM shows varied performance, excelling in some groups but underperforming in others.

4.2 Importance of Integrating Multiple EHR Modalities in Outcomes with Disparities

Our findings highlight the importance of integrating multiple data modalities from the EHR to provide a more complete picture of what is occurring among aging patients. Additionally, combined data generally led to optimal performance, highlighting the importance of integrating multiple data modalities for improving model accuracy. The deep cox and DSM models generally performed better overall, particularly for PD. However, CoxPH shows more variability, performing worse in some cases (e.g., OD) but well in others (e.g., VD). For AD, OD, LB, the models’ performances did not vary by the use of combined features. In contrast, for PD, laboratory features were very helpful in improving the analysis, indicating the importance of laboratory-based data for this disease. Similarly, for VD, combined features improved the analysis, highlighting the benefit of using multiple data sources for accurate predictions.

4.3 Limitations and Future Work

Our study has several limitations. First, we only utilize two EHR modalities (medication and laboratory results) and do not explore other modalities (e.g., procedures, diagnoses). Therefore, while our results are robust for the explored modalities, the behavior of these models on other EHR modalities may differ. Future work includes exploring additional EHR modalities to determine if the results are the same or differ. Another limitation pertains to the EHR data itself. EHR data are known to be subject to many biases that can result in various disparities [38]. Therefore, replication in a separate EHR, and one that utilizes a different EHR vendor may increase the robustness of our findings in future work (note that UPenn utilizes the EPIC EHR vendor). Future work will involve further exploration of additional EHR modalities (e.g., diagnoses, imaging) to determine the benefits of adding those modalities on NDD progression. We also hope to further explore the role of Social Determinants of Health [33], and disparities on NDD towards developing fairness-aware methods.

5 Conclusion

In conclusion, we found that combining multiple modalities from the EHR appeared to improve performance when using the C-index across survival analysis methods. This indicates that combining modalities, namely using both medications and laboratory value features, appears to improve the performance modestly. When we explored the relationship between features and sensitive attributes, we found that the HR for medication features varied with increases in HR observed for OD and PD and reduced HR observed for AD and VD (see Figure 4). This likely relates to the specific medications that are important in treating those NDD types and how those might alter the risk of injury or hospitalization following a fall or other adverse event. We found that being of Black race significantly increased the risk of fall/injury in the models that included only medication + sensitive attribute features. The combined model that used both modalities (medications and laboratory information) removed this relationship between being of Black race and increases in fall/injury. Therefore, we found that combining modalities in these survival models in the prediction of fall/injury risk among NDD and MCI individuals results in findings that are robust to different Racial and Ethnic groups with no biases apparent in our final combined modality results.

Acknowledgments

Research reported in this publication/ presentation was supported by the National Institute On Aging of the National Institutes of Health under Award Numbers P30 AG073105, U01 AG066833 and R01 AG071470. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

1

https://scikit-survival.readthedocs.io/en/v0.23.0/api/generated/sksurv.linear model.CoxPHSurvivalAnalysis.html

Figures & Table

References

  • [1].ALZ.org Alzheimer’s disease facts and figures. Accessed in July 2024, https://www.alz.org/alzheimers-dementia/facts-figures, 2024. [Google Scholar]
  • [2].Tang Alice S, Oskotsky Tomiko, Havaldar Shreyas, Mantyh William G, Bicak Mesude, Solsberg Caroline Warly, Woldemariam Sarah, Zeng Billy, Hu Zicheng, Oskotsky Boris, et al. Deep phenotyping of alzheimer’s disease leveraging electronic medical records identifies sex-specific clinical associations. Nature communications. 2022;13(1) doi: 10.1038/s41467-022-28273-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Babulal Ganesh M, Quiroz Yakeel T, Albensi Benedict C, Arenaza-Urquijo Eider, Astell Arlene J, Babiloni Claudio, Bahar-Fuchs Alex, Bell Joanne, Bowman Gene L, Brickman Adam M, et al. Perspectives on ethnic and racial disparities in alzheimer’s disease and related dementias: update and areas of immediate need. Alzheimer’s & Dementia. 2019;15(2):292–312. doi: 10.1016/j.jalz.2018.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Srulijes Karin, Klenk Jochen, Schwenk Michael, Schatton Cornelia, Schwickert Lars, Teubner-Liepert Kristin, Meyer Miriam, Srijana KC, Maetzler Walter, Becker Clemens, et al. Fall risk in relation to individual physical activity exposure in patients with different neurodegenerative diseases: a pilot study. The Cerebellum. 2019;18:340–348. doi: 10.1007/s12311-018-1002-x. [DOI] [PubMed] [Google Scholar]
  • [5].Axer Hubertus, Axer Martina, Sauer Heinrich, Witte Otto W, Hagemann Georg. Falls and gait disorders in geriatric neurology. Clinical neurology and neurosurgery. 2010;112(4):265–274. doi: 10.1016/j.clineuro.2009.12.015. [DOI] [PubMed] [Google Scholar]
  • [6].Bloem Bastiaan R, Haan Joost, Lagaay Anne M, van Beek Wim, Wintzen Axel R, Roos Raymund AC. Investigation of gait in elderly subjects over 88 years of age. Journal of geriatric psychiatry and neurology. 1992;5(2):78–84. doi: 10.1177/002383099200500204. [DOI] [PubMed] [Google Scholar]
  • [7].Tinetti Mary E, Williams Christianna S. Falls, injuries due to falls, and the risk of admission to a nursing home. New England journal of medicine. 1997;337(18):1279–1284. doi: 10.1056/NEJM199710303371806. [DOI] [PubMed] [Google Scholar]
  • [8].Rubenstein Laurence Z. Falls in older people: epidemiology, risk factors and strategies for prevention. Age and ageing. 2006;35(suppl 2):ii37–ii41. doi: 10.1093/ageing/afl084. [DOI] [PubMed] [Google Scholar]
  • [9].Boland Mary Regina, Kraus Marc S, Dziuk Eddie, Gelzer Anna R. Cardiovascular disease risk varies by birth month in canines. Scientific Reports. 2018;8(1):1–11. doi: 10.1038/s41598-018-25199-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Boland Mary Regina, Parhi Pradipta, Li Li, Miotto Riccardo, Carroll Robert, Iqbal Usman, Nguyen Phung-Anh, Schuemie Martijn, You Seng Chan, Smith Donahue, et al. Uncovering exposures responsible for birth season–disease effects: a global study. Journal of the American Medical Informatics Association. 2018;25(3):275–288. doi: 10.1093/jamia/ocx105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Boland Mary Regina, Fieder Martin, John Luis H, Rijnbeek Peter R, Huber Susanne. Female reproductive performance and maternal birth month: a comprehensive meta-analysis exploring multiple seasonal mechanisms. Scientific Reports. 2020;10(1):555. doi: 10.1038/s41598-019-57377-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Li Li, Boland Mary Regina, Miotto Riccardo, Tatonetti Nicholas P, Dudley Joel T. Replicating cardiovascular condition-birth month associations. Scientific reports. 2016;6(1):33166. doi: 10.1038/srep33166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Boland Mary Regina, Shahn Zachary, Madigan David, Hripcsak George, Tatonetti Nicholas P. Birth month affects lifetime disease risk: a phenome-wide method. Journal of the American Medical Informatics Association. 2015;22(5):1042–1053. doi: 10.1093/jamia/ocv046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Boland Mary Regina, Alur-Gupta Snigdha, Levine Lisa, Gabriel Peter, Gonzalez-Hernandez Graciela. Disease associations depend on visit type: results from a visit-wide association study. BioData Mining. 2019;12:1–10. doi: 10.1186/s13040-019-0203-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Halpern Rachel, Seare Jerald, Tong Junliang, Hartry Ann, Olaoye Anthony, Aigbogun Myrlene Sanon. Using electronic health records to estimate the prevalence of agitation in alzheimer disease/dementia. International journal of geriatric psychiatry. 2019;34(3):420–431. doi: 10.1002/gps.5030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Ye Chengyin, Li Jinmei, Hao Shiying, Liu Modi, Jin Hua, Zheng Le, Xia Minjie, Jin Bo, Zhu Chunqing, Alfreds Shaun T, et al. Identification of elders at higher risk for fall with statewide electronic health records and a machine learning algorithm. International journal of medical informatics. 2020;137:104105. doi: 10.1016/j.ijmedinf.2020.104105. [DOI] [PubMed] [Google Scholar]
  • [17].Baus Adam, Coben Jeffrey, Zullig Keith, Pollard Cecil, Mullett Charles, Taylor Henry, Cochran Jill, Jarrett Traci, Long Dustin. An electronic health record data-driven model for identifying older adults at risk of unintentional falls. Perspectives in health information management. 2017;14(Fall) [PMC free article] [PubMed] [Google Scholar]
  • [18].Xu Jie, Wang Fei, Xu Zhenxing, Adekkanattu Prakash, Brandt Pascal, Jiang Guoqian, Kiefer Richard C, Luo Yuan, Mao Chengsheng, Pacheco Jennifer A, et al. Data-driven discovery of probable alzheimer’s disease and related dementia subphenotypes using electronic health records. Learning Health Systems. 2020;4(4):e10246. doi: 10.1002/lrh2.10246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Noshin Kazi, Boland Mary Regina, Hou Bojian, Lu Victoria, Manning Carol, Shen Li, Zhang Aidong. Uncovering important diagnostic features for alzheimer’s, parkinson’s and other dementias using interpretable association mining methods. Pacific Symposium of Biocomputing. 2025 doi: 10.1142/9789819807024_0045. [DOI] [PubMed] [Google Scholar]
  • [20].Char Danton S, Shah Nigam H, Magnus David. Implementing machine learning in health care—addressing ethical challenges. New England Journal of Medicine. 2018;378(11):981–983. doi: 10.1056/NEJMp1714229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Chen Irene Y, Pierson Emma, Rose Sherri, Joshi Shalmali, Ferryman Kadija, Ghassemi Marzyeh. Ethical machine learning in healthcare. Annual review of biomedical data science. 2021;4(1):123–144. doi: 10.1146/annurev-biodatasci-092820-114757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Clark T G, Bradburn M J, Love S B, Altman D G. Survival analysis part i: Basic concepts and first analyses. British Journal of Cancer. 07 2003;89:232–238. doi: 10.1038/sj.bjc.6601118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Rai Sushmita, Mishra Prabhakar, Ghoshal Uday C. Survival analysis: A primer for the clinician scientists. Indian Journal of Gastroenterology. 10 2021;40:541–549. doi: 10.1007/s12664-021-01232-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Williams Claire, Lewsey James D, Mackay Daniel F., Briggs Andrew H. Estimation of survival probabilities for use in cost-effectiveness analyses: A comparison of a multi-state modeling survival analysis approach with partitioned survival and markov decision-analytic modeling. Medical Decision Making. 10 2016;37:427–439. doi: 10.1177/0272989X16670617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].T¨urkis¸ Fulden Cantas¸, Omurlu ˙Imran Kurt, T¨ure Mevl¨ut. Survival prediction with extreme learning machine, supervised principal components and regularized cox models in high-dimensional survival data by simulation. GAZI UNIVERSITY JOURNAL OF SCIENCE. 06 2024;37:1004–1020. [Google Scholar]
  • [26].Zhong Qixian, Mueller Jonas W, Wang Jane-Ling. Deep extended hazard models for survival analysis. In: Ranzato M, Beygelzimer A., Dauphin Y., Liang P. S., Wortman Vaughan J., editors. Advances in Neural Information Processing Systems. volume 34. Curran Associates, Inc.; 2021. pp. 15111–15124. [Google Scholar]
  • [27].Boland Mary Regina. Boland lab github: Alzheimer’s disease and related dementias (adrd) project. 2024 Accessed in September 2024, https://github.com/bolandlab/AlzheimersDiseaseandRelatedDementias . [Google Scholar]
  • [28].Mao Chengsheng, Xu Jie, Rasmussen Luke, Li Yikuan, Adekkanattu Prakash, Pacheco Jennifer, Bonakdarpour Borna, Vassar Robert, Shen Li, Jiang Guoqian, et al. Ad-bert: Using pre-trained language model to predict the progression from mild cognitive impairment to alzheimer’s disease. Journal of Biomedical Informatics. 2023;144:104442. doi: 10.1016/j.jbi.2023.104442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Reich Christian, Ostropolets Anna, Ryan Patrick, Rijnbeek Peter, Schuemie Martijn, Davydov Alexander, Dymshyts Dmitry, Hripcsak George. Ohdsi standardized vocabularies—a large-scale centralized reference ontology for international data harmonization. Journal of the American Medical Informatics Association. 2024;31(3):583–590. doi: 10.1093/jamia/ocad247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].McDonald Clement J, Huff Stanley M, Suico Jeffrey G, Hill Gilbert, Leavelle Dennis, Aller Raymond, Forrey Arden, Mercer Kathy, DeMoor Georges, Hook John, et al. Loinc, a universal standard for identifying laboratory observations: a 5-year update. Clinical chemistry. 2003;49(4):624–633. doi: 10.1373/49.4.624. [DOI] [PubMed] [Google Scholar]
  • [31].Katzman Jared L., Shaham Uri, Cloninger Alexander, Bates Jonathan, Jiang Tingting, Kluger Yuval. Deep-surv: personalized treatment recommender system using a cox proportional hazards deep neural network. BMC Medical Research Methodology. 02 2018;18 doi: 10.1186/s12874-018-0482-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Noshin K, Boland MR, Hou B, Lu V, Manning C, Shen L, Zhang A. Uncovering important diagnostic features for alzheimer’s, parkinson’s and other dementias using interpretable association mining methods. Pacific Symposium on Biocomputing. 2025;30:631–646. [PMC free article] [PubMed] [Google Scholar]
  • [33].Noshin Kazi, Boland Mary Regina, Hou Bojian, Lu Victoria, Manning Carol, Shen Li, Zhang Aidong. Integrating social determinants of health in a multi-modal deep clustering survival model for injury-risk in alzheimer’s and related dementia patients. AAAI. 2025;In press [Google Scholar]
  • [34].Cox D. R. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 01 1972;34:187–202. [Google Scholar]
  • [35].Nagpal Chirag, Li Xinyu, Dubrawski Artur. Deep survival machines: Fully parametric survival regression and representation learning for censored data with competing risks. IEEE Journal of Biomedical and Health Informatics. 08 2021;25:3163–3175. doi: 10.1109/JBHI.2021.3052441. [DOI] [PubMed] [Google Scholar]
  • [36].Hou Bojian, Wen Zixuan, Bao Jingxuan, Zhang Richard, Tong Boning, Yang Shu, Wen Junhao, Cui Yuhan, Moore Jason H, Saykin Andrew J, Huang Heng, Thompson Paul M, Ritchie Marylyn D, Davatzikos Christos, Shen Li. Interpretable deep clustering survival machines for alzheimer’s disease subtype discovery. Medical Image Analysis. 2024;97:103231. doi: 10.1016/j.media.2024.103231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Akiba Takuya, Sano Shotaro, Yanase Toshihiko, Ohta Takeru, Koyama Masanori. Optuna: A next-generation hyperparameter optimization framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2019 [Google Scholar]
  • [38].Boland Mary Regina, Elhadad No´emie, Pratt Wanda. Informatics for sex-and gender-related health: understanding the problems, developing new methods, and designing new solutions. Journal of the American Medical Informatics Association. 2022;29(2):225–229. doi: 10.1093/jamia/ocab287. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from AMIA Summits on Translational Science Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES