Abstract
Purpose:
To predict the need for surgical intervention in patients with primary open-angle glaucoma (POAG) using systemic data in electronic health records (EHR).
Design:
Development and evaluation of machine learning models.
Methods:
Structured EHR data for 385 POAG patients from a single academic institution were incorporated into models using multivariable logistic regression, random forests, and artificial neural networks. Leave-one-out cross-validation was performed. Mean area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, and Youden Index were calculated for each model to evaluate performance. Systemic variables driving predictions were identified and interpreted.
Results:
Multivariable logistic regression was most effective at discriminating patients with progressive disease requiring surgery, with an AUC of 0.67. Higher mean systolic blood pressure was associated with significantly increased odds of needing glaucoma surgery (odds ratio [OR] 1.09, p<0.001). Ophthalmic medications (OR 0.28, P<0.001), non-opioid analgesic medications (OR 0.21, P=0.002), anti-hyperlipidemic medications (OR 0.39, P=0.004), macrolide antibiotics (OR 0.40, P=0.03), and calcium blockers (OR 0.43, P=0.03) were associated with decreased odds of needing glaucoma surgery.
Conclusions:
Existing systemic data in the EHR has some predictive value in identifying POAG patients at risk of progression to surgical intervention, even in the absence of eye-specific data. Blood pressure-related metrics and certain medication classes emerged as predictors of glaucoma progression. This approach provides an opportunity for future development of automated risk prediction within the EHR based on systemic data to assist with clinical decision-making.
Table of Contents Statement
The relationship between systemic conditions and medications and progression of primary open-angle glaucoma (POAG) is complex and not well-characterized. Baxter et al. leverage the vast quantity of systemic data in electronic health records to generate models predicting which patients are at risk of progressive POAG requiring surgical intervention using a machine learning approach. Using multivariable logistic regression, random forests, and artificial neural networks, systemic data were found to have predictive value in classifying at-risk patients.
INTRODUCTION
Glaucoma is a progressive optic neuropathy and the world’s leading cause of irreversible blindness.1 Intraocular pressure (IOP) is the only documented modifiable risk factor and lowering IOP is the current mainstay of glaucoma therapy. However, not all patients with glaucoma have high IOP, and many patients progress to significant visual impairment despite IOP lowering. In addition, even though IOP-lowering has demonstrated effectiveness in delaying disease progression, prior large clinical studies have shown that disease progression is still inevitable.2–4 Thus, there has been increasing interest in identifying other therapeutic targets besides IOP.
Vascular conditions such as hypertension, diabetes, and coronary artery disease have been hypothesized to have a role in glaucoma development and progression.5 The relationship between systemic hypertension and primary open-angle glaucoma (POAG) is of particular interest, as both are age-related chronic diseases that are increasing in prevalence. Several population-based cross-sectional studies, such as the Rotterdam Eye Study6 and the Egna-Neumarkt Glaucoma Study,7 have demonstrated an association between elevated blood pressure (BP), elevated IOP, and glaucoma. The Blue Mountains Eye Study8 also demonstrated that systemic hypertension is related to an increased risk of glaucoma, and this elevated risk was independent of the effect of elevated BP on raising IOP. However, the relationship between BP and glaucoma is multifaceted, as the Barbados Eye Studies showed that lower systolic BP was also associated with risk of developing glaucoma.9 Several subsequent studies found hypotension is a risk factor for glaucoma, and specifically reduction of BP at night, known as nocturnal dipping, appears to make the optic nerve more susceptible to damage.10–15 However, many of these prior analyses did not account for co-existing vascular conditions, such as diabetes mellitus, which could also potentially influence the perfusion of the optic nerve. Moreover, the medical treatment of systemic hypertension, which may have a major confounding effect, was not always rigorously examined in the population-based studies. Some of the clinical studies were limited by small sample size. Finally, these studies utilized an expert-driven approach to create models incorporating only a modest number of risk variables and thus were not able to perform a comprehensive analysis of systemic risk factors in relation to glaucoma progression.
With the wide adoption of electronic health records (EHR), vast quantities of systemic data are readily available that can potentially be leveraged to better understand the relationship between systemic conditions and primary open-angle glaucoma (POAG). Since surgical intervention is a discrete event that is clearly defined and captured in the EHR, we used glaucoma surgery in this study as a surrogate for progressive disease. We hypothesized machine learning models trained with systemic data in the EHR may offer predictive value in classifying patients at high risk of glaucoma progression, as represented by need for glaucoma surgery within six months. This could potentially enhance the ability to practice precision medicine in the management of glaucoma patients. Furthermore, by identifying clinical features that are associated with risk of progression, these models may help us better understand glaucoma pathophysiology and identify novel therapeutic targets for future investigation.
METHODS
Study Population and Data Source
This study entailed development and evaluation of machine learning models based on retrospective data. We obtained EHR data from patients with glaucoma from the University of California San Diego (UCSD) Clinical Data Warehouse with clinical encounters during a five-year period from September 2013 to September 2018. The EHR used in both inpatient and ambulatory settings was Epic (EpicCare, Verona, WI). Institutional Review Board (IRB)/Ethics Committee approval was obtained at UCSD before the study began, and waiver of informed consent was also granted by the IRB. The study adhered to the tenets of the Declaration of Helsinki and was compliant with the Health Insurance Portability and Accountability Act (HIPAA) and all federal and state laws.
Inclusion criteria consisted of the following: diagnosis of POAG (International Classification of Disease [ICD]-9 code of 365.11 or ICD-10 code of H40.11), age greater than or equal to 18 years, diagnosis date between September 1, 2013 and September 1, 2018, and presence of systemic data in the UCSD EHR. Patients were excluded if the timespan of systemic data in the UCSD EHR was less than 6 months’ duration. With this exclusion criterion, patients who were seen by ophthalmology and sent to a primary care provider for pre-operative clearance shortly before surgery (such as those with advanced glaucoma referred to UCSD specifically for glaucoma surgical intervention) but lacked any other systemic data in the EHR were excluded. This helped ensure that the two groups (patients with surgery and those without surgery) would be similarly derived and helped mitigate potential bias from our institution serving as a tertiary referral center. By excluding these patients specifically referred for surgery, the training data for our models consisted of only patients who had undergone routine monitoring and had systemic data within our health system for at least six months. The final cohort that met these inclusion and exclusion criteria consisted of 385 patients, 174 of whom underwent glaucoma surgery within 6 months (cases) and 211 who did not (controls). All patients who underwent surgery did so at our institution’s ambulatory/outpatient surgical center. None of the patients required hospitalization for their glaucoma surgery.
Outcome Definition
The primary outcome of interest was defined as need for any type of glaucoma-related surgical intervention within 6 months of presentation, with initial presentation defined as date of first encounter within the EHR. Similar to a recent study by Zheng et al.,1 the following Current Procedural Terminology (CPT) codes were used to classify incisional and laser glaucoma surgery: 66160, 66170, 66172, 66174, 66175, 66179, 66180, 66183, 66184, 66185, 66710, 66711, and 65855. In addition to these, 65850, 65820, 0191T, and 0449T were also included as qualifying codes in order to represent minimally invasive glaucoma surgeries.
Data Processing
Figure 1 depicts the overall workflow for data processing and model construction and evaluation. First, data were extracted from the UCSD EHR Clinical Data Warehouse for the defined patient cohort. These included structured data pertaining to patient demographics, medications, information about admissions/hospitalizations, social history, vital signs, laboratory results, disease diagnoses, and procedures/surgeries. All records in the data sources were indexed by a unique patient identifier and timestamp. Free text clinical narrative notes and radiology images and reports were not included. A summary of the source data is provided in Supplemental Table 1 (available at AJO.com).
The raw data were abstracted from the clinical data warehouse into encrypted, password-protected Microsoft Excel files, which were placed on a secure HIPAA-compliant server.13 The data were exported to R (versions 3.5.1, R Core Team, www.r-project.org) for processing and analysis on the secure server. The following libraries were used: tidyverse, icd, varhandle, tableone, PerformanceAnalytics, ROCR, randomForest, nnet, cutpointr, and psych. All codes for data cleaning, processing, and analysis are released on GitHub (https://github.com/cmarkymark/Baxter_Marks_Kuo_Ohno-Machado_Weinreb/tree/1.0.0).17
To decrease the risk of overfitting, we processed the data to reduce feature dimensionality. For medications, individual medications were coded into pharmacologic classes based on RxNorm ontologies,18 and any medication used for less than 2 weeks was excluded. For hospitalizations, a binary variable was created to characterize any history of hospitalization as well as a continuous variable to define total number of days hospitalized during the observed study period. Vital signs were processed to define features such as maximum, minimum and mean for both systolic blood pressure, diastolic blood pressure, and heart rate. Body mass index was calculated based on height and weight information. Mapping of individual ICD-9 and ICD-10 codes into disease categories was performed using the icd package in R.19 We excluded categorical variables with too few unique responses to include in the cross-validation procedure described below. At the conclusion of data cleaning and processing, we reduced the features to a total of 48 predictor variables (11 continuous and 37 categorical) for training the subsequent predictive models.
Statistical Analysis and Predictive Modeling Methods
We generated summary statistics to describe both cases and controls. Univariate analyses were performed to investigate for potential associations between a) Demographic Characteristics and b) Clinical Features, individually with the primary outcome. For predictive modeling, we exploited the following three binary classification methods:
1). Multivariable Logistic Regression.
Logistic regression is a classification model widely used for predictive modeling in the medical literature that learns a direct map from the input data to the response labels and predicts risk using a monotonically increasing or decreasing function.20 We initially adopted a bidirectional step-wise variable selection using the step function in R based on the Akaike information criterion (AIC) before training a multivariable logistic regression model. This function utilizes the AIC value to choose whether to add or remove variables from the model starting from the null model; its first direction is always forward. Details of the methodology underlying bidirectional step-wise variable selection have been previously described by Hastie and Pregibon.21 We chose this method over modern tree-based methods because classic regression-based variable selection methods, such as those based on AIC, have been demonstrated to achieve better parsimony in clinical prediction problems in relatively smaller datasets comparable in size to our cohort.22 We also trained a full model with all predictor variables to evaluate its performance without stepwise regression.
2). Random Forests.
The random forests method classifies data based on an ensemble of many binary decision trees, which are trained by splitting the dataset into subsets on a value at a node and repeating this process on each subset.23–25 A forest reduces the risk of overfitting by averaging over multiple decision trees.23 Here we used the randomForest package26 in R to build a random forest model.
3). Artificial Neural Networks.
Artificial Neural Networks (ANNs) were implemented utilizing the nnet package in R.27 We compared a variety of neural network architectures (e.g., one hidden layer versus two hidden layers, different number of nodes within each layer) using a grid search method. We used a gradient descent learning algorithm with an exponential learning rate decay starting at 1. Complete batches were used. For each neural network, the maximum iteration variable was set to 1,000 epochs. The stopping criteria were either 1,000 iterations, or if the maximum conditional likelihood fell below 0.0001, or if the change in the optimizer (Broyden-Fletcher-Goldfarb-Shanno algorithm28) fell below 1×10−9.
Evaluation and Settings for Predictive Models
For model evaluation, we used a leave-one-out cross-validation (LOOCV) approach, also known as the jackknife method,29 in which the model is trained on all observations except one, which then serves as the test set. Accordingly, in our dataset we predicted each case based on a model trained on the remaining 384 cases. That is, the following process is repeated 385 times for each of the 385 test cases: each case was removed, the model was trained on the remaining 384 cases, then we applied the model to the single test case to collect the prediction score for that specific case. In LOOCV, the overall predictive performance consists of summative measures (i.e., computed using the prediction scores collected from the test cases).30 We used five evaluation metrics of predictive performance: area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, and the Youden Index. Advantages to the LOOCV approach include the capacity to provide a direct assessment of predictive ability, being intuitive, lack of randomness in the training/validation set splits, and its general nature lends compatibility for use with any kind of predictive modeling.29,30 Although LOOCV is computationally expensive in general,30,31 this method was feasible for our study given the manageable sample size.
For the Multivariable Logistic Regression model, we additionally developed a model using the entire dataset to examine the relative contribution of various predictor variables. For the Random Forest model, we computed the Mean Decrease in Accuracy (MDA, or Permutation Importance) and the Mean Decrease in Impurity (MDI, or Gini Importance) for all variables using the entire dataset, to determine important variables for predicting need for glaucoma surgery. For the Artificial Neural Network, there were four hyper-parameters: the number of layers, the nodes of the output layer, the nodes of the hidden layer (if utilized), and the number of epochs for training. The last hyper-parameter (i.e., epochs) was determined by evaluating for occurrence of overfitting based on the AUC, with a pre-defined maximum of 1,000 epochs.
RESULTS
We identified 385 adult patients with primary open-angle glaucoma in our clinical data warehouse with clinical encounters between 2013 and 2018 and at least 6 months of longitudinal systemic data captured in the EHR. Of these, 174 had undergone surgical intervention for glaucoma within 6 months of presentation (cases), and 211 had not undergone surgical intervention (controls). Surgical intervention included any type of glaucoma-related procedural intervention, including incisional surgery, minimally invasive glaucoma surgery, and laser surgery.
Table 1 depicts baseline characteristics of both cases and controls. There were no statistically significant differences in demographic characteristics between cases and controls. Mean age in both groups was around 73 years old. Patients undergoing surgery were approximately equally split between males and females. There was a slight female predominance (53.5%) among those without any surgical intervention but this was not statistically significant (p=0.414). The majority of patients self-identified as white (52.9% of cases, 57.8% of controls), with “Other Race or Mixed Race” being the next most highly represented racial category (19.0% of cases, 17.1% of controls). About 15% of patients in both groups self-identified as Hispanic.
Table 1.
Patients without any glaucoma surgery (n=211) | Patients undergoing glaucoma surgery within 6 months of presentation (n=174) | P-valuea | |
---|---|---|---|
Age (Mean, SD) | 73.24 (11.88) | 73.09 (12.60) | 0.905 |
Male Gender (n, %) | 98 (46.4) | 89 (51.1) | 0.414 |
Self-Reported Race (n, %) | 0.228 | ||
American Indian or Alaska Native | 0 (0.0) | 1 (0.6) | |
Asian | 29 (13.7) | 20 (11.5) | |
Black or African American | 7 (3.3) | 16 (9.2) | |
Native Hawaiian or Other Pacific Islander | 0 (0.0) | 0 (0.0) | |
Other Race or Mixed Race | 36 (17.1) | 33 (19.0) | |
Unknown (Patient cannot or refuses to declare race) | 17 (8.1) | 12 (6.9) | |
White | 122 (57.8) | 92 (52.9) | |
Self-Reported Ethnicity (n, %) | 0.695 | ||
African American | 0 (0.0) | 1 (0.6) | |
Asian/Pacific Islander | 0 (0.0) | 1 (0.6) | |
Caucasian | 1 (0.5) | 0 (0.0) | |
Hispanic | 32 (15.2) | 25 (14.4) | |
Multi-Racial | 1 (0.5) | 2 (1.1) | |
Non-Hispanic | 156 (73.9) | 128 (73.6) | |
Unknown (Patient cannot or refuses to declare ethnicity) | 20 (9.5) | 15 (8.6) |
n=number, SD=Standard Deviation.
The threshold for statistical significance was p<0.05.
Univariate analyses of potential predictor variables with the outcome of glaucoma surgical intervention showed that cases and controls in this cohort were similar with respect to age, gender, BMI, smoking status, pulse, blood pressure, a range of co-morbidities (history of myocardial infarction, congestive heart failure, peripheral vascular disease, stroke, dementia, pulmonary disease, rheumatic disease, peptic ulcer disease, liver disease, diabetes, renal diseases, cancer, metastatic disease, HIV), most medication classes, and laboratory values (Table 2). Several factors were associated with need for glaucoma surgical intervention based on the univariate analyses. These included fewer days hospitalized, and not having been prescribed ophthalmic medications, non-opioid analgesics, anti-viral agents, or antidepressant medications during the study period (Table 2).
Table 2.
Patients without any glaucoma surgery (n=211) | Patients undergoing glaucoma surgery within 6 months of presentation (n=174) | P-Valuesa | |
---|---|---|---|
Vital Signs (Mean, SD) | |||
Systolic Blood Pressure (mmHg) | |||
Minimum recorded value | 118.75 (22.66) | 115.02 (17.12) | 0.074 |
Maximum recorded value | 152.14 (21.76) | 153.99 (19.71) | 0.385 |
Mean of recorded values | 134.47 (17.46) | 133.65 (14.58) | 0.624 |
SD of recorded values | 11.73 (5.36) | 12.92 (6.24) | 0.059 |
Diastolic Blood Pressure (mmHg) | |||
Minimum recorded value | 62.95 (11.96) | 61.99 (10.24) | 0.405 |
Maximum recorded value | 83.23 (12.65) | 84.08 (9.80) | 0.468 |
Mean of recorded values | 73.03 (9.41) | 73.10 (7.54) | 0.936 |
SD of recorded values | 7.06 (2.88) | 7.11 (2.87) | 0.861 |
Heart Rate (beats/minute) | |||
Minimum recorded value | 63.07 (10.91) | 63.04 (9.43) | 0.977 |
Maximum recorded value | 84.86 (16.30) | 86.37 (15.89) | 0.360 |
Mean of recorded values | 72.43 (9.81) | 73.47 (9.83) | 0.304 |
Body Mass Index (Mean, SD) | 27.86 (6.19) | 27.92 (5.71) | 0.922 |
Smoking Status (n, %) | 0.530 | ||
Current | 69 (32.7) | 51 (29.5) | |
Former | 15 (7.1)) | 9 (5.2) | |
Never | 127 (60.2) | 113 (65.3) | |
Co-morbid Diagnoses (n, %) | |||
Pulmonary disease | 44 (20.9) | 30 (17.2) | 0.444 |
Cancer | 25 (11.8) | 22 (12.6) | 0.936 |
Renal disease | 24 (11.4) | 18 (10.3) | 0.874 |
Peripheral vascular disease | 20 (9.5) | 8 (4.6) | 0.101 |
Congestive heart failure | 19 (9.0) | 11 (6.3) | 0.432 |
Diabetes mellitus | 18 (8.5) | 15 (8.6) | 1.000 |
Stroke | 17 (8.1) | 12 (6.9) | 0.814 |
Hospitalization Status | |||
Ever hospitalized (n, %) | 184 (87.2) | 149 (85.6) | 0.765 |
Number of days hospitalized (mean, SD) | 11.46 (22.9) | 7.11 (10.5) | 0.021 |
Prescribed Medicationsb (n, %) | |||
Ophthalmic | 128 (60.7) | 58 (33.3) | <0.001 |
Non-opioid analgesics | 35 (16.6) | 8 (4.6) | <0.001 |
Anti-viral | 21 (10.0) | 5 (2.9) | 0.011 |
Antidepressants | 38 (18.0) | 17 (9.8) | 0.031 |
Anti-hyperlipidemic | 61 (28.9) | 35 (20.1) | 0.062 |
Anti-hypertensive | 52 (24.6) | 34 (19.5) | 0.283 |
Dermatological | 52 (24.6) | 33 (19.0) | 0.225 |
Opioid analgesics | 41 (19.4) | 33 (19.0) | 1.000 |
Ulcer drugs | 46 (21.8) | 29 (16.7) | 0.256 |
Laxatives | 39 (18.5) | 28 (16.1) | 0.631 |
Beta blockers | 38 (18.0) | 26 (14.9) | 0.505 |
Diuretics | 32 (15.2) | 26 (14.9) | 1.000 |
Calcium blockers | 41 (19.4) | 24 (13.8) | 0.182 |
Anticonvulsants | 24 (11.4) | 21 (12.1) | 0.959 |
Anti-asthmatic | 28 (13.3) | 20 (11.5) | 0.711 |
Fluoroquinolones | 22 (10.4) | 19 (10.9) | 1.000 |
Anti-diabetic | 19 (9.0) | 18 (10.3) | 0.787 |
Corticosteroids | 21 (10.0) | 18 (10.3) | 1.000 |
Decongestants | 34 (16.1) | 16 (9.2) | 0.063 |
Anti-rheumatic | 29 (13.7) | 15 (8.6) | 0.158 |
Cold/cough | 20 (9.5) | 15 (8.6) | 0.910 |
Laboratory values (mean, SD) | |||
Sodium (mEq/L) | 139.40 (2.76) | 139.20 (2.51) | 0.565 |
Anion gap | 13.40 (1.73) | 13.54 (2.12) | 0.599 |
Creatinine (mg/dL) | 1.07 (0.80 | 1.10 (0.97) | 0.805 |
Hemoglobin (g/dL) | 13.07 (1.78) | 13.22 (1.70) | 0.513 |
HDL Cholesterol (mg/dL) | 59.56 (18.24) | 59.03 (20.03) | 0.850 |
Non-HDL Cholesterol (mg/dL) | 119.30 (31.50) | 129.69 (43.56) | 0.067 |
Triglycerides (mg/dL) | 121.48 (78.84) | 130.67 (72.11) | 0.410 |
A1c (%) | 6.00 (1.05) | 6.25 (1.36) | 0.172 |
Erythrocyte sedimentation rate (mm/hr) | 20.89 (22.27) | 21.84 (19.69) | 0.845 |
Lactate (mg/dL) | 4.17 (6.70) | 4.49 (5.83) | 0.890 |
n=number, SD=Standard Deviation, HDL=high density lipoprotein.
The threshold for statistical significance was p<0.05.
Medication categories based on mapping individual medication orders from the clinical data warehouse with RxNorm ontologies for pharmacologic classes.
Predictive Modeling
The overall performance of the three predictive models based on leave-one-out crossvalidation are shown in Table 3, and the AUC curves are depicted in Figure 2. Results for logistic regression are reported for the full model incorporating all predictor variables. The logistic regression model had the highest mean AUC at 0.67, followed closely by random forests and ANNs at 0.65. Logistic regression and ANNs were more sensitive than the random forests. However, random forests demonstrated better specificity than the other two models. All three methods had comparable accuracy, ranging from 0.60 (ANNs) to 0.62 (logistic regression and random forests). The logistic regression model had the highest Youden Index at 0.26 (Table 3). Additional results are described below.
Table 3.
Predictive Model | AUC | Sensitivity | Specificity | Accuracy | Youden Index |
---|---|---|---|---|---|
Multivariate Logistic Regression | 0.67 | 0.75 | 0.50 | 0.62 | 0.26 |
Random Forests | 0.65 | 0.55 | 0.68 | 0.62 | 0.24 |
Artificial Neural Networks | 0.65 | 0.71 | 0.51 | 0.60 | 0.22 |
1). Results of Multivariable Logistic Regression
For relative contribution of the various predictor variables, Table 4 lists the important coefficients in the model. Factors that were significantly protective against needing glaucoma surgery were greater number of days hospitalized (odds ratio [OR] 0.97, P=0.006), higher values for minimum systolic blood pressure (OR 0.92, P<0.001), and being prescribed ophthalmic medication (OR 0.28, P<0.001), non-opioid analgesic medication (OR 0.21, P=0.002), anti-hyperlipidemic medication (OR 0.39, P=0.004), macrolide antibiotics (OR 0.40, P=0.034), or calcium blockers (Or 0.43, P=0.025). Factors associated with significantly increased risk of glaucoma surgery was higher mean systolic blood pressure (OR 1.09, P<0.001) and use of anticoagulant medication (OR 2.75, p=0.042).
Table 4.
Variable | Adjusted Odds Ratio (95% Confidence Interval) | P-Valuea |
---|---|---|
Ophthalmic medication | 0.28 (0.17, 0.46) | <0.001 |
Minimum systolic blood pressure | 0.92 (0.89, 0.95) | <0.001 |
Mean systolic blood pressure | 1.09 (1.06, 1.13) | <0.001 |
Non-opioid analgesic medication | 0.21 (0.07, 0.52) | 0.002 |
Anti-hyperlipidemic medication | 0.39 (0.21, 0.73) | 0.004 |
Number of days hospitalized | 0.97 (0.94, 0.99) | 0.006 |
Calcium blocker medication | 0.43 (0.21, 0.89) | 0.025 |
Macrolide antibiotic medication | 0.40 (0.17, 0.93) | 0.034 |
Anticoagulant medication | 2.75 (1.05, 7.46) | 0.042 |
Male gender | 1.52 (0.94, 2.47) | 0.089 |
Cold/cough medication | 2.22 (0.83, 6.06) | 0.115 |
Minimum diastolic blood pressure | 0.98 (0.95, 1.01) | 0.117 |
Dementia | 0.26 (0.04, 1.38) | 0.141 |
Antidepressant medication | 0.56 (0.50, 1.21) | 0.143 |
Metastatic disease | 0.31 (0.06, 1.43) | 0.149 |
The threshold for statistical significance was p<0.05.
2). Results of Random Forests
The top variables of importance for predicting need for glaucoma surgery, determined using MDA and MDI, are shown in Figures 3A and 3B, respectively. The five features with the greatest MDA were ophthalmic medications, minimum systolic blood pressure, number of days hospitalized, maximum diastolic blood pressure, and non-opioid analgesic medication use (Figure 3A). There were twelve features associated with relatively larger MDI: systolic blood pressure (minimum, maximum, mean), diastolic blood pressure (minimum, maximum, mean), heart rate (minimum, maximum, mean), age, number of days hospitalized, and ophthalmic medications (Figure 3B).
3). Results of Artificial Neural Networks
We found the best-performing model to be a two-layer neural network with one hidden layer of 5 nodes and an output layer with 1 node. Although more sensitive than random forests, for all other measures of predictive performance the neural networks did not yield superior results compared with logistic regression or random forests.
DISCUSSION
In this study, we developed and compared machine learning models to predict the need for glaucoma surgical intervention within six months for patients with POAG based on their existing systemic data in the EHR. The rationale for this was rooted in increasing evidence that systemic conditions and medications have a role in glaucoma pathophysiology.5,7,16,32 This may be important in understanding why some patients experience glaucoma progression leading to debilitating visual impairment, despite seemingly adequate control of IOP.
We generated predictive models using statistical and machine learning methods trained with systemic data from the EHR in the absence of eye-specific data such as visual acuity, IOP, and structural and functional testing such as optical coherence tomography images or visual field test results, which are the conventional data points ophthalmologists use to determine whether a patient will need surgery. By doing so, we used a data-driven approach that could encompass a wide range of potential predictor variables and additionally quantify the impact of systemic factors on glaucoma progression. The performance of our models supports the hypothesis that systemic data captured in the EHR during routine clinical care has some predictive value.
The statistical and machine learning approaches employed here demonstrated comparable predictive performance overall; neural networks did not yield superior results. Furthermore, both logistic regression and random forests offered interpretable results. The logistic regression equation clearly delineated coefficients and odds ratios that described the relative weights of various clinical features in making the prediction, while we used MDA and MDI to assess variable importance in random forests. This interpretability allowed identification of features with the most predictive value and that may represent future areas of investigation to better understand pathophysiology or to develop new treatments. In this way, predictive modeling may provide a tool to explore data captured in the EHR to generate hypotheses for future studies. Although artificial neural networks have been previously employed to perform highly accurate clinical predictions based on EHR data,33 one of their key limitations is that the underlying process driving their predictions is opaque.34 Hence, their “black box” nature may limit their usefulness in driving future clinical research, although various approaches to overcome this limitation exist. In this study, the artificial neural networks did not demonstrate superior performance. This may be due to the relatively small number of samples and high number of parameters that needed to be estimated.
Our models provided further support for some of the findings from prior studies examining the relationship between hypertension and glaucoma. In our logistic regression model, higher mean systolic BPs (e.g. chronic hypertension) were associated with increased risk of needing glaucoma surgery. Interestingly, our logistic regression model revealed that lower values for the minimum recorded systolic BP (e.g. episodes of relative hypotension) also were associated with increased risk of needing surgery. This supports previously reported studies that have associated hypotension with glaucoma progression.10–12,14 Measures related to systolic BPs (e.g. minimum, maximum, mean) also emerged as important variables in driving the random forests classification. Diastolic BP has also been linked with IOP6 and is a critical component of the calculation of mean ocular perfusion pressure,35 but its significance in glaucoma progression is not well understood. Similarly, diastolic BP did not have any significant predictive value in our logistic regression model, but measures related to diastolic BP did emerge as variables of importance in the random forest classification.
Several classes of medications emerged as having predictive value in our models. In our logistic regression model, patients who had been prescribed ophthalmic medications, non-opioid analgesics, anti-hyperlipidemic medications, macrolide antibiotics, and calcium blockers were significantly less likely to need glaucoma surgery. The random forests classification also identified ophthalmic medications and non-opioid analgesics as variables of importance. The fact that use of ophthalmic medications would have predictive value is unsurprising. However, the finding that these other classes of medications (non-opioid analgesics, anti-hyperlipidemic medications, macrolide antibiotics, and calcium blockers) helped successfully predict patients who did not need glaucoma surgery could support further investigation of possible new therapeutic targets in these drug classes. Of note, the non-opioid analgesics category included a variety of formulations of acetaminophen and aspirin. Outside of aspirin, the RxNorm codes categorized other non-steroidal anti-inflammatory drugs (NSAIDs) as “anti-rheumatic” medications, which did not exhibit significant predictive value in our models.
The potential effects of some of these medication classes on glaucoma progression have been explored in prior studies, with mixed results. For non-opioid analgesics, aspirin has been hypothesized to have potential applications in glaucoma by increasing blood flow to the optic nerve, acting as a neuroprotective agent to prevent retinal ganglion cell death, and regulating intraocular pressure via upregulating prostaglandin receptors.36 However, a retrospective analysis showed that aspirin use had no association with progression of optic nerve parameters in POAG suspects,37 and another analysis showed that aspirin use was actually associated with optic disc hemorrhages,38 which have been linked with glaucoma progression.39–41 The case for anti-hyperlipidemic medications such as statin therapies is more compelling, with several retrospective studies demonstrating an association between statin use and decreased risk of developing POAG and decreased risk of POAG progression.42–45 The finding of macrolide antibiotics having a potentially protective effect may provide some support for the hypothesis that the bacterium Helicobacter pylori (H. pylori) plays a causative role in glaucoma.46–48 While some studies have suggested that eradication of H. pylori with antibiotic therapy may positively influence glaucoma parameters,49 others have not demonstrated a clear beneficial effect.46,50,51 Our findings support the need for further study of possible therapeutic effects of systemic medications in glaucoma. Generating definitive evidence will require large, well-powered prospective clinical trials in the future.
Interestingly, calcium blockers had a protective effect in our analysis but were associated with increased risk of POAG in a different study using claims data.16 These conflicting results stem from one of the key limitations of both EHR and claims data, which is that they do not capture real-world medication use. Having a medication order in the EHR indicates that the treating physician recommended the patient take the medication but does not demonstrate whether the patient filled the prescription. While claims data may capture whether the pharmacy dispensed the medication, they still do not illustrate whether the patient subsequently took the medication as prescribed at home. Efforts to effectively link EHR data with claims data regarding medication use are ongoing.52 Given that medication adherence in glaucoma remains a key challenge,53 incorporating data that more directly represent patients’ medication use, such as using data from personal health records or even medication adherence sensors, will likely strengthen predictive models in the future.
We used glaucoma surgery as a proxy for glaucoma progression because it was a clearly defined event in the EHR, thereby reducing the risk of misclassification. In addition, the exact definition of glaucoma progression (and even of glaucoma itself) is not necessarily uniform across clinical research studies.53 However, our analysis demonstrated that need for glaucoma surgery was not a perfect surrogate for disease progression. For example, more days hospitalized were found to be protective against needing surgery. This was likely due to the fact that patients with illnesses requiring prolonged hospitalizations are usually not ideal candidates for elective surgery, rather than hospitalization itself actually being protective against glaucoma progression.
Despite the limitations of working with data obtained from real-world clinical practice, our analysis demonstrated the potential value of machine learning models using clinical features captured in the EHR to identify patients at greater risk of glaucoma progression. Evaluation of systemic data in the EHR is likely an underutilized resource as ophthalmologists tend to be high-volume providers with little time spent during each individual patient encounter.54,55 Based on national data collected in the Signal efficiency portal by the Epic UserWeb, as of December 2018, the median time in clinical chart review per appointment among ophthalmologists nationwide using Epic was 0.8 minutes (25th percentile 0.5 minutes, 75th percentile 1.1 minutes).56 In conjunction with the complexity of the EHR and this kind of time pressure, thorough and detailed chart review of systemic data is unlikely to be performed by ophthalmologists. Therefore, another potential application of EHR-based predictive models would be to extract relevant systemic data and assist clinicians in real-time to efficiently estimate the risk of disease progression in individual patients. This may help ophthalmologists advise patients and their families regarding prognosis and also help them make decisions about appropriate follow-up intervals in order to re-evaluate patients before progression occurs. With automated risk prediction within the EHR incorporating each individual patient’s systemic data, ophthalmologists would be better equipped to deliver precision management and reduce patients’ risk of irreversible blindness from glaucoma.
In conclusion, systemic data in the EHR offer some predictive value in classifying patients at risk of glaucoma progression (indicated by need for glaucoma surgery within six months) even in the absence of eye-specific endpoints. These predictive models are hypothesis-generating by identifying conditions and medications that may serve as novel therapeutic targets for future investigation. Although real-world data present some limitations, ultimately this type of predictive modeling has the potential to facilitate automated risk prediction within the EHR, which would help ophthalmologists more efficiently review systemic data and incorporate this information into their clinical decision-making for glaucoma patients.
Supplementary Material
ACKNOWLEDGMENTS/DISCLOSURES
a. Funding/Support: This project was supported by the National Institutes of Health (NIH) grants T15LM011271, K99/R00HG009680, P30EY022589, and R01 EY029058, the Heed Ophthalmic Foundation Fellowship (San Francisco, CA), and an unrestricted grant from Research to Prevent Blindness (New York, NY). The project was supported in part by the NIH (Bethesda, MD) grant U54HL108460, and by Grant UL RR031980 (for years 1 & 2 of CTSA funding) and/or UL1TR000100 (during year 3 and beyond of CTSA funding).
b. Financial Disclosures: RNW is a consultant for Aerie Pharmaceuticals, Allergan, Eyenovia, and Implantdata; has funding from Heidelberg Engineering, Carl Zeiss Meditec, Genentech, Konan, Optovue, Topcon, Optos, Centervue and Bausch&Lomb; and receives patent royalty payments from Toromedes and Meditec-Zeiss.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Supplemental Material available at AJO.com.
REFERENCES
- 1.Tham Y-C, Li X, Wong TY, Quigley HA, Aung T, Cheng C-Y. Global prevalence of glaucoma and projections of glaucoma burden through 2040: a systematic review and meta-analysis. Ophthalmology. 2014;121(11):2081–2090. [DOI] [PubMed] [Google Scholar]
- 2.Kass MA, Heuer DK, Higginbotham EJ, et al. The Ocular Hypertension Treatment Study: A Randomized Trial Determines That Topical Ocular Hypotensive Medication Delays or Prevents the Onset of Primary Open-Angle Glaucoma. Arch Ophthalmol. 2002;120(6):701–713. [DOI] [PubMed] [Google Scholar]
- 3.Heijl A, Leske MC, Bengtsson B, et al. Reduction of intraocular pressure and glaucoma progression: results from the Early Manifest Glaucoma Trial. Arch Ophthalmol. 2002; 120(10):1268–1279. [DOI] [PubMed] [Google Scholar]
- 4.Comparison of glaucomatous progression between untreated patients with normal-tension glaucoma and patients with therapeutically reduced intraocular pressures. Collaborative Normal-Tension Glaucoma Study Group. Am J Ophthalmol. 1998; 126(4):487–497. [DOI] [PubMed] [Google Scholar]
- 5.Deokule S, Weinreb RN. Relationships among systemic blood pressure, intraocular pressure, and open-angle glaucoma. Canadian Journal of Ophthalmology. 2008;43(3):302–307. [DOI] [PubMed] [Google Scholar]
- 6.Dielemans I, Vingerling JR, Algra D, Hofman A, Grobbee DE, de Jong PTVM. Primary Open-angle Glaucoma, Intraocular Pressure, and Systemic Blood Pressure in the General Elderly Population. Ophthalmology. 1995;102(1):54–60. [DOI] [PubMed] [Google Scholar]
- 7.Bonomi L Vascular risk factors for primary open angle glaucoma The Egna-Neumarkt Study. Ophthalmology. 2000;107(7):1287–1293. [DOI] [PubMed] [Google Scholar]
- 8.Mitchell P, Lee AJ, Rochtchina E, Wang JJ. Open-angle glaucoma and systemic hypertension: the blue mountains eye study. J Glaucoma. 2004;13(4):319–326. [DOI] [PubMed] [Google Scholar]
- 9.Leske MC, Wu S-Y, Hennis A, Honkanen R, Nemesure B, BESs Study Group. Risk factors for incident open-angle glaucoma: the Barbados Eye Studies. Ophthalmology. 2008; 115(1 ):85–93. [DOI] [PubMed] [Google Scholar]
- 10.Krasmska B, Karolczak-Kulesza M, Krasmski Z, et al. A marked fall in nocturnal blood pressure is associated with the stage of primary open-angle glaucoma in patients with arterial hypertension. Blood Pressure. 2011. ;20(3):171–181. [DOI] [PubMed] [Google Scholar]
- 11.Hayreh SS, Zimmerman MB, Podhajsky P, Alward WLM. Nocturnal Arterial Hypotension and Its Role in Optic Nerve Head and Ocular Ischemic Disorders. American Journal of Ophthalmology. 1994;117(5):603–624. [DOI] [PubMed] [Google Scholar]
- 12.Bechetoille A, Bresson-Dumont H. Diurnal and nocturnal blood pressure drops in patients with focal ischemic glaucoma. Graefes Arch Clin Exp Ophthalmol. 1994;232(11):675–679. [DOI] [PubMed] [Google Scholar]
- 13.Graham SL, Drance SM. Nocturnal Hypotension: Role in Glaucoma Progression. Surv Ophthalmol. 1999;43 Suppl 1:S10–16. [DOI] [PubMed] [Google Scholar]
- 14.Melgarejo JD, Lee JH, Petitto M, et al. Glaucomatous Optic Neuropathy Associated with Nocturnal Dip in Blood Pressure: Findings from the Maracaibo Aging Study. Ophthalmology. 2018;125(6):807–814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kaiser HJ, Flammer J, Graf T, StQmpfig D. Systemic blood pressure in glaucoma patients. Graefe’s Arch Clin Exp Ophthalmol. 1993;231(12):677–680. [DOI] [PubMed] [Google Scholar]
- 16.Zheng W, Dryja TP, Wei Z, et al. Systemic Medication Associations with Presumed Advanced or Uncontrolled Primary Open-Angle Glaucoma. Ophthalmology. 2018; 125(7):984–993. [DOI] [PubMed] [Google Scholar]
- 17.Cmarkymark. cmarkymark/Baxter_Marks_Kuo_Ohno-Machado_Weinreb: Manuscript Markdown. February 2019. doi: 10.5281/zenodo.2573348 [DOI]
- 18.RxNorm. https://www.nlm.nih.gov/research/umls/rxnorm/. Accessed December 11, 2018.
- 19.Wasey JO, scores) WM (Van W, Codes) AO (Hierarchical C, CCS) VD (AHRQ, format) EL (explain codes in table. Icd: Comorbidity Calculations and Tools for ICD-9 and ICD-10 Codes.; 2018. https://CRAN.R-project.org/package=icd. Accessed December 11, 2018.
- 20.Ng A, Jordan MI. On discriminative versus generative classifiers: a comparison of logistic regression and naive Bayes In: Advances in Neural Information Processing Systems. Vol 14 Cambridge, MA: MIT Press; 2001:841–848. [Google Scholar]
- 21.Hastie T, Pregibon D. Generalized linear models In: Statistical Models. Wadworth & Brooks/Cole; 1992. [Google Scholar]
- 22.Sanchez-Pinto LN, Venable LR, Fahrenbach J, Churpek MM. Comparison of variable selection methods for clinical predictive modeling. Int J Med Inform. 2018;116:10–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Breiman L Random Forests. Machine Learning. 2001;45:5–32. [Google Scholar]
- 24.Dankowski T, Ziegler A. Calibrating random forests for probability estimation. Stat Med. 2016;35(22):3949–3960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Nembrini S, Konig IR, Wright MN. The revival of the Gini importance? Bioinformatics. 2018;34(21):3711–3718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Liaw A, Wiener M. Classification and Regression by randomForest. R News. 2002;2(3):18–22. [Google Scholar]
- 27.nnet package | R Documentation. https://www.rdocumentation.org/packages/nnet/versions/7.3-12. Accessed June 14, 2019.
- 28.Nawi N, Ransing M, Ransing R. An Improved Learning Algorithm Based on The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method For Back Propagation Neural Networks. In: Sixth International Conference on Intelligent Systems Design and Applications Vol 1 Jian, China: IEEE; 2006:152–157. [Google Scholar]
- 29.Miller RG. The jackknife-a review. Biometrika. 1974;61 (1): 1–15. [Google Scholar]
- 30.Gronau QF, Wagenmakers E-J. Limitations of Bayesian Leave-One-Out Cross-Validation for Model Selection. Comput Brain Behav. 2019;2(1): 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.James G, Witten D, Hastie T, Tibshirani R, eds. An Introduction to Statistical Learning: With Applications in R. New York: Springer; 2013. [Google Scholar]
- 32.De Moraes CG, Cioffi GA, Weinreb RN, Liebmann JM. New Recommendations for the Treatment of Systemic Hypertension and their Potential Implications for Glaucoma Management. J Glaucoma. 2018;27(7):567–571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rajkomar A, Oren E, Chen K, et al. Scalable and accurate deep learning with electronic health records. npj Digital Medicine. 2018; 1(1):18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Sussillo D, Barak O. Opening the black box: low-dimensional dynamics in high-dimensional recurrent neural networks. Neural Comput. 2013;25(3):626–649. [DOI] [PubMed] [Google Scholar]
- 35.Keer KV, Breda JB, Pinto LA, Stalmans I, Vandewalle E. Estimating Mean Ocular Perfusion Pressure Using Mean Arterial Pressure and Intraocular Pressure. Invest Ophthalmol Vis Sci. 2016;57(4):2260–2260. [DOI] [PubMed] [Google Scholar]
- 36.Attarzadeh A, Hosseini H, Nowroozizadeh S. Therapeutic potentials of aspirin in glaucomatous optic neuropathy. Medical Hypotheses. 2006;67(2):375–377. [DOI] [PubMed] [Google Scholar]
- 37.Castro DKD, Punjabi OS, Bostrom AG, et al. Effect of statin drugs and aspirin on progression in open-angle glaucoma suspects using confocal scanning laser ophthalmoscopy. Clinical & Experimental Ophthalmology. 2007;35(6):506–513. [DOI] [PubMed] [Google Scholar]
- 38.Soares AS, Artes PH, Andreou P, Leblanc RP, Chauhan BC, Nicolela MT. Factors associated with optic disc hemorrhages in glaucoma. Ophthalmology. 2004; 111 (9): 1653–1657. [DOI] [PubMed] [Google Scholar]
- 39.Drance SM. Disc hemorrhages in the glaucomas. Survey of Ophthalmology. 1989;33(5):331–337. [DOI] [PubMed] [Google Scholar]
- 40.Rasker MT. Deterioration of Visual Fields in Patients With Glaucoma With and Without Optic Disc Hemorrhages. Arch Ophthalmol. 1997;115(10):1257. [DOI] [PubMed] [Google Scholar]
- 41.Leske MC. Factors for Glaucoma Progression and the Effect of Treatment: The Early Manifest Glaucoma Trial. Arch Ophthalmol. 2003;121(1):48. [DOI] [PubMed] [Google Scholar]
- 42.Kang JH, Boumenna T, Stein JD, et al. Association of Statin Use and High Serum Cholesterol Levels With Risk of Primary Open-Angle Glaucoma. JAMA Ophthalmol. May 2019. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 43.Whigham B, Oddone EZ, Woolson S, et al. The influence of oral statin medications on progression of glaucomatous visual field loss: A propensity score analysis. Ophthalmic Epidemiol. 2018;25(3):207–214. [DOI] [PubMed] [Google Scholar]
- 44.Talwar N, Musch DC, Stein JD. Association of Daily Dosage and Type of Statin Agent With Risk of Open-Angle Glaucoma. JAMA Ophthalmol. 2017;135(3):263–267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Stein JD, Newman-Casey PA, Talwar N, Nan B, Richards JE, Musch DC. The Relationship Between Statin Use and Open-Angle Glaucoma. Ophthalmology. 2012;119(10):2074–2081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Zeng J, Liu H, Liu X, Ding C. The Relationship Between Helicobacter pylori Infection and Open-Angle Glaucoma: A Meta-Analysis. Invest Ophthalmol Vis Sci. 2015;56(9):5238–5245. [DOI] [PubMed] [Google Scholar]
- 47.Kountouras J, Zavos C, Zeglinas C, Polyzos SA, Katsinelos P. Helicobacter pylori— Related Impact on Glaucoma Pathophysiology. Invest Ophthalmol Vis Sci. 2015;56(13):8029–8030. [DOI] [PubMed] [Google Scholar]
- 48.Tsolaki F, Kountouras J, Topouzis F, Tsolaki M. Helicobacter pylori infection, dementia and primary open-angle glaucoma: are they connected? BMC Ophthalmol. 2015;15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Kountouras J, Mylopoulos N, Chatzopoulos D, et al. Eradication of Helicobacter pylori May Be Beneficial in the Management of Chronic Open-Angle Glaucoma. Arch Intern Med. 2002;162(11):1237–1244. [DOI] [PubMed] [Google Scholar]
- 50.Chen H-Y, Lin C-L, Chen W-C, Kao C-H. Does Helicobacter pylori Eradication Reduce the Risk of Open Angle Glaucoma in Patients With Peptic Ulcer Disease? Medicine (Baltimore). 2015;94(39). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Noche CD, Njajou O, Etoa FX. No Association between CagA- and VacA-Positive Strains of Helicobacter pylori and Primary Open-Angle Glaucoma: A Case-Control Study. Ophthalmol Eye Dis. 2016;8:1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Hoopes M, Angier H, Raynor LA, et al. Development of an algorithm to link electronic health record prescriptions with pharmacy dispense claims. J Am Med Inform Assoc. 2018;25(10): 1322–1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Quigley HA. 21st century glaucoma care. Eye (Lond). October 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Chiang MF, Boland MV, Brewer A, et al. Special requirements for electronic health record systems in ophthalmology. Ophthalmology. 2011;118(8):1681–1687. [DOI] [PubMed] [Google Scholar]
- 55.Read-Brown S, Hribar MR, Reznick LG, et al. Time Requirements for Electronic Health Record Use in an Academic Ophthalmology Center. JAMA Ophthalmol. 2017;135(11):1250–1257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Signal - Efficiency Portal. https://signal.epic.com/?targetID=3216275&navMode=1&metricID=18&subMetricID=0&pageType=1. Accessed November 4, 2018.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.