Abstract
Background
Patients with atrial fibrillation (AF) usually have a heterogeneous co‐morbid history, with dynamic changes in risk factors impacting on multiple adverse outcomes. We investigated a large prospective cohort of patients with multimorbidity, using a machine‐learning approach, accounting for the dynamic nature of comorbidity risks and incident AF.
Methods
Using machine‐learning, we studied a prospective US cohort using medical/pharmacy databases of 1 091 911 patients, with an incident AF cohort of 14 078 and non‐AF cohort of 1 077 833 enrolled in the 4‐year study. Five incident clinical outcomes (heart failure, stroke, myocardial infarction, major bleeding, and cognitive impairment) were examined in relationship to AF status (AF vs non‐AF), diverse multi‐morbid (conditions and medications) history, and demographic parameters (age and gender), with supervised machine‐learning techniques.
Results
Complex inter‐relationships of various comorbidities were uncovered for AF cases, leading to 6‐fold higher risk of heart failure relative to the non‐AF cohort (OR 6.02, 95% CI 5.72‐6.33), followed by myocardial infarction (OR=2.68), stroke (OR=2.19), and major bleeding (OR=1.36). Supervised machine learning algorithms on the original populations yielded comparable results for both neural network and logistic regression algorithms in terms of discriminant validity, with c‐indexes for incident adverse outcomes: heart failure (0.924, 95%CI 0.923‐0.925), stroke (0.871, 95%CI 0.869‐0.873), myocardial infarction (0.901, 95% CI 0.899‐0.903), major bleeding (0.700, 95%CI 0.697‐0.703), and cognitive impairment (0.919, 95% CI 0.9170.921). External calibration of all models demonstrated a good fit between the predicted probabilities and observed events. Decision curve analysis demonstrated that the obtained models were much more clinically useful than the “treat all” strategy.
Conclusions
Complex multimorbidity relationships uncovered using a machine learning approach for incident AF cases have major consequences for integrated care management, with implications for risk stratification and adverse clinical outcomes. This approach may facilitate automated approaches in the presence of multimorbidity, potentially helping decision making.
Keywords: atrial fibrillation, cognitive impairment, congestive heart failure, machine learning, major bleeding, myocardial infarction, stroke
External calibration for five predictive models (a – heart failure; b – stroke; c – myocardial infarction; d – major bleeding; e – cognitive impairment).
1. INTRODUCTION
Atrial fibrillation (AF) is associated with a heterogeneous set of medical conditions and multiple medications. Risks for AF‐related cardiovascular and non‐cardiovascular events and mortality are not static, but are changed by aging and incident risk factors, as well as different medications. Indeed, AF management can be regarded as a truly complex system with non‐linear inter‐relationships among its inputs and outputs over time that is essential to optimize AF care.
AF has a significant effect on adverse clinical outcomes (stroke and others) with reference to non‐AF cohorts, and the published literature points in this direction for certain outcomes such as heart failure or myocardial infarction. 1 , 2 , 3 Currently, AF complexity is examined in unidimensional planes such as the effects of a linear set of risk factors on stroke or bleeding. 4 , 5 , 6 , 7 AF heterogeneity has previously been examined with a finite static number of clusters on the input side with respect to their effects on clinical outcomes in primarily AF cohorts. 8 , 9 Indeed, there has never been a comprehensive study that has examined the impact of AF on incident adverse clinical outcomes with a heterogeneous co‐morbid history, using a machine learning approach to account for the dynamic changes in risk factors and the impact on multiple outcomes.
In this study, we hypothesized that while complications of AF may have common risk factors, there could be different risk factors common to ≥2 complications, as well as unique factors to each type of complication. Second, we hypothesized that machine learning may facilitate dynamic risk stratification, where multimorbidity is present and decision making on integrated or holistic management would be needed.
To test these hypotheses, we first compared using main effect models an incident AF cohort to a non‐AF cohort in a large contemporary multimorbid patient population, accounting for the dynamic nature of comorbidity risks, focused on five incident clinical outcomes: heart failure, stroke, myocardial infarction, major bleeding, and cognitive impairment. Second, we examined the complex clustering relationships among prior co‐morbid condition/medication history and demographic variables in an incident AF cohort relative to a non‐AF cohort, and the five clinical outcomes above, using supervised machine learning algorithms (ie, neural network and logistic regression) to account for the dynamic and non‐linear changes in risk profile and aging.
2. METHODS
The study population represented two health plans: Commercial (working population <65 years and their families) based on private healthcare insurance and Medicare (elderly population ≥65 years and individuals with disability including those below 65 years) financed by the US government and managed by an independent healthcare organization. The Medicare health plan consisted of Medicare Advantage and Medicare/Medicaid Plan participants. The AF cohort consisted of 14 078 persons, with mean age of 68.1 (SD 14.4) years and 52.9% males. The non‐AF cohort included 1 077 833 persons, with mean age of 48.3 (SD 15.8) and 46.7% males.
The derivation of the AF and non‐AF cohorts in part 1 of the study is depicted in Figure 1 for two health plans. Each participant was enrolled in both medical and pharmacy benefits over a 4‐year period (1/1/2016‐12/31/2019). An incident AF case was defined as having no AF ICD 10 code (I480, I481, I4811, I4819, I482, I4820, I4821, I483, I484, I489, I4891, I4892) in the medical database in the first 24 months, together with no history of anticoagulants (ie, warfarin and direct oral anticoagulants) and/or rhythm control medications in the pharmacy database (see Table S1 for NDC codes). The exceptions for using anticoagulant/rhythm control medications as a proxy for AF (Table S2) were applied as reported by Tu et al 10 An incident AF case was defined as a patient with a clean history of no prior cases of AF in the first 24‐month period as defined by Piccini et al 11 and no prior history of anticoagulant and rhythm control medications. 10 The index date was set based on the AF ICD 10 diagnosis in the last 24 months of enrollment.
2.1. Parameter identification
The date of the first medical claim qualifying a patient for an incident AF case (as explained above) was marked as the index date. The incidence of any adverse clinical outcome (ie, heart failure, stroke, myocardial infarction, major bleeding or cognitive impairment) was identified as the first case, after the index date by at least 30 days until the end of the study period (Dec 31, 2019) (see Table S3 for the definition of outcomes). Patients were censored for each of the five adverse clinical outcomes.
The list of comorbid conditions was identified starting from January 1, 2016, to the date prior to the index date. The clinical outcomes and baseline comorbid conditions were identified from medical claims using primary and/or secondary diagnoses (see Table S3 for ICD 10 codes). The prior history of co‐morbid conditions included heart failure, hypertension, diabetes, ischemic stroke, transient ischemic attack, thromboembolic events, myocardial infarction, peripheral artery disease, valvular disease, coronary artery disease, chronic sleep apnea, chronic kidney disease, chronic obstructive pulmonary disease/bronchiectasis, major bleeding (eg, intracranial and gastro‐intestinal), cognitive impairment, liver disease, anemia, depression, lipid disorder, spondylosis/intervertebral discs, and osteoarthritis. The comorbid history/outcomes are defined in terms of the ICD10 codes in supplemental Table S3 using a combination of mostly medical claims as well as pharmacy claims as defined in the Methods section (see also Tables S1 and S2).
2.2. Statistical analyses
In the exploratory aspect of the study (first aim), relationships were assessed for each of the five clinical outcomes using the main effects of AF status (present vs absent) and prior history of co‐morbid conditions using logistic regression analyses. The outcome and input variables had binary representations. The stepwise procedure of SAS software 12 was used to establish the final models with significant terms. The odds ratios with the 95% CI were reported for the model main effects for each outcome variable, together with the significance level.
A logistic regression analysis was conducted with AF group status as an outcome and the prior history of co‐morbid conditions and demographic variables as the potential input variables. This was performed to examine the heterogeneity of conditions feeding into the incident AF and non‐AF cohorts. The odds ratios and 95% CI were reported for each significant main effect together with the significance level. The clinical outcomes were analyzed in terms of incidence rates in new cases/100 person‐years as a function of age group (18‐44, 45‐54, 55‐64, 65‐74, ≥75 years), gender (male, female), and overall population. This was performed for the incident AF and non‐AF cohorts.
Second, the complex relationships were examined in the AF cohort between each of the five clinical outcomes and the multitude of input parameters including prior clinical history at baseline (both medications and co‐morbid conditions), demographic variables, and AF group status. Two supervised machine learning algorithms were used to examine these relationships including the parametric methods of neural network and logistic regression.
The machine learning‐based logistic regression algorithm included main effects, interaction terms, and polynomial effects, with the model selection based on the stepwise method. Only quadratic terms were included in the polynomial formulation to ensure proper conversion in a timely fashion of the optimization algorithm for logistic regression from a numerical analysis perspective, given the sheer volume of sample size, large number of multi‐morbid conditions, and polypharmacy as well as the consideration of different types of interactions and polynomial terms. The neural network used a multilayer perceptron which consists of a feed‐forward multilayer network architecture composed of several layers of neurons, an input layer, an output layer, and five hidden layers. The details of the network properties are outlined in Table S4. For each outcome variable, the presence of AF as a disease was modeled as a status variable that is interacting with the history of comorbid conditions and medications in a dynamic fashion. That is, the presence of AF (once confirmed) had a major prognostic impact, interacting with the prior co‐morbid history to induce as theorized significantly higher adverse outcomes relative to non‐AF. Age was simultaneously analyzed in two ways, both categorical (> 75 years, 65‐74 years, 55‐64 years, 45‐54 years, and 18‐44 years) and continuous.
Each model was trained on 67% of the data, with the remaining 33% data used for external validation. In this respect, the development and validation samples were extracted at random with respect to each outcome variable, that is, the randomized samples were different for each outcome variable. Discriminant validity was assessed using C‐indexes (area under the curve) for both the development and validation samples, separately. Model calibration was evaluated to determine if there is a good agreement between the observed values and predicted probabilities. In addition, the clinical utility of each of the five models was assessed using decision curve analysis. The latter allows one to assess the net true positive detected by the model after accounting for the false positives of the prediction model. Moreover, it makes it possible to assess the model performance relative to the treatment of all patients (ie, both true and false positives).
3. RESULTS
We studied a large prospective US cohort from both medical and pharmacy databases of 1 091 911 patients, representing 14 078 incident AF cohort and 1 077 833 non‐AF cohort, contributing 4 367 644 person‐years of enrollment in the 4‐year study. Baseline characteristics are shown in Table 1. The average age for the AF cohort (68.1) was 16 years older than the non‐AF cohort (48.3) (P <.0001). The proportion of male patients in the AF cohort (52.9%) was higher than the non‐AF cohort (46.3%). An analysis of the heterogeneous prior co‐morbid history and demographic variables between the AF and non‐AF cohorts is presented in Table 2. As expected, the AF cohort was older, more commonly males, and had a greater prevalence of comorbidities than the non‐AF cohort.
TABLE 1.
Baseline characteristic | AF | non‐AF |
---|---|---|
Age group (years) | ||
<45 | 926 (6.6) | 447 092 (41.5) |
45‐54 | 1873 (13.3) | 289 799 (26.9) |
55‐64 | 2154 (15.3) | 166 108 (15.4) |
65‐74 | 3976 (28.2) | 109 334 (10.1) |
>75 | 5149 (36.6) | 65 500 (6.1) |
Age mean (SD) | 68.1 (14.4) | 48.3 (15.8) |
Gender | ||
Male | 7449 (52.9) | 503 787 (46.7) |
Female | 6629 (47.1) | 574 046 (53.3) |
Overall cohort | 14 078 (100.0) | 1 077 833 (100.0) |
Co‐morbid history | ||
Heart failure | 1655 (11.8) | 15 539 (1.4) |
Hypertension | 10 083 (71.6) | 327 318 (30.4) |
Diabetes mellitus | 2732 (19.4) | 67 365 (6.3) |
Ischemic stroke | 636 (4.5) | 9581 (0.9) |
Transient ischemic attack | 531 (3.8) | 8851 (0.8) |
Thrombo‐embolic events | 100 (0.7) | 1663 (0.2) |
Myocardial infarction | 845 (6.0) | 12 348 (1.1) |
Peripheral artery disease | 1498 (10.6) | 22 505 (2.1) |
Valvular disease | 2323 (16.5) | 36 606 (3.4) |
Coronary artery disease | 3440 (24.4) | 49 248 (4.6) |
Obstructive sleep apnea | 421 (3.0) | 16 870 (1.6) |
Chronic kidney disease | 1918 (13.6) | 30 149 (2.8) |
Chronic obstructive pulmonary disease & bronchiectasis | 2774 (19.7) | 72 693 (6.7) |
Major bleeding | 1237 (8.8) | 43 586 (4.0) |
Cognitive impairment | 543 (3.9) | 8480 (0.8) |
Liver disease | 1355 (9.6) | 60 235 (5.6) |
Anemia | 2978 (21.2) | 88 556 (8.2) |
Lipid disorders | 8971 (63.7) | 343 046 (31.8) |
Depression | 1572 (11.2) | 90 325 (8.4) |
Spondylosis and intervertebral discs | 5252 (37.3) | 283 732 (26.3) |
Osteoarthritis | 4269 (30.3) | 128 219 (11.9) |
TABLE 2.
Baseline characteristic | Odds ratio (95% Cl) | Significance level |
---|---|---|
Age groups | ||
>75 years vs <45 years | 19.37 (17.88‐20.98) | <.0001 |
65‐74 years vs <45 years | 10.28 (9.50‐11.12) | <.0001 |
55‐64 years vs <45 years | 4.69 (4.32‐5.07) | 0.301 |
45‐54 years vs <45 years | 2.68 (2.47‐2.90) | <.0001 |
Gender (1 vs 0) | 0.65 (0.63‐0.67) | <.0001 |
Co‐morbid history | ||
Heart failure (1 vs 0) | 1.75 (1.65‐1.87) | <.0001 |
Hypertension (1 vs 0) | 1.64 (1.57‐1.72) | <.0001 |
Diabetes mellitus (1 vs 0) | 1.16 (1.11‐1.22) | <.0001 |
Ischemic stroke (1 vs 0) | 1. 21 (1.11‐1.33) | <.0001 |
Transient ischemic attack (1 vs 0) | 1.087 (0.97‐1.18) | .1821 |
Thrombo‐embolic events (1 vs 0) | 1.08 (0.88‐1.34) | .4632 |
Myocardial infarction (1 vs 0) | 1.04 (0.96‐1.13) | .3601 |
Peripheral artery disease (1 vs 0) | 1.12 (1.05‐1.18) | <.0001 |
Valvular disease (1 vs 0) | 1.48 (1.41‐1.56) | <.0001 |
Coronary artery disease (1 vs 0) | 1.42 (1.35‐1.49) | <.0001 |
Chronic sleep apnea (1 vs 0) | 1.22 (1.10‐1.35) | .0002 |
Chronic kidney disease (1 vs 0) | 1.17 (1.11‐1.24) | <.0001 |
Chronic obstructive pulmonary disease/bronchiectasis (1 vs 0) | 1.28 (1.22‐1.34) | <.0001 |
Major bleeding (1 vs 0) | 1.06 (1.00‐1.13) | .072 |
Cognitive impairment (1 vs 0) | 1.02 (0.93‐1.12) | .7185 |
Liver disease (1 vs 0) | 1.09 (1.02‐1.15) | .0071 |
Anemia (1 vs 0) | 1.09 (1.04‐1.14) | .0004 |
Lipid disorders (1 vs 0) | 0.89 (0.86‐0.93) | <.0001 |
Depression (1 vs 0) | 1.09 (1.03‐1.15) | .0029 |
Spondylosis and intervertebral discs (1 vs 0) | 1.07 (1.03‐1.11) | .0005 |
Osteoarthritis (1 vs 0) | 1.16 (1.12‐1.21) | <.0001 |
C‐index | 0.838 |
1‐Presence of condition or female for gender.
0‐Absence of condition or male for gender.
Results for main effect model.
As depicted in Figure 2 (see also Table S4), heart failure had the highest overall incidence rates (new cases/100 person‐years) from among the five clinical outcomes and was significantly higher in the AF cohort (13.38, 95% CI 12.95‐13.80) relative to the non‐AF cohort (0.97, 95% CI 0.96‐0.99). The overall incidence rate of stroke for the AF cohort (6.06, 95% CI 5.77‐6.34) was the second highest adverse clinical outcome and significantly higher than in the non‐AF cohort (0.93, 95% CI 0.92‐0.94). Myocardial infarction and major bleeding had incidence rates of 5.5 (95% CI 5.22‐5.77) and 5.33 (95% CI 5.06‐5.60), respectively, which were significantly higher than the corresponding rates for the non‐AF cohorts (myocardial infarction: 0.69, 95% CI 0.68‐0.70; major bleeding: 2.10, 95% CI 2.08‐2.12). Cognitive impairment had the lowest incidence rates, with the overall rates for the AF cohort (2.3, 95% CI 2.12‐2.47) significantly higher than the non‐AF cohort (0.55, 95% CI 0.54‐0.56).
Overall, all clinical outcomes increased with an increase in age groups in AF or non‐AF cohorts (Figure 2; Table S5). For cognitive impairment, there was an age group and AF status interaction: from 45 to 74 years, the AF cohort had higher cognitive impairment than the non‐AF cohort; at age ≥75 years, there was no difference between the two groups. Females tend to have incrementally higher incidence rates on average for heart failure, stroke, major bleeding, and cognitive impairment than males; moreover, males had higher incidence of myocardial infarction than females (Table S5).
3.1. Clinical outcomes and AF status/prior co‐morbid history
Table 3 presents the statistical relationships for each clinical outcome and the independent effects of AF group status and the heterogeneous prior multi‐morbid condition history. As an outcome, heart failure recorded the highest relative risk between AF and non‐incident AF cohorts. Patients in the AF cohort were at 6‐fold higher risk of having heart failure than being in the non‐AF cohort (OR 6.02, 95% CI 5.72‐6.33). This was followed by myocardial infarction (OR=2.68, 95% CI 2.51‐2.87)), stroke (OR=2.19, 95% CI 2.05‐2.30), and major bleeding (OR=1.36, 95% CI 1.28‐1.44). There was no statistical difference between the AF and non‐AF cohorts in terms of cognitive impairment for the main effect model due to the non‐inclusion of interaction terms such as those with age groups and gender (as shown in Figure 2). Hypertension, diabetes mellitus, depression, COPD/bronchiectasis, spondylosis/intervertebral discs, and osteoarthritis were the only common co‐morbid factors statistically significant with all clinical outcomes, as was the age group.
TABLE 3.
Baseline characteristic | Heart failure | Stroke | Myocardial infarction | Major bleeding | Cognitive impairment | |
---|---|---|---|---|---|---|
AF status | AF vs Non‐AF | 6.02 (5.72‐6.33) | 2.17 (2.05‐2.30) | 2.68 (2.51‐2.87) | 1.36 (1.28‐1.44) | |
Heart failure | (1 vs 0) | 19.08 (18‐32‐19.88) | 1.19 (1.12‐1.26) | |||
Hypertension | (1 vs 0) | 2.16 (2.07‐2.25) | 1.57 (1.51‐1.63) | 1.68 (1.60‐1.76) | 1.18 (1.15‐1.20) | 1.12 (1.06‐1.17) |
Diabetes mellit us | (1 vs 0) | 1.68 (1.62‐1.74) | 1.34 (1.29‐1.39) | 1.30 (1.24‐1.36) | 1.10 (1.07‐1.14) | 1.17 (1.11‐1.23) |
Ischemic stroke | (1 vs 0) | 10.43 (9.92‐10.97) | 1.27 (1.19‐1.36) | 1.24 (1.14‐1.35) | ||
Transient ischemic attack | (1 vs 0) | 4.22 (3.99‐4.48) | 1.15 (1.07‐1.24) | |||
Thrombo‐embol c events | (1 vs 0) | 5.45 (4.82‐6. 6) | 1.17 (1.00‐1.36) | |||
Myocardial infarction | (1 vs 0) | 1.26 (1.19‐1.35) | 11.99 (11.42‐12.59) | |||
Peripheral artery disease | (1 vs 0) | 1.21 (1.15‐1.27) | 1.49 (1.42‐1.57) | 1.16 (1.11‐1.21) | 1.14 (1.07‐1.22) | |
Valvular disease | (1 vs 0) | 1.50 (1.44‐1.57) | 1.14 (1.09‐1.19) | 1.13 (1.09‐1.18) | ||
Coronary artery disease | (1 vs 0) | 1.65 (1.59‐1.72) | 1.16 (1.11‐1.20) | 4.26 (4.07‐4.45) | 1.08 (1.04‐1.12) | |
Chronic sleep apnea | (1 vs 0) | 1.35 (1.25‐1.47) | 1.10 (1.01‐1.21) | 1.17 (1.10‐1.24) | 1.28 (1.13‐1.44) | |
Chronic kidney disease | (1 vs 0) | 1.55 (1.48‐1.62) | 1.09 (1.0 4‐1.50) | 1.13 (1.08‐1.17) | ||
Chronic pulomonary obstructive disease/bronchiectasis | (1 vs 0) | 1.66 (1.60‐1.72) | 1.20 (1.15‐1.25) | 1.25 (1.20‐1.31) | 1.23 (1.19‐1.27) | 1.07 (1.01‐1.12) |
Major bleeding | (1 vs 0) | 1.25 (1.18‐1.31) | 4.22 (4.10‐4.33) | 1.19 (1.11‐1.27) | ||
Cognitive impairment | (1 vs 0) | 1.14 (1.06‐1.23) | 17.38 (16.48‐18.33) | |||
Liver disease | (1 vs 0) | 1.11 (1.06‐1.17) | 1.09 (1.0 4‐1.15) | 1.35 (1.31‐1.39) | 1.13 (1.06‐1.21) | |
Anemia | (1 vs 0) | 1.19 (1.14‐1.23) | 1.10 (1.06‐1.15) | 1.35 (1.31‐1.39) | 1.18 (1.12‐1.23) | |
lipid disorders | (1 vs 0) | 1.22 (1.18‐1.26) | 1.24 (1.18‐1.30) | 1.16 (1.13‐1.19) | 1.10 (1.05‐1.16) | |
Depression | (1 vs 0) | 1.23 (1.17‐1.28) | 1.31 (1.25‐1.37) | 1.18 (1.12‐1.25) | 1.25 (1.22‐1.29) | 2.14 (2.03‐2.25) |
Spondylosis/interverbral discs | (1 vs 0) | 1.10 (1.07‐1.14) | 1.21 (1.18‐1.25) | 1.10 (1.06‐1.14) | 1.33 (1.30‐1.36) | 1.26 (1.21‐1.31) |
Osteoarthritis | (1 vs 0) | 1.14 (1.10‐1.18) | 1.08 (1.05‐1.12) | 1.06 (1.01‐1.10) | 1.15 (1.12‐1.18) | 1.11 (1.06‐1.16) |
Gender | Female vs male | 0.91 (0.88‐0.94) | 0.96 (0.93‐0.99) | 0.69 (0.66‐0.71) | 1.04 (1.00‐1.09) | |
Age group | > 75 years vs 18‐44 years | 7.45 (6.97‐7.97) | 8.67 (8.13‐9.25) | 5.01 (4.63‐5.42) | 1.86 (1.79‐1.93) | 40.80 (36.85‐45.16) |
65‐74 years vs 18‐44 years ears | 5.08 (4.76‐5.43) | 6.32 (5.94‐6.72) | 4.42 (4.10‐4.77) | 1.57 (1.52‐1.63) | 17.84 (16.11‐19.75) | |
55‐64 years vs 18‐44 years | 3.23 (3.03‐3.45) | 3.39 (3.18‐3.60) | 3.46 (3.21‐3.73) | 1.26 (1.23‐1.30) | 5.10 (4.58‐5.68) | |
45‐54 years vs 18‐44 years | 2.14 (2.00‐2.29) | 2.31 (2.17‐2.45) | 2.69 (2.51‐2.89) | 1.17 (1.13‐1.20) | 2.22 (1.98‐2.48) | |
C‐index | 0.917 | 0.860 | 0.890 | 0.696 | 0.909 |
1‐Presence of condition or female for gender.
0‐Absence of condition or male for gender.
Results shown for main effect model.
3.2. Age‐stratified analysis
Because there were significant differences between the AF and non‐AF cohorts in terms of mean age, an age‐stratified sample was performed for the non‐AF cohort in a way similar to that for the AF cohort. Compared to the age ≥75 year group (group 4) for the AF cohort, the sizes of groups 0 (ie, 18‐44 years), 1 (ie, 45‐54 years), 2 (ie, 55‐64 years), and 3 (ie, 65‐74) were 18%, 36%, 42%, and 77%, respectively. Random samples were obtained from each of those age group strata for the non‐AF cohort with the same proportions. The following sizes of random samples were obtained for these groups using the SAS software: n = 11 780 (group 0), n = 23 826 (group 1), n = 27 401 (group 2), n = 50 578 (group 3) relative to the n = 65 500 for group 4 in the non‐AF cohort. The average age (67.1 years, SD 14.7) for the age‐stratified non‐AF cohort was not significantly different from that for the AF cohort (average 68.2 years, SD 14.4).
The results for the main effect model were similarly obtained when the age‐stratified non‐AF cohort was added to the AF cohort (see Table S6). Thus, the remainder of the analyses are made with reference to the original population.
3.3. Machine learning algorithms
As shown in Figure 3, the AF cohort is associated with significant multimorbidity and polypharmacy, and machine learning was used for the dynamic modeling of the presence of AF and its interactions with multimorbidity and polypharmacy, and interactions of different components of prior clinical history, uncovering non‐linear relationships.
Both neural network and logistic regression algorithms yielded comparable results. Therefore, the results of logistic regression algorithms were used herein due to their explicit mathematical formulations (see Table S7 for model outputs and its glossary in Tables S8 through S10). For the training samples used in the development of prediction models, the following C‐indexes were obtained for the five incident clinical outcomes: heart failure (0.924, 95% CI 0.923‐0.925), stroke (0.871, 95% CI 0.869‐0.873), myocardial infarction (0.901, 95% CI 0.899‐0.903), major bleeding (0.700, 95% CI 0.697‐0.703), and cognitive impairment (0.919, 95% CI 0.917‐0.921). In the external validation of developed models, similar c‐indexes were evident (heart failure: 0.925, 95% CI 0.923‐0.927; stroke: 0.866, 95% CI 0.863‐0.869; myocardial infarction: 0.897, 95% CI 0.894‐0.900; major bleeding: 0.702, 95% CI 0.698‐0.706; cognitive impairment: 0.917, 95% CI 0.915‐0.919). The external calibration of all models demonstrated a good fit between the predicted probabilities and observed events (Figure 4).
Decision curve analysis for the validation samples demonstrated good results for the five clinical outcomes in terms of the predictive models in comparison to the “Treat All” option (Figure 5). There was a slow decline in net benefits per 100 patients for a given prediction model with an increase in the probability threshold in comparison to the steep decline of the “All Treatment” options of patients as positive (ie, all true and false positives). Overall, all developed models showed better utility in terms of true positives adjusted for false positives in comparison to the “treat all” approach. At the 5% probability threshold, for example, the net benefit values for the machine learning prediction models were 1.6 true positives/100 patients for heart failure, 1.1 true positives/100 patients for stroke, 0.9 true positives/100 patients for myocardial infarction, 2.2 true positives/100 patients for major bleeding, and 0.6 true positives/100 patients for cognitive impairment. In comparison to the “All Treatment” option, the net benefits in net true positives/100 patients (after adjusting for the false positives) were 4.5 for congestive heart failure, 4.3 for stroke, 4.6 for myocardial infarction, 2.9 for major bleeding, and 4.7 for cognitive impairment.
The non‐linear relationships were derived mainly from the interaction terms. The only polynomial effect was due to the quadratic effect (eg, X2) of age as a continuous variable, highlighting the crucial dynamic effects of the aging process (with the exception of cognitive impairment). In addition to the non‐linear effects, linear effects were mostly present for each clinical outcome due to the independent effects of age group and prior history of clinical outcome.
4. DISCUSSION
In this study, we obtained complex inter‐relationships of various comorbidities uncovered using a machine learning approach for AF cases. As expected, congestive heart failure recorded the highest relative risk between AF and non‐incident cohorts, followed by myocardial infarction, stroke, and major bleeding. There was no statistical difference between the main effects of AF and non‐AF cohorts in terms of cognitive impairment, once accounting for comorbidities. Second, machine learning models had high c‐indexes (0.702‐0.925) with good external validation and calibration. Third, decision curve analysis showed good results and positive net benefits for the five clinical outcomes in terms of using the machine‐learning predictive models in comparison to the “Treat All” option.
Our results clearly illustrate the power of machine learning in improving the prediction of adverse outcomes in multimorbid AF patients. Prior studies have tended to investigate the impact of a risk factor in isolation, often assessing it at baseline, and not accounting for aging and incident comorbidities and polypharmacy. Machine learning facilitates the integration of multiple risk factors that tend to cluster (reflecting the real‐world scenario), and the dynamic nature of risk changes (which may be non‐linear). From a modeling standpoint, the presence of AF interacted with other conditions in a dynamic fashion. This approach is a significant departure from most of the work in the published literature and may allow us to achieve higher discriminatory results in addition to the law of large numbers. In addition, the interactions of significant multi‐morbidity and polypharmacy were modeled including other variables of demographic origin.
Hence, our analyses of machine‐learning predictive models yielded high C‐index values for both training and validation samples and comparable for the logistic regression and neural network algorithms, due to a number of reasons including the dynamic modeling of the presence of AF and its interactions with multimorbidity and polypharmacy, interactions of different components of prior clinical history, exploiting a number of strategies for uncovering non‐linear relationships, and the very large cohort. Also, the non‐linear relationships observed in the machine learning logistic regression algorithms were derived mainly from the interactions of model features or inputs. The first type of interaction was elicited due to the simultaneous presence of AF and prior clinical history or aging. The second type of interaction is due to the simultaneous presence of other conditions, medication, and/or aging.
Currently, conventional AF care management has emphasized stroke prevention and anticoagulation treatment, albeit this population suffers from other major adverse outcomes such as heart failure, major bleeding, myocardial infarction, and cognitive impairment. In alignment with recent research, 13 our data clearly indicate that caring for AF should take into account other management components to lessen the adverse effects of additional complications other than stroke.
The AF cohort in this study has a diversified and worse co‐morbid history than prior studies, which may have influenced the (high) adverse clinical outcomes in the present study. For example, the incidence rates for stroke and major bleeding in the recent GARFIELD AF registry 3 were only 1.25 (95% CI 1.13‐1.38) and 0.7 (95% CI 0.62‐0.8) cases/100 person‐years, respectively. In contrast, our findings in this large cohort using novel machine learning dynamic assessments of multimorbidity and aging show that heart failure had the highest odds ratios (6.02, 95% CI 5.72‐6.33) from among the adverse clinical outcomes in an incident AF cohort relative to a non‐AF cohort. A major finding of this study is that the relationship between any of the five clinical outcomes and input variables is mostly non‐linear by depending less on main effects and relying more on interaction terms among the prior history of co‐morbid conditions, medications, age, and AF burden status.
The presence of AF (vs non‐AF) is demonstrated clearly in terms of its multiple interactions with baseline characteristics including existing conditions, medications, and age, with consistency for heart failure, stroke, myocardial infarction, major bleeding, and cognitive impairment. Indeed, our observations show the clustering of common comorbidities, as previously highlighted using cluster analysis. Indeed, this is a large prospective AF incident cohort / non‐AF cohort simulating real‐world study with patients having differences described in terms of diversified comorbid conditions / medication history / demographic variance in the baseline period upon entry into the study and feeding into the incidence of AF cases. Consequently, these differences are accounted for in statistical modeling as model features (ie, inputs) together with the dynamic changes resulting from the interaction of AF diagnosis and these differences. Such observations suggest that as a medically progressive condition, AF complexity depends initially on the significantly diverse heterogeneous conditions, medications, and age to induce the condition then dynamically interacts with its environment in terms of the existing conditions and consumed medications resulting in the observed clinical outcomes in question. Additionally, the non‐linear relationships include multiple interactions among the existing conditions, medications, and age. That is why, one should also manage the relevant comorbidities in addition to the AF symptoms, in order to optimize the management of incident AF cases.
In a systematic review and meta‐analysis, Odutayo et al 1 reported that congestive heart failure had the highest relative risk (4.99, 95% CI 3.04‐8.22) relative to other cardiovascular and non‐cardiovascular events. In a cohort with existing chronic kidney conditions, Bansal et al 2 also found that the AF cohort relative to the non‐AF cohort had the highest hazard ratio (5.17, 95% CI 3.89‐6.87) for congestive heart failure in comparison to other adverse clinical outcomes. Additionally, Bansal et al 2 indicated that myocardial infarction (3.64, 95% CI 2.5‐5.31) had the second highest hazard ratios in their study followed by stroke (2.66, 95% CI 1.5‐4.74). In the present study, stroke was the second highest followed by myocardial infarction and major bleeding. These examples clearly suggest that exposure to AF should be modeled in terms of the non‐linear associations to uncover the dynamic complex system impacting its implications for integrated care management purposes. An optimized integrated or holistic approach to AF care is required to manage the adverse outcomes in AF, given the heterogeneous prior multi‐morbid history (ie, conditions and medications). Indeed, Gallagher et al 14 show how the integrated care approach has the potential of reduced cardiovascular hospitalizations and all‐cause mortality.
4.1. Practical implications
The potential opportunities of managing multimorbidity in an integrated, holistic, and dynamic approach are illustrated by our Mobile Health (mHealth) technology to improve the optimization of integrated care in patients with the Atrial Fibrillation App program (mAFA) which investigated mHealth technology for improved screening and integrated care in patients with AF, facilitating early diagnosis, dynamic (re)assessments of risk profiles, and holistic AF management. Incorporation of a dynamic machine learning model into our mHealth technology would facilitate “real time” assessment of risks of adverse outcomes, facilitating the mitigation of modifiable risk factors. In this study, we have addressed five common complications in AF patients. Consequently, we developed five non‐linear algorithms to predict five clinical outcomes for each AF patient. As such, a risk level (eg, low, medium or high depending on the model score) can be assigned to each of the five clinical outcomes for a particular AF patient. This methodology (see Table S5 for the outputs of non‐linear algorithms and its glossary in Tables S6 through S8) could be implemented in an application (as above) to allow the integrated care among the primary care physicians and respective specialists with the patient engagement. The current design is a prospective incident design for the AF cohort, with non‐AF cohort maintained in a similar structure. Together, we are targeting the entire population to simulate a “real world” study allowing for the proper progression of the AF disease process, while at the same time accounting statistically in terms of model features the differences between the groups.
4.2. Limitations
This study is observational in nature and given every attempt was made to reduce potential biases, one should keep this in mind. In addition, although the dynamic nature of things was modeled using the AF group status and its interactions with the clinical history, an explicit representation of the time characteristics of the complex system examined in this study is not possible at this time due to technological and methodological limitations.
5. CONCLUSION
Complex inter‐relationships of various comorbidities uncovered using a machine learning approach for incident AF cases have major consequences for integrated care management, with implications for adverse clinical events. Machine learning may potentially facilitate automated approaches for dynamic risk stratification, where multimorbidity is present.
DISCLOSURES
The authors report no conflicts of interest in this work.
Supporting information
Lip GYH, Tran G, Genaidy A, Marroquin P, Estes C. Revisiting the dynamic risk profile of cardiovascular/non‐cardiovascular multimorbidity in incident atrial fibrillation patients and five cardiovascular/non‐cardiovascular outcomes: A machine‐learning approach. J Arrhythmia. 2021;37:931–941. 10.1002/joa3.12555
DATA AVAILABILITY STATEMENT
Data are available as presented in the paper. According to US laws and corporate agreements, our own approvals to use the Anthem and IngenioRx data sources for the current study do not allow us to distribute or make patient data directly available to other parties.
REFERENCE
- 1. Odutayo A, Wong CX, Hsiao AJ, Hopewell S, Altman DG, Emdin CA. Atrial fibrillation and risks of cardiovascular disease, renal disease, and death: systematic review and meta‐analysis. BMJ. 2016;354:i4482. 10.1136/bmj.i4482 [DOI] [PubMed] [Google Scholar]
- 2. Bansal N, Xie D, Sha D, Appel LJ, Deo R, Feldman HI, et al. Cardiovascular events after new‐onset atrial fibrillation in adults with CKD: results from the chronic renal insufficiency cohort (CRIC) study. J Am Soc Nephrol. 2018;29(12):2859–69. 10.1681/ASN.2018050514 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Bassand JP, Accetta G, Camm AJ, Cools F, Fitzmaurice DA, Fox KA, et al. Two‐year outcomes of patients with newly diagnosed atrial fibrillation: results from GARFIELD‐AF. Eur Heart J. 2016;37(38):2882‐9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Gage BF, Waterman AD, Shannon W, Boechler M, Rich MW, Radford MJ. Validation of clinical classification schemes for predicting stroke: results from the National Registry of Atrial Fibrillation. JAMA 2001;285(22):2864–70. [DOI] [PubMed] [Google Scholar]
- 5. Lip GY, Nieuwlaat R, Pisters R, Lane DA, Crijns HJ. Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor‐based approach: The euro heart survey on atrial fibrillation. Chest. 2010;137(2):263–72. [DOI] [PubMed] [Google Scholar]
- 6. Chao TF, Lip GYH, Liu CJ, Lin YJ, Chang SL, Lo LW, et al. Relationship of aging and incident comorbidities to stroke risk in patients with atrial fibrillation. J Am Coll Cardiol. 2018;71(2):122–32. [DOI] [PubMed] [Google Scholar]
- 7. Chao T‐F, Lip G, Lin Y‐J, Chang S‐L, Lo L‐W, Hu Y‐F, et al. Incident risk factors and major bleeding in patients with atrial fibrillation treated with oral anticoagulants: A comparison of baseline, follow‐up and delta HAS‐BLED scores with an approach focused on modifiable bleeding risk factors. Thromb Haemost. 2018;118(4):768–77. [DOI] [PubMed] [Google Scholar]
- 8. Inohara T, Shrader P, Pieper K, Blanco RG, Thomas L, Singer DE, et al. Association of atrial fibrillation clinical phenotypes with treatment patterns and outcomes. A Multicenter Registry Study. JAMA Cardiol. 2018;3(1):54‐63. 10.1001/jamacardio.2017. 4665 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Inohara T, Piccini JP, Mahaffey KW, Kimura T, Katsumata Y, Tanimoto K, et al. A cluster analysis of the Japanese multicenter outpatient registry of patients with atrial fibrillation. Am J Cardiol. 2019;124:871–8. 10.1016/j.amjcard.2019.05.071 [DOI] [PubMed] [Google Scholar]
- 10. Tu K, Nieuwlaat R, Cheng SY, Wing L, Ivers N, Atzema CL, et al. Identifying patients with atrial fibrillation in administrative data. Can J Cardiol. 2016;32:1561–5. 10.1016/j.cjca.2016.06.006 [DOI] [PubMed] [Google Scholar]
- 11. Piccini JP, Hammill BG, Sinner MF, Jensen PN, Hernandez AF, Heckbert SR, et al. Incidence and prevalence of atrial fibrillation and associated mortality among Medicare beneficiaries: 1993–2007. Cir Cardiovasc Qual Outcomes. 2012;5:85–93. 10.1161/CIRCOUTCOMES.111.962688 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Statistical Analysis Software . Base SAS 9.4 Procedures Guide: Statistical Procedures. 2nd Edition. SAS, 2013: Cary, NC, USA. 2013. [Google Scholar]
- 13. Bassand JP, Accetta G, Al Mahmeed W, Corbalan R, Eikelboom J, Fitzmaurice DA, et al. Risk factors for death, stroke, and bleeding in 28,628 patients from the GARFIELD‐AF registry: Rationale for comprehensive management of atrial fibrillation. . PLoS One. 2018;13(1):e0191592. 10.1371/journal.pone.0191592.eCollection [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Gallagher C, Elliott AD, Wong CX, Geetanjali Rangnekar G, Middeldorp ME, Mahajan R, et al. Integrated care in atrial fibrillation: A systematic review and meta‐analysis. Heart. 2017;103:1947–53. 10.1136/heartjnl-2016-310952 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data are available as presented in the paper. According to US laws and corporate agreements, our own approvals to use the Anthem and IngenioRx data sources for the current study do not allow us to distribute or make patient data directly available to other parties.