Abstract
OBJECTIVE
Sodium–glucose cotransporter 2 (SGLT2) inhibitors have well-documented cardioprotective effects but are underused, partly because of high cost. We aimed to develop a machine learning–based decision support tool to individualize the atherosclerotic cardiovascular disease (ASCVD) benefit of canagliflozin in type 2 diabetes.
RESEARCH DESIGN AND METHODS
We constructed a topological representation of the Canagliflozin Cardiovascular Assessment Study (CANVAS) using 75 baseline variables collected from 4,327 patients with type 2 diabetes randomly assigned 1:1:1 to one of two canagliflozin doses (n = 2,886) or placebo (n = 1,441). Within each patient’s 5% neighborhood, we calculated age- and sex-adjusted risk estimates for major adverse cardiovascular events (MACEs). An extreme gradient boosting algorithm was trained to predict the personalized ASCVD effect of canagliflozin using features most predictive of topological benefit. For validation, this algorithm was applied to the CANVAS-Renal (CANVAS-R) trial, comprising 5,808 patients with type 2 diabetes randomly assigned 1:1 to canagliflozin or placebo.
RESULTS
In CANVAS (mean age 60.9 ± 8.1 years; 33.9% women), 1,605 (37.1%) patients had a neighborhood hazard ratio (HR) more protective than the effect estimate of 0.86 reported for MACEs in the original trial. A 15-variable tool, INSIGHT, trained to predict the personalized ASCVD effects of canagliflozin in CANVAS, was tested in CANVAS-R (mean age 62.4 ± 8.4 years; 2,164 [37.3%] women), where it identified patient phenotypes with greater ASCVD canagliflozin effects (adjusted HR 0.60 [95% CI 0.41–0.89] vs. 0.99 [95% CI 0.76–1.29]; Pinteraction = 0.04).
CONCLUSIONS
We present an evidence-based, machine learning–guided algorithm to personalize the prescription of SGLT2 inhibitors for patients with type 2 diabetes for ASCVD effects.
Introduction
Sodium–glucose cotransporter 2 (SGLT2) inhibitors are a class of antihyperglycemic agents with well-documented cardioprotective and renoprotective effects (1–11). Across several trials in patients with type 2 diabetes, SGLT2 inhibitors have consistently reduced the incidence of adverse atherosclerotic cardiovascular (CV) disease (ASCVD) outcomes (2,7–9), including death resulting from CV causes, the composite of myocardial infarction/stroke/death resulting from CV causes, and hospitalization for heart failure, as reflected in product labeling for the individual agents in the class. However, the uptake of these medications has remained suboptimal (12). It has been estimated that more than half of all patients with type 2 diabetes are eligible for treatment with SGLT2 inhibitors (13–15); however, <10% of eligible patients in the U.S. receive a prescription for an SGLT2 inhibitor (16–18), with the cost of these medications representing a major barrier to their wider use. A tool that identifies individuals most likely to benefit from SGLT2 inhibitor use is likely to increase the societal benefit from SGLT2 inhibitors in the setting of limited resources.
To date, >45,000 unique patients with type 2 diabetes have been evaluated in clinical trials, the results from which have demonstrated cardioprotective effects of SGLT2 inhibitors for a variety of CV clinical outcomes (19). While there is modest to high heterogeneity in effect estimates across trials for death resulting from CV causes and ASCVD composite outcomes, subgroup assessments based on broad clinical groups have not identified which individual patients are most likely to benefit from treatment (19). Nevertheless, a focus on broad subgroups also cannot define the effect for a given individual, which is essential for precision care.
We recently developed and validated an algorithm that enables phenotyping of a clinical trial population based on sets of prerandomization variables to predict therapeutic response (20). Our machine learning–based method projects each patient to a topological space, an n-dimensional representation of all baseline phenotypes, and defines for each patient a unique neighborhood to calculate personalized treatment effect estimates.
Here, we hypothesize that patients with type 2 diabetes who were enrolled in the Canagliflozin Cardiovascular Assessment Study (CANVAS) testing the CV safety and efficacy of the SGLT2 inhibitor canagliflozin exhibit differential effects based on their complex phenotypical profile at baseline (2). Using an evidence-based, machine learning–guided approach, we developed a decision support tool that can be used to individualize the predicted effect of canagliflozin therapy on ASCVD outcomes and validated its predictive value in an independent cohort of patients enrolled in the CANVAS-Renal (CANVAS-R) trial (2).
RESEARCH DESIGN AND METHODS
Data Source
We obtained participant-level data of the CANVAS and CANVAS-R trials through the Yale University Open Data Access Project (project 2020-4553), which has an agreement with Janssen Research & Development, LLC. Briefly, CANVAS (Clinical trial reg. no. NCT01032629, ClinicalTrials.gov) was an international trial that recruited patients with type 2 diabetes (HbA1c 7.0–10.5% [53–91 mmol/mol]) who were randomly assigned in a 1:1:1 ratio to receive canagliflozin at a dose of 100 mg, 300 mg, or matching placebo (2). CANVAS-R (Clinical trial reg. no. NCT01989754, ClinicalTrials.gov) was a second randomized CV outcomes trial that was designed to meet a postapproval CV safety regulatory commitment (2). CANVAS-R randomly assigned participants at 1:1 ratio to receive canagliflozin (100 mg daily with an optional increase to 300 mg starting from week 13) or matching placebo.
Participants in both trials were men and women who were either ≥30 years of age with established ASCVD or ≥50 years of age with two or more ASCVD risk factors, including duration of diabetes of at least 10 years, systolic blood pressure >140 mm Hg while receiving one or more antihypertensive agents, current smoker status, microalbuminuria or macroalbuminuria, or HDL cholesterol level of <38.7 mg/dL (1 mmol/L). Detailed inclusion and exclusion criteria are presented in the primary publication of the pooled results of the CANVAS program (2). The Yale Institutional Review Board approved the study and waived the requirement for informed consent for post hoc analyses of deidentified data. We confirm that the present study complied with the Declaration of Helsinki.
Study Population and Covariates
We included all participants who received at least one dose of investigational drug (4,327 in CANVAS and 5,808 in CANVAS-R). We captured patient characteristics at trial enrollment and index/baseline visit, including demographics (age, sex, race, and ethnicity), anthropometrics (height, weight, and BMI), vital signs (systolic and diastolic blood pressure and heart rate), ASCVD risk factors (hypertension, diabetes, smoking status, and family history), medical history (microvascular and macrovascular complications of diabetes), laboratory measurements (complete blood count and differential, comprehensive metabolic panel, glycated hemoglobin, lipid panel, creatine kinase, urinalysis, and urine microalbumin and creatinine), medications (antihyperglycemic and other CV or LDL cholesterol–lowering medications), and electrocardiographic parameters (rate and PR, QRS, and QTc intervals) (Supplementary Table 1 provides a detailed list). In chronological order, we made the a priori decision to use CANVAS as the development (discovery) cohort and CANVAS-R as an independent external validation study to assess the validity and generalizability of our observations.
Data Preprocessing
A total of 126 variables were reviewed (Supplementary Table 1), and those with >20% missingness (n = 35) were excluded. To avoid collinearity of variables, we performed pairwise correlations across variables, and wherever pairs exceeded an absolute correlation coefficient of 0.7, we removed the variable with the largest mean absolute correlation across other pairwise comparisons (n = 16), resulting in 75 included variables. We imputed missing data for these 75 selected variables using chained random forests with predictive mean matching (21). Following imputation, we transformed continuous variables into standardized scores (z scores) by subtracting their mean and dividing by their respective SD (Supplementary Material).
Study Outcomes
Consistent with the definitions of the trials (2), our primary outcome was time to first event of a composite of major adverse CV events (MACEs), including death resulting from CV causes, nonfatal myocardial infarction, and nonfatal stroke.
Defining Phenotypical Neighborhoods
We assessed the phenotypical similarity of all individuals based on their prerandomization characteristics according to the Gower method, which computes a metric of dissimilarity between two data points (patient characteristics) that takes into account both numerical (e.g., age) and nonnumerical data (e.g., history of hypertension). In simple terms, lower Gower distances suggest that two patients are more alike (22). Similar to our prior work (20), for each individual in CANVAS, we defined a neighborhood consisting of the 5% phenotypically most similar participants, defined as those with the shortest Gower distance from the index patient. In additional sensitivity analyses, we repeated our analysis for neighborhoods of variable sizes (3%, 5%, 7.5%, 10%, 15%, 20%, 25%, and 50%), comparing risk estimates with our default neighborhood size of 5%.
Individualized Risk Phenomapping
Within each patient-centered neighborhood, we measured the association of treatment with canagliflozin versus placebo with the outcome of interest using age- and sex-adjusted Cox regression models, thus providing individualized efficacy estimates based on each patient’s unique neighborhood. To favor a normal distribution of risk estimates, we extracted the natural logarithmic transformations of the hazard ratios (log HRs) from the Cox models comparing canagliflozin treatment with placebo within each patient’s neighborhood (with negative values favoring treatment with canagliflozin as more protective against the primary outcome).
To visualize the phenotypical variation in the CANVAS population and neighborhoods, we used uniform manifold approximation and projection, a data visualization method that constructs a two-dimensional representation of the high-dimensional feature space (23). In such an embedding, phenotypically similar patients tend to also be topologically closer. We used color maps to visualize the spatial distribution of the patient baseline characteristics and neighborhood log HR estimates in the phenomap.
Creating and Validating an Algorithm to Identify High Responders to Canagliflozin Treatment
To develop a decision support tool that can best describe the observed heterogeneity in the treatment effects of canagliflozin across the CANVAS trial, we trained an extreme gradient boosting algorithm (24) to predict the personalized (neighborhood) log HRs of MACEs with canagliflozin treatment (vs. placebo) using a subset of 46 variables (Supplementry Table 1), selected based on their availability across both CANVAS and CANVAS-R.
Specifically, we constructed a regression task training the extreme gradient boosting algorithm to identify the prerandomization phenotypes most strongly linked to the neighborhood-derived log HR values (log HR values were winsorized in the −2 to +2 range to avoid the effects of outliers, as per our previous methodology (20)). Given the continuous dependent variable, model performance was evaluated using root mean square error as the loss function, where root mean square error is defined as the square root of the mean of the square of all of the prediction errors (20). Optimal hyperparameters (parameters whose values are used to control the learning process) were selected using a grid search and fivefold cross-validation.
To allow interpretation of the predictions of our model, we assessed feature importance using SHAP (Shapley Additive Explanations) values to identify the relative contribution of a predictor, either positively or negatively, to the final prediction (25). SHAP values are derived from coalitional game theory and, in simple terms, attempt to fairly distribute the payout (prediction) among all players (predictors). SHAP generalizes this to predictive modeling and quantifies the contribution that each feature brings to the prediction made by the model.
Additional details on the preprocessing, setup of the training algorithm, and hyperparameter tuning are presented in the Supplementary Material.
To improve the practical application of the model, we selected variables that were most strongly associated with the ASCVD effects of canagliflozin therapy based on an SHAP feature importance of ≥0.01, resulting in 15 features. We retrained our model using this limited set of variables, based on the same approach described above, and named the resulting model INSIGHT (Individualized Cardiovascular Risk Reduction With SGLT2 Inhibitors). In INSIGHT, a more negative log HR (corresponding to HR of <1) predicted a greater ASCVD benefit with the use of canagliflozin compared with placebo.
External Validation in CANVAS-R
We then applied the final model in the CANVAS-R trial to provide patient-level predictions of the expected CV effect of canagliflozin. We assessed the unbiased allocation of treatment by comparing the risk predictions of expected ASCVD effects between the canagliflozin and placebo arms and evaluated the agreement between the CV effects of canagliflozin among predicted high responders and the actual (observed) ASCVD effects. We defined high responders as those individuals with a predicted benefit greater than half an SD lower than the average predicted response (log HR). We evaluated this association using a treatment × high versus low effect interaction term in Cox regression models for the time-to-ASCVD outcome. In sensitivity analyses, we evaluated individuals in the highest versus lowest tertiles of predicted response. We also repeated this process with interaction terms between treatment and traditional CV risk factors. Finally, we retrained our model after excluding variables that are not routinely measured in the management of type 2 diabetes.
Browser-Accessible Decision Support Tool
The final extreme gradient boosting algorithm (INSIGHT) was deployed as an interactive application by creating a user interface (https://www.cards-lab.org/insight) and deploying the model as an R/Shiny app (version 1.6.0).
Statistical Analyses
Categorical variables are summarized as numbers and percentages, and continuous variables as means and SDs or medians and ranges, as appropriate. Canagliflozin- and placebo-treated groups were compared using Student t test for continuous variables, with density plots visualizing between-group differences. Spearman ρ was used to assess the correlation between continuous variables, as appropriate. Survival analyses were performed using Cox proportional hazards regression and graphically presented as unadjusted Kaplan-Meier plots with associated risk tables. While neighborhoods were matched on prerandomization covariates, we explicitly adjusted all our Cox models for age and sex.
All statistical tests were two sided, with a level of significance of 0.05 without correction for multiplicity of comparisons. Analyses were performed using R (version 4.0.2) and Python (version 3.8.5). Reporting of the study design and findings stands consistent with the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines (26).
Results
Study Population
From CANVAS, we included the 4,327 study participants (mean age 60.9 ± 8.1 years; 1,467 [33.9%] women) who successfully underwent random assignment and received at least one dose of their allocated treatment, which was either canagliflozin (n = 2,886 [66.7%]) or placebo (n = 1,441 [33.3%]). Study participants received treatment for a mean of 4.3 ± 2.3 years and were followed up for a mean of 5.7 ± 1.3 years, during which a total of 626 primary (MACEs) events were reported, including 303 deaths resulting from CV causes (from 478 total), 259 myocardial infarctions, and 196 stroke events.
In CANVAS-R, 5,808 participants (mean age 62.4 ± 8.4 years; 2,164 [37.3%] women) received canagliflozin (n = 2,904 [50.0%]) versus placebo (n = 2,904 [50.0%]) for a mean active treatment period of 1.8 ± 0.5 years. They were followed for a mean period of 2.0 ± 0.4 years with 322 MACEs recorded, including 116 deaths resulting from CV causes (among 205 death events), 129 myocardial infarctions, and 130 stroke events. The demographics and baseline characteristics of the two trials are summarized in Supplementary Table 2.
Phenomapping the CANVAS Population of Type 2 Diabetes
We produced a phenomap of the CANVAS trial population by defining pairwise distances between all trial participants based on the Gower dissimilarity metric using 75 prerandomization phenotypical variables, followed by visualization as a two-dimensional representation using uniform manifold approximation and projection. Visual assessment of the risk phenomaps showed that the treatment arms, which were randomly allocated, appeared uniformly distributed in the topological space (Fig. 1A). In contrast, baseline phenotypical variables, such as sex (Fig. 1B), age (Fig. 1C), presence of coronary artery disease (CAD) (Fig. 1D), glycated hemoglobin levels (Fig. 1E), and heart failure history (Fig. 1F) were heterogeneously distributed in the topological manifold, reflecting distinct neighborhoods characterized by different combinations of these baseline phenotypes.
Figure 1.
Phenotypical architecture of the CANVAS trial. Graphs illustrate a manifold representation of all 4,327 participants included in CANVAS (each dot represents a CANVAS participant). Patients are distributed in the topological space based on dissimilarity metrics (Gower distance) derived from 75 prerandomization variables; therefore, phenotypically similar individuals tend to be topologically closer. A: Visualization of treatment arm allocation demonstrates a random distribution of canagliflozin and placebo in the topological space. B–F: In contrast, sex (B), age (C), CAD (D), glycated hemoglobin (E), and heart failure history (F) demonstrate nonuniform distribution, reflecting the topological representation of patients according to their baseline phenotypes. It should be noted that because the dimensionality reduction is nonlinear, axes have been omitted, and only the comparisons between distances are meaningful.
Detecting Treatment Effect Heterogeneity With Canagliflozin Treatment
Using the manifold representation shown in Fig. 1, we defined 4,327 neighborhoods, around each trial participant, and for each one, we calculated age- and sex-adjusted risk estimates of MACEs with canagliflozin therapy versus placebo. Moving from larger neighborhoods (e.g., 50, 25, 20, and 15% of all participants) (Supplementary Fig. 1A–D) to smaller neighborhoods (e.g., 10, 7.5, 5, and 3% of all participants) (Supplementary Fig. 1E–H) uncovered treatment effect heterogeneity that was nonuniformly distributed in the phenotypical space.
We focused on 5% neighborhoods that enabled the analysis of smaller, more homogenous populations surrounding individuals and correlated well with risk estimates derived from 3 (ρ = 0.72; P < 0.001), 7.5 (ρ = 0.77; P < 0.001), and 10% neighborhoods (ρ = 0.65; P < 0.001), as opposed to larger neighborhood sizes (ρ = 0.12; P < 0.001 for a 50% neighborhood size). There was a wide distribution in neighborhood-specific risk estimates, with a median log HR of −0.01 and a 25th–75th percentile range of −0.28 to 0.28 (Fig. 2A). In total, 1,605 (37.1%) patients had a lower neighborhood HR than the overall effect estimate of 0.86 (log HR −0.15) for canagliflozin versus placebo reported in the original CANVAS trial (Fig. 2B) (2). Of note, the topological distribution of the treatment effect heterogeneity did not match that of any of the major CV or demographic profiles shown in Fig. 1.
Figure 2.
ASCVD benefit phenomaps of canagliflozin therapy in patients with type 2 diabetes. A: Graphical representation of the individualized neighborhood-derived hazards for major adverse cardiac events with canagliflozin therapy versus placebo demonstrates distinct topological and therefore phenotypical neighborhoods that benefit most from canagliflozin, which are scattered across different areas of the phenotypical manifold. B: In additional analyses, 1,605 (37.1%) patients had a lower individualized neighborhood HR than the overall HR estimate of 0.86 reported in the original CANVAS trial. Taken together, these phenomaps demonstrate that individuals deriving the greatest atherosclerotic benefit from canagliflozin use have complex phenotypical profiles that are not adequately described by a single phenotypical variable.
Individualized Treatment Effect Estimates Based on Risk Phenomaps
To uncover the baseline phenotypes that were most strongly associated with the observed heterogeneity in the ASCVD effects of canagliflozin therapy, we trained and internally validated in CANVAS an extreme gradient boosting algorithm that predicts a patient’s neighborhood hazard (log HR) based on a set of 46 baseline phenotypical variables, also collected in CANVAS-R (Supplementary Material). SHAP analysis (Fig. 3) revealed retinopathy, CAD, race, history of smoking, dyslipidemia, female sex, hypertension history, HDL cholesterol levels, BMI, circulating creatine kinase levels, kidney function based on estimated glomerular filtration rate, Hispanic/Latino ethnicity, and neuropathy as the most important predictors of the individualized ASCVD effect of canagliflozin therapy.
Figure 3.
SHAP feature importance for the individualized ASCVD effects of canagliflozin in patients with type 2 diabetes. To offer some insight into the contribution of each variable, we used an SHAP summary plot, in which the y-axis represents the independent predictors (in descending order of importance), and the x-axis indicates the change in the prediction for each patient (as represented by a dot). The gradient color denotes the original value for that variable (e.g., for categorical variables such as hypertension or diabetes, it only takes two colors, whereas for continuous variables, it contains the whole spectrum), with each point representing an individual from CANVAS. More negative SHAP values (left on the x-axis) indicate a higher ASCVD benefit with canagliflozin therapy versus placebo. The top 15 most important variables (with importance of ≥0.01) were used to create the INSIGHT tool. eGFR, estimated glomerular filtration rate.
INSIGHT: A Tool to Individualize the Predicted ASCVD Effects of Canagliflozin in Type 2 Diabetes
We then trained an extreme gradient boosting tree algorithm on the subset of the 15 most important features (shown in Fig. 3) to derive the INSIGHT tool (Supplementary Fig. 2), with an internal cross-validation root mean square error of 0.46. Application of INSIGHT in the independent external validation set of the CANVAS-R trial demonstrated that there was no significant difference in the predicted individualized effects of canagliflozin therapy between the canagliflozin and placebo arms, consistent with the random allocation of these treatments and the reliance of the decision support tool on baseline, prerandomization phenotypes (Fig. 4A). Individuals with the highest INSIGHT-predicted ASCVD benefit had a lower ASCVD risk when treated with canagliflozin compared with placebo (adjusted HR 0.60 [95% CI 0.41–0.89]), a significant difference compared with the rest of the CANVAS-R participants in whom canagliflozin therapy showed no significant ASCVD effects for the primary MACE composite outcome (adjusted HR 0.99 [95% CI 0.76–1.30]; Pinteraction = 0.04) (Fig. 4B–D).
Figure 4.
External validation of INSIGHT to individualize the prediction of ASCVD effects of canagliflozin therapy in patients with type 2 diabetes. A: Density plot of predicted log HR distributions in the canagliflozin and placebo arms of CANVAS-R, the external validation cohort. B–D: HRs (and 95% CIs) of MACEs in canagliflozin versus placebo, based on categories of predicted benefit (B), with cumulative incidence of MACEs in patients with high predicted benefit (C) and low predicted benefit (D).
Sensitivity Analyses
These findings were robust to alternative definitions of high response, including individuals in the highest (adjusted HR 0.65 [95% CI 0.45–0.95]) versus lowest tertile (adjusted HR 0.97 [95% CI 0.74–1.27]) of INSIGHT-predicted effects. Of note, traditional ASCVD risk factors, such as sex (Pinteraction = 0.34), age (≥65 vs. <65 years; Pinteraction = 0.73), CAD (Pinteraction = 0.73), heart failure (Pinteraction = 0.77), retinopathy (Pinteraction = 0.58), or chronic kidney disease (estimated glomerular filtration rate ≥60 vs. <60 mL/min/1.73 m2; Pinteraction = 0.55), could not reliably identify which individuals were most likely to derive ASCVD benefit from canagliflozin therapy. Moreover, there was no significant difference in the risk of serious adverse events or serious adverse events requiring drug discontinuation between predicted high and low responders (Supplementary Table 3). Finally, an alternative version of the INSIGHT tool that did not include creatine kinase levels (which may be missing or hard to obtain in routine practice) maintained its ability to identify patient subgroups deriving differential ASCVD benefit from canagliflozin therapy (Supplementary Fig. 3).
Conclusions
Using two clinical trials that evaluated the CV safety and efficacy of the SGLT2 inhibitor canagliflozin in patients with type 2 diabetes, we developed and validated a novel machine learning–based tool that leverages the phenotypical diversity of patients and enables a more personalized assessment of the predicted ASCVD effects with canagliflozin therapy in patients with type 2 diabetes. Our algorithm constructed a high-dimensional topological representation of the CANVAS trial population using the full breadth and complex interrelationship of recorded baseline phenotypical information. In a series of in silico experiments, each CANVAS trial participant formed the center of his or her own unique cluster, surrounded by the phenotypically most similar patients, thus enabling the extraction of personalized estimates of ASCVD effects in neighborhoods. Using machine learning, we identified the baseline phenotypes that were most strongly associated with the individualized ASCVD effects of canagliflozin therapy and constructed a 15-variable tool, named INSIGHT, to extrapolate these observations to different populations. We validated INSIGHT in the independent CANVAS-R study, where INSIGHT reliably identified patient phenotypes that derived greater ASCVD benefit from canagliflozin therapy.
Unlike traditional subgroup analyses that use categorical subgroups across a single phenotypical axis, such as use of specific medications (diuretics and β-blockers) or baseline risk profiles (2,19), our method treats the phenotypical spectrum as a multidimensional continuum, enabling the algorithm to discover inherent patterns in the data rather than relying on provider assumptions. Moreover, the approach does not rely on assumptions about high absolute risk to identify broad groups of patients likely to benefit from therapy; rather, it focuses on individualized aggregate estimates of expected relative risk reduction from the intervention compared with the placebo. As such, the approach complements prior studies that have modeled the absolute CV risk in type 2 diabetes (27,28) by providing a personalized estimate of the extent to which an individual’s ASCVD risk may be modifiable through SGLT2 inhibition.
Using a machine learning approach, our algorithm performed an unbiased ranking of all recorded baseline phenotypes and identified microvascular complications (e.g., retinopathy and neuropathy), macrovascular complications (e.g., CAD and heart failure), baseline kidney function, traditional ASCVD risk factors (e.g., dyslipidemia, hypertension, HDL levels, and obesity), but also sex, ethnicity, and race, as the most important predictors of ASCVD effects with canagliflozin therapy. These findings highlight a complex interrelationship between a patient’s demographic and clinical profiles as a major driver of the benefit seen from canagliflozin therapy. Adding to the external validity and robustness of our method, INSIGHT, a tool trained in CANVAS based on the 15 key predictors of ASCVD effects, reliably identified which individuals were most likely to benefit from canagliflozin therapy in the external, independent CANVAS-R trial, a study that chronologically followed CANVAS.
Most importantly, our approach leverages the phenotypical diversity in randomized clinical trials to define a precision therapeutic strategy. While precision medicine was conceived as an approach that ensures best patient outcomes through the selection of effective therapies, defining such therapeutic options has focused on biological mechanisms and dedicated genomic or biomarker evaluation (29–32), with associated burden on health care resources. Our approach expands the concept of precision medicine to a deeper understanding of clinical characteristics, which are readily available and therefore applicable to a larger portion of the population, integrating both nature and nurture to link distinct patient phenotypes to heterogeneous treatment effects. Through validation in a distinct clinical trial population, our approach also demonstrates reproducibility, generalizability, and clinical value. We therefore view this approach as a new way to inform the implementation of results from clinical trials, forming a bridge between the population-level inference on treatment effectiveness and the individualized application of trial results to inform therapeutic decision making. To further facilitate the translation of our findings, we have decided to make the INSIGHT online decision support tool freely accessible for future research use (Supplementary Fig. 4) (33).
Our study should be interpreted considering the following limitations. First, variation in effect estimates drawn from topological analyses around individual patients may be prone to random variation. However, our approach focused on deriving features that consistently define participation in neighborhoods with the strongest canagliflozin treatment effects. Second, the findings presented here do not necessarily apply to different SGLT2 inhibitors or patients with low atherosclerotic CV disease risk and should not be used to guide the use of other novel antihyperglycemic therapies, such as glucagon-like peptide 1 analogues. Similarly, they may not generalize to patient populations that were underrepresented in the original studies, such as Black individuals, who formed ∼3.3% of the CANVAS population. Third, results from the present study do not represent a justification to restrict the use of canagliflozin to specific groups, but instead, they define a strategy to identify patients most likely to benefit from canagliflozin in resource-limited settings. In the absence of mechanistic data, it is possible that clinical variables function as surrogates for other biological or even social determinants of personalized benefit. For instance, retinopathy may flag microvascular disease in type 2 diabetes, which has been shown to benefit from SGLT2 inhibition (34–36), whereas demographic characteristics may be associated with differences in medication adherence. However, these assertions could not be tested based on the available data. Fourth, to ensure consistency with the original trials, we chose the original primary outcome as our outcome of interest, thus excluding heart failure hospitalizations or progression of kidney disease from our assessments, outcomes consistently improved across this class of medications. Finally, the complexity of the model structure makes it hard to describe how the algorithm functions at each level. However, to promote explainability in our models, we used the SHAP method to quantify the importance of each feature and developed an online platform for interested parties to interact with our decision support tool.
In conclusion, we developed a machine learning–guided, evidence-based tool, INSIGHT, to personalize the consideration for canagliflozin based on prediction for optimizing the ASCVD benefit among patients with type 2 diabetes with or at high risk for ASCVD. This individualized approach to evidence synthesis may represent a strategy to maximize the benefits of novel therapeutics and may be a valuable adjunct to inform shared decision making in clinical practice.
Article Information
Funding. This study, carried out under YODA project 2020-4553, used data obtained from the Yale University Open Data Access Project, which has an agreement with Janssen Research & Development, LLC. This study was supported by research funding awarded by the Yale School of Medicine to R.K. R.K. also receives support from the National Heart, Lung, and Blood Institute of the National Institutes of Health under the award K23HL153775-01A1, outside of the submitted work.
The funders had no role in the design or conduct of the study; collection, management, analysis, or interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication. The interpretation and reporting of research using this data are solely the responsibility of the authors and do not necessarily represent the official views of the Yale University Open Data Access Project or Janssen Research & Development, LLC.
Duality of Interest. R.K. and E.K.O. are co-founders of Evidence2Health, a precision health analytics company, and are named as inventors in a U.S. Provisional Patent Application (“Methods For Neighborhood Phenomapping For Clinical Trials,” no. 63/177,117 filed 20 April 2021). D.K.M. has received honoraria or consultancy fees from Afimmune, Applied Therapeutics, AstraZeneca, Boehringer Ingelheim, CSL Behring, Eisai, Esperion, GlaxoSmithKline, Janssen Research & Development, LLC, Lexicon, Lilly USA, Merck Sharp & Dohme, Metavant, Novo Nordisk, Pfizer, and Sanofi US. No other potential conflicts of interest relevant to this article were reported.
Author Contributions. E.K.O. designed the study, performed data analysis and statistical analysis, and wrote the manuscript. M.A.S. and D.K.S. provided scientific direction and contributed to writing the report. R.K. designed the study, raised the funding for data access and analysis, coordinated and directed the project, and wrote the report. E.K.O. and R.K. are the guarantors of this work and, as such, had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Footnotes
This article contains supplementary material online at https://doi.org/10.2337/figshare.18283793.
References
- 1. Heerspink HJL, Stefansson BV, Correa-Rotter R, et al.; DAPA-CKD Committees and Investigators . Dapagliflozin in patients with chronic kidney disease. N Engl J Med 2020;383:1436–1446 [DOI] [PubMed] [Google Scholar]
- 2. Neal B, Perkovic V, Mahaffey KW, et al.; CANVAS Program Collaborative Group . Canagliflozin and cardiovascular and renal events in type 2 diabetes. N Engl J Med 2017;377:644–657 [DOI] [PubMed] [Google Scholar]
- 3. Perkovic V, de Zeeuw D, Mahaffey KW, et al. Canagliflozin and renal outcomes in type 2 diabetes: results from the CANVAS Program randomised clinical trials. Lancet Diabetes Endocrinol 2018;6:691–704 [DOI] [PubMed] [Google Scholar]
- 4. Perkovic V, Jardine MJ, Neal B, et al.; CREDENCE Trial Investigators . Canagliflozin and renal outcomes in type 2 diabetes and nephropathy. N Engl J Med 2019;380:2295–2306 [DOI] [PubMed] [Google Scholar]
- 5. Radholm K, Figtree G, Perkovic V, et al. Canagliflozin and heart failure in type 2 diabetes mellitus: results from the CANVAS program. Circulation 2018;138:458–468 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Petrie MC. Sodium glucose cotransporter 2 inhibitors: searching for mechanisms in the wake of large, positive cardiovascular outcomes trials. Circulation 2019;140:1703–1705 [DOI] [PubMed] [Google Scholar]
- 7. Zinman B, Wanner C, Lachin JM, et al.; EMPA-REG OUTCOME Investigators . Empagliflozin, cardiovascular outcomes, and mortality in type 2 diabetes. N Engl J Med 2015;373:2117–2128 [DOI] [PubMed] [Google Scholar]
- 8. Cannon CP, Pratley R, Dagogo-Jack S, et al.; VERTIS CV Investigators . Cardiovascular outcomes with ertugliflozin in type 2 diabetes. N Engl J Med 2020;383:1425–1435 [DOI] [PubMed] [Google Scholar]
- 9. Wiviott SD, Raz I, Bonaca MP, et al.; DECLARE-TIMI 58 Investigators . Dapagliflozin and cardiovascular outcomes in type 2 diabetes. N Engl J Med 2019;380:347–357 [DOI] [PubMed] [Google Scholar]
- 10. Bhatt DL, Szarek M, Steg PG, et al.; SOLOIST-WHF Trial Investigators . Sotagliflozin in patients with diabetes and recent worsening heart failure. N Engl J Med 2021;384:117–128 [DOI] [PubMed] [Google Scholar]
- 11. Cahn A, Raz I, Leiter LA, et al. Cardiovascular, renal, and metabolic outcomes of dapagliflozin versus placebo in a primary cardiovascular prevention cohort: analyses from DECLARE-TIMI 58. Diabetes Care 2021;44:1159–1167 [DOI] [PubMed] [Google Scholar]
- 12. Ceriello A, Standl E, Catrinoiu D, et al.; Diabetes and Cardiovascular Disease (D&CVD) EASD Study Group . Issues of cardiovascular risk management in people with diabetes in the COVID-19 era. Diabetes Care 2020;43:1427–1432 [DOI] [PubMed] [Google Scholar]
- 13. Castellana M, Procino F, Sardone R, Trimboli P, Giannelli G. Generalizability of sodium-glucose co-transporter-2 inhibitors cardiovascular outcome trials to the type 2 diabetes population: a systematic review and meta-analysis. Cardiovasc Diabetol 2020;19:87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Wittbrodt ET, Eudicone JM, Bell KF, Enhoffer DM, Latham K, Green JB. Eligibility varies among the 4 sodium-glucose cotransporter-2 inhibitor cardiovascular outcomes trials: implications for the general type 2 diabetes US population. Am J Manag Care 2018;24:S138–S145 [PubMed] [Google Scholar]
- 15. Nargesi AA, Jeyashanmugaraja GP, Desai N, Lipska K, Krumholz H, Khera R. Contemporary national patterns of eligibility and utilization of novel cardioprotective anti-hyperglycemic agents in type 2 diabetes. J Am Heart Assoc 2021;10:e021084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Hamid A, Vaduganathan M, Oshunbade AA, et al. Antihyperglycemic therapies with expansions of US Food and Drug Administration indications to reduce cardiovascular events: prescribing patterns within an academic medical center. J Cardiovasc Pharmacol 2020;76:313–320 [DOI] [PubMed] [Google Scholar]
- 17. Arnold SV, de Lemos JA, Rosenson RS, et al.; Gould Investigators . Use of guideline-recommended risk reduction strategies among patients with diabetes and atherosclerotic cardiovascular disease. Circulation 2019;140:618–620 [DOI] [PubMed] [Google Scholar]
- 18. Arnold SV, Inzucchi SE, Tang F, et al. Real-world use and modeled impact of glucose-lowering therapies evaluated in recent cardiovascular outcomes trials: an NCDR(R) Research to Practice project. Eur J Prev Cardiol 2017;24:1637–1645 [DOI] [PubMed] [Google Scholar]
- 19. McGuire DK, Shih WJ, Cosentino F, et al. Association of SGLT2 inhibitors with cardiovascular and kidney outcomes in patients with type 2 diabetes: a meta-analysis. JAMA Cardiol 2021;6:148–158 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Oikonomou EK, Van Dijk D, Parise H, et al. A phenomapping-derived tool to personalize the selection of anatomical vs. functional testing in evaluating chest pain (ASSIST) [published correction appears in Eur Heart J 2022;ehab916]. Eur Heart J 2021;42:2536–2548 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Wright MN, Ziegler A. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 2017;77:17 [Google Scholar]
- 22. Gower JC. A general coefficient of similarity and some of its properties. Biometrics 1971;27:857–871 [Google Scholar]
- 23. McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. Accessed 28 January 2022. Available from https://arxiv.org/abs/1802.03426
- 24. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Accessed 28 January 2022. Available from https://arxiv.org/abs/1603.02754
- 25. Lundberg SM, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2020;2:56–67 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. von Elm E, Altman DG, Egger M, et al.; STROBE Initiative . The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet 2007;370:1453–1457 [DOI] [PubMed] [Google Scholar]
- 27. Ferrannini E, Rosenstock J. Clinical translation of cardiovascular outcome trials in type 2 diabetes: is there more or is there less than meets the eye? Diabetes Care 2021;44:641–646 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Shao H, Shi L, Fonseca VA. Using the BRAVO risk engine to predict cardiovascular outcomes in clinical trials with sodium-glucose transporter 2 inhibitors. Diabetes Care 2020;43:1530–1536 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Claassens DMF, Vos GJA, Bergmeijer TO, et al. A genotype-guided strategy for oral P2Y12 inhibitors in primary PCI. N Engl J Med 2019;381:1621–1631 [DOI] [PubMed] [Google Scholar]
- 30. Sibbing D, Aradi D, Jacobshagen C, et al.; TROPICAL-ACS Investigators . Guided de-escalation of antiplatelet treatment in patients with acute coronary syndrome undergoing percutaneous coronary intervention (TROPICAL-ACS): a randomised, open-label, multicentre trial. Lancet 2017;390:1747–1757 [DOI] [PubMed] [Google Scholar]
- 31. Pereira NL, Sargent DJ, Farkouh ME, Rihal CS. Genotype-based clinical trials in cardiovascular disease. Nat Rev Cardiol 2015;12:475–487 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Kimmel SE, French B, Kasner SE, et al.; COAG Investigators . A pharmacogenetic versus a clinical algorithm for warfarin dosing. N Engl J Med 2013;369:2283–2293 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. INSIGHT: personalized atherosclerotic benefit estimation for canagliflozin in type 2 diabetes. Accessed 21 November 2021. Available from https://cards-lab.shinyapps.io/INSIGHT/
- 34. Adingupu DD, Gopel SO, Gronros J, et al. SGLT2 inhibition with empagliflozin improves coronary microvascular function and cardiac contractility in prediabetic ob/ob(-/-) mice. Cardiovasc Diabetol 2019;18:16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Sugiyama S, Jinnouchi H, Kurinami N, et al. The SGLT2 inhibitor dapagliflozin significantly improves the peripheral microvascular endothelial function in patients with uncontrolled type 2 diabetes mellitus. Intern Med 2018;57:2147–2156 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Ott C, Jumar A, Striepe K, et al. A randomised study of the impact of the SGLT2 inhibitor dapagliflozin on microvascular and macrovascular circulation. Cardiovasc Diabetol 2017;16:26. [DOI] [PMC free article] [PubMed] [Google Scholar]