Skip to main content
American Journal of Epidemiology logoLink to American Journal of Epidemiology
. 2024 Sep 11;194(7):1999–2011. doi: 10.1093/aje/kwae355

Tree-based scan statistics to generate drug repurposing hypotheses: a test case using sodium-glucose cotransporter-2 inhibitors

George S Q Tan 1,2,, Judith C Maro 3, Shirley V Wang 4, Sengwee Toh 5,6, Jedidiah I Morton 7,8, Jenni Ilomäki 9, Jenna Wong 10,✉,#, Xiaojuan Li 11,✉,#
PMCID: PMC12461578  PMID: 39270669

Abstract

Most drug repurposing studies using real-world data focused on validating, instead of generating, hypotheses. We used tree-based scan statistics to generate repurposing hypotheses for sodium-glucose cotransporter-2 inhibitors (SGLT2i). We used an active-comparator, new-user study design to create a 1:1 propensity-score matched cohort of SGLT2i and dipeptidyl peptidase-4 inhibitors (DPP4i) initiators in the Merative MarketScan Research Databases. Tree-based scan statistics were estimated across an ICD-10-CM-based hierarchical outcome tree using incident outcomes identified from hospital and outpatient diagnoses. We used an adjusted P ≤ .01 as the threshold for statistical alert to prioritize associations for evaluation as repurposing signals. We varied the analyses by tree size, scanning level, and clinical settings for outcomes. There were 80 510 matched SGLT2i-DPP4i initiator pairs with 215 333 outcomes among SGLT2i initiators and 223 428 outcomes among DPP4i initiators. There were 18 prioritized associations, which included chronic kidney disease (P = .0001), an expected signal, and anemia (P = .0001). Heart failure (P = .0167), another expected signal, was identified slightly beyond the statistical alert threshold. Narrowing the outcome tree, scanning at different tree levels, and including outcomes from different clinical settings influenced the scan statistics. We identified signals aligning with recently approved indications of SGLT2i, plus potential repurposing signals supported by existing evidence but requiring future validation.

Keywords: drug repurposing, drug repositioning, tree-based scan statistics, TreeScan, data-mining, real-world data, pharmacoepidemiology

Introduction

Drug repurposing, defined as finding new indications for existing drugs, has garnered much interest in the past decade due to significant cost and time savings, as well as greater success rates across the drug development and regulatory approval pipeline compared to de novo drug development.14 One of the computational approaches for drug repurposing is retrospective analysis of real-world data (RWD), defined as data collected during routine delivery of healthcare by the United States Food and Drug Administration (FDA).5 Most previous drug repurposing studies using RWD have focused on validating, rather than generating repurposing hypotheses.6 Using RWD to generate novel repurposing hypotheses holds much promise given improving data quality and availability.7,8

Tree-based scan statistics (TBSS), enabled by TreeScan, is a data mining method originally developed to conduct scan statistics across a hierarchical tree.9 In general, a hierarchical tree consists of variables arranged in a tree structure, for example, occupations, pharmaceutical drugs, and clinical diagnoses. Applications of TBSS thus far have been predominantly for occupational disease and medication safety surveillance.913

We aimed to demonstrate how TBSS can be used to generate new drug repurposing hypotheses from RWD. In essence, an inverse association between drug exposure and a health outcome identified by the scan statistics may suggest a potential repurposing signal relating to the outcome. We used sodium-glucose cotransporter-2 inhibitors (SGLT2i) as a test case, which is a new class of glucose-lowering drugs initially approved for the treatment of type 2 diabetes. Sodium-glucose cotransporter-2 inhibitors were additionally approved in the United States for the treatment of heart failure in 2020 and chronic kidney disease in 2021.1420 These new indications could serve as “positive controls” to evaluate the performance of this approach.

Methods

Data sources

We used data from the Merative MarketScan Research Databases from October 1, 2014, to December 31, 2021, where the data were converted to the Sentinel Common Data Model (version 8.1). MarketScan captures one of the largest convenience samples of individuals (and their spouses and dependents) with employer-sponsored health insurance plans across the United States.21,22 It provides de-identified patient-level health data, including insurance enrollment status, diagnosis and procedure codes for inpatient and outpatient services, and outpatient prescription medication dispensing data based on National Drug Codes. This study was approved by the Institutional Review Board of Harvard Pilgrim Health Care Institute and Monash University.

Study design and cohort

We used an active-comparator, new-user study design by comparing initiators of SGLT2i (canagliflozin, dapagliflozin, empagliflozin, ertugliflozin, other SGLT2i-containing combination products) to initiators of dipeptidyl peptidase-4 inhibitors (DPP4i; alogliptin, linagliptin, saxagliptin, sitagliptin, other DPP4i-containing combination products; Table S1). Dipeptidyl peptidase-4 inhibitors were chosen as the active comparator because, like SGLT2i, they are second-line glucose-lowering drugs for type 2 diabetes.23 Commonly, DPP4i have been used as active comparators for SGLT2i in comparative studies.2427

The study cohort consisted of beneficiaries aged ≥18 years who initiated treatment with an SGLT2i or DPP4i between October 1, 2015, and October 31, 2019. The latter date was selected because the pivotal DAPA-HF trial, published in November 2019, was the first to report that dapagliflozin use was associated with a reduced risk in heart failure outcomes irrespective of diabetes status, which could have influenced prescribing practices of SGLT2i.28 The index date was defined as the first dispensing of either SGLT2i or DPP4i. Eligible individuals were required to have at least 1 year of continuous medical and pharmacy coverage prior to the index date, with allowable gaps of no more than 45 days. We used a 1-year washout period (with no prior dispensing of either SGLT2i or DPP4i) prior to the index date to identify new users. We excluded individuals who initiated treatment with both SGLT2i and DPP4i on the index date. The drug exposure periods were constructed using the days’ supply of medication in a dispensing, allowing for stockpiling when an additional dispensing occurred before the end of the days’ supply of the previous dispensing. A grace period was allowed and used to bridge brief gaps between exposure periods of up to 14 days. Using an adapted version of the Chronic Condition Warehouse algorithm for type 2 diabetes,29 we required eligible individuals to have a diagnosis of type 2 diabetes and excluded those with a diagnosis of type 1 diabetes, using at least 1 inpatient diagnosis or at least 2 ambulatory or emergency department diagnoses on separate days, within 1 year prior to the index date. Figure 1 illustrates the complete study design.30

Figure 1.

Figure 1

Graphical representation of longitudinal study design Modified from: Schneeweiss S, Rassen JA, Brown JS, et al. Graphical Depiction of Longitudinal Study Designs in Health Care Databases. Ann Intern Med 2019; 170: 398-406. 20190312. DOI: 10.7326/m18-3079.

Propensity score matching

To reduce potential confounding across all outcomes, we used 1:1 propensity score matching where SGLT2i initiators were matched with DPP4i initiators using optimal nearest-neighbor matching with a caliper of 0.025 and no replacement. We estimated propensity scores for initiating SGLT2i using a predefined set of baseline covariates measured in a 1-year baseline period prior to the index date.5 These included demographic factors (age and sex); calendar year of the index date; combined Charlson/Elixhauser comorbidity score31,32; adapted Diabetes Complications Severity Index33; baseline use of glucose-lowering drugs; comorbidities and other medications; procedures; and healthcare utilization characteristics (Table S2). We examined covariate balance after matching, with covariate imbalance defined as an absolute value of the standardized mean difference of > .1.

Hierarchical outcome tree

We used a pruned version of the hierarchical tree based on International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes. The ICD-10-CM codes are inherently organized into a hierarchical tree-like structure with up to 7 levels, corresponding to the maximum 7 digits of the diagnosis codes. Broad categories of diagnoses start at the “root” and progressively “branch” into more specific groups of diagnoses, culminating in specific diagnosis codes at the “leaf” (Figure S1). Each level has multiple nodes, which encompass all downstream diagnoses. We pruned the ICD-10-CM tree to remove branches containing diagnoses that are less plausible as drug-related outcomes: external causes of morbidity (V00-Y99) and factors influencing health status and contact with health services (Z00-Z99). We also excluded codes for conditions originating in the perinatal period (P00-P96) and codes for pregnancy, childbirth, and the puerperium (O00-O9A), as we did not intend to evaluate pregnancy-related outcomes. The tree was further pruned in the sensitivity analyses (described later on). Refer to Table S3 for specifications of the tree.

Follow-up for outcomes

Follow-up began on the day following the first dispensing of the drug of interest and continued until the earliest of any of the following events: end of the drug exposure period, disenrollment, death, end of data availability (December 31, 2021), initiation of opposite study drug, censoring of 1 person from the matched pair for any of the aforementioned reasons, or end of the 2-year (730 days) follow-up period. We defined incident outcomes based on diagnoses (in any diagnosis position) from inpatient admissions, emergency department presentations, or ambulatory care. Each incident outcome was considered separately. However, to be considered an incident outcome, the individual must have had no diagnosis with the first 3 digits of the ICD-10-CM code recorded in at least 1 year preceding its occurrence. In other words, incidence was defined as level 3 of the outcome tree. This was to exclude closely related diagnoses (categorized within the same level 3) that were recorded within the same timeframe, which could reflect a related follow-up diagnosis or nuanced differences when coding a similar condition.

Scan statistics

As the interest of this study was identifying repurposing signals rather than safety signals, we looked for nodes where the observed probability of the outcome in the exposure group was lower than the corresponding expected probability if there was truly no difference with the comparator group (inverse associations). In the TreeScan software, this was implemented by interchanging the exposure and comparator groups because TreeScan was designed to evaluate safety signals (positive associations) “out of the box.”5,12 The expected number of outcomes at each node is calculated as half of the total number of outcomes from both exposure groups, given that follow-up time was matched between groups. Any node that was scanned had to have at least 2 outcomes among the exposed. We used the unconditional Bernoulli scan statistics as we assumed that outcomes in the exposed group occur in a fixed probability of .5 within the 1:1 matched cohort.

Due to the evaluation of thousands of outcomes concurrently in this study, it was important to limit false positive signals.34 Tree-based scan statistics derives multiplicity-adjusted P values nonparametrically using Monte Carlo simulations.9 A P value can be interpreted as the 1-sided probability of observing the difference between observed and expected outcomes at the specific node (alternative hypothesis) if the composite null hypothesis were true. The composite null hypothesis was that there is no difference in observed and expected outcomes across all nodes. The alternative hypothesis in this study was the likelihood of an inverse association, unlike in drug safety studies that look for a positive association.5,12 We describe how the P values were derived in more detail in Appendix S1. However, it is important to note that the P values were used to prioritize signals for further evaluation.35 We specified in the TreeScan software to output all inverse associations with P < 1.

Repurposing signals

We only looked for associations using outcome nodes in levels 3, 4, and 5, so as not to expend statistical power looking for signals that were clinically either too broad or too specific. Similar to some previous TBSS studies,5,12 we used P ≤ .01 as the threshold for statistical alerts prioritizing associations for evaluation as potential repurposing signals, rather than the conventional P ≤ .05 to further guard against type 1 error. However, we presented all inverse associations with P < 1 sorted by ascending P values for transparency.

The established cardiorenal benefits of SGLT2i, specifically for heart failure and chronic kidney disease (CKD), were expected signals and served as positive controls in this study. Evaluation of unexpected signals as potential repurposing signals included consideration of biological and pharmacological plausibility, clinical context, confounding, and bias by study design. We summarize the workflow for using TBSS to identify potential repurposing signals in Figure 2.

Figure 2.

Figure 2

Workflow using tree-based scan statistics to identify repurposing signals.

Sensitivity analyses

We conducted a number of sensitivity analyses to investigate the impact of modifying certain analytic parameters on the repurposing signals identified. First, we further pruned the ICD-10-CM outcome tree to preserve statistical power (Table S3). Codes for neoplasms (C00-D49) were excluded, as outcomes with long induction and latent periods, such as cancers, are less likely to be causally associated with the exposure within 2 years of follow-up.36 Codes for diabetes mellitus (E10-E14) were also excluded as both the exposure and comparator drugs are already indicated for diabetes. Finally, codes relating to symptoms, signs, and abnormal laboratory findings (R00-R99) were excluded as most are nonspecific or subclinical symptoms of diseases. Second, we repeated the analyses to also scan across nodes at level 2 (in addition to levels 3, 4, and 5), where the incidence of outcomes was redefined at level 2 of the ICD-10-CM outcome tree. Third, we restricted the analyses such that incident outcomes were identified using diagnoses from only inpatient admissions or emergency department presentations and not ambulatory care.

Software

Sentinel Routine Query Modules (version 12.1.2) were executed in SAS Studio 3.7 (SAS Institute, Inc., Cary, North Carolina) to extract the matched cohorts and outcome data (see Table S4 for parameter specifications used for the modules). Sentinel Query Request Package Reporting Tool (version 2.1.0) was used to generate tables and figures. We used TreeScan software (version 2.1.1; www.treescan.org) to conduct the TBSS.

Results

Cohort characteristics

We identified a total of 106 143 SGLT2i initiators and 118 575 DPP4i initiators. The baseline characteristics of individuals in the 2 exposure groups before matching are included in Table 1. Briefly, compared to DPP4i initiators, SGLT2i initiators were slightly younger (mean age, 55 vs 58 years), had fewer comorbidities (mean Charlson/Elixhauser combined comorbidity score, 0.9 vs 1.3), and had fewer or less severe diabetes complications (mean adapted Diabetes Complication Severity Index, 0.9 vs 1.1). Sodium-glucose cotransporter-2 inhibitors compared to DPP4i initiators were less likely to have a baseline diagnosis of CKD (9.9% vs 14.8%) and heart failure (4.1% vs 6.6%). The median follow-up time before matching for SGLT2i and DPP4i initiators was 116 (interquartile range [IQR], 43-336) and 104 (IQR, 43-290) days, respectively. The distribution of censoring reasons for both groups before matching was comparable (Table S5), with most censored due to end of treatment episode (76%-77%) and disenrollment (18%-19%).

Table 1.

Baseline characteristics of the study cohort before and after 1:1 propensity score matching.

  Before matching After 1:1 propensity score matching
  SGLT2i initiators DPP4i initiators Standardized mean difference SGLT2i initiators DPP4i initiators Standardized mean difference
  Number/Mean %/SD Number/Mean %/SD Number/Mean %/SD Number/Mean %/SD
Number of patients 106 143 118 575 80 510 80 510
Patient characteristics
 Age, years 54.7 9.8 58.0 11.9 −0.303 55.3 9.8 55.2 10.6 0.018
 Female 47 272 44.5% 55 173 46.5% −0.040 35 984 44.7% 35 816 44.5% 0.004
Index year of initiation
 2015 6386 6.0% 8756 7.4% −0.055 5235 6.5% 5203 6.5% 0.002
 2016 27 446 25.9% 37 974 32.0% −0.136 22 441 27.9% 22 501 27.9% −0.002
 2017 26 589 25.1% 31 015 26.2% −0.025 20 636 25.6% 20 704 25.7% −0.002
 2018 21 933 20.7% 23 022 19.4% 0.031 16 633 20.7% 16 685 20.7% −0.002
 2019 23 789 22.4% 17 808 15.0% 0.190 15 565 19.3% 15 417 19.1% 0.005
Diabetes-related covariates
Adapted diabetes complications severity index 0.9 1.3 1.1 1.7 −0.171 0.9 1.4 0.8 1.4 0.020
Glucose-lowering drugs
 Metformin 85 236 80.3% 89 307 75.3% 0.120 64 009 79.5% 64 245 79.8% −0.007
 Sulfonylurea 34 389 32.4% 41 780 35.2% −0.060 26 771 33.3% 26 712 33.2% 0.002
 GLP-1 agonist 25 303 23.8% 7166 6.0% 0.515 7461 9.3% 7077 8.8% 0.017
 Thiazolidinedione 7800 7.3% 6215 5.2% 0.087 4961 6.2% 4847 6.0% 0.006
 α-glucosidase inhibitor 299 0.3% 347 0.3% −0.002 222 0.3% 216 0.3% 0.001
 Insulin 28 243 26.6% 16 994 14.3% 0.308 14 617 18.2% 14 287 17.7% 0.011
Comorbidities and comedications
 Charlson/Elixhauser combined comorbidity score 0.9 1.6 1.3 2.2 −0.204 0.9 1.7 0.9 1.7 0.020
Diagnoses and procedures
 Anemia 10 232 9.6% 16 068 13.6% −0.122 8014 10.0% 7881 9.8% 0.006
 Arrhythmia 9362 8.8% 14 176 12.0% −0.103 7230 9.0% 6928 8.6% 0.013
 Autoimmune disease 8951 8.4% 10 008 8.4% −0.000 6353 7.9% 6171 7.7% 0.008
 Bacterial infection 15 519 14.6% 21 251 17.9% −0.090 11 958 14.9% 11 795 14.7% 0.006
 Coagulopathy 1309 1.2% 2454 2.1% −0.066 1086 1.3% 1057 1.3% 0.003
 Colonoscopy 10 006 9.4% 11 217 9.5% −0.001 7505 9.3% 7529 9.4% −0.001
 Degenerative disease of the  central nervous system 11 246 10.6% 14 741 12.4% −0.058 8525 10.6% 8409 10.4% 0.005
 Durable medical equipment 2615 2.5% 4536 3.8% −0.078 2053 2.5% 2030 2.5% 0.002
 Fecal occult blood test 6340 6.0% 7030 5.9% 0.002 4865 6.0% 4954 6.2% −0.005
 Fluid and electrolyte disorder 6464 6.1% 11 682 9.9% −0.139 5176 6.4% 5027 6.2% 0.008
 Gallstones 1719 1.6% 2398 2.0% −0.030 1342 1.7% 1327 1.6% 0.001
 Human papillomavirus DNA test 54 0.1% 75 0.1% −0.005 39 0.0% 50 0.1% −0.006
 Hyperparathyroidism 460 0.4% 766 0.6% −0.029 354 0.4% 325 0.4% 0.006
 Kawasaki disease 1 0.0% 3 0.0% −0.004 1 0.0% 3 0.0% −0.005
 Mammogram 21 314 20.1% 22 794 19.2% 0.022 15 912 19.8% 15 999 19.9% −0.003
 Organ transplant 531 0.5% 1033 0.9% −0.045 434 0.5% 432 0.5% 0.000
 Other infections 5545 5.2% 6223 5.2% −0.001 4159 5.2% 4143 5.1% 0.001
 Prostate-specific antigen test 23 039 21.7% 23 668 20.0% 0.043 17 541 21.8% 17 726 22.0% −0.006
 Pap smear 11 602 10.9% 11 635 9.8% 0.037 8564 10.6% 8632 10.7% −0.003
 Psychosis 11 903 11.2% 13 444 11.3% −0.004 8652 10.7% 8610 10.7% 0.002
 Pulmonary circulation disorders 569 0.5% 1086 0.9% −0.045 461 0.6% 425 0.5% 0.006
 Pulmonary disease 10 912 10.3% 14 737 12.4% −0.068 8492 10.5% 8398 10.4% 0.004
 Renal failure 4698 4.4% 11 991 10.1% −0.220 4004 5.0% 3623 4.5% 0.022
 Reye’s syndrome 0 0.0% 0 0.0% - 0 0.0% 0 0.0% -
 Screening, examinations and  disease management training 8372 7.9% 8472 7.1% 0.028 6073 7.5% 6126 7.6% −0.002
 Thrombotic and  thrombocytopenic purpura 4 0.0% 26 0.0% −0.016 4 0.0% 11 0.0% −0.009
 Weight loss 248 0.2% 781 0.7% −0.064 219 0.3% 188 0.2% 0.008
 Acute myocardial infarction 1640 1.5% 2207 1.9% −0.024 1251 1.6% 1191 1.5% 0.006
 Alzheimer’s disease 78 0.1% 614 0.5% −0.082 71 0.1% 134 0.2% −0.022

Table 1.

Continued

  Before matching After 1:1 propensity score matching
  SGLT2i initiators DPP4i initiators Standardized mean difference SGLT2i initiators DPP4i initiators Standardized mean difference
  Number/Mean %/SD Number/Mean %/SD Number/Mean %/SD Number/Mean %/SD
 Asthma 7494 7.1% 8813 7.4% −0.014 5670 7.0% 5684 7.1% −0.001
 Benign prostatic hyperplasia 5208 4.9% 7612 6.4% −0.065 4185 5.2% 4090 5.1% 0.005
 Cataract 14 747 13.9% 19 153 16.2% −0.063 11 137 13.8% 11 091 13.8% 0.002
 Chronic kidney disease 10 543 9.9% 17 564 14.8% −0.149 7884 9.8% 7526 9.3% 0.015
 Chronic obstructive pulmonary  disease 7262 6.8% 10 748 9.1% −0.082 5828 7.2% 5703 7.1% 0.006
 Depressive bipolar disorder 13 215 12.5% 14 561 12.3% 0.005 9540 11.8% 9501 11.8% 0.002
 Diabetes 106 142 100.0% 118 572 100.0% 0.004 80 509 100.0% 80 509 100.0% 0.000
 Glaucoma 6878 6.5% 9338 7.9% −0.054 5231 6.5% 5137 6.4% 0.005
 Heart failure 4375 4.1% 7822 6.6% −0.110 3437 4.3% 3252 4.0% 0.012
 Hip fracture 93 0.1% 344 0.3% −0.047 78 0.1% 58 0.1% 0.009
 Hyperlipidemia 81 260 76.6% 87 646 73.9% 0.061 60 287 74.9% 60 218 74.8% 0.002
 Hypertension 80 440 75.8% 90 103 76.0% −0.005 60 265 74.9% 60 177 74.7% 0.003
 Hyperthryoidism 15 875 15.0% 17 662 14.9% 0.002 11 692 14.5% 11 488 14.3% 0.007
 Ischemic heart disease 13 969 13.2% 17 381 14.7% −0.043 10 325 12.8% 10 091 12.5% 0.009
 Nonalzheimer’s dementia 287 0.3% 1747 1.5% −0.130 262 0.3% 424 0.5% −0.031
 Osteoporosis 1141 1.1% 2252 1.9% −0.068 969 1.2% 920 1.1% 0.006
 Parkinson 142 0.1% 488 0.4% −0.053 120 0.1% 105 0.1% 0.005
 Pneumonia 3712 3.5% 6208 5.2% −0.085 3020 3.8% 2893 3.6% 0.008
 Rheumatoid arthritis 20 303 19.1% 25 487 21.5% −0.059 15 499 19.3% 15 408 19.1% 0.003
 Stroke and transient ischemic  attack 2456 2.3% 4537 3.8% −0.088 1973 2.5% 1916 2.4% 0.005
 Attention deficit and  hyperactivity disorder 1299 1.2% 1144 1.0% 0.025 881 1.1% 871 1.1% 0.001
 Alcohol use 1042 1.0% 1365 1.2% −0.016 827 1.0% 845 1.0% −0.002
 Autism 48 0.0% 57 0.0% −0.001 35 0.0% 35 0.0% 0.000
 Anxiety disorder 12 323 11.6% 13 626 11.5% 0.004 9144 11.4% 9098 11.3% 0.002
 Bipolar disorder 1617 1.5% 1929 1.6% −0.008 1139 1.4% 1232 1.5% −0.010
 Cerebral palsy 31 0.0% 49 0.0% −0.006 24 0.0% 30 0.0% −0.004
 Cystic fybrosis 810 0.8% 845 0.7% 0.006 576 0.7% 568 0.7% 0.001
 Depressive disorder 11 533 10.9% 12 720 10.7% 0.004 8374 10.4% 8247 10.2% 0.005
 Drug use disorder 1223 1.2% 1463 1.2% −0.008 906 1.1% 933 1.2% −0.003
 Epilepsy 570 0.5% 895 0.8% −0.027 454 0.6% 435 0.5% 0.003
 Fibromylagia and chronic pain 16 575 15.6% 18 119 15.3% 0.009 12 159 15.1% 12 119 15.1% 0.001
 Human immunodeficiency virus 305 0.3% 403 0.3% −0.009 239 0.3% 238 0.3% 0.000
 Intellectual disability 49 0.0% 84 0.1% −0.010 36 0.0% 48 0.1% −0.007
 Learning disability 44 0.0% 85 0.1% −0.013 32 0.0% 40 0.0% −0.005
 Leukemia and lymphoma 657 0.6% 988 0.8% −0.025 507 0.6% 506 0.6% 0.000
 Liver disease 9089 8.6% 10 095 8.5% 0.002 6690 8.3% 6754 8.4% −0.003
 Migraine 4683 4.4% 5207 4.4% 0.001 3497 4.3% 3451 4.3% 0.003
 Mobility impairment 569 0.5% 1389 1.2% −0.069 460 0.6% 550 0.7% −0.014
 Muscular dystrophy 17 0.0% 26 0.0% −0.004 16 0.0% 13 0.0% 0.003
 Multiple sclerosis 313 0.3% 397 0.3% −0.007 241 0.3% 255 0.3% −0.003
 Obesity 42 081 39.6% 38 386 32.4% 0.152 29 189 36.3% 29 058 36.1% 0.003
 Opioid use disorder 751 0.7% 900 0.8% −0.006 546 0.7% 575 0.7% −0.004
 Developmental disorder 15 0.0% 20 0.0% −0.002 13 0.0% 11 0.0% 0.002
 Peripheral vascular disorder 5681 5.4% 9350 7.9% −0.102 4449 5.5% 4327 5.4% 0.007
 Personality disorder 1223 1.2% 1322 1.1% 0.004 877 1.1% 877 1.1% 0.000
 Post-traumatic stress disorder 659 0.6% 654 0.6% 0.009 454 0.6% 472 0.6% −0.003
 Pressure and chronic ulcer 2113 2.0% 3286 2.8% −0.051 1586 2.0% 1508 1.9% 0.007
 Schizophrenia 149 0.1% 300 0.3% −0.025 132 0.2% 151 0.2% −0.006
 Schizophrenic psychosis 421 0.4% 894 0.8% −0.047 358 0.4% 348 0.4% 0.002
 Blind and visual impairment 65 0.1% 178 0.2% −0.027 49 0.1% 74 0.1% −0.011
 Deaf and hearing impairment 3040 2.9% 4252 3.6% −0.041 2402 3.0% 2213 2.7% 0.014
 Spina bifida 38 0.0% 78 0.1% −0.013 31 0.0% 29 0.0% 0.001
 Spinal injury 151 0.1% 321 0.3% −0.028 118 0.1% 107 0.1% 0.004
 Tobacco use 6862 6.5% 8418 7.1% −0.025 5439 6.8% 5449 6.8% −0.000
 Traumatic brain injury 149 0.1% 243 0.2% −0.016 111 0.1% 127 0.2% −0.005

Table 1.

Continued

  Before matching After 1:1 propensity score matching
  SGLT2i initiators DPP4i initiators Standardized mean difference SGLT2i initiators DPP4i initiators Standardized mean difference
  Number/Mean %/SD Number/Mean %/SD Number/Mean %/SD Number/Mean %/SD
 Viral hepatitis 1620 1.5% 2751 2.3% −0.058 1347 1.7% 1286 1.6% 0.006
 Mental and physical impairment 3717 3.5% 5833 4.9% −0.071 2936 3.6% 2878 3.6% 0.004
Other comedications
 Gout medications 56 857 53.6% 63 665 53.7% −0.003 42 995 53.4% 42 772 53.1% 0.006
 Oxicam medications 19 350 18.2% 21 135 17.8% 0.011 14 496 18.0% 14 509 18.0% −0.000
 Sertraline 4683 4.4% 4981 4.2% 0.010 3351 4.2% 3359 4.2% −0.000
 Sulfa antibiotics 9487 8.9% 11 044 9.3% −0.013 7094 8.8% 7058 8.8% 0.002
Health care utilization characteristics
 Mean number of ambulatory  encounters 14.2 12.8 14.9 15.3 −0.053 13.8 12.8 13.7 13.6 0.008
 Mean number of emergency  room encounters 0.4 1.1 0.5 1.3 −0.092 0.4 1.1 0.4 1.0 0.007
 Mean number of inpatient  hospital encounters 0.1 0.4 0.2 0.6 −0.164 0.1 0.4 0.1 0.4 0.011
 Mean number of nonacute  institutional encounters 0.0 0.0 0.0 0.0 −0.004 0.0 0.0 0.0 0.0 −0.003
 Mean number of other  ambulatory encounters 3.0 5.0 3.7 7.5 −0.105 3.0 5.1 3.0 5.3 −0.013
 Mean number of filled  prescriptions 35.1 26.2 33.2 26.3 0.075 33.0 25.1 32.8 26.0 0.007
 Mean number of generics  dispensed 10.0 5.8 9.7 5.9 0.061 9.6 5.7 9.5 5.8 0.016
 Mean number of unique drug  classes dispensed 8.9 5.0 8.7 5.2 0.035 8.6 4.9 8.5 5.0 0.010

Abbreviations: SD: Standard deviation; %: Percentage.

After 1:1 propensity score matching, there were 80 510 pairs (Figure 3), corresponding to a reduction in sample size of approximately 25% after matching. All baseline characteristics were balanced after matching, as indicated by absolute standardized mean differences ≤.1 (Table 1). Figure S2 of the supplemental material shows the propensity score distribution of both groups before and after matching. From inpatient admissions, emergency department presentations, and ambulatory care, there were 215 133 incident outcomes among 45 444 SGLT2i initiators and 223 428 among 45 931 DPP4i initiators.

Figure 3.

Figure 3

Cohort attrition in preparing the analytic cohort for tree-based scan statistic analysis. ED: Emergency department; PS: Propensity score; T1DM(+): Presence of a diagnosis for type 1 diabetes; T2DM(-): Absence of a diagnosis for type 2 diabetes

Repurposing signals

In the original pruned outcome tree, there were 175 922 incident outcomes among SGLT2i initiators and 183 824 among DPP4i initiators, across 30 555 nodes (levels 3, 4, and 5). Tree-based scan statistics analysis using the original pruned tree yielded 18 statistical alerts (ie, prioritized associations that met the statistical threshold for alerting; P ≤ .01; Table 2). The statistical alerts were predominantly outcomes relating to kidney diseases, anemia, and clinical symptoms, such as edema and dyspnea. As for the expected signals, CKD (N18) was identified as the most likely node (P = .0001), while heart failure (I50) was the first node that fell beyond the threshold for prioritization (P = .0167). We present the complete list of inverse associations with P <1 in Table S6 of the supplemental material.

Table 2.

Tree-based scan statistics for associations between SGLT2i vs DPP4i and outcomes (only for associations with P < .1).a

Node Description Total outcomes Observed outcomes (SGLT2i) Expected putcomes b  
(SGLT2i)
Observed: expected outcomes (SGLT2i) Log likelihood ratio  
(scan statistic)
P value
Associations with statistical alert (P ≤ .01)
N18 Chronic kidney disease (CKD) 1470 594 735 0.81 27.21738 0.0001
N18.3 Chronic kidney disease, stage 3 (moderate) 722 270 361 0.75 23.18839 0.0001
D64 Other anemias 1415 581 707.5 0.82 22.7401 0.0001
D64.9 Anemia, unspecified 1356 556 678 0.82 22.07283 0.0001
R60.0 Localized edema 941 371 470.5 0.79 21.20169 0.0001
R60 Edema, not elsewhere classified 1564 656 782 0.84 20.39056 0.0001
I12 Hypertensive chronic kidney disease 833 333 416.5 0.8 16.85408 0.0001
E83.4 Disorders of magnesium metabolism 307 106 153.5 0.69 14.94276 0.0002
I12.9 Hypertensive chronic kidney disease with stage 1-4 or unspecified chronic kidney disease 802 324 401 0.81 14.87777 0.0002
R06 Abnormalities of breathing 4176 1920 2088 0.92 13.53186 0.0005
R80 Proteinuria 852 351 426 0.82 13.2733 0.0006
E83.42 Hypomagnesemia 287 101 143.5 0.7 12.7779 0.0008
N25.8 Other disorders resulting from impaired renal tubular function 74 17 37 0.46 11.41062 0.0055
R06.0 Dyspnea 3177 1454 1588.5 0.92 11.40191 0.0055
R80.9 Proteinuria, unspecified 754 312 377 0.83 11.26309 0.0056
D63.1 Anemia in chronic kidney disease 124 36 62 0.58 11.24766 0.0058
N20 Calculus of kidney and ureter 1070 458 535 0.86 11.12082 0.0066
I51 Complications and ill-defined descriptions of heart disease 1232 534 616 0.87 10.94805 0.0089
Nonsignificant associations (P > 0.01)a
I50 Heart failure 846 357 423 0.84 10.34007 0.0167
R14.2 Eructation 42 7 21 0.33 10.18861 0.018
N25 Disorders resulting from impaired renal tubular function 95 26 47.5 0.55 10.09454 0.0198
D63 Anemia in chronic diseases classified elsewhere 217 76 108.5 0.7 9.886091 0.0236
N25.81 Secondary hyperparathyroidism of renal origin 70 17 35 0.49 9.715734 0.0266
R09 Other symptoms and signs involving the circulatory and respiratory system 1967 887 983.5 0.9 9.483731 0.0372
N13.2 Hydronephrosis with renal and ureteral calculous obstruction 127 40 63.5 0.63 8.907116 0.0713
N19 Unspecified kidney failure 119 37 59.5 0.62 8.72376 0.0784
a

Refer to table in Table S6 for associations with P > .1.

b

The number of expected outcomes at a node was calculated as half of the total number of outcomes from both exposure and comparator group.

Sensitivity analyses

After further pruning of the outcome tree, the total node count scanned decreased by 14% from 30 555 to 26 288 (Table S3). The number of incident outcomes decreased by 24% after additional pruning (133 821 incident outcomes among SGLT2i initiators; 139 083 among DPP4i initiators). The analysis using this further pruned tree yielded a total of 12 statistical alerts (P ≤ .01; Table S7). All the inverse associations in this sensitivity analysis were identified (and in the same order as) in the primary analysis using the original pruned tree, but most with lower P values, such as for heart failure (P = .0167 [original pruned tree] vs P = .0134 [further pruned tree]). No additional inverse associations with P <1 were identified using the further pruned tree.

When scanning additionally at level 2 of the original outcome tree, TBSS analysis yielded a total of 15 statistical alerts (P ≤ .01; Table S8). Notably, several level 2 outcome nodes were also identified as statistical alerts. As for the expected signals, CKD (N18) remained as one of the statistical alerts (P = .0027); heart failure (I50) remained slightly beyond the threshold for prioritization (P = .0333).

When restricting incident outcomes to diagnoses from inpatient admission and emergency department presentations only, there were 29 773 incident outcomes among 5942 SGLT2i initiators and 34 001 incident outcomes among 6473 DPP4i initiators. The analysis conducted without including diagnoses from ambulatory care yielded a total of 5 statistical alerts (P ≤ .01; Table S9). Both expected signals were not included as statistical alerts (CKD, P = .9695; heart failure, P = .2922).

Discussion

This study demonstrated a novel implementation of TBSS to generate drug repurposing hypotheses. Our test case using the glucose-lowering drug class, SGLT2i, identified the 2 expected signals, CKD and heart failure, that align with newly approved indications of SGLT2i in recent years. Chronic kidney disease was identified as the most statistically significant alert (P = .001); heart failure (P = .0167) fell just beyond the statistical alert threshold (P ≤ .01), which might be influenced by specifications of the statistical alert threshold and outcome tree (discussed later on), in addition to the number of events and the magnitude of the observed association. Furthermore, most of the statistical alerts could be related to clinical signs, symptoms, and abnormal laboratory results linked to heart failure and/or CKD, such as dyspnea, edema, and proteinuria.37,38 Statistical alerts pertaining to anemia are also suggestive of complications of heart failure and/or CKD.39,40 However, previous clinical studies have reported an association between SGLT2i use and improved hematocrit,4143 which was supported by emerging evidence that SGLT2i may stimulate erythropoiesis independent of its diuresis effect.4446

The multiplicity-adjusted P value was used as a metric to prioritize nodes to be evaluated as repurposing signals, similar to previous studies using TBSS to look for drug safety signals.5,10,12,47 A list of statistical alerts drew our attention to nodes with reduced risk of outcomes associated with SGLT2i use that occurred least likely due to chance. We used a conservative P value threshold of 0.01, but a standard significance level of 0.05 has been used in some TBSS studies for safety signals.10,48 The threshold for prioritization may also be further relaxed (eg, P ≤ .1) in underpowered studies, such as for rare diseases.48 It is important to note the arbitrary nature of prespecifying the statistical alert threshold for prioritization, and one should consider the trade-off between minimizing the false-positive rate and missing some “true” repurposing signals when using a more stringent threshold. If the significance level was relaxed to 0.05 in our study, there would be 6 additional statistical alerts, which would have then included heart failure. Moreover, this finding also highlights the value of having clinicians review not only associations meeting the prespecified threshold for prioritization but also those falling slightly beyond the arbitrary threshold. In fact, a clinician might have been able to point out heart failure as a repurposing signal by piecing together a clinical picture based on many of the clinical signs and symptoms of heart failure that were among the statistical alerts (eg, dyspnea and edema). Additionally, one could also consider circumventing the use of a threshold for statistical alerting and review the list of associations (prioritized in the order of increasing P value) with an emphasis on associations with lower P values. Although some studies have also prioritized associations based on their magnitude in addition to statistical significance,47 we did not do this because some collateral drug benefits may have a relatively small effect size but lead to substantial public health implications due to the prevalence of the condition. We also did not prioritize associations based on absolute effect size, as a small absolute effect size may still suggest important novel therapeutics for rare or orphan diseases. Lastly, the multiplicity adjustment employed in TBSS accounts for dependencies between associations evaluated and is more correct than traditional methods of accounting for multiple testing, such as Bonferroni corrections, which could have led to higher rates of false negatives.47

This study demonstrated several other important considerations when designing drug repurposing studies using TBSS. Notably, the size of the outcome tree (ie, total number of nodes scanned) affects the ability to prioritize associations and identify repurposing signals. When using a “narrower” tree with a lower total number of nodes scanned, the maximum likelihood ratios generated from the 9999 simulated data sets decrease and have a narrower distribution across the smaller number of nodes. This effectively increases the probability for likelihood ratios of the observed nodes to rank higher, which leads to smaller P values for the same node and makes it easier to be prioritized. In our test case of SGLT2i, the P value for heart failure was slightly smaller after further pruning the outcome tree (P = .0167 before vs P = .0134 after; Table S7). Outcome trees may be pruned to remove outcomes that are of less interest or less informative, such as nonspecific signs and symptoms, or those potentially affected by incorrect temporality relative to exposure, for example, reverse causation for cancer outcomes. A step further, the entire tree could be restricted to a specific therapeutic area if there is warranted a priori knowledge, which would address a different and more targeted research aim (eg, identifying repurposing signals of SGLT2i for cardiovascular outcomes). If we had restricted the outcome tree to only cardiovascular diseases (I00-I99; post hoc analysis), then the P value for heart failure would have notably decreased further and met the prespecified threshold for prioritization (P = .0167 before vs P = .0018 after; see Table S10).

The hierarchical grouping of diagnoses within the outcome tree influences the identification of potential repurposing signals using TBSS. Clinical conditions whose grouping of parent diagnoses and related diagnoses fall within the same branch or spatially close to each other within the tree will lead to larger aggregated sample sizes at the nodes, translating to greater power to detect potential signals. In our study, dyspnea (R06.0; P = .0055) was identified as a more likely cut than heart failure (I50; P = .0167) due to the significantly larger sample size of outcomes (3177 vs 846). It is possible that heart failure could have been identified as a more likely node (ie, a lower P value) if the increased occurrence of dyspnea, a common yet spatially distant (within the tree) clinical presentation of heart failure, could be considered. However, it is important to acknowledge that subclinical symptoms, such as dyspnea, can suggest a myriad of other parent diagnoses, such as obstructive respiratory diseases. Furthermore, parent diagnoses of interest may be grouped at higher hierarchical levels of the tree and scanning at higher levels of the tree may impact the associations prioritized. The statistical significance of the scan statistics for liver diseases increased when scanning additionally at level 2 of the outcome tree (K70-K77; P = .0452; Table S8). Indeed, previous clinical studies have reported reduced hepatitis fibrosis and steatosis from SGLT2i use.49,50 This finding has been attributed to various pharmacological effects, including a reduction in oxidative stress and inflammation.51,52 However, it is important to note that scanning at a higher level of the tree requires a more stringent incidence criterion (ie, defining the incidence of outcomes at the highest level of the tree scanned). Similar to reducing the size of the outcome tree via pruning, scanning across fewer levels (eg, at level 3 only) will theoretically increase statistical power. However, one should consider the trade-off between power and the possibility to detect potentially important associations at a finer-grained level, which may be more useful for drug repurposing. Lastly, future studies may also customize or construct a bespoke outcome tree with an enhanced grouping of diagnoses.

Another consideration is the clinical settings from which the outcome data are sourced, as it influences the overall number of outcomes, as well as the prevalence and distribution of recorded diagnoses. In our study, using diagnoses from inpatient admissions, emergency department presentations, and ambulatory care conferred an approximately 7-fold larger number of incident outcomes (n = 438 561) compared to using diagnoses from inpatient admissions and emergency department presentations only (n = 63 774). Moreover, the total number of incident outcomes was slightly more balanced between SGLT2i initiators and DPP4i initiators when considering diagnoses from all 3 clinical settings (Figure 2). This balance may suggest better exchangeability between the analytical cohorts, as people have a comparable number of incident outcomes diagnosed during follow-up regardless of the study drug received.53 Inpatient admissions and emergency department presentations may better capture acute medical events or diseases of greater severity, while ambulatory care may provide a more comprehensive record of subacute medical conditions or milder stages of diseases. Therefore, including diagnoses from all clinical settings may provide a more complete picture of clinical outcomes. For example, in our study, both expected signals were not identified as statistical alerts when restricting the analysis to diagnoses from only inpatient admission and emergency department presentations (heart failure, P = .2922; CKD, P = .9695; Table S9).

Our study had some notable strengths. First, we used data from a large set of linked administrative databases capturing longitudinal records of healthcare utilization and outcomes from hospital and ambulatory settings, which provided a large cohort size and a large number of events across the hierarchical outcome tree, especially at finer-grained levels. Second, we used an example drug class where recently approved indications for these medications could serve as positive controls (ie, expected repurposing signals) to evaluate the utility of the methodology.

However, our study had several limitations. First, there may have been residual confounding in the observed associations since granular clinical characteristics (eg, renal function and glycaemic control) were not available in the data. Furthermore, SGLT2i was initially contraindicated in individuals with poor renal function, and there was an early perception of less benefit with SGLT2i use in people with impaired renal function.54 This might have introduced some confounding by indication in the observed inverse association between SGLT2i use and renal-related outcomes. Although we accounted for a general list of common confounders across all the outcomes assessed, prioritized repurposing signals would need to be further scrutinized and validated in a follow-up pharmacoepidemiologic studies with more tailored confounding control specific to the drug-outcome pair or a randomized trial. Second, signals identified using TBSS could theoretically suggest potential safety signals of the comparator drug instead of potential repurposing signals of the exposure drug. This concern can be mitigated by excluding signals for known adverse effects of the comparator drug when evaluating the prioritized nodes. Third, we used a 1-year lookback period to ascertain baseline comorbidities which may have resulted in under-ascertainment. A longer lookback period could have improved the sensitivity of capturing chronic comorbidities but limited the sample size and representativeness of the study population. Fourth, we censored individuals if one person from the matched pair was censored for any of these reasons. This design was required for the propensity score-matched TBSS approach, but it reduced the number of events and hence power of the analyses. Fifth, we did not use a baseline washout period for outcomes, which means prevalent health outcomes, especially chronic diseases, may have been included. Sixth, MarketScan data have not included death data since 2016 for patient privacy.55 Hence, censoring due to deaths might not be complete. Last, while MarketScan data are nationally representative of individuals in the United States with employer-sponsored insurance, who account for a significant portion (~65%) of the population,56 it is possible that these data may not be generalizable to individuals with public insurance, such as Medicare and Medicaid services, as well as those who are uninsured.

Conclusion

In our case study using the class of SGLT2i drugs, TBSS was able to identify expected repurposing signals representing new additional indications recently approved for this drug class. Several potential repurposing signals, such as for anemia and liver disease, were detected and should be further investigated. There are several important considerations when conducting TBSS for drug repurposing, including the statistical threshold used to prioritize associations, specification of the outcome tree, and clinical settings used to capture outcomes. Future studies could apply this methodology to other drugs of interest to generate repurposing hypotheses from RWD.

Author contributions

G.S.Q.T. contributed to the design of the study, performed the statistical analysis and literature search, and wrote and revised the manuscript. X.L., J.W., J.C.M., S.V.W., J.I.M., and J.I. contributed to the design of the study and revision of the manuscript. S.T. contributed to the acquisition of data, design of the study, and revision of the manuscript. G.S.Q.T. is the guarantor of this work and, as such, had full access to all the data in the study, and takes responsibility for the integrity of the data and the accuracy of the data analyses.

Supplementary Material

Web_Material_kwae355
web_material_kwae355.zip (185.4KB, zip)

Acknowledgments

We thank Jenice Ko from Harvard Pilgrim Health Care Institute for her assistance with the Sentinel Routine Query Modules, and Dr. Thuy Thai from Harvard Pilgrim Health Care Institute for her help with using and interpreting results from the TreeScan software.

Contributor Information

George S Q Tan, Centre for Medicine Use and Safety, Faculty of Pharmacy and Pharmaceutical Sciences, Monash University, Parkville, Australia; Baker Heart and Diabetes Institute, Melbourne, Australia.

Judith C Maro, Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, United States.

Shirley V Wang, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States.

Sengwee Toh, Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, United States; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, United States.

Jedidiah I Morton, Centre for Medicine Use and Safety, Faculty of Pharmacy and Pharmaceutical Sciences, Monash University, Parkville, Australia; Baker Heart and Diabetes Institute, Melbourne, Australia.

Jenni Ilomäki, Centre for Medicine Use and Safety, Faculty of Pharmacy and Pharmaceutical Sciences, Monash University, Parkville, Australia.

Jenna Wong, Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, United States.

Xiaojuan Li, Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, United States.

Supplementary material

Supplementary material is available at American Journal of Epidemiology online.

Funding

G.S.Q.T. was supported by the Monash Graduate Scholarship and the Enhanced Research Experience program, Monash University, Australia. J.C.M. received support from the Harvard Pilgrim Health Care Institute Robert H. Ebert Career Development Award. X.L. received support from grant K01AG073651 from the National Institute on Aging.

Conflict of interest

S.V.W. has consulted for Veracity Healthcare Analytics, Exponent Inc, and MITRE an FFRDC for the Centers for Medicare and Medicaid for unrelated work. S.T. consults for Pfizer, Inc. and TriNetX, LLC. for unrelated work. J.I. has received funding from AstraZeneca, PLC., and Amgen, Inc. for unrelated work.

Data availability

The MarketScan data that support the findings of this study are available from Merative, which was licensed for use by Harvard Pilgrim Health Care Institute. Restrictions apply to the availability of these data, and so they are not publicly available. Results are however available from the authors upon reasonable request and according to the data-use agreement. The computing codes were from Sentinel Routine Query Modules (version 12.1.2), namely the Cohort Identification and Descriptive Analysis, Propensity Score Analysis, and Signal Identification modules.

References

  • 1. Baker  NC, Ekins  S, Williams  AJ. A bibliometric review of drug repurposing. Drug Discov Today. 2018;23(3):661-672. 10.1016/j.drudis.2018.01.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Parvathaneni  V, Kulkarni  NS, Muth  A. Drug repurposing: a promising tool to accelerate the drug discovery process. Drug Discov Today. 2019;24(10):2076-2085. 10.1016/j.drudis.2019.06.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Pushpakom  S, Iorio  F, Eyers  PA, et al.  Drug repurposing: progress, challenges and recommendations. Nat Rev Drug Discov. 2019;18(1):41-58. 10.1038/nrd.2018.168 [DOI] [PubMed] [Google Scholar]
  • 4. Roy  S, Dhaneshwar  S, Bhasin  B. Drug repurposing: an emerging tool for drug reuse, recycling and discovery. Curr Drug Res Rev. 2021;13(2):101-119. 10.2174/2589977513666210211163711 [DOI] [PubMed] [Google Scholar]
  • 5. Wang  SV, Maro  JC, Gagne  JJ, et al.  A general propensity score for signal identification using tree-based scan statistics. Am J Epidemiol. 2021;190(7):1424-1433. 10.1093/aje/kwab034 [DOI] [PubMed] [Google Scholar]
  • 6. Tan  GSQ, Sloan  EK, Lambert  P, et al.  Drug repurposing using real-world data. Drug Discov Today. 2023;28(1):103422. 10.1016/j.drudis.2022.103422 [DOI] [PubMed] [Google Scholar]
  • 7. Liu  F, Panagiotakos  D. Real-world data: a brief review of the methods, applications, challenges and opportunities. BMC Med Res Methodol. 2022;22(1):287. 10.1186/s12874-022-01768-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Brown  JS, Maro  JC, Nguyen  M. Using and improving distributed data networks to generate actionable evidence: the case of real-world outcomes in the Food and Drug Administration's sentinel system. J Am Med Inform Assoc. 2020;27(5):793-797. 10.1093/jamia/ocaa028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Kulldorff  M, Fang  Z, Walsh  SJ. A tree-based scan statistic for database disease surveillance. Biometrics. 2003;59(2):323-331. 10.1111/1541-0420.00039 [DOI] [PubMed] [Google Scholar]
  • 10. Kulldorff  M, Dashevsky  I, Avery  TR, et al.  Drug safety data mining with a tree-based scan statistic. Pharmacoepidemiol Drug Saf. 2013;22(5):517-523. 10.1002/pds.3423 [DOI] [PubMed] [Google Scholar]
  • 11. Yih  WK, Kulldorff  M, Dashevsky  I. Using the self-controlled tree-temporal scan statistic to assess the safety of live attenuated herpes zoster vaccine. Am J Epidemiol. 2019;188(7):1383-1388. 10.1093/aje/kwz104 [DOI] [PubMed] [Google Scholar]
  • 12. Wang  SV, Maro  JC, Baro  E, et al.  Data Mining for Adverse Drug Events with a propensity score-matched tree-based scan statistic. Epidemiology. 2018;29(6):895-903. 10.1097/EDE.0000000000000907 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Yih  WK, Daley  MF, Duffy  J, et al.  A broad assessment of covid-19 vaccine safety using tree-based data-mining in the vaccine safety datalink. Vaccine. 2023;41(3):826-835. 10.1016/j.vaccine.2022.12.026 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. McGuire  DK, Shih  WJ, Cosentino  F, et al.  Association of SGLT2 inhibitors with cardiovascular and kidney outcomes in patients with type 2 diabetes: a meta-analysis. JAMA Cardiol. 2021;6(2):148-158. 10.1001/jamacardio.2020.4511 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Nuffield Department of Population Health Renal Studies G, Consortium SiM-AC-RT . Impact of diabetes on the effects of sodium glucose co-transporter-2 inhibitors on kidney outcomes: collaborative meta-analysis of large placebo-controlled trials. Lancet. 2022;400(10365):1788-1801. 10.1016/S0140-6736(22)02074-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Zelniker  TA, Wiviott  SD, Raz  I, et al.  SGLT2 inhibitors for primary and secondary prevention of cardiovascular and renal outcomes in type 2 diabetes: a systematic review and meta-analysis of cardiovascular outcome trials. Lancet. 2019;393(10166):31-39. 10.1016/S0140-6736(18)32590-X [DOI] [PubMed] [Google Scholar]
  • 17. Fadiran  O, Nwabuo  C. The evolution of sodium-glucose Co-Transporter-2 inhibitors in heart failure. Cureus. 2021;13(11):e19379. 10.7759/cureus.19379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Heerspink  HJL, Stefansson  BV, Correa-Rotter  R, et al.  Dapagliflozin in patients with chronic kidney disease. N Engl J Med. 2020;383(15):1436-1446. 10.1056/NEJMoa2024816 [DOI] [PubMed] [Google Scholar]
  • 19. Giorgino  F, Vora  J, Fenici  P. Renoprotection with SGLT2 inhibitors in type 2 diabetes over a spectrum of cardiovascular and renal risk. Cardiovasc Diabetol. 2020;19(1):196. 10.1186/s12933-020-01163-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Kaneto  H, Obata  A, Kimura  T, et al.  Unexpected pleiotropic effects of SGLT2 inhibitors: pearls and pitfalls of this novel antidiabetic class. Int J Mol Sci. 2021;22(6):3062. 10.3390/ijms22063062 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Butler  AM, Nickel  KB, Overman  RA, et al.  IBM MarketScan Research Databases. In: Sturkenboom  M, Schink  T, eds. Databases for Pharmacoepidemiological Research. Cham: Springer International Publishing; 2021:243-251. [Google Scholar]
  • 22. Kulaylat  AS, Schaefer  EW, Messaris  E. Truven health analytics MarketScan databases for clinical research in colon and rectal surgery. Clin Colon Rectal Surg. 2019;32(1):54-60. 10.1055/s-0038-1673354 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. ElSayed  NA, Aleppo  G, Aroda  VR, et al.  9. Pharmacologic approaches to glycemic treatment: standards of Care in Diabetes-2023. Diabetes Care. 2023;46(Suppl 1):S140-S157. 10.2337/dc23-S009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Kosiborod  M, Cavender  MA, Fu  AZ, et al.  Lower risk of heart failure and death in patients initiated on sodium-glucose Cotransporter-2 inhibitors versus other glucose-lowering drugs: the CVD-REAL study (comparative effectiveness of cardiovascular outcomes in new users of sodium-glucose Cotransporter-2 inhibitors). Circulation. 2017;136(3):249-259. 10.1161/CIRCULATIONAHA.117.029190 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Huang  W, Whitelaw  J, Kishore  K, et al.  The comparative epidemiology and outcomes of hospitalized patients treated with SGLT2 or DPP4 inhibitors. J Diabetes Complications. 2021;35(12):108052. 10.1016/j.jdiacomp.2021.108052 [DOI] [PubMed] [Google Scholar]
  • 26. D'Andrea  E, Wexler  DJ, Kim  SC, et al.  Comparing effectiveness and safety of SGLT2 inhibitors vs DPP-4 inhibitors in patients with type 2 diabetes and varying baseline HbA1c levels. JAMA Intern Med. 2023;183(3):242-254. 10.1001/jamainternmed.2022.6664 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Tan  GSQ, Morton  JI, Wood  S, et al.  SGLT-2 inhibitor use and cause-specific hospitalization rates: an outcome-wide study to identify novel associations of SGLT-2 inhibitors. Clin Pharmacol Ther. 2024;115(6):1304-1315. 10.1002/cpt.3194 [DOI] [PubMed] [Google Scholar]
  • 28. McMurray  JJV, Solomon  SD, Inzucchi  SE, et al.  Dapagliflozin in patients with heart failure and reduced ejection fraction. N Engl J Med. 2019;381(21):1995-2008. 10.1056/NEJMoa1911303 [DOI] [PubMed] [Google Scholar]
  • 29. Sakshaug  JW, Weir  DR, Nicholas  LH. Identifying diabetics in Medicare claims and survey data: implications for health services research. BMC Health Serv Res. 2014;14(1):150. 10.1186/1472-6963-14-150 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Schneeweiss  S, Rassen  JA, Brown  JS, et al.  Graphical depiction of longitudinal study designs in health care databases. Ann Intern Med. 2019;170(6):398-406. 10.7326/M18-3079 [DOI] [PubMed] [Google Scholar]
  • 31. Gagne  JJ, Glynn  RJ, Avorn  J, et al.  A combined comorbidity score predicted mortality in elderly patients better than existing scores. J Clin Epidemiol. 2011;64(7):749-759. 10.1016/j.jclinepi.2010.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Sun  JW, Rogers  JR, Her  Q, et al.  Adaptation and validation of the combined comorbidity score for ICD-10-CM. Med Care. 2017;55(12):1046-1051. 10.1097/MLR.0000000000000824 [DOI] [PubMed] [Google Scholar]
  • 33. Chang  HY, Weiner  JP, Richards  TM, et al.  Validating the adapted diabetes complications severity index in claims data. Am J Manag Care. 2012;18(11):721-726. [PubMed] [Google Scholar]
  • 34. Ranganathan  P, Pramesh  CS, Buyse  M. Common pitfalls in statistical analysis: the perils of multiple testing. Perspect Clin Res. 2016;7(2):106-107. 10.4103/2229-3485.179436 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Amrhein  V, Greenland  S, McShane  B. Scientists rise up against statistical significance. Nature. 2019;567(7748):305-307. 10.1038/d41586-019-00857-9 [DOI] [PubMed] [Google Scholar]
  • 36. Pottegard  A, Friis  S, Sturmer  T, et al.  Considerations for Pharmacoepidemiological studies of drug-cancer associations. Basic Clin Pharmacol Toxicol. 2018;122(5):451-459. 10.1111/bcpt.12946 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Watson  RD, Gibbs  CR, Lip  GY. ABC of heart failure. Clinical features and complications. BMJ. 2000;320(7229):236-239. 10.1136/bmj.320.7229.236 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Webster  AC, Nagler  EV, Morton  RL. Chronic kidney disease. The Lancet. 2017;389(10075):1238-1252. 10.1016/S0140-6736(16)32064-5 [DOI] [PubMed] [Google Scholar]
  • 39. Silverberg  DS, Wexler  D, Blum  M, et al.  The use of subcutaneous erythropoietin and intravenous iron for the treatment of the anemia of severe, resistant congestive heart failure improves cardiac and renal function and functional cardiac class, and markedly reduces hospitalizations. J Am Coll Cardiol. 2000;35(7):1737-1744. 10.1016/S0735-1097(00)00613-6 [DOI] [PubMed] [Google Scholar]
  • 40. Portoles  J, Martin  L, Broseta  JJ, et al.  Anemia in chronic kidney disease: from pathophysiology and current treatments, to future agents. Front Med (Lausanne). 2021;8:642296. 10.3389/fmed.2021.642296 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Docherty  KF, Curtain  JP, Anand  IS, et al.  Effect of dapagliflozin on anaemia in DAPA-HF. Eur J Heart Fail. 2021;23(4):617-628. 10.1002/ejhf.2132 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Murashima  M, Tanaka  T, Kasugai  T, et al.  Sodium-glucose cotransporter 2 inhibitors and anemia among diabetes patients in real clinical practice. J Diabetes Investig. 2022;13(4):638-646. 10.1111/jdi.13717 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Maruyama  T, Takashima  H, Oguma  H, et al.  Canagliflozin improves erythropoiesis in diabetes patients with anemia of chronic kidney disease. Diabetes Technol Ther. 2019;21(12):713-720. 10.1089/dia.2019.0212 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Ghanim  H, Abuaysheh  S, Hejna  J, et al.  Dapagliflozin suppresses hepcidin and increases erythropoiesis. J Clin Endocrinol Metab. 2020;105(4):e1056-e1063. 10.1210/clinem/dgaa057 [DOI] [PubMed] [Google Scholar]
  • 45. Marathias  KP, Lambadiari  VA, Markakis  KP, et al.  Competing effects of renin angiotensin system blockade and sodium-glucose Cotransporter-2 inhibitors on erythropoietin secretion in diabetes. Am J Nephrol. 2020;51(5):349-356. 10.1159/000507272 [DOI] [PubMed] [Google Scholar]
  • 46. Osonoi  T, Shirabe  S, Saito  M, et al.  Dapagliflozin improves erythropoiesis and iron metabolism in type 2 diabetic patients with renal anemia. Diabetes Metab Syndr Obes. 2023;16:1799-1808. 10.2147/DMSO.S411504 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Wang  SV, Kulldorff  M, Poor  S, et al.  Screening medications for association with progression to wet age-related macular degeneration. Ophthalmology. 2021;128(2):248-255. 10.1016/j.ophtha.2020.08.004 [DOI] [PubMed] [Google Scholar]
  • 48. Suarez  EA, Nguyen  M, Zhang  D, et al.  Novel methods for pregnancy drug safety surveillance in the FDA sentinel system. Pharmacoepidemiol Drug Saf. 2023;32(2):126-136. 10.1002/pds.5512 [DOI] [PubMed] [Google Scholar]
  • 49. Zhou  P, Tan  Y, Hao  Z, et al.  Effects of SGLT2 inhibitors on hepatic fibrosis and steatosis: a systematic review and meta-analysis. Front Endocrinol (Lausanne). 2023;14:1144838. 10.3389/fendo.2023.1144838 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Hsiang  JC, Wong  VW. SGLT2 inhibitors in liver patients. Clin Gastroenterol Hepatol. 2020;18(10):2168-2172.e2. 10.1016/j.cgh.2020.05.021 [DOI] [PubMed] [Google Scholar]
  • 51. Androutsakos  T, Nasiri-Ansari  N, Bakasis  AD, et al.  SGLT-2 inhibitors in NAFLD: expanding their role beyond diabetes and Cardioprotection. Int J Mol Sci. 2022;23(6). 10.3390/ijms23063107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Miyamoto  Y, Honda  A, Yokose  S, et al.  The effects of SGLT2 inhibitors on liver cirrhosis patients with refractory ascites: a literature review. J Clin Med. 2023;12(6):2253. 10.3390/jcm12062253 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Grimes  DA, Schulz  KF. Bias and causal associations in observational research. Lancet. 2002;359(9302):248-252. 10.1016/S0140-6736(02)07451-2 [DOI] [PubMed] [Google Scholar]
  • 54. Davidson  JA. SGLT2 inhibitors in patients with type 2 diabetes and renal disease: overview of current evidence. Postgrad Med. 2019;131(4):251-260. 10.1080/00325481.2019.1601404 [DOI] [PubMed] [Google Scholar]
  • 55. Xie  F, Beukelman  T, Sun  D, et al.  Identifying inpatient mortality in MarketScan claims data using machine learning. Pharmacoepidemiol Drug Saf. 2023;32(11):1299-1305. 10.1002/pds.5658 [DOI] [PubMed] [Google Scholar]
  • 56. Bundorf  MK, Gupta  S, Kim  C. Trends in US health insurance coverage during the COVID-19 pandemic. JAMA Health Forum. 2021;2(9):e212487. 10.1001/jamahealthforum.2021.2487 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web_Material_kwae355
web_material_kwae355.zip (185.4KB, zip)

Data Availability Statement

The MarketScan data that support the findings of this study are available from Merative, which was licensed for use by Harvard Pilgrim Health Care Institute. Restrictions apply to the availability of these data, and so they are not publicly available. Results are however available from the authors upon reasonable request and according to the data-use agreement. The computing codes were from Sentinel Routine Query Modules (version 12.1.2), namely the Cohort Identification and Descriptive Analysis, Propensity Score Analysis, and Signal Identification modules.


Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES