Validation of a Risk Score for Cancer-Associated Thrombosis Using Nationwide EHR Data

Ang Li; Omid Jafari; Barbara D Lam; Jun Y Jiang; Rock Bum Kim; Shengling Ma; Emily Zhou; Joyce W Tiong; Elizabeth C Chiang; Justine Ryu; Christopher I Amos; Jennifer La; Nathanael R Fillmore

doi:10.1001/jamanetworkopen.2025.44428

. 2025 Nov 25;8(11):e2544428. doi: 10.1001/jamanetworkopen.2025.44428

Validation of a Risk Score for Cancer-Associated Thrombosis Using Nationwide EHR Data

Ang Li ^1,^✉, Omid Jafari ¹, Barbara D Lam ², Jun Y Jiang ¹, Rock Bum Kim ¹, Shengling Ma ¹, Emily Zhou ³, Joyce W Tiong ⁴, Elizabeth C Chiang ⁴, Justine Ryu ⁵, Christopher I Amos ⁶, Jennifer La ^7,⁸, Nathanael R Fillmore ^7,⁸

¹Section of Hematology-Oncology, Baylor College of Medicine, Houston, Texas

²Division of Hematology & Oncology, Fred Hutch Cancer Center, University of Washington, Seattle

³McGovern Medical School, University of Texas Health Science Center at Houston

⁴School of Medicine, Baylor College of Medicine, Houston, Texas

⁵Department of Medicine, Section of Hematology, Yale School of Medicine, New Haven, Connecticut

⁶Population Sciences and Cancer Control, University of New Mexico, Alburquerque

⁷Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston, Massachusetts

⁸VA Boston Healthcare System, Harvard Medical School, Boston, Massachusetts

Accepted for Publication: September 26, 2025.

Published: November 25, 2025. doi:10.1001/jamanetworkopen.2025.44428

^✉

Corresponding Author: Ang Li, MD, MS, Section of Hematology-Oncology, Baylor College of Medicine, One Baylor Plaza, 011DF, Houston, TX 77030 (ang.li2@bcm.edu).

Author Contributions: Drs Li and Jafari had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Li, Ma, La.

Acquisition, analysis, or interpretation of data: Li, Jafari, Jiang, Lam, Kim, Ma, Zhou, Tiong, Chiang, Ryu, Amos, Fillmore.

Drafting of the manuscript: Li, Lam, Amos, Fillmore.

Critical review of the manuscript for important intellectual content: Li, Jafari, Jiang, Lam, Kim, Ma, Zhou, Tiong, Chiang, Ryu, Amos, La.

Statistical analysis: Li, Jafari, Kim, Ma.

Obtained funding: Li.

Administrative, technical, or material support: Li, Jafari, Tiong, Chiang, Amos, La, Fillmore.

Supervision: Li.

Conflict of Interest Disclosures: Dr Li, a Cancer Prevention and Research Institute of Texas (CPRIT) Scholar in Cancer Research, was supported by CPRIT (No. RR190104); the National Institute of Health (NIH)’s National Heart, Lung, and Blood Institute (No. K23 HL159271); the NIH’s Artificial Intelligence/Machine Learning Consortium to Advance Health Equity and Researcher Diversity (AIM-AHEAD) (No. 3OT2OD032581-01S1); American Society of Hematology (ASH) Scholar Award; and Conquer Cancer (ASCO) Career Development Award, all outside of the current work. Dr Jiang was supported by CPRIT (No. RP210027), via the Baylor College of Medicine Comprehensive Cancer Training Program, outside of the current work. Drs La and Fillmore were supported by American Heart Association (No. 857078) outside of the current work. Dr Fillmore reported grants from VA Cooperative Studies Program and grants from American Heart Association during the conduct of the study; he reported grants from Bayer and grants from Merck outside the submitted work. No other disclosures were reported.

Data Sharing Statement: See Supplement 2.

Additional Contributions: We would like to acknowledge and thank the Epic Cosmos team for creating, maintaining, and curating the Cosmos database used in the current study.

^✉

Corresponding author.

PMCID: PMC12648341 PMID: 41288979

Key Points

Question

Can routinely collected electronic health record (EHR) data from diverse health systems be used to model cancer-associated thrombosis (CAT) risk?

Findings

In this prognostic study using a retrospective cohort of 732 594 patients with cancer receiving systemic therapy between 2018 and 2023 from 184 health systems, the EHR-CAT score significantly outperformed the benchmark Khorana score and had 20% improved accuracy. The model had consistent calibration by demographic subgroups, health system sites, and cohorts stratified by bleeding risk.

Meaning

These results suggest that standardized structured EHR data from different health systems can support scalable validation and implementation of CAT risk assessment.

This prognostic study of US patients with cancer receiving systemic therapy validates an electronic health record–based risk prediction model for mortality and morbidity associated with venous thromboembolism.

Abstract

Importance

Venous thromboembolism (VTE) is associated with increased mortality and morbidity in patients with cancer. Existing risk prediction models are typically validated within individual sites, a fragmented approach that limits clinical adoption.

Objective

To validate the electronic health record cancer-associated thrombosis (EHR-CAT) score compared with the benchmark Khorana score in a contemporary cohort of patients with cancer across the nation, before and after treatment, excluding those at high risk of bleeding.

Design, Setting, and Participants

This prognostic study included patients in a nationwide longitudinal EHR database from January 2018 to December 2023 with follow-up continuing to April 2025. Patients with newly diagnosed, invasive, solid, or hematologic malignant neoplasms (defined using validated International Statistical Classification of Diseases, Tenth Revision, Clinical Modification [ICD-10-CM] algorithms) receiving systemic therapy (defined using the first antineoplastic medication) were included. Those with recent history of acute VTE diagnosis or anticoagulant prescription were excluded.

Exposures

Demographics, risk model variables, and common anticoagulant trial exclusion criteria (as a proxy for identifying people at high risk of bleeding) were extracted on or before index therapy initiation date.

Main Outcomes

Incident VTE and bleeding outcomes at 6 months were defined using validated ICD-10-CM algorithms.

Results

A total of 732 594 patients (median [IQR] age, 65.0 [56.9-73.0] years; 425 124 female [58.0%]; 25 634 Asian [3.5%], 94 269 Black [12.9%], 48 266 Hispanic [6.6%], 583 047 White [76.9%]) with active cancer receiving systemic therapy between 2018 and 2023 from 184 health systems were identified. With a median (IQR) follow-up of 676 (340-1151) days, the incidence of 6-month VTE, bleeding, and mortality was 4.7% (34 499 patients), 3.7% (26 993 patients), and 8.4% (60 239 patients), respectively. Bleeding risk was higher in the 26.0% of patients (190 413) meeting anticoagulant trial exclusion criteria (7.2% vs 2.4%; hazard ratio, 2.5 [95% CI, 2.5-2.5]). The EHR-CAT score stratified patients into discriminative risk groups (C statistic, 0.70-0.71) both before and after exclusion for bleeding risk. When compared with the benchmark Khorana score (C statistic, 0.63), EHR-CAT reclassified 20% of patients into revised categories with improved prediction accuracy. Furthermore, EHR-CAT had consistent calibration in subgroups by age, sex, race, ethnicity, and individual health system sites.

Conclusions

This prognostic study of the EHR-CAT risk score demonstrated the external validity and feasibility of using readily available structured EHR data to estimate VTE risk in patients with cancer.

Introduction

Cancer-associated venous thromboembolism (VTE), particularly pulmonary embolism and lower extremity deep vein thrombosis (LE-DVT), is associated with increased risk of death and complications in patients with active cancer.^1,2 Various society guidelines suggest low-dose thromboprophylaxis in patients with cancer who are deemed high risk for thrombosis by a validated risk assessment tool.^3,4 The Khorana score, initially derived in 2008, has been the most cited benchmark model,⁵ but the proportion of VTE cases captured in the high-risk group varies between 23% and 55%.⁶ Newer models, including our own electronic health record cancer-associated thrombosis (EHR-CAT) risk score,⁷ have incorporated more clinical covariates to improve model accuracy and discrimination, albeit at the potential cost of increased complexity. Currently, most oncology clinicians do not perform VTE risk assessments or prescribe thromboprophylaxis to their high-risk patients.^8,9

The implementation of VTE risk assessment and prevention in cancer has been hindered by several challenges. The variables required to calculate risk scores are not consistently available or standardized across electronic health record (EHR) systems, making wide-scale external validation and adoption difficult. Bleeding risk is also a critical consideration for patients with cancer, but there are no clear criteria for identifying high-risk patients. These challenges are especially pronounced in the US, where fragmented EHR systems limit the assessment of risk models as well as the ability to accurately track national patterns in cancer-related complications.

In this study, we used a contemporary, longitudinal EHR database that incorporated data from numerous health systems and defined a nationwide cohort of patients with active cancer receiving systemic therapy over a 6-year period. Using objective and readily available structured EHR data, we extracted baseline covariates for the VTE risk scores and identified patient characteristics aligned with anticoagulant trial exclusion criteria as a proxy for bleeding risk. We evaluated the feasibility and performance of the VTE risk scores before and after excluding patients at high risk for bleeding.

Methods

Data Source

We used retrospective data collected by Epic Cosmos, a dataset created in collaboration with a community of Epic EHR health systems representing more than 298 million patients from over 1711 hospitals and 39 900 clinics nationwide.²¹ Linked and deidentified data was made available in the Expertly Determined De-Identified (EDDI) data set. The current analysis was performed using the April 9, 2025, EDDI data refresh and included patients with active cancer receiving systemic therapy from January 2018 through December 2023 with a lookback window to January 1, 2017, and a follow-up truncation on April 1, 2025.

The study was deemed non–human participants research by the institutional review board at Baylor College of Medicine. The analytic plan and reporting followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline checklist for prediction model validation.

Participants

Full details are in eMethods and eTables 1 through 3 in Supplement 1. Briefly, health systems were required to be US based, validated for data completeness, and included a minimum frequency of face-to-face encounters in hematology and oncology departments to ensure the organization contributed complete, continuous, and cancer-relevant EHR data. From each eligible site, newly diagnosed cancers were defined using validated International Classification of Diseases, Tenth Edition, Clinical Modification (ICD-10-CM) coding algorithm.⁷ Cohort inclusion criteria included adults with incident cancer diagnosis with at least 1 new systemic therapy. Cohort exclusion criteria included acute VTE diagnosis or active anticoagulant prescription in the last 12 months before index date to ensure the cohort was not affected by those with recent outcome exposure or receiving anticoagulant prophylaxis. A subcohort was created for sensitivity analysis to exclude patients at risk for bleeding based on anticoagulation clinical trial exclusion criteria.^11,12 These criteria included acute leukemia, primary or metastatic brain tumor, recent history of bleeding, platelet and/or alanine aminotransferase levels more than 5 times the upper limit, bilirubin more than 2 times the upper limit, glomerular filtration rate (GFR) below 30 mL/min/1.73 m², weight less than 40 kg, anticoagulants, nonaspirin antiplatelet drugs, or drugs with strong CYP3A4 interactions.

Outcomes

The primary outcome was incident overall VTE, which was defined as the first occurrence of acute pulmonary embolism, LE-DVT, or upper extremity (UE) DVT. Secondary outcomes included acute pulmonary embolism or LE-DVT, all-cause mortality, and bleeding. Patients were followed from the index date of systemic therapy initiation until an outcome event, the censor date (last face-to-face encounter before a 6-month encounter-free gap), death, or April 1, 2025, whichever came first. All outcomes were defined using validated ICD-10-CM codes with 95% or more precision. Full outcome definitions and validation metrics are in eMethods and eTable 4 in Supplement 1.

Predictors

Baseline data encompassed demographic, clinical, and laboratory variables, including 26 cancer types and 4 systemic therapy types. Detailed information defining each variable is given in eMethods and eTable 5 in Supplement 1. Definitions and thresholds for EHR-CAT and Khorana score are described in eTable 6 in Supplement 1. Both risk models included body mass index (BMI; calculated as weight in kilograms divided by height in meters squared) and complete blood count (CBC). In addition to reclassified cancer types, the former risk model contained 6 additional variables (advanced stage, history of VTE, history of paralysis, recent hospitalization, targeted or endocrine monotherapy, Asian or Pacific Islander race). EHR-CAT was grouped into 6 categories (0 or less, 1, 2, 3, 4, 5 or more) and Khorana score was grouped into 4 categories (0, 1, 2, 3 or more) according to initial model development suggestions.^5,7

Statistical Analysis

To ensure accurate external validation, we assessed the model performance using the sum of integer scores without model updating or recalibration. Specifically, discrimination was evaluated using time-dependent receiver operating characteristic (ROC) curve (C statistic) and bootstrapped 95% CIs.¹⁰ Calibration was evaluated using cumulative incidence calibration plots and compared with the incidence at each risk score from the initial derivation cohort. Patients with missing BMI and CBC strongly correlated with early-stage breast and prostate cancer and adjuvant endocrine therapy. Adhering to the derivation study, we excluded these patients as they would not typically be considered for thromboprophylaxis. Metastatic ICD-10-CM codes were used to define advanced stage when documented stage was missing. There were no missing data in the remaining variables, and no imputation was performed. Key sensitivity analysis was performed after excluding patients at risk for bleeding based on anticoagulant clinical trial exclusion criteria.^11,12 Additional subgroup analyses were done by age, sex, self-reported race, ethnicity, and individual site to ensure model fairness. All analyses were performed using R version 4.4.3 (R Project for Statistical Computing).

Results

Organizations and Participants

As of April 9, 2025, there were 298 million unique patients from 256 sites in Cosmos. After organizational exclusion to ensure high-quality, longitudinal, and cancer-relevant data, 252 million patients from 186 health systems in all 50 states remained eligible for cohort selection. Among these sites, 185 had 3 or more years of data, 164 had 5 or more years of data, and 150 had 7 or more years of data. From eligible health systems, we identified 2 411 655 patients with newly diagnosed cancer and 1 050 833 patients who started new systemic therapy after diagnosis from 184 sites between January 2018 and December 2023. After further cohort exclusion, 732 594 patients with active cancer receiving systemic therapy remained in the primary analytic cohort (Figure 1).

Figure 1. — *ICD-10-CM* indicates *International Classification of Diseases, Tenth Edition*, Clinical Modification; VTE, venous thromboembolism.

The median (IQR) age was 65.0 (56.9-73.0) years; 425 124 (58.0%) were female (Table 1). Self-reported race included 25 634 Asian or Pacific Islander (3.5%), 94 269 Black (12.9%), and 583 047 White (79.6%) patients; 48 266 (6.6%) had Hispanic ethnicity, and 144 926 (19.8%) lived in rural or micropolitan areas. The median (IQR) social vulnerability index (SVI) was 0.6 (0.3-0.8), and the median (IQR) National Cancer Institute Comorbidity Index was 0.3 (0-0.7). The most common cancers were breast (205 298 [28.0%]), lung (83 208 [11.4%]), colorectal (59 852 [8.2%]), and prostate (51 589 [7.0%]). Cancer stage was 123 806 patients (16.9%) with stage I to II disease, 58 126 (7.9%) with stage III disease, 206 334 (28.2%) with stage IV or metastatic disease, and 119 956 (16.4%) with unstageable disease (brain or leukemia). Treatment included 446 048 patients (60.9%) receiving cytotoxic chemotherapy, 156 644 (21.4%) receiving endocrine therapy, 81 987 (11.2%) receiving targeted therapy, and 47 915 (6.5%) immune checkpoint inhibitor therapy. The median (IQR) time between first cancer diagnosis encounter and therapy initiation was 35 (14-77) days.

Table 1. Baseline Patient Characteristics and Risk Score Assignments.

Characteristic	Patients, No. (%) (N = 732 594)	EHR-CAT	Khorana	Bleed risk^a
Age, median (IQR), y	65.0 (56.9-73.0)	0	0	0
Sex
Female	425 124 (58.0)	0	0	0
Male	307 430 (42.0)	0	0	0
Self-reported race
Asian or Pacific Islander	25 634 (3.5)	−1	0	0
Black	94 269 (12.9)	0	0	0
White	583 047 (79.6)	0	0	0
Other or unknown^b	29 644 (4.0)	0	0	0
Ethnicity
Non-Hispanic	663 433 (90.6)	0	0	0
Hispanic	48 266 (6.6)	0	0	0
Unknown	20 895 (2.9)	0	0	0
Rural-urban commuting area
Metropolitan	585 625 (79.9)	0	0	0
Micropolitan	75 499 (10.3)	0	0	0
Rural	69 427 (9.5)	0	0	0
Unknown	2043 (0.3)	0	0	0
Social Vulnerability Index, median (IQR)	0.6 (0.3-0.8)	0	0	0
NCI Comorbidity Index, median (IQR)	0.3 (0-0.7)	0	0	0
Cancer type
Breast	205 298 (28.0)	0	0	0
Prostate	51 589 (7.0)	0	0	0
Lung	83 208 (11.4)	+2	+1	0
Lower GI^c	59 852 (8.2)	+1	0	0
Upper GI^d	19 257 (2.6)	+3	+2	0
Pancreas	24 055 (3.3)	+3	+2	0
Bile and gallbladder	8040 (1.1)	+3	0	0
Liver	8465 (1.2)	0	0	0
Head and neck	29 782 (4.1)	0	0	0
Bladder	14 612 (2.0)	+2	+1	0
Kidney	11 625 (1.6)	+2	+1	0
Testis	4078 (0.6)	+2	+1	0
Uterus	16 928 (2.3)	+2	+1	0
Ovary	14 571 (2.0)	+2	+1	0
Cervix	8005 (1.1)	0	+1	0
Brain	16 110 (2.2)	+2	0	1
Melanoma	13 915 (1.9)	0	0	0
Sarcoma	5439 (0.7)	+2	0	0
Myeloma	24 344 (3.3)	+2	0	0
Lymphoma (aggressive)^e	26 094 (3.6)	+2	+1	0
Lymphoma (indolent)^f	22 481 (3.1)	0	+1	0
Leukemia (AML)	10 692 (1.5)	0	0	1
Leukemia (ALL)	3285 (0.4)	+2	0	1
Leukemia (CML, MDS)	16 728 (2.3)	0	0	0
Leukemia (CLL)	11 926 (1.6)	0	0	0
Other cancer	22 215 (3.0)	0	0	0
Cancer stage
Stage I	79 368 (10.8)	0	0	0
Stage II	44 438 (6.1)	0	0	0
Stage III	58 126 (7.9)	+1	0	0
Stage IV	60 700 (8.3)	+1	0	0
Metastatic cancer ICD-10-CM code^g	145 634 (19.9)	+1	0	0
Metastatic brain ICD-10-CM code^g	20 974 (2.9)	0	0	1
Unstageable^h	119 956 (16.4)	0	0	0
Unknown	224 372 (30.6)	0	0	0
Therapy type
Cytotoxic chemotherapy	446 048 (60.9)	0	0	0
Immune checkpoint inhibitor	47 915 (6.5)	0	0	0
Targeted therapy	81 987 (11.2)	−1	0	0
Endocrine therapy	156 644 (21.4)	−1	0	0
BMI
Continuous	27.5 (23.8 to 32.2)	0	0	0
≥35	113 333 (15.5)	+1	+1	0
Weight <40 kg	2412 (0.3)	0	0	1
Historical diagnoses
Hospitalization last 3 mo	171 145 (23.4)	+1	0	0
Paralysis last year	9891 (1.4)	+1	0	0
Chronic VTE last yearⁱ	23 474 (3.2)	+1	0	0
Bleeding last year	96 907 (13.2)	0	0	1
Active or reported prescription
Anticoagulant (stopped)^j	3350 (0.5)	0	0	1
Antiplatelet (nonaspirin)	17 419 (2.4)	0	0	1
CYP3A4-interacting drug	12 667 (1.7)	0	0	1
White blood cell
Median (IQR), cells/μL	7200 (5600-9500)	0	0	0
Count >11 000 cells/μL	117 748 (16.1)	+1	+1	0
Unknown	0	0	0	0
Hemoglobin
Median (IQR), g/dL	12.6 (11.0-13.8)	0	0	0
Levels <10 g/dL	105 419 (14.4)	+1	+1	0
Unknown	0	0	0	0
Platelet
Median (IQR), 10³/μL	249.0 (196.0-312.0)	0	0	0
Count ≥350 × 10³/μL	117 572 (16.0)	+1	+1	0
Count <50 × 10³/μL	14 349 (2.0)	0	0	1
Unknown	0	0	0	0
Alanine transaminase
Median (IQR), units/L	20.0 (14.0-29.0)	0	0	0
Levels ≥260 units/L	2074 (0.3)	0	0	1
Unknown	50 843 (6.9)	0	0	0
Total bilirubin
Median (IQR), mg/dL	0.5 (0.3-0.7)	0	0	0
Levels ≥2.4 mg/dL	7176 (1.1)	0	0	1
Unknown	68 744 (9.4)	0	0	0
Creatinine
Median (IQR), mg/dL	0.8 (0.7-1.0)	0	0	0
eGFR levels <30 mL/min/1.73 m²	17 708 (2.5)	0	0	1
Unknown	32 056 (4.4)	0	0	0

Open in a new tab

Abbreviations: ALL, acute lymphoblastic leukemia; AML, acute myeloid leukemia; BMI, body mass index (calculated as weight in kilograms divided by height in meters squared); CLL, chronic lymphocytic leukemia; CML, chronic myeloid leukemia; eGFR, estimated glomerular filtration rate; GI, gastrointestinal; ICD-10-CM, International Classification of Diseases, Tenth Edition, Clinical Modification; MDS, myelodysplastic syndrome; NCI, National Cancer Institute; VTE, venous thromboembolism.

SI conversion factor: to convert alanine transaminase to microkatal/L, multiply by 0.0167; bilirubin to micromoles/L, multiply by 17.104; creatinine to micromoles/L, multiply by 88.4; hemoglobin to g/L, multiply by 10; white blood cells to × 10⁹ per liter, multiply by 0.001.

^{^a}

This column indicates the variables that were considered when excluding patients at risk for bleeding for sensitivity analysis based on anticoagulation clinical trial exclusion criteria. These included acute leukemia, primary or metastatic brain tumor, recent history of bleeding, platelet count less than 105 × 10³/μL, alanine transaminase levels more than 5 times the upper limit, total bilirubin more than 2 times the upper limit, eGFR less than 30 mL/min/1.73 m², weight less than 40 kg, anticoagulants, nonaspirin antiplatelet drugs, or drugs with strong CYP3A4 interactions.

^{^b}

Other includes American Indian or Alaska Native, other race, and unknown (missing).

^{^c}

Lower GI includes colorectal and intestinal cancers.

^{^d}

Upper GI includes gastric and esophageal cancers.

^{^e}

Aggressive lymphoma includes diffuse large B-cell lymphoma, Burkitt lymphoma, lymphoblastic lymphoma, systemic T/natural killer-cell lymphoma.

^{^f}

Indolent lymphoma includes follicular lymphoma, mantle cell lymphoma, cutaneous T-cell lymphoma, Hodgkin lymphoma, and other lymphoma.

^{^g}

Metastatic cancer and metastatic brain are defined using ICD-10-CM codes (eTable 5 in Supplement 1).

^{^h}

Unstageable cancer includes all hematologic malignant neoplasms and brain cancer.

^ⁱ

Only history/chronic VTE and those with acute event more than 1 year before index date is included (recent acute VTE is an exclusion criterion).

^{^j}

Only recent prescription of anticoagulation with a clear stop date before index date is included (recent anticoagulation initiation and active use is an exclusion criterion).

The allocation and distribution of individual VTE risk predictors for EHR-CAT and Khorana score are shown in Table 1. Notably, the 2 risk scores had distinct scoring metrics for cancer type. Shared risk predictors included BMI of 35 or higher (113 333 patients [15.5%]), white blood cell count above 11 000/μL (117 748 patients [16.1%]; to convert to cells × 10⁹ per liter, multiply by 0.001), hemoglobin levels below 10 g/dL (105 419 patients [14.4%]; to convert to grams per liter, multiply by 10), and platelet count of 350 × 10³/μL or higher (117 572 patients [16.0%]; to convert to cells × 10⁹, multiply by 1). Additional risk predictors for EHR-CAT included advanced stage (264 460 patients [36.1%]), targeted or endocrine therapy (156 644 patients [32.6%]), recent hospitalization (171 145 patients [23.4%]), recent paralysis or immobilization (9891 patients [1.4%]), remote VTE history (23 474 patients [3.2%]), and Asian or Pacific Islander race (25 634 patients [3.5%]). Furthermore, 190 413 patients (26.0%) would have met at least 1 exclusion criterion from anticoagulant clinical trials.

Incidence of VTE and Bleeding Outcomes

With a median (IQR) continuous follow-up of 676 (340-1151) days, the 6-month incidences of overall VTE, pulmonary embolism, and DVT were 4.7% (34 499 patients) and 3.9% (28 341 patients), respectively. Nonexclusively, this represented 18 250 pulmonary embolism (2.5%), 15 658 LE-DVT (2.1%), and 7206 UE-DVT (1.0%) cases within 6 months. The incidence of VTE remained stable over 6 years (4.5% to 5.0%). With a median (IQR) follow-up of 1148 (688-1148) days for vital status, the 6-month overall mortality was 8.4% (60 239 patients).

The cumulative incidence of 6-month overall VTE and pulmonary embolism or LE-DVT by cancer type is shown in eFigure 1 in Supplement 1. Pancreatic (10.3% [2472 of 24 055 patients]), biliary or gallbladder (10.2% [822 of 8040 patients]), and acute lymphocytic leukemia (9.0% [298 of 3285 patients]) cancers had the highest 6-month incidence of VTE, while breast (2.0% [4072 of 205 298 patients]), chronic lymphocytic leukemia (1.8% [215 of 11 926 patients]), and prostate (1.7% [857 of 51 589 patients]) cancers had the lowest. Except for acute leukemias with higher incidence of catheter related UE-DVT, most cancers had correlated incidence of pulmonary embolism or LE-DVT and overall VTE.

The 6-month incidence of bleeding was 3.7% (26 993 patients). Clinical trial exclusion criteria were variably associated with bleeding risk, ranging from high (total bilirubin above 2.4 mg/dL [to convert to micromoles per liter, multiply by 17.104]: hazard ratio [HR], 2.78 [95% CI, 2.62-2.95]; acute leukemia: HR, 2.33 [95% CI, 2.22-2.44]; metastatic brain cancer: HR, 2.17 [95% CI, 2.09-2.26]; bleeding history: HR, 2.24 [95% CI, 2.20-2.29]; GFR <30 mL/min/1.73 m²: HR, 2.19 [95% CI, 2.11-2.29]) to low (alanine aminotransferase >260 units/L: HR, 1.32 [95% CI, 1.17-1.49]; strong CYP3A4 inducer/inhibitor: HR, 1.27 [95% CI, 1.21-1.34]; weight <40 kg: HR, 1.43 [95% CI, 1.25-1.62]; prior anticoagulant: HR, 1.42 [95% CI, 1.28-1.56]) (eTable 7 in Supplement 1). The incidence of 6-month bleeding was 2.4% in those with none of the exclusion criteria (541 181 patients) and 7.2% in those with any of them (190 413 patients) (HR, 2.55 [95% CI, 2.51-2.59]).

Model Performance

We assessed the VTE risk scores’ performance in all eligible patients and among those deemed low bleeding risk from clinical trial exclusion criteria. In the primary analysis, the 6-month cumulative incidence of VTE when stratified by EHR-CAT was 1.3% for scores of 0 or less (3062 of 229 808 patients), 3.4% for a score of 1 (4059 of 117 796 patients), 4.7% for scores of 2 (5499 of 116 288 patients), 6.3% for scores of 3 (7096 of 112 900 patients), 8.2% for scores of 4 (7102 of 86 456 patients), and 11.1% for scores of 5 or more (7681 of 69 346 patients) (Table 2, Figure 2). The corresponding HRs (with 0 or less scores as reference) were 2.12 (95% CI, 2.05-2.18), 2.99 (95% CI, 2.92-3.11), 4.01 (95% CI, 3.90-4.12), 5.28 (95% CI, 5.14-5.44), and 7.31 (95% CI, 7.10-7.52). In contrast, the 6-month cumulative incidence of VTE when stratified by the Khorana score was 2.7% for scores of 0 (7447 of 275 609 patients), 4.7% for 1 (12 019 of 258 259 patients), 6.8% for 2 (9423 of 139 135 patients), and 9.4% for scores of 3 or more (5610 of 59 591 patients), respectively. The corresponding HRs (with 0 scores as reference) were 1.65 (95% CI, 1.61-1.68), 2.45 (95% CI, 2.40-2.51), and 3.37 (95% CI, 3.29-3.46). The pulmonary embolism and LE-DVT outcome followed a similar pattern in both models.

Table 2. Performance of EHR-CAT vs Khorana Score for VTE at 6 Months.

Score	Cancer patients, No. (%) (N = 732 594)	Classification	VTE at 6 mo			PE or LE-DVT at 6 mo
Score	Cancer patients, No. (%) (N = 732 594)	Classification	No. (row %) (n = 34 499)	HR (95% CI)	TD-ROC	No. (row %) (n = 28 341)	HR (95% CI)	TD-ROC
EHR-CAT score
≤0	229 808 (31.4)	Low risk	3062 (1.3)	1 [Reference]	0.70 (0.70-0.70)	2285 (1.0)	1 [Reference]	0.71 (0.71-0.71)
1	117 796 (16.1)		4059 (3.4)	2.12 (2.05-2.18)		3132 (2.7)	2.08 (2.01-2.15)
2	116 288 (15.9)		5499 (4.7)	2.99 (2.92-3.11)		4350 (3.7)	3.01 (2.92-3.11)
3	112 900 (15.4)	High risk	7096 (6.3)	4.01 (3.90-4.12)		5829 (5.2)	4.12 (3.99-4.25)
4	86 456 (11.8)		7102 (8.2)	5.28 (5.14-5.44)		6033 (8.5)	5.53 (5.36-5.70)
≥5	69 346 (9.5)		7681 (11.1)	7.31 (7.10-7.52)		6712 (9.7)	7.83 (7.59-8.08)
Khorana score
0	275 609 (37.6)	Low risk	7447 (2.7)	1 [Reference]	0.63 (0.63-0.63)	5989 (2.2)	1 [Reference]	0.63 (0.63-0.63)
1	258 259 (35.3)	Low risk	12 019 (4.7)	1.65 (1.61-1.68)		9769 (3.8)	1.65 (1.61-1.69)
2	139 135 (19.0)	High risk	9423 (6.8)	2.45 (2.40-2.51)		7829 (5.6)	2.49 (2.43-2.55)
3+	59 591 (8.1)	High risk	5610 (9.4)	3.37 (3.29-3.46)		4754 (8.0)	3.44 (3.35-3.54)

Open in a new tab

Abbreviations: HR, hazard ratio; LE-DVT, lower extremity deep vein thrombosis; TD-ROC, time-dependent receiver operating characteristic; VTE, venous thromboembolism.

Figure 2. — EHR-CAT indicates electronic health record cancer-associated thrombosis; LE-DVT, lower extremity deep vein thrombosis; PE, pulmonary embolism; TD-AUC, time-dependent area of the receiver operating characteristic curve; VTE, venous thromboembolism.

For model discrimination, the time-dependent ROC (C statistic) for EHR-CAT was 0.697 (95% CI, 0.696-0.699) for overall VTE and 0.708 (95% CI, 0.706-0.711) for pulmonary embolism or LE-DVT (Figure 2). In contrast, the C statistic for the Khorana score was 0.626 (95% CI, 0.625-0.627) for overall VTE and 0.630 (95% CI, 0.628-0.630) for pulmonary embolism or LE-DVT. Similar to the derivation study, the EHR-CAT model outperformed the Khorana score by 0.07.⁷ For model calibration, we compared the observed with the predicted cumulative incidence of VTE at each score for EHR-CAT (eFigure 2 in Supplement 1). The model was well calibrated for pulmonary embolism or LE-DVT and modestly calibrated for overall VTE. We could not calibrate the Khorana score due to the lack of 6-month estimates from the initial study. When comparing the previously proposed clinical risk groups, EHR-CAT reclassified 20% of patients from the Khorana score groups into revised categories that showed improved concordance with the observed VTE risk (Table 3). The reclassification increased the total proportion of potentially preventable VTEs in the high-risk group from 43.6% to 63.4%.

Table 3. Comparison of EHR-CAT vs Khorana Score Risk Groups for VTE at 6 Months.

Category	Khorana score	EHR-CAT	Cancer patients, No. (%) (N = 732 594)	No. (row %)
Category	Khorana score	EHR-CAT	Cancer patients, No. (%) (N = 732 594)	Overall VTE at 6 mo (n = 34 499)	PE or LE-DVT at 6 mo (28 341)
Concordant (80%)	Low risk	Low risk	423 899 (57.9)	11 355 (2.7)	8829 (2.1)
Concordant (80%)	High risk	High risk	158 733 (21.7)	13 768 (8.7)	11 645 (7.3)
Reclassified (20%)	Low risk	High risk	109 969 (15.0)	8111 (7.4)	6929 (6.3)
Reclassified (20%)	High risk	Low risk	39 993 (5.5)	1265 (3.2)	938 (2.3)

Open in a new tab

Abbreviations: EHR-CAT, electronic health record cancer-associated thrombosis; LE-DVT, lower extremity deep vein thrombosis; PE, pulmonary embolism; VTE, venous thromboembolism.

Sensitivity Analyses

Approximately 26% of the cohort met anticoagulant trial exclusion and had higher bleeding risk. In sensitivity analysis excluding these individuals (eTable 8 in Supplement 1), the 6-month incidence of VTE decreased from 4.7% to 4.2% and that of bleeding decreased from 3.7% to 2.4%; however, both risk scores retained similar covariate distribution and performance (C statistic: EHR-CAT, 0.71-0.72; Khorana score, 0.63-0.64).

We performed additional sensitivity analyses to ensure model fairness, generalizability, and interoperability. First, we performed subgroup analysis by age, sex, self-reported race, and ethnicity (eTable 9 in Supplement 1). EHR-CAT had similar discrimination in all subgroups as the primary analysis, except the male subgroup that had a slightly lower C statistic of 0.67 driven by the loss of breast cancer (28% of the cohort). Second, we randomly selected 10 health systems among the 135 sites that contributed over 1000 patients to demonstrate model performance on a site level (eTable 10 in Supplement 1). Each site had a uniquely different patient and cancer composition, and the 6-month VTE incidence varied from 3.5% and 6.2% (average, 4.8%), and the C statistic varied from 0.64 to 0.76 (average, 0.70). Finally, we attempted to simplify the model predictors to address the 2 variables with the most missingness. An exploratory multivariable analysis using individual variables is shown in eTable 11 in Supplement 1. When cancer stage was entirely substituted by metastatic ICD-10-CM codes, the C statistic was unchanged (0.70 for VTE). When all CBC values were removed from the model, the C statistic was slightly worse (0.69 for VTE).

Discussion

Using an EHR database, we externally validated the performance and generalizability of VTE risk scores in a contemporary cohort of 732 594 patients with newly diagnosed cancer receiving systemic therapy from 2018 to 2023 across 184 health systems in the US. At 6 months after systemic therapy initiation, there were 4.7% overall VTE, 3.9% pulmonary embolism or LE-DVT, and 3.7% bleeding among patients with cancer. Using objective and extractable EHR data, the EHR-CAT risk score demonstrated a robust performance across different sites and accurately classified patients into groups with 6-month incidences of overall VTE ranging from 1.3% to 11.1% (C statistic of 0.70). Like the initial derivation study, the updated risk score outperformed the benchmark Khorana score by 0.07 (C statistic of 0.63) in the current analysis and reclassified 20% of patients into more appropriate risk groups. Finally, the EHR-CAT risk score performed well even after excluding 26% of patients at risk for bleeding using common anticoagulant exclusion criteria (C statistic of 0.70).

In contrast to previous validation studies,^13,14,15,16 our current analytic cohort was large (732 594 patients), representative (all 50 states, with similar racial and ethnic demographics as the US census),¹⁷ contemporary (outcome ascertainment until April 2025), and with a median follow-up of 676 days. To ensure accuracy and external validity, we performed various data quality checks. The VTE and bleeding incidence remained similar across all years, indicating the absence of data shift and drift. The magnitude and pattern of VTE incidence across different cancer type, stage, and treatment was consistent with our previous study in male US veterans.¹⁸ The bleeding incidence at 2.4% after trial exclusion was also consistent with the 3.0% overall bleeding in the placebo arm of the CASSINI trial.¹² Finally, the factors associated with VTE and bleeding were consistent with prior clinical knowledge.

While we defined baseline predictors and model fitting in the same way as the initial derivation studies, there were several notable differences in the sampling strategy that significantly improved its generalizability. First, we updated the study years from 2011-2020 to 2018-2024 and increased the number of sites from 1 to 184. Second, instead of relying on manually extracted cancer registry data to define cancer type, stage, and diagnosis date, we only used readily available structured data from the EHR. Third, we expanded systemic therapy initiation date to any time instead of only the first year after cancer diagnosis to ensure model generalizability. Fourth, we demonstrated that the model performed equally well in all patients as well as those deemed lower risk for bleeding using anticoagulant clinical trial exclusion criteria. This is important as the fear of bleeding and the challenge to identify patients who would qualify for clinical trials is often one of the key barriers to implementation.^8,9 While not all trial exclusion criteria affected the bleeding risk in a similar magnitude, this was a starting point to build physician and patient trust for implementing the EHR-based VTE risk score. Finally, we showed that the risk score could have variations in performance when applied to single health systems (C statistic, 0.64 to 0.76), as it depended on patient composition. This likely explains the conflicting data on EHR-CAT’s absolute C statistic in previous studies.^13,14,15,16

Limitations

Our study has limitations due to the use of a deidentified, aggregated, and multi-institutional EHR database. First, while longitudinal data were joined in all the health systems that participate in Cosmos, interval missingness of follow-up data from non-participating institutions remained a source of error. To mitigate this, we censored patients who had a more than a 6-month gap from the previous face-to-face encounter, even if they had subsequent data. Second, missing data in baseline predictors could detract from the usability of any risk scores. Among the 11 predictors in EHR-CAT, the only 2 with significant missingness were cancer stage and CBC values. Fortunately, metastatic ICD-10-CM codes were a reasonable replacement for documented advanced cancer stage. For patients with missing CBC values, 70% had early-stage breast or prostate cancers and were receiving adjuvant endocrine therapy. We excluded this very low-risk group in the primary analysis; however, if we included these patients and imputed missing as low-risk for CBC values, the C statistic for EHR-CAT would have improved to 0.72. Third, given the lack of unstructured notes, we were not able to use an natural language processing algorithm to confirm VTE or bleeding cases. Therefore, we relied on previously validated ICD-based algorithms with relatively high positive predictive value (95%). The true incidence of outcome events could be underestimated, though the overall incidence was similar to previously published epidemiology studies.^18,19,20 Finally, we acknowledge the complexity and accuracy trade-off of any risk scores. To reduce cost and facilitate implementation, we are actively working on translating our query into standardized database query that can be applied and embedded to EHR at each individual site.

Conclusions

In summary, the EHR-CAT risk score performed well in discrimination and calibration of VTE outcomes in a contemporary cohort of patients with active cancer receiving systemic therapy, even after applying clinical trial exclusions to lower their bleeding risk. We further demonstrated the feasibility of using standardized and readily available structured data elements from EHR to calculate the risk score. With appropriate implementation, the EHR-CAT model has the potential to support thromboprophylaxis strategies across different health systems.

Supplement 1.

eMethods.

eTable 1. Organization Filters

eTable 2. Cohort Filters

eTable 3. Systemic Therapy Classifications

eTable 4. Outcome Filters

eTable 5. Baseline Variables

eTable 6. Detailed Predictor Definitions for EHR-CAT and Khorana Score

eTable 7. Risk of Bleeding Based on Clinical Trial Exclusion Criteria

eTable 8. Performance of EHR-CAT vs Khorana Score for VTE at 6 Months After Exclusion for Bleeding Risk

eTable 9. Performance of EHR-CAT in Prespecified Subgroups

eTable 10. Performance of EHR-CAT in 10 Randomly Selected Health Systems

eTable 11. Exploratory Multivariable Cox Regression for EHR-CAT and Khorana Score Individual Predictors

eFigure 1. Incidence of VTE at 6 Months by Cancer Type

eFigure 2. Calibration Plots of EHR-CAT vs Original Derivation Model

jamanetwopen-e2544428-s001.pdf^{(1.1MB, pdf)}

Supplement 2.

Data Sharing Statement

jamanetwopen-e2544428-s002.pdf^{(16.4KB, pdf)}

References

1.Sørensen HT, Pedersen L, van Es N, Büller HR, Horváth-Puhó E. Impact of venous thromboembolism on the mortality in patients with cancer: a population-based cohort study. Lancet Reg Health Eur. 2023;34:100739. doi: 10.1016/j.lanepe.2023.100739 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Mahajan A, Brunson A, Adesina O, Keegan THM, Wun T. The incidence of cancer-associated thrombosis is increasing over time. Blood Adv. 2022;6(1):307-320. doi: 10.1182/bloodadvances.2021005590 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Lyman GH, Carrier M, Ay C, et al. American Society of Hematology 2021 guidelines for management of venous thromboembolism: prevention and treatment in patients with cancer. Blood Adv. 2021;5(4):927-974. doi: 10.1182/bloodadvances.2020003442 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Key NS, Khorana AA, Kuderer NM, et al. Venous thromboembolism prophylaxis and treatment in patients with cancer: ASCO guideline update. J Clin Oncol. 2023;41(16):3063-3071. doi: 10.1200/JCO.23.00294 [DOI] [PubMed] [Google Scholar]
5.Khorana AA, Kuderer NM, Culakova E, Lyman GH, Francis CW. Development and validation of a predictive model for chemotherapy-associated thrombosis. Blood. 2008;111(10):4902-4907. doi: 10.1182/blood-2007-10-116327 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Mulder FI, Candeloro M, Kamphuisen PW, et al. ; CAT-prediction collaborators . The Khorana score for prediction of venous thromboembolism in cancer patients: a systematic review and meta-analysis. Haematologica. 2019;104(6):1277-1287. doi: 10.3324/haematol.2018.209114 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Li A, La J, May SB, et al. Derivation and validation of a clinical risk assessment model for cancer-associated thrombosis in two unique US health care systems. J Clin Oncol. 2023;41(16):2926-2938. doi: 10.1200/JCO.22.01542 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Martin KA, Lyleroehr MJ, Cameron KA. Barriers and facilitators to preventing venous thromboembolism in oncology practice. Thromb Res. 2022;220(October):21-23. doi: 10.1016/j.thromres.2022.09.026 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Martin KA, Cameron KA, Linder JA, Hirschhorn LR. Preventing venous thromboembolism for ambulatory patients with cancer: developing the form and content of implementation strategies. Thromb Update. 2024;15:100168. doi: 10.1016/j.tru.2024.100168 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Heagerty PJ, Lumley T, Pepe MS. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics. 2000;56(2):337-344. doi: 10.1111/j.0006-341X.2000.00337.x [DOI] [PubMed] [Google Scholar]
11.Carrier M, Abou-Nassar K, Mallick R, et al. ; AVERT Investigators . Apixaban to prevent venous thromboembolism in patients with cancer. N Engl J Med. 2019;380(8):711-719. doi: 10.1056/NEJMoa1814468 [DOI] [PubMed] [Google Scholar]
12.Khorana AA, Soff GA, Kakkar AK, et al. ; CASSINI Investigators . Rivaroxaban for thromboprophylaxis in high-risk ambulatory patients with cancer. N Engl J Med. 2019;380(8):720-728. doi: 10.1056/NEJMoa1814630 [DOI] [PubMed] [Google Scholar]
13.Li A, De Las Pozas G, Andersen CR, et al. External validation of a novel electronic risk score for cancer-associated thrombosis in a comprehensive cancer center. Am J Hematol. 2023;98(7):1052-1057. doi: 10.1002/ajh.26928 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Dulberger KN, La J, Li A, et al. External validation of a novel cancer-associated venous thromboembolism risk assessment score in a safety-net hospital. Res Pract Thromb Haemost. 2024;9(1):102650. doi: 10.1016/j.rpth.2024.102650 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Lanting V, Vágó E, Horváth-Puhó E, et al. Validation of clinical risk assessment scores for venous thromboembolism in patients with cancer: a population-based cohort study. J Thromb Haemost. 2025;23(2):600-609. doi: 10.1016/j.jtha.2024.10.021 [DOI] [PubMed] [Google Scholar]
16.Vladić N, Englisch C, Berger JM, et al. Validation of risk assessment models for venous thromboembolism in patients with cancer receiving systemic therapies. Blood Adv. 2025;9(13):3340-3349. doi: 10.1182/bloodadvances.2025016044 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.US Census Bureau . Detailed Races and Ethnicities in the US and Puerto Rico: 2020 Census. US Census quick facts. Updated online September 21, 2023. Accessed August 27, 2025. https://www.census.gov/library/visualizations/interactive/detailed-race-ethnicities-2020-census.html
18.Martens KL, Li A, La J, et al. Epidemiology of cancer-associated venous thromboembolism in patients with solid and hematologic neoplasms in the Veterans Affairs health care system. JAMA Netw Open. 2023;6(6):e2317945. doi: 10.1001/jamanetworkopen.2023.17945 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Mulder FI, Horváth-Puhó E, van Es N, et al. Venous thromboembolism in cancer patients: a population-based cohort study. Blood. 2021;137(14):1959-1969. doi: 10.1182/blood.2020007338 [DOI] [PubMed] [Google Scholar]
20.Englisch C, Moik F, Steiner D, et al. Bleeding events in patients with cancer: incidence, risk factors, and impact on prognosis in a prospective cohort study. Blood. 2024;144(22):2349-2359. doi: 10.1182/blood.2024025362 [DOI] [PubMed] [Google Scholar]
21.Epic Research . Epic Cosmos website. Accessed August 27, 2025. https://cosmos.epic.com/

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1.

eMethods.

eTable 1. Organization Filters

eTable 2. Cohort Filters

eTable 3. Systemic Therapy Classifications

eTable 4. Outcome Filters

eTable 5. Baseline Variables

eTable 6. Detailed Predictor Definitions for EHR-CAT and Khorana Score

eTable 7. Risk of Bleeding Based on Clinical Trial Exclusion Criteria

eTable 8. Performance of EHR-CAT vs Khorana Score for VTE at 6 Months After Exclusion for Bleeding Risk

eTable 9. Performance of EHR-CAT in Prespecified Subgroups

eTable 10. Performance of EHR-CAT in 10 Randomly Selected Health Systems

eTable 11. Exploratory Multivariable Cox Regression for EHR-CAT and Khorana Score Individual Predictors

eFigure 1. Incidence of VTE at 6 Months by Cancer Type

eFigure 2. Calibration Plots of EHR-CAT vs Original Derivation Model

jamanetwopen-e2544428-s001.pdf^{(1.1MB, pdf)}

Supplement 2.

Data Sharing Statement

jamanetwopen-e2544428-s002.pdf^{(16.4KB, pdf)}

[zoi251203r1] 1.Sørensen HT, Pedersen L, van Es N, Büller HR, Horváth-Puhó E. Impact of venous thromboembolism on the mortality in patients with cancer: a population-based cohort study. Lancet Reg Health Eur. 2023;34:100739. doi: 10.1016/j.lanepe.2023.100739 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zoi251203r2] 2.Mahajan A, Brunson A, Adesina O, Keegan THM, Wun T. The incidence of cancer-associated thrombosis is increasing over time. Blood Adv. 2022;6(1):307-320. doi: 10.1182/bloodadvances.2021005590 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zoi251203r3] 3.Lyman GH, Carrier M, Ay C, et al. American Society of Hematology 2021 guidelines for management of venous thromboembolism: prevention and treatment in patients with cancer. Blood Adv. 2021;5(4):927-974. doi: 10.1182/bloodadvances.2020003442 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zoi251203r4] 4.Key NS, Khorana AA, Kuderer NM, et al. Venous thromboembolism prophylaxis and treatment in patients with cancer: ASCO guideline update. J Clin Oncol. 2023;41(16):3063-3071. doi: 10.1200/JCO.23.00294 [DOI] [PubMed] [Google Scholar]

[zoi251203r5] 5.Khorana AA, Kuderer NM, Culakova E, Lyman GH, Francis CW. Development and validation of a predictive model for chemotherapy-associated thrombosis. Blood. 2008;111(10):4902-4907. doi: 10.1182/blood-2007-10-116327 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zoi251203r6] 6.Mulder FI, Candeloro M, Kamphuisen PW, et al. ; CAT-prediction collaborators . The Khorana score for prediction of venous thromboembolism in cancer patients: a systematic review and meta-analysis. Haematologica. 2019;104(6):1277-1287. doi: 10.3324/haematol.2018.209114 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zoi251203r7] 7.Li A, La J, May SB, et al. Derivation and validation of a clinical risk assessment model for cancer-associated thrombosis in two unique US health care systems. J Clin Oncol. 2023;41(16):2926-2938. doi: 10.1200/JCO.22.01542 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zoi251203r8] 8.Martin KA, Lyleroehr MJ, Cameron KA. Barriers and facilitators to preventing venous thromboembolism in oncology practice. Thromb Res. 2022;220(October):21-23. doi: 10.1016/j.thromres.2022.09.026 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zoi251203r9] 9.Martin KA, Cameron KA, Linder JA, Hirschhorn LR. Preventing venous thromboembolism for ambulatory patients with cancer: developing the form and content of implementation strategies. Thromb Update. 2024;15:100168. doi: 10.1016/j.tru.2024.100168 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zoi251203r10] 10.Heagerty PJ, Lumley T, Pepe MS. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics. 2000;56(2):337-344. doi: 10.1111/j.0006-341X.2000.00337.x [DOI] [PubMed] [Google Scholar]

[zoi251203r11] 11.Carrier M, Abou-Nassar K, Mallick R, et al. ; AVERT Investigators . Apixaban to prevent venous thromboembolism in patients with cancer. N Engl J Med. 2019;380(8):711-719. doi: 10.1056/NEJMoa1814468 [DOI] [PubMed] [Google Scholar]

[zoi251203r12] 12.Khorana AA, Soff GA, Kakkar AK, et al. ; CASSINI Investigators . Rivaroxaban for thromboprophylaxis in high-risk ambulatory patients with cancer. N Engl J Med. 2019;380(8):720-728. doi: 10.1056/NEJMoa1814630 [DOI] [PubMed] [Google Scholar]

[zoi251203r13] 13.Li A, De Las Pozas G, Andersen CR, et al. External validation of a novel electronic risk score for cancer-associated thrombosis in a comprehensive cancer center. Am J Hematol. 2023;98(7):1052-1057. doi: 10.1002/ajh.26928 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zoi251203r14] 14.Dulberger KN, La J, Li A, et al. External validation of a novel cancer-associated venous thromboembolism risk assessment score in a safety-net hospital. Res Pract Thromb Haemost. 2024;9(1):102650. doi: 10.1016/j.rpth.2024.102650 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zoi251203r15] 15.Lanting V, Vágó E, Horváth-Puhó E, et al. Validation of clinical risk assessment scores for venous thromboembolism in patients with cancer: a population-based cohort study. J Thromb Haemost. 2025;23(2):600-609. doi: 10.1016/j.jtha.2024.10.021 [DOI] [PubMed] [Google Scholar]

[zoi251203r16] 16.Vladić N, Englisch C, Berger JM, et al. Validation of risk assessment models for venous thromboembolism in patients with cancer receiving systemic therapies. Blood Adv. 2025;9(13):3340-3349. doi: 10.1182/bloodadvances.2025016044 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zoi251203r17] 17.US Census Bureau . Detailed Races and Ethnicities in the US and Puerto Rico: 2020 Census. US Census quick facts. Updated online September 21, 2023. Accessed August 27, 2025. https://www.census.gov/library/visualizations/interactive/detailed-race-ethnicities-2020-census.html

[zoi251203r18] 18.Martens KL, Li A, La J, et al. Epidemiology of cancer-associated venous thromboembolism in patients with solid and hematologic neoplasms in the Veterans Affairs health care system. JAMA Netw Open. 2023;6(6):e2317945. doi: 10.1001/jamanetworkopen.2023.17945 [DOI] [PMC free article] [PubMed] [Google Scholar]

[zoi251203r19] 19.Mulder FI, Horváth-Puhó E, van Es N, et al. Venous thromboembolism in cancer patients: a population-based cohort study. Blood. 2021;137(14):1959-1969. doi: 10.1182/blood.2020007338 [DOI] [PubMed] [Google Scholar]

[zoi251203r20] 20.Englisch C, Moik F, Steiner D, et al. Bleeding events in patients with cancer: incidence, risk factors, and impact on prognosis in a prospective cohort study. Blood. 2024;144(22):2349-2359. doi: 10.1182/blood.2024025362 [DOI] [PubMed] [Google Scholar]

[zoi251203r21] 21.Epic Research . Epic Cosmos website. Accessed August 27, 2025. https://cosmos.epic.com/

PERMALINK

Validation of a Risk Score for Cancer-Associated Thrombosis Using Nationwide EHR Data

Ang Li, MD, MS

Omid Jafari, PhD

Barbara D Lam, MD

Jun Y Jiang, MD

Rock Bum Kim, MD, PhD

Shengling Ma, MD

Emily Zhou, BS

Joyce W Tiong, BS

Elizabeth C Chiang, BS

Justine Ryu, MD

Christopher I Amos, PhD

Jennifer La, PhD

Nathanael R Fillmore, PhD

Key Points

Question

Findings

Meaning

Abstract

Importance

Objective

Design, Setting, and Participants

Exposures

Main Outcomes

Results

Conclusions

Introduction

Methods

Data Source

Participants

Outcomes

Predictors

Statistical Analysis

Results

Organizations and Participants

Figure 1. Cohort Selection From Cosmos.

Table 1. Baseline Patient Characteristics and Risk Score Assignments.

Incidence of VTE and Bleeding Outcomes

Model Performance

Table 2. Performance of EHR-CAT vs Khorana Score for VTE at 6 Months.

Figure 2. Performance of EHR-CAT vs Khorana Score.

Table 3. Comparison of EHR-CAT vs Khorana Score Risk Groups for VTE at 6 Months.

Sensitivity Analyses

Discussion

Limitations

Conclusions

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases