Validation of a Composite Mortality End Point in a Large Clinicogenomic Real-World Database of Patients With Advanced Cancer

Joshuah Kapilivsky; Farahnaz Islam; Emma K Roth; Jessica Dow; Shannon Moran; Emilie Scherrer; Seung Won Hyun; Chithra Sangli

doi:10.1200/CCI-25-00291

. 2026 Apr 20;10(2):e2500291. doi: 10.1200/CCI-25-00291

Validation of a Composite Mortality End Point in a Large Clinicogenomic Real-World Database of Patients With Advanced Cancer

Joshuah Kapilivsky ¹, Farahnaz Islam ¹, Emma K Roth ¹, Jessica Dow ¹, Shannon Moran ¹, Emilie Scherrer ¹, Seung Won Hyun ¹, Chithra Sangli ^1,^✉

PMCID: PMC13105484 PMID: 42008779

Abstract

PURPOSE

Real-world data from electronic health records and next-generation sequencing are used to study treatment effectiveness in molecularly refined patient populations. Incomplete mortality data can overestimate survival rates in these studies. The National Death Index (NDI) is the gold standard for mortality data in the United States, but limited accessibility and reporting delays hinder timely research. External sources can supplement and improve mortality data capture. We evaluated a composite mortality variable against NDI records in a large real-world cohort of patients with advanced cancer.

METHODS

Deidentified clinical and molecular data from patients with advanced solid tumors were linked with third-party mortality and claims data sets using deterministic tokenization. Vital status and death dates were harmonized across sources. Patient identifiers were submitted to NDI; true matches were deidentified and joined for analysis. Performance metrics were calculated using NDI as ground truth. Date agreement was assessed at 0-, ±15-, and ±30-day tolerances. Subgroup analyses and a cumulative case/dynamic control (CC/DC) approach were also performed.

RESULTS

Among 17,597 patients, the composite mortality variable demonstrated 82% sensitivity and 95% specificity against NDI. The positive predictive value was 96%, and the negative predictive value was 77%. Exact date agreement was 86%, increasing to 94% within a ±15-day tolerance and 96% within a ±30-day tolerance. Incorporating third-party data substantially improved the sensitivity from 17% to 82%. With the CC/DC approach, the sensitivity was 96% at 6 months, 97% at 12 months, and 98% at 24 months, with specificity above 98% across these time frames.

CONCLUSION

The composite mortality variable is a robust and reliable end point for real-world evidence analyses with high accuracy for identified deaths and appropriate censoring of patients lost to follow-up.

BACKGROUND

Real-world data (RWD), derived from sources like electronic health records (EHR) from routine clinical care, provide a vital complement to evidence generated from traditional clinical trials. These data allow researchers to study treatment effectiveness in real-world populations that might have more diverse clinical characteristics than what can practically be studied in a clinical trial. With the development and adoption of modern targeted therapies in oncology, it is increasingly important to integrate molecular data from next-generation sequencing (NGS) into these clinical data sets to investigate relationships between specific genomic alterations and patient outcomes. Overall survival (OS) analysis is commonly used for assessing treatment effectiveness, which fundamentally depends on accurate mortality data. Therefore, the full potential of these rich clinicogenomic data sets can only be realized if their mortality data are accurate and reliable. In the United States, the National Death Index (NDI) is widely regarded as the gold standard for mortality data because of its comprehensive coverage.^1,2 However, significant time lags, logistical hurdles, and limitations in accessing NDI data can create a bottleneck, limiting its utility for the pace and volume of research in precision oncology. Alternative data sources, such as EHRs, are more readily accessible but often incomplete as patients may be lost to follow-up because of care transfers, transitions to hospice, cessation of medical treatment, or name changes.^3,4 Therefore, relying solely on EHR data can lead to underascertainment of death events and overestimation of survival rates, skewing research findings.^5,6

CONTEXT

Key Objective
To evaluate a composite mortality variable in a large real-world cohort.
Knowledge Generated
When benchmarked against the National Death Index, this composite mortality variable demonstrated a sensitivity of 82%, a specificity of 95%, and a positive predictive value of 96%, with missed death events properly captured as lost-to-follow-up. These results validate our composite mortality variable as a robust and reliable end point for overall survival analyses in real-world evidence.
Relevance (U. Topaloglu)
Improving mortality ascertainment using a composite variable greatly enhances the accuracy of real-world survival analyses, reducing bias introduced by incomplete electronic health record-based death data. This strengthened mortality endpoint allows clinicians and researchers to generate more reliable evidence on treatment effectiveness in molecularly defined cancer populations, supporting better-informed clinical and regulatory decision-making.*
*Relevance section written by JCO Clinical Cancer Informatics Associate Editor Umit Topaloglu, PhD.

A composite mortality variable can address these limitations by integrating data from multiple sources. Numerous studies have shown that combining EHR data with information from third-party mortality databases, which aggregate data from sources like the Social Security Death Index, public records, and obituaries, can significantly improve the sensitivity and completeness of mortality capture.^2,6-8 Although this approach may not achieve the same level of completeness as the NDI, the advantages of reduced data latency and improved accessibility can provide a mortality end point better suited to the pace and volume of research needed for precision oncology.

In this study, we evaluated the performance of the composite mortality variable used within the Tempus real-world multimodal database. To validate this end point, we benchmarked its performance against the NDI in a large cohort of patients with advanced cancer who underwent Tempus NGS as part of their routine clinical care.

METHODS

Study Design

This study was a noninterventional retrospective analysis of mortality data from the Tempus real-world multimodal database. The objective was to validate our composite mortality variable by benchmarking it against death information from the NDI.

Data Sources

The Tempus multimodal database consists of deidentified longitudinal clinical and molecular data for patients with cancer who undergo NGS at Tempus. Tempus oncology NGS services roughly 65% of academic medical centers and several hundred community institutions in the United States.⁹ Clinical histories provided in furtherance of testing are principally obtained from EHRs or other systems. Deidentified records are supplemented with two distinct third-party deidentified data sets: (1) a standalone open administrative claims data set and (2) a mortality data set provided by Veritas Data Research. While the Veritas index serves as a dedicated mortality aggregator (compiling signals from sources including the Social Security Administration's Limited Access Death Master File, obituary notices, public cemetery records, state death registries, and medical claims), the standalone administrative claims data set provides a broader scope: it contains explicit mortality records sourced from its own external vendor alongside broader longitudinal claims activity signals (eg, hospital admission dates) used to determine vital status. Data from all three streams—Tempus EHR-derived data, Veritas mortality data, and third-party claims—were combined into the integrated clinical record (the final, multimodal patient profile). Within this record, the “Composite Mortality Variable” represents the harmonized synthesis of all available mortality signals. Details on how these data streams are combined are provided in the Data Supplement (Appendix Methods).

In this study, patient identifiers—including name, date of birth, sex, and, when available, Social Security Number—were submitted to the NDI via secure file transfer. The NDI returned potential matches, and true matches were selected based on the recommendations in the NDI User's Guide.¹⁰ Only matches meeting these criteria were considered true deaths; patients without a true death recorded in NDI were considered alive. After matching, NDI results were deidentified and joined to the corresponding Tempus record for the limited purposes of performing analysis in support of this study.

Cohort Selection

Patients were eligible for inclusion in this study if they had a primary diagnosis of non–small cell lung cancer (NSCLC), breast cancer (BC), ovarian cancer (OC), pancreatic cancer (PANC), colorectal cancer (CRC), or prostate cancer (PC) between January 1, 2016, and December 31, 2020 (Fig 1). These specific cancer types were chosen because they represent the predominant solid tumor populations within the Tempus multimodal database. The diagnosis period was selected to guarantee at least 2 years of mortality ascertainment through the NDI. At the time of NDI data submission, NDI records were available through December 31, 2022. Patients were 18 years or older at the time of cancer diagnosis. To enrich for death events, only patients with locally advanced, unresectable, or metastatic disease stages were included, defined by disease-specific staging criteria. Additional inclusion criteria required that patients have sufficient personally identifiable information available for linking with third-party data sets and for submission to the NDI.

FIG 1. — Cohort funnel. NDI, National Death Index; PII, personally identifiable information.

Initial patient eligibility was determined before submission to NDI. After receipt from NDI, patient eligibility was reconfirmed with refreshed data from the live database and patients rejected from the NDI search were excluded. The study data set was then frozen, and all analyses were performed using the frozen data set.

Vital Status Harmonization

All data sources in the integrated patient record were combined into a single high-confidence, internally consistent representation of vital status (ie, dead/alive). First, all dates that a patient was known to be alive were extracted using a predefined list of fields representing active clinical interactions. These included confirmed encounters, therapeutic administrations (excluding unfulfilled orders), diagnostic measurements (eg, vitals, specimen collection), and physician assessments. For example, the administration of an infusion or an inpatient hospital admission indicates that a patient is alive, but an automated prescription fill in EHR or the date a claim was submitted to a payer may actually take place after a patient's date of death and thus does not indicate aliveness. The latest date from these alive dates was selected as each patients' last known alive date. Next, each death date available in the integrated clinical record was extracted and compared against the last known alive date. Any death date preceding the last known alive date was discarded. Remaining death dates were ranked by the reliability of their source to select up to one mortality date per patient.

At the time of this study, the routine NDI data file included deaths through the end of 2022. To enable fair comparison against NDI, Tempus vital status data were truncated to match. Patients with a last known alive date or deceased date after December 31, 2022, were considered alive and their last known alive date was changed to December 31, 2022.

Validation of Composite Mortality Variable

For each patient, vital status (alive or deceased) and date of death were compared between the Tempus data set and the NDI, with NDI considered as ground truth. Sensitivity was defined as the proportion of NDI-classified deaths that were also identified as deceased in the Tempus data set. Specificity was defined as the proportion of NDI-classified living patients who were not identified as deceased in the Tempus data set. Positive predictive value (PPV) and negative predictive value (NPV) were calculated as the proportions of Tempus-classified deaths and living patients, respectively, which were confirmed by the NDI. Exact 95% CIs for all performance metrics were calculated using the Clopper-Pearson method.

Date agreement was assessed by calculating the proportion of Tempus death dates that matched NDI deaths with identical dates, as well as within ±15-day and ±30-day windows. Death dates in the Tempus database that fall within the allowed time window from the NDI death date were counted as an agreement. Death dates that do not fall within the time window were counted as a disagreement. Patients with a Tempus death date without an NDI death date were also counted as a disagreement.

While the majority of death dates are available from the source at year-month-day resolution, some death dates are available at only year-month resolution. Year-month dates are imputed to year-month-day resolution as part of Tempus's standard deidentification process. All NDI death dates are available at year-month-day resolution. Only Tempus death dates available at year-month-day resolution were included in exact day and ±15-day agreement metrics; ±30-day metrics included Tempus death dates at either year-month-day or year-month resolution.

The contribution of individual mortality data streams was assessed by calculating the metrics described above on individual mortality data streams (ie, EHR, third-party mortality, third-party claims), as well as on addition of each data stream (ie, EHR, EHR + third-party mortality, EHR + third-party mortality + third-party claims). Exploratory subgroup analyses were conducted by age at diagnosis, cancer type, geographic region, year of diagnosis, metastatic status, number of recorded lines of therapy, race, and sex. In addition, sensitivity and date agreement metrics were reported stratified by year of death according to NDI. Because this stratification focuses solely on patients with NDI-classified deaths, components necessary for calculating specificity, PPV, and NPV (ie, true negatives and false positives) are eliminated. Therefore, these metrics were not reported for this stratification variable.

The above end points assess the completeness of Tempus's capture of all deaths in the NDI, treating any missed death event as a false negative. However, in real-world OS analyses—such as those using Kaplan-Meier estimation or Cox proportional hazards models—mortality data are typically interpreted alongside the last known alive date, with patients classified as deceased, alive, or censored. To provide additional context on how Tempus data reflect patients without a recorded date of death, we conducted a naïve estimation of sensitivity and specificity at t = 6, 12, and 24 months using the cumulative case/dynamic control (CC/DC) approach.¹¹ For this analysis, patients without a recorded death date were censored at their calculated last known alive date (as defined in the Vital Status Harmonization section). This method restricts the analysis to patients with sufficient follow-up to be evaluable for survival at each time point by excluding those censored before time t.

Ethics Statement

Institutional Review Board (IRB) approval of the study protocol was obtained. This was a noninterventional study using routinely collected data, and informed consent was waived by the IRB (Advarra Pro00076789). This study was also reviewed and approved by staff from the NDI.

RESULTS

Study Population

A total of 17,597 patients with advanced or metastatic solid tumors (NSCLC, BC, OC, PANC, CRC, and PC) diagnosed between 2016 and 2020 were included in the evaluable cohort (Fig 1). All patients were 18 years and older at diagnosis and had sufficient identifiers for NDI matching and third-party data integration.

Primary End Point

When benchmarked against the NDI as the gold standard, the composite mortality data set demonstrated a sensitivity of 82%, indicating that 82% of deaths recorded in the NDI were also captured in the Tempus data set (Table 1). The specificity was 95%, reflecting a low rate of false-positive death assigned to patients not classified as deceased in the NDI. The PPV was 96%, and the NPV was 77%. In practical terms, this indicates that for every 100 deaths recorded in the composite data set, 96 were also present in the NDI; conversely, for every 100 patients without a recorded death in the composite, 77 had no corresponding record in the NDI. Subgroup analyses are given in the Data Supplement (Appendix and Appendix Table S1).

TABLE 1.

Primary End Point Results

No. of Patients	Sensitivity, %	Specificity, %	PPV, %	NPV, %	Date Agreement (0-day tolerance), %	Date Agreement (15-day tolerance), %	Date Agreement (30-day tolerance), %
17,597	81.79 (81.05 to 82.52)	94.82 (94.26 to 95.33)	96.19 (95.78 to 96.57)	76.51 (75.59 to 77.42)	86.11 (85.37 to 86.82)	93.78 (93.26 to 94.27)	95.61 (95.17 to 96.02)

Open in a new tab

NOTE. 95% CIs are shown.

Abbreviations: NPV, negative predictive value; PPV, positive predictive value.

Date Agreement

Death dates available as full precision (YYYY-MM-DD) in Tempus data had an exact date match with an NDI of 86%. Date agreement increased to 94% when allowing for a ±15 day tolerance. For dates available as full precision (YYYY-MM-DD) or year-month precision (YYYY-MM), 96% matched to NDI within a ±30 day tolerance.

Combining Mortality Data Sets

Analysis of individual mortality data streams revealed that the composite approach substantially improved death capture (Table 2). Without the inclusion of third-party data sets, only 17% of NDI deaths would have been captured using Tempus EHR-derived data alone. The addition of third-party mortality data increased sensitivity to 80%, and further addition of third-party claims data increased the sensitivity to 82%. The specificity was 99% for Tempus EHR-derived data alone and decreased to 95% with the addition of mortality sources. The PPV was relatively stable with the addition of mortality sources, starting at 97% for Tempus EHR-derived data alone and decreasing to 96% for the Tempus Composite Mortality Variable. The NPV for Tempus EHR-derived data alone was 43% and increased to 77% with the inclusion of additional sources. For exact date agreement, we observed an increase from 80% with Tempus EHR-derived data alone to 86% with the composite variable. The ±15-day agreement similarly increased from 93% to 94%, and ±30-day agreement remained stable at 96%.

TABLE 2.

Performance Metrics by Mortality Data Stream(s)

Mortality Source	Sensitivity, %	Specificity, %	PPV, %	NPV, %	Date Agreement (0-day tolerance), %	Date Agreement (15-day tolerance), %	Date Agreement (30-day tolerance), %
EHR-derived (only)	16.77 (16.07 to 17.48)	99.10 (98.84 to 99.31)	96.75 (95.84 to 97.50)	42.69 (41.91 to 43.47)	79.77 (77.83 to 81.60)	93.01 (91.73 to 94.15)	95.57 (94.52 to 96.46)
Third-party mortality (only)	76.79 (75.99 to 77.59)	95.10 (94.56 to 95.60)	96.16 (95.73 to 96.55)	71.94 (71.00 to 72.87)	87.30 (86.58 to 87.99)	93.94 (93.41 to 94.43)	95.80 (95.36 to 96.21)
Third-party claims (only)	63.95 (63.04 to 64.86)	96.88 (96.44 to 97.29)	97.04 (96.62 to 97.42)	62.71 (61.77 to 63.63)			96.03 (95.55 to 96.47)
EHR-derived + third-party mortality	79.97 (79.21 to 80.72)	94.89 (94.34 to 95.40)	96.16 (95.74 to 96.54)	74.77 (73.84 to 75.69)	86.11 (85.37 to 86.82)	93.78 (93.26 to 94.27)	95.63 (95.19 to 96.05)
EHR-derived + third-party mortality + third-party claims	81.79 (81.05 to 82.52)	94.82 (94.26 to 95.33)	96.19 (95.78 to 96.57)	76.51 (75.59 to 77.42)	86.11 (85.37 to 86.82)	93.78 (93.26 to 94.27)	95.61 (95.17 to 96.02)

Open in a new tab

NOTE. 95% CIs are shown.

Abbreviations: EHR, electronic health record; NPV, negative predictive value; PPV, positive predictive value.

CC/DC

When excluding patients lost to follow-up at time t, the sensitivity was 95.95% at t = 6 months, 96.57% at t = 12 months, and 97.96% at t = 24 months. The specificity was 99.64% at 6 months, 99.29% at 12 months, and 98.25% at 24 months (Table 3).

TABLE 3.

CC and DC

Time Frame, Months	Sensitivity, %	Specificity, %
6	95.96 (94.96 to 96.82)	99.64 (99.53 to 99.73)
12	96.57 (95.88 to 97.16)	99.29 (99.13 to 99.43)
24	97.96 (97.56 to 98.31)	98.25 (97.96 to 98.50)

Open in a new tab

NOTE. 95% CIs are shown.

Abbreviations: CC, cumulative case; DC, dynamic control.

DISCUSSION

Real-world databases are valuable research tools for precision oncology, and the reliability of OS analyses is dependent on the quality of the mortality data captured in these databases. Here, we assessed the performance of a composite mortality variable by benchmarking it against the NDI. Our findings indicate that the composite mortality variable captures 82% of death events recorded in the NDI. We observed increased sensitivity when using the CC/DC approach, suggesting that patients with deaths not captured by this variable are censored appropriately rather than misclassified as alive. The high PPV observed indicates that when the composite mortality variable identifies a patient as deceased, this classification is highly reliable. Date agreement analyses showed strong concordance with NDI-reported dates of death, with an exact agreement of 86%. Agreement increased when allowing for ±15-day and ±30-day tolerance windows. Collectively, these results suggest that while the composite mortality variable may not capture every death event, the death events it does report are highly dependable, and patients with uncaptured death events are correctly categorized as lost to follow-up rather than misclassified as alive.

Consistent with previous research,^7,12 we observed that the addition of external mortality data sources dramatically improved sensitivity to levels suitable for outcome research. A substantial portion of this sensitivity improvement is attributed to the inclusion of third-party mortality data from Veritas, which aggregates information from various mortality sources, including government records, obituary notices, and public cemetery records. We also observed a modest additional increase in sensitivity with the inclusion of third-party claims data although mortality data are not the primary rationale for integrating claims into the Tempus database. While the marginal gain in mortality sensitivity from third-party claims is limited because of overlap with the dedicated mortality aggregator, the claims data set serves a critical second purpose: supplying longitudinal activity signals (ie, known alive dates from administrative claims) used to confirm alive status.

Much of the variability observed across subgroups in our study aligns with patterns reported previously. For example, results from the study by Zhang et al¹² similarly show BC and CRC among the cancer types with the lowest sensitivities, NSCLC and PANC among the highest, and sensitivity increasing with patient age. Given these similarities, we hypothesize that this variability may be driven by differences in expected survival. Patients with longer survival require extended follow-up, which increases their chance of being lost to follow-up from the source EHR. Subsequently, locating these patients in supplementary external data sets becomes more challenging over time as the extended period increases the likelihood of personal data changes that complicate record linkage. This association extends to metastatic status, where premetastatic populations may experience longer survival and consequently lower sensitivity compared with those with metastatic disease. Therefore, reduced sensitivity in these groups may stem from this dual challenge of patient attrition from the EHR and diminished ability to match identifiers across external sources.

Expected survival also appears to influence specificity, a hypothesis supported by two patterns observed in both our study and the study by Zhang et al:¹² PANC had the lowest specificity, and specificity consistently declined with increasing patient age. While the NDI is the gold standard, it may not capture every death. In a cohort with a very high mortality rate like PANC, any true death missed by the NDI but captured by a RWD source would be misclassified as a false positive, thereby artificially lowering the reported specificity. Though not all false positives in our study are NDI errors, this effect could impose an upper limit on achievable specificity in shorter-lived populations.

While the expected survival hypothesis may explain many observed patterns, other factors likely contribute to the variability in specific subgroups. For instance, consistent with previous research,^6,12 sensitivity was lower among patients from several racial and ethnic minority groups, including Asian and Black or African American patients, compared with White patients. For some populations, there may be challenges in harmonizing patient names across different cultural and linguistic contexts, including variations in name structure (eg, compound surnames, multiple given names, or transposition of family and given names), which complicate patient identification and record linkage across data sets. Similarly, our observation of lower sensitivity in the Western United States may be influenced by other factors, including regional differences in end-of-life practices. Variations in where death occurs (eg, hospital v at home), customs for death notification, or the prevalence of cremation over burial could affect data capture. Ultimately, these specific variations underscore the need for continued refinement of methods for RWD collection and data linkage.

This study has several limitations. First, the primary objective was to validate the mortality end point specifically for use within the Tempus multimodal database. Therefore, the study population was intentionally limited to patients who received Tempus NGS testing. While this design is essential for the intended use case of enabling molecularly driven outcome analysis, it means that the cohort may not be fully representative of the general oncology population. Second, this study validated an all-cause mortality end point rather than cancer-specific mortality; therefore, the captured events include deaths from other causes. This scope is consistent with standard definitions of OS in oncology. Third, this study evaluated the ascertainment of death, not the ascertainment of aliveness. Consequently, the absence of a recorded event should not be used to infer that a patient is under active clinical management. Despite these limitations, this study successfully validates our composite mortality variable against the NDI gold standard for use in real-world outcome research.

In conclusion, this study validates our composite mortality variable as a robust and reliable end point for conducting real-world evidence analyses. The high accuracy of identified deaths and the appropriate censoring of patients lost to follow-up support its use in OS analyses. Validating the quality of these mortality data is a foundational step for the broader goal of the Tempus multimodal data set: enabling high-quality research to improve patient outcomes and advance cancer drug development.

Joshuah Kapilivsky

Employment: Tempus

Stock and Other Ownership Interests: Tempus

Patents, Royalties, Other Intellectual Property: SYSTEMS AND METHODS FOR MULTI-LABEL CANCER CLASSIFICATION (Patent Application, Number 17/150,992)

Farahnaz Islam

Employment: Tempus AI, Inc

Stock and Other Ownership Interests: Tempus AI, Inc

Jessica Dow

Employment: Tempus AI

Stock and Other Ownership Interests: Tempus AI

Shannon Moran

Employment: Tempus AI

Emilie Scherrer

Employment: TempusAI

Stock and Other Ownership Interests: Tempus AI, Pfizer, Merck

Seung Won Hyun

Employment: Tempus AI, Inc

Leadership: Tempus AI, Inc

Stock and Other Ownership Interests: Tempus AI, Inc

Chithra Sangli

Employment: Tempus AI

Stock and Other Ownership Interests: AbbVie/Abbott, AbbVie/Abbott, Tempus, Novo Nordisk, Johnson & Johnson/Janssen

No other potential conflicts of interest were reported.

DISCLAIMER

All authors are current or former employees of Tempus AI, Inc and may hold stock in the company.

PREPRINT VERSION

Preprint version available on medRxiv (https://doi.org/10.1101/2025.08.20.25334011).

DATA SHARING STATEMENT

Data used in this research were collected in a real-world health care setting and are subject to controlled access for privacy and proprietary reasons. Furthermore, access to and use of NDI data in this study are governed by a formal agreement signed as part of the NDI application process. This agreement stipulates that the data may not be published or released in any other form and are to be utilized exclusively for the objectives of this specific study.¹⁰

AUTHOR CONTRIBUTIONS

Conception and design: Joshuah Kapilivsky, Farahnaz Islam, Jessica Dow, Shannon Moran, Emilie Scherrer, Seung Won Hyun, Chithra Sangli

Administrative support: Joshuah Kapilivsky

Collection and assembly of data: Joshuah Kapilivsky, Emma K. Roth, Chithra Sangli

Data analysis and interpretation: Joshuah Kapilivsky, Emma K. Roth, Emilie Scherrer, Chithra Sangli

Manuscript writing: All authors

Final approval of manuscript: All authors

Accountable for all aspects of the work: All authors

AUTHORS’ DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Joshuah Kapilivsky

Employment: Tempus

Stock and Other Ownership Interests: Tempus

Patents, Royalties, Other Intellectual Property: SYSTEMS AND METHODS FOR MULTI-LABEL CANCER CLASSIFICATION (Patent Application, Number 17/150,992)

Farahnaz Islam

Employment: Tempus AI, Inc

Stock and Other Ownership Interests: Tempus AI, Inc

Jessica Dow

Employment: Tempus AI

Stock and Other Ownership Interests: Tempus AI

Shannon Moran

Employment: Tempus AI

Emilie Scherrer

Employment: TempusAI

Stock and Other Ownership Interests: Tempus AI, Pfizer, Merck

Seung Won Hyun

Employment: Tempus AI, Inc

Leadership: Tempus AI, Inc

Stock and Other Ownership Interests: Tempus AI, Inc

Chithra Sangli

Employment: Tempus AI

Stock and Other Ownership Interests: AbbVie/Abbott, AbbVie/Abbott, Tempus, Novo Nordisk, Johnson & Johnson/Janssen

No other potential conflicts of interest were reported.

REFERENCES

1.Ter-Minassian M, Basra SS, Watson ES, et al. : Validation of US CDC National Death Index mortality data, focusing on differences in race and ethnicity. BMJ Health Care Inform 30:e100737, 2023 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Jamal-Allial A, Sponholtz T, Vojjala SK, et al. : Validation of mortality data sources compared to the National Death Index in the healthcare integrated research database. Pragmat Obs Res 16:19–25, 2025 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Kim MK, Rouphael C, McMichael J, et al. : Challenges in and opportunities for electronic health record-based data analysis and interpretation. Gut Liver 18:201-208, 2024 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Haneuse S, Arterburn D, Daniels MJ: Assessing missing data assumptions in EHR-based studies: A complex and underappreciated task. JAMA Netw Open 4:e210184, 2021 [DOI] [PubMed] [Google Scholar]
5.Liu P, Lan Z, Walker B, et al. : Augmentation of real-world mortality data in the electronic medical record: Assessing the impact on overall survival estimates in multiple myeloma. Blood 142:7395, 2023. (suppl 1) [Google Scholar]
6.Lerman MH, Holmes B, St Hilaire D, et al. : Validation of a mortality composite score in the real-world setting: Overcoming source-specific disparities and biases. JCO Clin Cancer Inform 10.1200/CCI.20.00143 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Curtis MD, Griffith SD, Tucker M, et al. : Development and validation of a high-quality composite real-world mortality endpoint. Health Serv Res 53:4460-4476, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Shao P, Tepsick JG, Walker B, et al. : Improving real-world mortality data quality in oncology research: Augmenting electronic medical records with obituary, social security death index, and commercial claims data. JCO Clin Cancer Inform 10.1200/CCI.23.00014 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Tempus AI : 43rd Annual J.P. Morgan Healthcare Conference, 2025. https://investors.tempus.com/static-files/80e049bc-5dac-4b68-b699-984cd28763f2 [Google Scholar]
10.Centers for Disease Control and Prevention : National Death Index: User’s Guide. https://www.cdc.gov/nchs/data/ndi/2024-NDI-User-Guide.pdf
11.Kamarudin AN, Cox T, Kolamunnage-Dona R: Time-dependent ROC curve analysis in medical research: Current methods and applications. BMC Med Res Methodol 17:53, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Zhang Q, Gossai A, Monroe S, et al. : Validation analysis of a composite real-world mortality endpoint for patients with cancer in the United States. Health Serv Res 56:1281-1287, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[b1] 1.Ter-Minassian M, Basra SS, Watson ES, et al. : Validation of US CDC National Death Index mortality data, focusing on differences in race and ethnicity. BMJ Health Care Inform 30:e100737, 2023 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b2] 2.Jamal-Allial A, Sponholtz T, Vojjala SK, et al. : Validation of mortality data sources compared to the National Death Index in the healthcare integrated research database. Pragmat Obs Res 16:19–25, 2025 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b3] 3.Kim MK, Rouphael C, McMichael J, et al. : Challenges in and opportunities for electronic health record-based data analysis and interpretation. Gut Liver 18:201-208, 2024 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b4] 4.Haneuse S, Arterburn D, Daniels MJ: Assessing missing data assumptions in EHR-based studies: A complex and underappreciated task. JAMA Netw Open 4:e210184, 2021 [DOI] [PubMed] [Google Scholar]

[b5] 5.Liu P, Lan Z, Walker B, et al. : Augmentation of real-world mortality data in the electronic medical record: Assessing the impact on overall survival estimates in multiple myeloma. Blood 142:7395, 2023. (suppl 1) [Google Scholar]

[b6] 6.Lerman MH, Holmes B, St Hilaire D, et al. : Validation of a mortality composite score in the real-world setting: Overcoming source-specific disparities and biases. JCO Clin Cancer Inform 10.1200/CCI.20.00143 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b7] 7.Curtis MD, Griffith SD, Tucker M, et al. : Development and validation of a high-quality composite real-world mortality endpoint. Health Serv Res 53:4460-4476, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b8] 8.Shao P, Tepsick JG, Walker B, et al. : Improving real-world mortality data quality in oncology research: Augmenting electronic medical records with obituary, social security death index, and commercial claims data. JCO Clin Cancer Inform 10.1200/CCI.23.00014 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b9] 9.Tempus AI : 43rd Annual J.P. Morgan Healthcare Conference, 2025. https://investors.tempus.com/static-files/80e049bc-5dac-4b68-b699-984cd28763f2 [Google Scholar]

[b10] 10.Centers for Disease Control and Prevention : National Death Index: User’s Guide. https://www.cdc.gov/nchs/data/ndi/2024-NDI-User-Guide.pdf

[b11] 11.Kamarudin AN, Cox T, Kolamunnage-Dona R: Time-dependent ROC curve analysis in medical research: Current methods and applications. BMC Med Res Methodol 17:53, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b12] 12.Zhang Q, Gossai A, Monroe S, et al. : Validation analysis of a composite real-world mortality endpoint for patients with cancer in the United States. Health Serv Res 56:1281-1287, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Validation of a Composite Mortality End Point in a Large Clinicogenomic Real-World Database of Patients With Advanced Cancer

Joshuah Kapilivsky, BS

Farahnaz Islam, PhD

Emma K Roth, MS

Jessica Dow, MS

Shannon Moran, PhD

Emilie Scherrer, PhD

Seung Won Hyun, PhD

Chithra Sangli, MS

Abstract

PURPOSE

METHODS

RESULTS

CONCLUSION

BACKGROUND

CONTEXT

METHODS

Study Design

Data Sources

Cohort Selection

FIG 1.

Vital Status Harmonization

Validation of Composite Mortality Variable

Ethics Statement

RESULTS

Study Population

Primary End Point

TABLE 1.

Date Agreement

Combining Mortality Data Sets

TABLE 2.

CC/DC

TABLE 3.

DISCUSSION

DISCLAIMER

PREPRINT VERSION

DATA SHARING STATEMENT

AUTHOR CONTRIBUTIONS

AUTHORS’ DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases