Abstract
Background:
Recent therapeutic advances and screening technologies have improved survival among patients with lung cancer, who are now at high risk of developing second primary lung cancer (SPLC). Recently, an SPLC risk-prediction model (called SPLC-RAT) was developed and validated using data from population-based epidemiological cohorts and clinical trials, but real-world validation has been lacking. We evaluate the predictive performance of SPLC-RAT in a hospital-based cohort of lung cancer survivors.
Methods:
We analyzed data from 8,448 ever-smoking patients diagnosed with initial primary lung cancer (IPLC) in 1997–2006 at Mayo Clinic, with each patient followed for SPLC through 2018. We assessed the predictive performance of SPLC-RAT and further explored the potential of improving SPLC detection through risk model-based surveillance using SPLC-RAT versus existing clinical surveillance guidelines.
Results:
Of 8,448 IPLC patients, 483 (5.7%) developed SPLC over 26,470 person-years. The application of SPLC-RAT showed high discrimination (AUC: 0.81). When the cohort was stratified by a 10-year risk threshold of ≥5.6% (i.e., 80th percentile from the SPLC-RAT development cohort), the observed SPLC incidence was significantly elevated in the high-risk versus low-risk subgroup (13.1% vs. 1.1%, P<1×10−6). The risk-based surveillance through SPLC-RAT (≥5.6% threshold) outperformed the NCCN guidelines with higher sensitivity (86.4% vs. 79.4%) and specificity (38.9% vs. 30.4%) and required 20% fewer CT follow-ups needed to detect one SPLC (162 vs. 202).
Conclusion:
In a large, hospital-based cohort, we validated the predictive performance of SPLC-RAT in identifying high-risk survivors of SPLC and showed its potential to improve SPLC detection through risk-based surveillance.
Keywords: Second Primary Neoplasms, Lung Cancer, Active Surveillance, Risk Assessment, Early Detection of Cancer
Plain Language Summary
Lung cancer survivors have a high risk of developing second primary lung cancer (SPLC). However, no evidence-based guidelines for SPLC surveillance are available for lung cancer survivors. Recently, an SPLC risk-prediction model was developed and validated using data from population-based epidemiological cohorts and clinical trials, but real-world validation has been lacking. Using a large, real-world cohort of lung cancer survivors, we showed the high predictive accuracy and risk-stratification ability of the SPLC risk-prediction model. Furthermore, we demonstrated the potential to enhance efficiency in detecting SPLC using risk model-based surveillance strategies compared to the existing consensus-based clinical guidelines, including the NCCN.
Precis
Given the rapidly growing number of lung cancer survivors who are now at high risk of developing second primary lung cancer (SPLC), previous studies have identified SPLC risk factors and developed SPLC risk-prediction models, but they lack insight into real-world validation to help improve clinical decision-making in SPLC surveillance for lung cancer survivors. Using a large, hospital-based real-world cohort of lung cancer survivors, we validated the predictive accuracy of an SPLC risk-prediction model (AUC of 0.81), which can identify high-risk lung cancer survivors for SPLC and can be incorporated into clinical decision-making for SPLC surveillance to improve the systematic management of lung cancer survivors.
Introduction
Lung cancer is the leading cause of cancer death.1 With advances in treatment and the adoption of low-dose computed tomography (CT) screening, the number of lung cancer survivors is expected to increase rapidly.2 Recent studies show that lung cancer survivors have a high risk of developing second primary lung cancer (SPLC), which is 3–4 times higher than the risk of a person in the general population developing initial primary lung cancer (IPLC).3,4 Moreover, over 80% of the detected SPLC cases are known to be comprised of asymptomatic individuals,5 with SPLC patients showing significantly worse survival compared to those who remain with single primary lung cancer.6 Thus, evidence from prior studies suggests the need to develop efficient strategies to identify and manage high-risk survivors for SPLC.
Recently, we developed a risk-prediction model for SPLC among lung cancer patients using a large, ethnically diverse population from the Multiethnic Cohort Study (MEC).7 This risk-prediction tool, called the SPLC-Risk Assessment Tool (SPLC-RAT), incorporates several risk factors for SPLC,8–12 including smoking history, IPLC tumor characteristics, and treatment history (https://splc.shinyapps.io/SPLC-RAT/). This model was validated using several large, population-based clinical trial data, including the National Lung Screening Trial (NLST) and the Prostate, Lung, Colorectal, and Ovarian (PLCO) cancer screening trial (AUC of 72.7–78.8%).7 However, validation of SPLC-RAT has been lacking using real-world data that include patient populations seen in actual clinical practice, albeit necessary to improve clinical decision-making for efficient and evidence-based SPLC surveillance.
The existing post-treatment surveillance guidelines based on the expert panels’ consensus opinion, including the National Comprehensive Cancer Network (NCCN)13 and the American Society of Clinical Oncology (ASCO),14 recommend CT every 2–6 months for the first 2–5 years then annual CT thereafter based on stage and treatment history for IPLC patients. However, recent studies showed that several other factors, such as smoking history,7,8 are associated with ongoing and increased risk of SPLC. Despite their importance, these risk factors are not reflected in the existing guidelines, thereby raising questions on the efficiency of the current surveillance guidelines for identifying high-risk patients for SPLC to be screened or followed up.
In this study, we validate the predictive performance of SPLC-RAT using a large, hospital-based observational cohort of lung cancer survivors. We further explore the model’s potential of improving SPLC detection through risk-based surveillance by estimating various surveillance efficiency parameters in this cohort.
Methods
STUDY POPULATION AND STUDY OUTCOME
The study population consisted of 8,448 patients with a smoking history in the Mayo Clinic Epidemiology and Genetics of Lung Cancer Research Program (EGLC) who were diagnosed with IPLC between 1997–2006 and followed through 2018. The Mayo Clinic EGLC cohort comprised patients who sought care on the Minnesota campus and were granted research authorization (>97%). Over 70% of the patients reside in the U.S. Mid-West region; approximately 88% were White, 2% Native American, 1% each African American and Asian/Pacific Islander, and less than 1% Latino.
We excluded those whose smoking status was missing (N=10), which is one of the key variables in predicting SPLC risk. The study outcome was defined as the time from IPLC diagnosis to SPLC, death, or loss to follow-up, whichever occurred first. SPLC cases were identified through electronic medical records from Mayo Clinic and outside medical facilities where patients received care. SPLC cases were confirmed using one of the following two methods: (i) manual verification using a radiologic report if a single malignant pulmonary nodule is either in a different lobe of the lung with a different histology from IPLC, or diagnosed after 2 years from IPLC diagnosis with no indication of metastasis15,16 or (ii) through biopsy (i.e., pathologically confirmed of a different histological cell type from that of the IPLC).17 The patient’s vital status was verified through various sources and was virtually 100% complete. For censored patients, the last date of follow-up was obtained from the most recent medical record or the last follow-up of a repeated annual questionnaire.
VALIDATION OF SPLC-RAT USING THE MAYO CLINIC DATA
Overview of SPLC-RAT.
SPLC-RAT7 was developed to predict a 10-year SPLC risk at the time of IPLC diagnosis by applying cause-specific Cox regression (CSC)18 to account for the competing risk of death among IPLC patients (Table S1). The model included the following factors: the histology and stage of IPLC, surgery for IPLC, prior history of cancer, smoking intensity (average number of cigarettes smoked per day during the time one smoked), and meeting the 2013 USPSTF eligibility criteria (i.e., aged 55–80 years, smoked ≥30 pack-years and ≤15 years since cessation). In SPLC-RAT, the risk of SPLC increased with a prior history of cancer, a large cell IPLC (vs. a squamous cell IPLC), and IPLC surgery. Those who met the 2013 USPSTF criteria at the time of IPLC diagnosis also had an increased risk of SPLC (Table S1). We used the model implementation that is available as an open-access application for public use (https://splc.shinyapps.io/SPLC-RAT/).
Validation of predictive accuracy of SPLC-RAT.
SPLC-RAT was applied to the patient-level dataset from the Mayo Clinic cohort, and the predictive performance was assessed by evaluating calibration, discrimination,19 and predictive accuracy.20 The calibration plot was used to assess calibration (i.e., the overall agreement between predicted and observed event probability). Discrimination (i.e., the ability to differentiate between those with SPLC and without SPLC) was measured using the Area Under the Receiver Operating Characteristics Curve (AUC), which is considered as clinically useful if it is higher than 0.7. Brier Score (i.e., expected quadratic error or prediction) was used to assess predictive accuracy and offer the combined performance of discrimination and calibration. As a sensitivity analysis, we evaluated the model performance among patients with early-stage IPLC who are more likely to survive longer and across different subgroups defined by age at IPLC diagnosis, smoking history, and IPLC histology.
Overall missing rates in the variables used in SPLC-RAT were 0.1–12.5% (Table S2) in Mayo Clinic data. We examined the missing data mechanism in this data and determined and assumed it was Missing at Random (MAR) (Table S3).21 Hence, we performed multiple imputations and applied Rubin’s rules to obtain the pooled estimate of the predicted probabilities from the model in evaluating predictive performance.21 In addition, we conducted a sensitivity analysis with complete-case data.
Validation of risk stratification ability of SPLC-RAT.
We divided the entire IPLC patients into two groups with high versus low risk of developing SPLC based on a 10-year risk threshold of ≥5.6% (i.e., SPLC-RAT-5.6%) that was derived from the 80th percentile of the estimated risk from the development cohort (MEC). That is, an individual was deemed to be in a high-risk group if the estimated 10-year SPLC risk from SPLC-RAT is equal to or greater than 5.6%. We then evaluated the observed cumulative incidence of SPLC in high versus low risk groups using the Aalen-Johansen estimator to account for the competing risk of death22,23 and tested the difference using the two-sided Gray’s method.24 For comparison, we also evaluated the risk stratification ability among eligible vs. ineligible patients to be surveilled using the clinical surveillance criteria recommended by the NCCN and ASCO (Table S4; Supplementary Methods).
SURVEILLANCE EFFICIENCY
To evaluate the potential of improving SPLC detection through risk-based surveillance using SPLC-RAT, we compared various surveillance efficiency metrics under the three eligibility criteria for SPLC surveillance proposed by (i) SPLC-RAT-5.6%, (ii) the NCCN and (iii) the ASCO (Table S4) in the Mayo Clinic cohort. We considered a hypothetical surveillance program for SPLC, implemented during five years after IPLC diagnosis. This program involved a CT follow-up every six months for five years, with a maximum of 10 rounds of CT follow-ups. We assume that an SPLC that occurred within 6 months from each follow-up time was detectable by CT.
For SPLC-RAT-5.6%, the eligibility for receiving a follow-up CT was determined if each individual’s 10-year SPLC risk predicted through SPLC-RAT is higher or equal to 5.6% (see Validation of risk stratification ability of SPLC-RAT). The eligibility for receiving follow-up CTs under the NCCN and the ASCO were based on IPLC stage, treatment, and histology (Table S4).
We calculated the following surveillance efficiency metrics: sensitivity (the probability of successfully labeling an IPLC patient with SPLC as “eligible”, i.e., a true positive rate), specificity (the probability of successfully labeling an IPLC patient without SPLC as “non-eligible”, i.e., a true negative rate), and the number of individuals needed to surveil (NNS) to detect one SPLC (the total number of surveillance CT follow-ups divided by the surveillance-detected SPLC cases). We estimated the surveillance efficiency metrics for three criteria (i.e., SPLC-RAT-5.6%, NCCN, and ASCO) at each CT follow-up after IPLC diagnosis, and then evaluated overall surveillance efficiency using cumulative estimates of these performance metrics summed across all follow-ups (up to 10 rounds) for five years from IPLC diagnosis. We conducted various sensitivity analyses to evaluate surveillance efficiency under varying settings, taking into account varied follow-up intervals and specific patient populations (see Supplementary Methods).
ROLE OF THE FUNDING SOURCE
The study’s funder had no role in study design, data collection, data analysis, data interpretation, or report writing.
Results
Of 8,448 IPLC patients, 483 (5.7%) developed SPLC over 26,470 person-years (Table 1). The cumulative incidence of SPLC gradually increased over time with a 10-year cumulative incidence of 5.5% (95% confidence interval (CI) [4.9–5.9]) (Figure S1). The Mayo Clinic cohort comprised 51.5% active smokers at initial diagnosis and 62.5% patients with early-stage IPLC (Table 1).
Table 1.
Characteristics of ever-smoking patients diagnosed with IPLC in the Mayo Clinic cohort
Variables | Total (N, %) |
Outcome |
||
---|---|---|---|---|
SPLC (N, %) |
All-cause death (N, %) | Censored (N, %) |
||
Total (N, Row %) | 8448 (100.0) | 483 (5.7) | 7444 (88.1) | 521 (6.2) |
Follow-up Time (Mean, SD)a | 3.1 (4.3) | 3.1 (3.7) | 2.6 (3.6) | 10.9 (6.3) |
DEMOGRAHPICS | ||||
Age at IPLC diagnosis | ||||
Mean (SD) | 66.4 (10.5) | 67.0 (8.9) | 66.9 (10.5) | 59.3 (10.6) |
Age group at IPLC diagnosis | ||||
<50 | 631 (7.5) | 19 (3.9) | 508 (6.8) | 104 (20.0) |
50–59 | 1545 (18.3) | 81 (16.8) | 1302 (17.5) | 162 (31.1) |
60–69 | 2843 (33.7) | 188 (38.9) | 2486 (33.4) | 169 (32.4) |
70–79 | 1554 (18.4) | 104 (21.5) | 1394 (18.7) | 56 (10.7) |
80+ | 1875 (22.2) | 91 (18.8) | 1754 (23.6) | 30 (5.8) |
Sex | ||||
Female | 3410 (40.4) | 240 (49.7) | 2894 (38.9) | 276 (53.0) |
Male | 5038 (59.6) | 243 (50.3) | 4550 (61.1) | 245 (47.0) |
Race/ethnicity | ||||
White | 7158 (84.7) | 446 (92.3) | 6297 (84.6) | 415 (79.7) |
African American | 78 (0.9) | 2 (0.4) | 70 (0.9) | 6 (1.2) |
Hispanic | 36 (0.4) | 3 (0.6) | 29 (0.4) | 4 (0.8) |
Asian | 26 (0.3) | 2 (0.4) | 19 (0.3) | 5 (1.0) |
Others | 245 (2.9) | 15 (3.1) | 192 (2.6) | 38 (7.3) |
Missing | 905 (10.7) | 15 (3.1) | 837 (11.2) | 53 (10.2) |
Education | ||||
High school or lower | 4316 (51.1) | 239 (49.5) | 3861 (51.9) | 216 (41.5) |
Some college/4-year college | 2013 (23.8) | 139 (28.8) | 1703 (22.9) | 171 (32.8) |
Postgraduate | 655 (7.8) | 46 (9.5) | 548 (7.4) | 61 (11.7) |
Missing | 1464 (17.3) | 59 (12.2) | 1332 (17.9) | 73 (14.0) |
INITIAL TUMOR CHARACTERISTICS | ||||
Stage of IPLC | ||||
Early (I-III or Limited) | 5278 (62.5) | 456 (96.3) | 4342 (58.3) | 471 (90.4) |
Advanced (IV or Extensive) | 3058 (36.2) | 16 (3.3) | 2995 (40.2) | 47 (9.0) |
Missing | 112 (1.3) | 2 (0.4) | 107 (1.4) | 3 (0.6) |
Histology of IPLC | ||||
Adenocarcinoma | 3566 (42.2) | 276 (57.1) | 3028 (40.7) | 262 (50.3) |
Large cell | 220 (2.6) | 17 (3.5) | 190 (2.6) | 13 (2.5) |
Squamous cell carcinoma | 2090 (24.7) | 133 (27.5) | 1846 (24.8) | 111 (21.3) |
NSCLC/NOS | 1119 (13.2) | 17 (3.5) | 1070 (14.4) | 32 (6.1) |
Small cell lung cancer | 1025 (12.1) | 22 (4.6) | 976 (13.1) | 27 (5.2) |
Other | 421 (5.0) | 17 (3.5) | 328 (4.4) | 76 (14.6) |
Missing | 7 (0.1) | 1 (0.2) | 6 (0.1) | 0 (0.0) |
Radiation for IPLC | ||||
No | 5369 (63.6) | 332 (68.7) | 4621 (62.1) | 416 (79.8) |
Yes | 2705 (32.0) | 151 (31.3) | 2460 (33.0) | 94 (18.0) |
Missing | 374 (4.4) | 0 (0.0) | 363 (4.9) | 11 (2.1) |
Surgery for IPLC | ||||
No | 4767 (56.4) | 46 (9.5) | 4620 (62.1) | 101 (19.4) |
Yes | 3307 (39.1) | 437 (90.5) | 2461 (33.1) | 409 (78.5) |
Missing | 374 (4.4) | 0 (0.0) | 363 (4.9) | 11 (2.1) |
Eligibility to USPSTF criteria at diagnosisb | ||||
Not eligible | 3906 (46.2) | 197 (40.8) | 3388 (45.5) | 321 (61.6) |
Eligible | 3764 (44.6) | 270 (55.9) | 3313 (44.5) | 181 (34.7) |
Missing | 778 (9.2) | 16 (3.3) | 743 (10.0) | 19 (3.6) |
CLINICAL FACTORS | ||||
Prior history of cancer | ||||
No | 6213 (73.5) | 344 (71.2) | 5441 (73.1) | 428 (82.1) |
Yes | 1318 (15.6) | 126 (26.1) | 1123 (15.1) | 69 (13.2) |
Missing | 917 (10.9) | 13 (2.7) | 880 (11.8) | 24 (4.6) |
Family history of lung cancer | ||||
No | 7081 (83.8) | 393 (81.4) | 6241 (83.8) | 447 (85.8) |
Yes | 1367 (16.2) | 90 (18.6) | 1203 (16.2) | 74 (14.2) |
COPD | ||||
No | 5217 (61.8) | 252 (52.2) | 4603 (61.8) | 362 (69.5) |
Yes | 2638 (31.2) | 174 (36.0) | 2331 (31.3) | 133 (25.5) |
Missing | 593 (7.0) | 57 (11.8) | 510 (6.9) | 26 (5.0) |
BMI | ||||
Mean (SD) | 26.5 (5.4) | 26.6 (4.9) | 26.5 (5.4) | 27.2 (5.2) |
Missing | 1135 (13.4) | 19 (3.9) | 1066 (14.3) | 50 (9.6) |
SMOKING-RELATED FACTORS | ||||
Smoking status | ||||
Former | 4096 (48.5) | 261 (54.0) | 3558 (47.8) | 277 (53.2) |
Current/Active | 4352 (51.5) | 222 (46.0) | 3886 (52.2) | 244 (46.8) |
Smoking duration | ||||
Mean (SD) | 37.4 (13.2) | 39.4 (12.4) | 37.8 (13.1) | 30.5 (12.7) |
Missing | 1060 (12.5) | 23 (4.8) | 1001 (13.4) | 36 (6.9) |
Cigarettes per day (CPD) | ||||
Mean (SD) | 27. 0 (13.0) | 25.4 (12.0) | 27.3 (13.1) | 24.2 (12.7) |
Missing | 984 (11.6) | 21 (4.3) | 927 (12.5) | 36 (6.9) |
Smoking pack-years | ||||
Mean (SD) | 51.4 (30.7) | 51.2 (29.0) | 52.3 (30.9) | 39.4 (26.5) |
Missing | 1031 (12.2) | 21 (4.3) | 978 (13.1) | 32 (6.1) |
Smoking quit yearsc | ||||
Mean (SD) | 8.2 (11.9) | 8.8 (11.6) | 8.1 (11.9) | 8.9 (12.6) |
Abbreviations: BMI, body mass index; COPD, chronic obstructive pulmonary disease; CPD, cigarettes smoked per day; IPLC, initial primary lung cancer; NSCLC/NOS, non-small cell lung cancer/not otherwise specified; SPLC, second primary lung cancer; SD, standard deviation; USPSTF, United States Preventive Services Task Force.
From time at initial diagnosis of lung cancer
Eligibility to the USPSTF 2013 criteria for lung cancer screening (Age 55–80 years, ≥30 smoking pack-years and ≤15 quit years)
Among current smokers (quit years=0)
The predictive performance of SPLC-RAT in the application to Mayo Clinic cohort showed high discrimination (i.e., the ability to distinguish individuals with SPLC from non-SPLC) with AUC of 0.80 (95% CI: 0.78–0.82) and predictive accuracy (Brier score 4.4% [95% CI: 3.7–5.1]) with good calibration as shown in calibration plot (Figure 1). The predictive performance was robust among early-stage IPLC patients who were likely to survive longer (Figure S2), across different subgroups (Figures S3–5), and when using complete-case data (Figure S6).
Figure 1. Validation of SPLC-RAT for predicting a 10-year SPLC risk among ever-smoking lung cancer patients in the Mayo Clinic cohort (N=8,448).
(A) Calibration plots with discriminative performance (area under the receiver operative characteristics curve; AUC) and prediction accuracy (Brier score). (B) Plots of mean difference between predicted and observed probability (calibration error) across risk deciles. All estimates were based on the pooled prediction from 10 multiply imputed datasets (N=8448×10) using Rubin’s rules.
Abbreviations: SPLC, second primary lung cancer; SPLC-RAT, second primary lung cancer-risk assessment tool.
When the study population was stratified by a 10-year risk threshold of ≥5.6%, the observed SPLC incidence was significantly elevated in the high-risk versus low-risk subgroup (13.1% vs. 1.1%, P<1×10−6) (Figure 2), with its difference larger than when applying the existing surveillance criteria (Table S4) by the NCCN (7.6% vs. 1.1%, P<1×10−6) or the ASCO (8.5% vs. 0.9%, P<1X10−6) criteria. The example profiles of the patients who developed SPLC in the cohort are shown in Table S5, who were selected as “high-risk” (i.e., eligible for surveillance) using the criteria under SPLC-RAT-5.6% but not eligible under the NCCN criteria; the SPLC patients eligible by SPLC-RAT-5.6% but not by the NCCN tend to be heavy smokers who had a high smoking intensity (Cigs/day: range 25–60) at initial diagnosis, were diagnosed with early-stage IPLC, and underwent surgery for IPLC (Table S5). Similarly, we compared the distributions of the patient characteristics in the high-risk versus low-risk groups identified by SPLC-RAT-5.6% (Figures S7–9). We also evaluated the distribution of 10-year SPLC risk scores in various patient subgroups, capturing lung cancer patients at a higher risk of SPLC for surveillance (Figure S10).
Figure 2. Comparison of the risk stratification ability using SPLC-RAT-5.6% versus alternative clinical guidelines for SPLC surveillance by the National Comprehensive Cancer Network (NCCN) and the American Society of Clinical Oncology (ASCO).
The 8,448 ever-smoking lung cancer patients in the Mayo Clinic cohort was stratified by the high-risk vs. low-risk groups using the 10-year SPLC risk threshold of ≥5.6% based on SPLC-RAT in (A), stratified by the eligibility criteria by the NCCNa follow-up guidelines for SPLC in (B), and stratified by the eligibility criteria by the ASCOb follow-up guidelines for SPLC in (C). In each subgroup, observed cumulative incidence of SPLC was calculated using the Aalen-Johansen estimator and tested by Grey’s method to account for competing risk of death. The 95% confidence interval for the observed cumulative incidence of SPLC is presented as shaded areas around the curve.
Note: aEligible lung cancer patients in NCCN criteria: Patients with stage I-II NSCLC with surgery or radiotherapy, all stage III NSCLC, stage IV oligometastatic NSCLC, or all SCLC (Table S4).
bEligible lung cancer patients in ASCO criteria: Patients with stage I-III NSCLC or stage I-III SCLC (Table S4).
Abbreviations: NCCN, National Comprehensive Cancer Network; SPLC, second primary lung cancer; SPLC-RAT, second primary lung cancer-risk assessment tool.
To evaluate the potential of improving SPLC detection by using SPLC-RAT-5.6%, we compared various surveillance efficiency metrics (i.e., sensitivity, specificity, and NNS) under the three eligibility criteria for SPLC surveillance proposed by SPLC-RAT-5.6%, the NCCN, and the ASCO (Table S4) in the Mayo Clinic cohort. To this end, we considered a hypothetical surveillance program for SPLC (See Methods), which is implemented over five years following IPLC diagnosis, assuming that an SPLC that occurred within six months of each follow-up time was detectable by CT (Tables S6–8). According to the results for cumulative estimations of surveillance efficiency measures summing across all follow-ups for five years (Table 2), the SPLC-RAT-5.6% (86.4%) had the highest sensitivity (i.e., true positive rate), followed by the NCCN (79.4%) and the ASCO (63.7%). Similarly, SPLC-RAT-5.6% demonstrated the highest specificity of 39.0% (i.e., a true negative rate), followed by the NCCN (30.4%) and the ASCO (18.6%) (Table 2). The NNS to detect one SPLC was lowest using SPLC-RAT-5.6% (n=162), with a 20–25% reduction compared to those under the NCCN (n=202) and the ASCO (n=217), thus demonstrating the potential of increased efficiency in detecting SPLCs (Table 2) in the Mayo Clinic cohort.
Table 2.
Comparison of the surveillance efficiency of risk-based strategy using SPLC-RAT-5.6% versus existing clinical criteria by the NCCN and the ASCO in the Mayo Clinic cohort
Surveillance Efficiency Metricsb | Surveillance criteriaa | ||
---|---|---|---|
SPLC-RAT-5.6% | NCCN | ASCO | |
Total number of CT follow-ups during five years after IPLC diagnosis | 19169 | 21838 | 18629 |
(A) Sum of N. surveillance-eligible patients who developed SPLC within the 6-month window from each follow-up during five years | 118 | 107 | 86 |
(B) Sum of N. surveillance-eligible patients without SPLC within the 6-month window from each follow-up during five years | 19051 | 21731 | 25410 |
(C) Sum of N. surveillance-ineligible patients who developed SPLC within the 6-month window from each follow-up during five years | 18 | 28 | 49 |
(D) Sum of N. surveillance-ineligible patients without SPLC within the 6-month window from each follow-up during five years | 12165 | 9485 | 5806 |
Sensitivityc = A/(A+C) x 100, % | 86.41 | 79.41 | 63.70 |
Specificityd = D/(B+D) x 100, % | 38.97 | 30.39 | 18.60 |
NNSe to detect one SPLC = (A+B)/A, N | 162 | 202 | 217 |
Abbreviation: ASCO, American Society of Clinical Oncology; IPLC, initial primary lung cancer; NCCN, National Comprehensive Cancer Network; NNS, number needed to surveil to detect one SPLC; SPLC, second primary lung cancer.
In SPLC-RAT-5.6%, IPLC patients with an estimated 10-year SPLC risk by SPLC-RAT larger than or equal to 5.6% are eligible to CT surveillance. The NCCN and ASCO eligibility criteria for surveillance are based on IPLC characteristics (stage and histology) and treatment (Table S4).
Each indicator was calculated based on the total sum of numbers across the 10 CT follow-ups from 6 to 60 months. Surveillance efficiency indicators at each CT follow-up is shown in Table S6–8.
Sensitivity is the probability of successfully labeling an IPLC patient with SPLC as “high-risk” or “eligible” (i.e., a true positive rate).
Specificity is the probability of successfully labeling an IPLC patient without SPLC as “low-risk” or “non-eligible” (i.e., a true negative rate).
NNS to detect one SPLC is calculate as the total number of surveillance follow-ups divided by surveillance-detected SPLC cases.
A sensitivity analysis that restricted the study population to non-small-cell lung cancer and small-cell lung cancer patients (N=8,020) showed consistent results, with higher sensitivity and specificity through SPLC-RAT-5.6% (Table S9). A sensitivity analysis under the modified SPLC-RAT-5.6% criteria (i.e., SPLC-RAT-5.6%/9.5%) showed that the overall performance of the modified criteria was similar to that under the original SPLC-RAT-5.6% (Table S10); specificity improved from 38.9% to 45.3% and the NNS reduced from n=162 to n=158, although sensitivity decreased slightly from 86.4% to 80.7% due to the decreased surveillance interval (see Supplementary Methods). A sensitivity analysis that assumed that stage-IV oligometastatic NSCLC patients were ineligible to the NCCN criteria showed overall consistent findings to those under the original NCCN guidelines, with slightly improved surveillance efficiency (Table S11).
Discussion
In this study, we validated the predictive performance of SPLC-RAT in a prospective hospital-based cohort. The application of SPLC-RAT—which incorporates various patient-level risk factors for SPLC, such as smoking history, treatment, and IPLC tumor characteristics—showed high discrimination (AUC of 0.81) as well as good calibration and risk stratification ability. We also showed that risk-based surveillance for SPLC through SPLC-RAT has a great potential to improve the efficiency in detecting SPLC cases versus existing guidelines. In particular, the risk-based surveillance criteria for SPLC using the 10-year risk threshold of 5.6% through SPLC-RAT showed increased overall sensitivity and specificity compared to the NCCN and the ASCO guidelines.
Real-world data is reflective of actual clinical settings and diverse patient populations. Validation using data in electronic health records can help improve the model’s generalizability and applicability. Although the data in this study reflect actual clinical settings seen in practice, it is noted that the patient cohort from Mayo Clinic showed demographic patterns with mostly white (>80%) and affluent socioeconomic status. However, SPLC-RAT that was developed in an ethnically diverse population-based cohort from the MEC7 and previously validated in two population-based randomized trial data7 still showed clinically competent levels of predictive accuracy (AUC, 0.81) in the Mayo Clinic data.
To the best of our knowledge, this study is among the first to validate a prediction model for SPLC using large, hospital-based cohort data. We also evaluated the potential of improving SPLC detection through risk-based surveillance strategies using this large cohort. The risk-based surveillance criteria through SPLC-RAT-5.6% considered comprehensive risk factors for SPLC, such as smoking and medical history, and IPLC treatment. Thus, the risk-based criteria identified SPLC cases that the NCCN or ASCO guidelines might have overlooked, particularly in heavy smokers, those who underwent surgery, and those with early-stage IPLC who had a high chance of longer survival and a high risk of SPLC7.
Our study has limitations. While the ascertainment of SPLC was manually verified based on medical health records from Mayo Clinic and outside medical facilities, we applied several criteria in an attempt to accurately identify SPLC versus recurrence based on comprehensive histologic assessments15 and evaluation of pathologic reports.17 We further discovered that the characteristics of lung cancer patients who developed SPLC vs. recurrence are different by directly comparing the current study population (N=470 stage I-III IPLC patients with SPLC in Mayo Clinic) to a previous study population (N=445 stage I-III IPLC patients with recurrence) in the Mayo Clinic.25 For example, patients with SPLC had a much higher proportion of stage I-II (99% vs. 63%) and adenocarcinoma IPLCs (60.4% vs. 51.0%) than patients with recurrent cases. Thus, potential misclassification is low, but we cannot exclude the possibility. Therefore, we further employed a stricter definition of SPLC,25 by defining SPLC as a new, distinct pulmonary malignancy diagnosed 4 years after the primary tumor (IPLC), and our main finding remained consistent with the primary analysis that showed higher surveillance efficiency using SPLC-RAT vs. the NCCN or ASCO guidelines (data not shown). Current SPLC-RAT is designed to predict the SPLC risk based on the data collected before or at the time of IPLC diagnosis and does not incorporate any changes in patient data (e.g., smoking cessation) after diagnosis to capture the changing SPLC risk over time. Therefore, we limited the time window for evaluating hypothetical risk-based surveillance to the first 5 years from IPLC diagnosis. Dynamic prediction modeling incorporating longitudinal changes in patients’ data can help ensure the long-term performance of risk-based surveillance, which is currently underway. Future studies would be needed to determine the optimal risk thresholds for SPLC surveillance under various SPLC surveillance scenarios at the population level.26,27 Finally, while genetic data was not available in this study, adding it into SPLC risk prediction would be of interest.
To conclude, we evaluated the predictive performance of SPLC-RAT for identifying high-risk survivors of SPLC in a large, hospital-based cohort. The present study shows that a comprehensive risk-prediction modeling approach for SPLC holds great potential to improve efficiency in managing lung cancer survivors.
Supplementary Material
Funding:
This study is supported by grant from the National Institutes of Health (1R37CA226081). This work was supported by the Stanford Cancer Institute, an NCI-designated Comprehensive Cancer Center. The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.
Footnotes
Data Sharing and Data Availability: Data generated by the authors or analyzed during the study were provided by the Mayo Clinic Epidemiology and Genetics of Lung Cancer (EGLC) Research Program under data use agreement. Researchers interested in the Mayo Clinic EGLC data may submit an inquiry online: https://www.mayo.edu/research/centers-programs/epidemiology-genetics-lung-cancer-program/contact
Declaration of interests
Dr. Han and her team developed the SPLC risk assessment tool (SPLC-RAT). The model is available free to noncommercial users. Dr. Tammemagi is a consultant for Johnson & Johnson/Janssen, Medial EarlySign, Nucleix, bioAffinity Technologies, and AstraZeneca. Dr. Backhus reports personal fees from Johnson & Johnson, outside the submitted work. Ms. Su reports participation in summer internship at ZS Associates, Inc. in 2021. Dr. Neal reports grants from Genetech/Roche, grants from Merck, grants from Boehringer Ingelheim, grants from Exelixis, grants from Nektar Therapeutics, grants from Takeda Pharmaceuticals, grants from Adaptimmune, grants from GSK, grants from Janssen, grants from AbbVie, other from CME Matters, other from Clinical Care Options, other from Research to Practice, other from Medscape, other from Biomedical Learning Institute, other from MLI Peerview, other from Prime Oncology, other from Projects in Knowledge, other from Rockpointe, other from MJH Life Science, other from AstraZeneca, other from Genetech/Roche, other from Exelixis, other from Jounce Therapeutics, other from Takeda Pharmaceuticals, other from Eli Lilly and Company, other from Calithera BIosciences, other from Amgen, other from Iovance Biotherapeutics, other from Blueprint Pharmaceuticals, other from Regeneron Pharmaceuticals, other from Natera, outside the submitted work. Dr. Wakelee reports grants from ACEA Biosciences, grants from Arrys Therapeutics, grants from BMS, grants from Celgene, grants from Clovis Oncology, grants from Exelixis, grants from Genetech/Roche, grants from Gilead, grants from Merck, grants from Novartis, grants from Pharmacyclics, grants from Sea Gen, grants from Xcovery, other from AstraZeneca, other from Xcovery, other from Janssen, other from Daiichi Sankyo, other from Blueprint, other from Mirati, other from Helsinn, other from Merch - Not compensated, other from Genetech/Roche - Not compensated, other from IASLC, other from ECOG-ACRIN, outside the submitted work. All remaining authors report no other disclosures.
References
- 1.Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71(3):209–249. [DOI] [PubMed] [Google Scholar]
- 2.Miller KD, Nogueira L, Mariotto AB, et al. Cancer treatment and survivorship statistics, 2019. CA Cancer J Clin 2019;69(5):363–385. [DOI] [PubMed] [Google Scholar]
- 3.Thakur MK, Ruterbusch JJ, Schwartz AG, Gadgeel SM, Beebe-Dimmer JL, Wozniak AJ. Risk of Second Lung Cancer in Patients with Previously Treated Lung Cancer: Analysis of Surveillance, Epidemiology, and End Results (SEER) Data. J Thorac Oncol 2018;13(1):46–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Surapaneni R, Singh P, Rajagopalan K, Hageboutros A. Stage I lung cancer survivorship: risk of second malignancies and need for individualized care plan. J Thorac Oncol 2012;7(8):1252–1256. [DOI] [PubMed] [Google Scholar]
- 5.Johnson BE. Second lung cancers in patients after treatment for an initial lung cancer. J Natl Cancer Inst 1998;90(18):1335–1345. [DOI] [PubMed] [Google Scholar]
- 6.Choi E, Luo SJ, Aredo JV, et al. The Survival Impact of Second Primary Lung Cancer in Patients with Lung Cancer. J Natl Cancer Inst 2021. [DOI] [PMC free article] [PubMed]
- 7.Choi E, Sanyal N, Ding VY, et al. Development and Validation of a Risk Prediction Tool for Second Primary Lung Cancer. J Natl Cancer Inst 2021. [DOI] [PMC free article] [PubMed]
- 8.Aredo JV, Luo SJ, Gardner RM, et al. Tobacco Smoking and Risk of Second Primary Lung Cancer. J Thorac Oncol 2021;16(6):968–979. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Luo SJ, Choi E, Aredo JV, et al. Smoking Cessation After Lung Cancer Diagnosis and the Risk of Second Primary Lung Cancer: The Multiethnic Cohort Study. JNCI Cancer Spectrum 2021. [DOI] [PMC free article] [PubMed]
- 10.Han SS, Rivera GA, Tammemägi MC, et al. Risk Stratification for Second Primary Lung Cancer. J Clin Oncol 2017;35(25):2893–2899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Fisher A, Kim S, Farhat D, et al. Risk Factors Associated with a Second Primary Lung Cancer in Patients with an Initial Primary Lung Cancer. Clin Lung Cancer 2021. [DOI] [PMC free article] [PubMed]
- 12.Leroy T, Monnet E, Guerzider S, et al. Let us not underestimate the long-term risk of SPLC after surgical resection of NSCLC. Lung Cancer 2019;137:23–30. [DOI] [PubMed] [Google Scholar]
- 13.National Comprehensive Cancer Network: Treatment by Cancer Type, 2022. https://www.nccn.org/guidelines/category_1. Accessed.
- 14.Schneider BJ, Ismaila N, Aerts J, et al. Lung Cancer Surveillance After Definitive Curative-Intent Therapy: ASCO Guideline. J Clin Oncol 2020;38(7):753–766. [DOI] [PubMed] [Google Scholar]
- 15.Detterbeck FC, Franklin WA, Nicholson AG, et al. The IASLC Lung Cancer Staging Project: Background Data and Proposed Criteria to Distinguish Separate Primary Lung Cancers from Metastatic Foci in Patients with Two Lung Tumors in the Forthcoming Eighth Edition of the TNM Classification for Lung Cancer. J Thorac Oncol 2016;11(5):651–665. [DOI] [PubMed] [Google Scholar]
- 16.Martini N, Melamed MR. Multiple primary lung cancers. J Thorac Cardiovasc Surg 1975;70(4):606–612. [PubMed] [Google Scholar]
- 17.Wittekind C, Meyer H. TNM classification of malignant tumours, 7th. In: New York: Wiley-Liss; 2010. [Google Scholar]
- 18.Prentice RL, Kalbfleisch JD, Peterson AV Jr., Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics 1978;34(4):541–554. [PubMed] [Google Scholar]
- 19.Blanche P, Dartigues JF, Jacqmin-Gadda H. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Stat Med 2013;32(30):5381–5397. [DOI] [PubMed] [Google Scholar]
- 20.Gerds TA, Andersen PK, Kattan MW. Calibration plots for risk prediction models in the presence of competing risks. Stat Med 2014;33(18):3191–3203. [DOI] [PubMed] [Google Scholar]
- 21.Steyerberg EW. Clinical prediction models Springer; 2019. [Google Scholar]
- 22.Aalen OO, Johansen S. An Empirical Transition Matrix for Non-Homogeneous Markov Chains Based on Censored Observations. Scandinavian Journal of Statistics 1978;5(3):141–150. [Google Scholar]
- 23.Allignol A, Schumacher M, Beyersmann J. A note on variance estimation of the Aalen-Johansen estimator of the cumulative incidence function in competing risks, with a view towards left-truncated data. Biom J 2010;52(1):126–137. [DOI] [PubMed] [Google Scholar]
- 24.Gray RJ. A Class of $K$-Sample Tests for Comparing the Cumulative Incidence of a Competing Risk. The Annals of Statistics 1988;16(3):1141–1154, 1114. [Google Scholar]
- 25.Williams BA, Sugimura H, Endo C, et al. Predicting postrecurrence survival among completely resected nonsmall-cell lung cancer patients. Ann Thorac Surg 2006;81(3):1021–1027. [DOI] [PubMed] [Google Scholar]
- 26.de Koning HJ, Meza R, Plevritis SK, et al. Benefits and harms of computed tomography lung cancer screening strategies: a comparative modeling study for the U.S. Preventive Services Task Force. Ann Intern Med 2014;160(5):311–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Meza R, Jeon J, Toumazis I, et al. Evaluation of the Benefits and Harms of Lung Cancer Screening With Low-Dose Computed Tomography: Modeling Study for the US Preventive Services Task Force. Jama 2021;325(10):988–997. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.