Skip to main content
JNCI Journal of the National Cancer Institute logoLink to JNCI Journal of the National Cancer Institute
. 2022 Sep 9;114(12):1665–1673. doi: 10.1093/jnci/djac176

Lung Cancer Absolute Risk Models for Mortality in an Asian Population using the China Kadoorie Biobank

Matthew T Warkentin 1,2, Martin C Tammemägi 3, Osvaldo Espin-Garcia 4,5, Sanjeev Budhathoki 6, Geoffrey Liu 7,8, Rayjean J Hung 9,10,
PMCID: PMC9949588  PMID: 36083018

Abstract

Background

Lung cancer is the leading cause of cancer mortality globally. Early detection through risk-based screening can markedly improve prognosis. However, most risk models were developed in North American cohorts of smokers, whereas less is known about risk profiles for never-smokers, which represent a growing proportion of lung cancers, particularly in Asian populations.

Methods

Based on the China Kadoorie Biobank, a population-based prospective cohort of 512 639 adults with up to 12 years of follow-up, we built Asian Lung Cancer Absolute Risk Models (ALARM) for lung cancer mortality using flexible parametric survival models, separately for never and ever-smokers, accounting for competing risks of mortality. Model performance was evaluated in a 25% hold-out test set using the time-dependent area under the curve and by comparing model-predicted and observed risks for calibration.

Results

Predictors assessed in the never-smoker lung cancer mortality model were demographics, body mass index, lung function, history of emphysema or bronchitis, personal or family history of cancer, passive smoking, and indoor air pollution. The ever-smoker model additionally assessed smoking history. The 5-year areas under the curve in the test set were 0.77 (95% confidence interval = 0.73 to 0.80) and 0.81 (95% confidence interval = 0.79 to 0.84) for ALARM-never-smokers and ALARM-ever smokers, respectively. The maximum 5-year risk for never and ever-smokers was 2.6% and 12.7%, respectively.

Conclusions

This study is among the first to develop risk models specifically for Asian populations separately for never and ever-smokers. Our models accurately identify Asians at high risk of lung cancer death and may identify those with risks exceeding common eligibility thresholds who may benefit from screening.


Lung cancer is the leading cause of cancer mortality globally. In 2020, there were an estimated 2.2 million incident lung cancers and 1.8 million deaths due to lung cancer (1). The 5-year survival proportion for lung cancer patients remains poor at only 10%-20%, though this varies between countries (1). However, several large, randomized trials in populations of predominantly European ancestry have demonstrated a statistically significant reduction in lung cancer mortality with low-dose computed tomography screening (2-5). Identifying high-risk individuals who could benefit from lung cancer screening remains an important public health priority for reducing lung cancer mortality.

The United States Preventive Services Task Force (USPSTF) recommendations were recently updated and suggest screening adults aged 50 to 80 years with 20 or more smoking pack-years and who have quit smoking within 15 years (6). Using these criteria would fail to screen any light, long-term former, or never-smokers, which represent a growing proportion of all lung cancer diagnoses, particularly in Asian populations. The proportion of lung cancers among never-smokers varies geographically, with approximately 15% of lung cancers occurring among never-smokers in North America and as much as 30%-40% in Asian countries (7).

Absolute risk models have been shown to be superior in identifying high-risk individuals for lung cancer screening compared with the USPSTF criteria; however, these models were primarily developed in North American cohorts of ever-smokers. The widely used PLCOm2012 risk model (8) was developed in a smoker cohort in the United States, which may not be generalizable to Asian populations due to potential differences in risk factors and baseline risks. Less is known about the risk profiles for never-smokers in Asian populations, where never-smoker lung cancer is more common than for other racial groups. Although approximately 54% of worldwide lung cancers occurred in Asian countries (1), currently there is no validated risk prediction model specifically for Asian populations.

The goal of this study was to develop and evaluate a lung cancer mortality risk prediction model, specifically for Asian populations, with separate models developed for ever- and never-smokers.

Methods

Study Participants

This study used data from the China Kadoorie Biobank (CKB), which was previously described in detail (9). In brief, the CKB is a population-based prospective cohort of 512 639 adults aged 30 to 79 years recruited between 2004 and 2008 from 10 geographical regions in China, with 5.1 million person-years of follow-up and detailed collections of epidemiologic data. We excluded any participants with a self-reported personal history of lung cancer within 5 years of baseline. Mortality status was collected through record linkage to mortality registries with up to 12 years of follow-up. Lung cancer death was the primary outcome in this study. Death from all other causes (hereafter referred to as “other cause mortality”) was considered as a competing risk of mortality. Central ethics approval for the CKB was obtained from Oxford University and the China National Centres for Disease Control, and local ethics approval was obtained at each recruitment site. All participants provided written informed consent. This project was approved by the Research Ethics Board at Mount Sinai Hospital.

Statistical Analysis

Absolute Risk Models

In the competing risks setting, the probability of death from lung cancer occurring within a defined time period is estimated as a function of all relevant cause-specific hazards (10,11). To estimate the probability of lung cancer mortality in the presence of the competing risk of other-cause mortality, we separately modeled the cause-specific hazards for lung cancer death and death from all other causes. We estimated the 5-year probability of lung cancer mortality using the cumulative incidence function, which accounts for both hazard functions, conditional on a set of risk factors (details in Supplementary Methods, available online).

We modeled each cause-specific hazard using a flexible parametric survival model (ie, Royston-Parmar model) (12) on the cumulative hazard scale to estimate baseline hazards and cause-specific hazard ratios for predictor effects. Lung cancer mortality models were fit separately for never- and ever-smokers using complete-case analysis. We fit models on the time-since-entry timescale using restricted cubic splines with internal knots placed at quantiles of the log uncensored cause-specific event-time distributions.

Asian Lung Cancer Absolute Risk Models (ALARM) were built separately for never- (ALARM-NS) and ever-smokers (ALARM-ES). We used stratified random sampling to split the data into training and testing sets, maintaining similar proportions of the outcome, separately for never- and ever-smokers. Models were fitted in the training (75%) data, and the hold-out testing (25%) data were used to estimate out-of-sample performance for internal model validation.

For the lung cancer mortality models, candidate predictors were selected based on their known associations with lung cancer or by improving model performance. ALARM-NS included age, sex, body mass index (BMI), family history of cancer, personal cancer history, lung function (forced expiratory volume in 1 second [FEV1]/forced vital capacity [FVC]), personal history of emphysema or bronchitis, and household income. ALARM-ES included these variables with the addition of smoking status, smoking duration, and smoking intensity. Cooking fuel exposure was computed as a composite of cooking fuel type and duration of usage in years. Numeric variables were evaluated for potential nonlinear relationships and time-dependent effects. Age was included in the final models as the natural logarithm, and BMI was included as a quadratic term to capture the potential nonlinear relationship with lung cancer mortality (13,14). The other-cause-mortality model included age, sex, and smoking status (never, former, current) as covariates. We report cause-specific hazard ratios (HR) and 95% confidence intervals (CI). All statistical analyses were performed using R version 4.0.5 (15).

Model Performance

The performance of the prediction models was assessed in 2 complementary ways: 1) how closely model-predicted risks corresponded to observed risks (calibration), and 2) the ability of the model to assign higher predicted risks to those who died of lung cancer or died at an earlier time (discrimination). Model calibration was assessed by comparing the observed 5-year risks with model-predicted (expected) 5-year risks. Graphical assessment was performed by comparing risks within binned expected risk groups. Observed risks were estimated based on the nonparametric Aalen-Johansen estimator for the cumulative incidence function (16). Model-predicted risks were estimated based on the absolute risk models described above. The ratio of expected to observed deaths and difference in expected and observed deaths per 100 000 are reported for the never- and ever-smoker models. Discrimination was measured by the area under the time-dependent receiver operating characteristic curve (AUC) for competing risks (17,18). We constructed 95% confidence intervals for the AUC and calibration metrics using a percentile-based approach based on 1000 bootstrap resamples. Calibration and discrimination metrics are reported for the hold-out test data only.

Results

Demographic characteristics for the CKB cohort by vital status and by smoking status are presented in Table 1 and Supplementary Table 1 (available online), respectively. In general, those who died from lung cancer were older at baseline (62.2 years vs 52.0 years), more likely to be male (64% vs 41%), and more likely to be current or former smokers with an extensive smoking history compared with those still alive.

Table 1.

Distribution of demographic characteristics for the China Kadoorie Biobank based on mortality status

Alive (n = 480 115) Lung cancer mortality (n = 2754) All-cause mortalitya(n = 29 770) Total (n = 512 639)
Age, y
 Mean (SD) 51.3 (10.4) 62.2 (9.1) 62.4 (9.8) 52.0 (10.7)
Length of follow-up, y
 Mean (SD) 10.3 (1.1) 4.9 (2.3) 4.8 (2.4) 9.9 (1.8)
Sex, No. (%)
 Female 288 874 (60) 991 (36) 12 632 (42) 302 497 (59)
 Male 191 241 (40) 1763 (64) 17 138 (58) 210 142 (41)
Family history of cancer, No. (%)
 No 397 851 (83) 2253 (82) 25 228 (85) 425 332 (83)
 Yes 82 264 (17) 501 (18) 4542 (15) 87 307 (17)
Personal cancer history, No. (%)
 No 478 187 (>99) 2724 (99) 29 225 (98) 510 136 (>99)
 Yes 1928 (<1) 30 (1) 545 (2) 2503 (<1)
Emphysema or bronchitis, No. (%)
 No 469 102 (98) 2553 (93) 27 696 (93) 499 351 (97)
 Yes 11 013 (2) 201 (7) 2074 (7) 13 288 (3)
Smoking status, No. (%)
 Never 330 283 (69) 1060 (38) 15 274 (51) 346 617 (68)
 Former 32 901 (7) 415 (15) 4652 (16) 37 968 (7)
 Current 116 931 (24) 1279 (46) 9844 (33) 128 054 (25)
Smoking intensity, cigs/db
 Mean (SD) 17.9 (10.8) 19.0 (11.2) 16.7 (11.1) 17.8 (10.8)
Smoking duration, yb
 Mean (SD) 22.7 (7.3) 22.2 (7.4) 23.3 (8.8) 22.7 (7.5)
Time since quitting, yc
 Mean (SD) 8.7 (8.5) 8.2 (8.5) 8.5 (8.9) 8.7 (8.5)
FEV1/FVC, %
 Mean (SD) 84.8 (8.2) 80.6 (10.5) 80.4 (11.4) 84.5 (8.5)
Body mass index, kg/m2
 Mean (SD) 23.7 (3.3) 22.8 (3.6) 23.0 (3.8) 23.7 (3.4)
Household incomed, No. (%)
 <2500 yuan 30 566 (7) 236 (9) 3822 (14) 34 624 (7)
 2500–4999 yuan 87 540 (19) 502 (19) 6519 (24) 94 561 (19)
 5000–9999 yuan 139 550 (30) 829 (3) 8553 (32) 148 932 (30)
 10 000–19 999 yuan 121 086 (26) 647 (25) 4948 (18) 126 681 (25)
 20 000–34 999 yuan 88 968 (19) 404 (15) 2932 (11) 92 304 (19)
 ≥35 000 yuan 139 550 (30) 829 (32) 8553 (32) 148 932 (30)
Cooking fuel exposuree, No. (%)
 None 191 331 (40) 1242 (45) 11 803 (40) 204 376 (40)
 Low 76 064 (16) 443 (16) 4321 (15) 80 828 (16)
 Medium 143 175 (30) 602 (22) 6474 (22) 150 251 (29)
 High 69 545 (14) 467 (17) 7172 (24) 77 184 (15)
a

All-cause mortality included all causes of death except death attributable to lung cancer. FEV1 = forced-expiratory volume, 1-second; FVC = forced vital capacity.

b

Average smoking intensity and age at smoking initiation includes current and former smokers; never smokers are excluded.

c

Years since smoking cessation only includes former smokers; current and never smokes are excluded.

d

Approximate equivalents in USD ($), rounded to the nearest dollar: <384, 384–767, 767–1537, 1537–3075, 3075–5381, and ≥5381.

e

Cooking fuel exposure groups were formed based on 25th and 75th percentiles of the cumulative cooking exposure distribution. Cumulative cooking exposure was calculated based on the self-reported frequency of exposure to potentially harmful cooking fuels (ie, coal, wood, or other compared with gas or electricity).

In total, 3 separate flexible parametric cause-specific hazard models were fitted for 1) lung cancer mortality for never-smokers, 2) lung cancer mortality for ever-smokers, and 3) all-other-cause mortality. The smoking status–specific absolute risk models were formed based on models 1 and 3 (ALARM-NS) and models 2 and 3 (ALARM-ES). The cause-specific hazard ratios and 95% confidence intervals for the lung cancer mortality models are reported in Table 2 and for all-other-cause mortality in Supplementary Table 2 (available online).

Table 2.

Estimates from lung cancer mortality flexible parametric survival models fit separately for never-smokers (ALARM-NS) and ever-smokers (ALARM-ES)

ALARM-NS
ALARM-ES
HR (95% CI) Beta (SE) HR (95% CI) Beta (SE)
Age at entry, natural loga 4.7174 (0.2200) 4.2272 (0.2725)
Sex, female vs male 0.84 (0.70 to 1.00) −0.1744 (0.0902) 1.17 (0.96 to 1.44) 0.1586 (0.1040)
Family history of cancer, yes vs no 0.88 (0.73 to 1.07) −0.1245 (0.1000) 1.29 (1.12 to 1.48) 0.2541 (0.0713)
Personal history of cancer, yes vs no 1.30 (0.62 to 2.74) 0.2643 (0.3803) 1.95 (1.15 to 3.31) 0.6683 (0.2705)
FEV1/FVC, per 5% 0.94 (0.90 to 0.99) −0.0583 (0.0253) 0.98 (0.95 to 1.02) −0.0164 (0.0186)
Personal history of emphysema or bronchitis, yes vs no 1.03 (0.78 to 1.37) 0.0314 (0.1428) 1.31 (1.08 to 1.58) 0.2681 (0.0967)
Household income, per level 0.98 (0.93 to 1.03) −0.0174 (0.0263) 1.08 (1.03 to 1.12) 0.0729 (0.0210)
BMI, kg/m2b
 BMI −0.1976 (0.1088) −0.1890 (0.1536)
 BMI2 0.0036 (0.0020) 0.0032 (0.0030)
Smoking status, former vs current 0.89 (0.77 to 1.03) −0.1132 (0.0736)
Smoking duration, per 5 y 1.18 (1.14 to 1.22) 0.1665 (0.0178)
Smoking intensity, per 10 cigarettes/d 1.20 (1.15 to 1.26) 0.1833 (0.0233)
a

Age (in years) was modeled on the natural-log scale. We have excluded the hazard ratio because it is difficult to interpret without inverting the transformation. ALARM = Asian Lung Cancer Absolute Risk Model; BMI = body mass index; CI = confidence interval; ES = ever-smoker; FEV1 = forced-expiratory volume, 1-second; FVC = forced vital capacity; HR = hazard ratio; NS = never-smoker; SE = standard error.

b

BMI (in kg/m2) was modeled as a quadratic relationship (ie, BMI and BMI2). We have excluded the hazard ratios because it is difficult to interpret without considering both effects jointly.

Age increased the risk of lung cancer mortality for both never- and ever-smokers. Female sex was found to be protective for lung cancer mortality in never-smokers (HR = 0.84, 95% CI = 0.70 to 1.00) but a modest risk factor in ever-smokers (HR = 1.17, 95% CI = 0.96 to 1.44). Family history of any cancer was a risk factor for lung cancer death among ever-smokers (HR = 1.29, 95% CI = 1.12 to 1.48) but not for never-smokers. Personal cancer history increased the hazard of lung cancer mortality in never- (HR = 1.30, 95% CI = 0.62 to 2.74) and ever-smokers (HR = 1.95, 95% CI = 1.15 to 3.31). For every 5% increase in lung function performance (FEV1/FVC), the hazard of lung cancer mortality was reduced by 6% and 2% for never- and ever-smokers, respectively. In addition, a self-reported history of emphysema or bronchitis had a hazard ratio of 1.03 (95% CI = 0.78 to 1.37) and 1.31 (95% CI = 1.08 to 1.58) for never- and ever-smokers, respectively. Cooking fuel exposure was not found to improve model performance and was therefore not included in the final models.

ALARM-ES further included several smoking variables. Compared with current smokers, former smokers had a lower hazard of lung cancer morality (HR = 0.89, 95% CI = 0.77 to 1.03). For every 5 additional years of smoking, the hazard of lung cancer mortality increased by 18% (HR = 1.18, 95% CI = 1.14 to 1.22). For every additional 10 cigarettes smoked per day (approximately equivalent to half a pack), the hazard of lung cancer mortality increased by 20% (HR = 1.20, 95% CI = 1.15 to 1.26). Nonlinear relationships and 2-way interactions were explored for smoking variables but did not contribute to model improvement and were not included in the final models.

We estimated the 5-year absolute risk of lung cancer mortality (ie, cumulative incidence of death from lung cancer), conditional on a participant’s risk profile. The distribution of risks for never- and ever-smokers is presented in Supplementary Figure 1 (available online). The maximum 5-year predicted risk of lung cancer mortality was 12.7% for ever-smokers and 2.6% for never-smokers. According to our models, 8.1% of ever-smokers and less than 1% of never-smokers in the CKB would be eligible for screening assuming a 1.5% 5-year risk threshold.

Calibration plots comparing model-predicted and observed 5-year risks, separately for ALARM-NS and ALARM-ES, are presented in Supplementary Figure 2 (available online). Both models show very good calibration in the hold-out test data. Additional calibration metrics are summarized in Supplementary Table 3 (available online). The 5-year time-dependent AUC in the hold-out test set was 0.77 (95% CI = 0.73 to 0.80) and 0.81 (95% CI = 0.79 to 0.84) for ALARM-NS and ALARM-ES, respectively. The 5-year area under the time-dependent receiver operating characteristic curves for the 25% hold-out test data are presented in Figure 1. When assessing across several time horizons, we found that the time-dependent AUC remained largely consistent (Supplementary Table 4, available online).

Figure 1.

Figure 1.

Five-year receiver operating characteristic curves and area under the curves (AUC) for the 25% hold-out test data, separately for Asian Lung Cancer Absolute Risk Model (ALARM)-NS (never-smokers, lower curve) and ALARM-ES (ever-smokers, upper curve). CI = confidence interval.

Absolute risk trajectories for average and high-risk current and former smokers are presented in Figure 2 according to lung function performance and smoking intensity. Lung cancer mortality risks are higher for those with suboptimal lung function and higher smoking intensity. Absolute risk trajectories for never-smokers are presented in Figure 3 separately for men and women by lung function. Lung cancer mortality risk is higher for suboptimal lung function and for men. Contour plots for absolute risk according to smoking intensity, age at smoking initiation, and lung function for a theoretical average or high-risk current and former smokers are shown in Supplementary Figure 3 (available online). Contour plots for a theoretical average or high-risk never-smoker according to age, FEV1/FVC, and sex are shown in Figure 4, with risks as high as 2.25% to 2.50% for the highest risk profile.

Figure 2.

Figure 2.

Five-year absolute risk trajectories for lung cancer mortality based on Asian Lung Cancer Absolute Risk Model-ES for current and former smokers for varying smoking intensity (cigarettes per day) and lung function (FEV1/FVC) for average-risk and high-risk profiles. Smoking intensity varied from 5 cigarettes per day (bottom line) up to 40 cigarettes per day (top line) in 5 cigarette increments. An average-risk profile is defined as having the average covariate value for all predictors other than those varied, and a high-risk profile is defined as having the highest risk covariate value observed in the China Kadoorie Biobank (based on the direction of effect) for all predictors other than those varied.

Figure 3.

Figure 3.

Five-year absolute risk trajectories for lung cancer mortality based on Asian Lung Cancer Absolute Risk Model-NS for never-smoker men and women across levels of FEV1/FVC for average risk and high risk profiles. FEV1/FVC varied from 100% (bottom line) to 10% (top line) in 10% increments. An average-risk profile is defined as having the average covariate value for all predictors other than those varied, and a high-risk profile is defined as having the highest risk covariate value observed in the China Kadoorie Biobank (based on direction of effect) for all predictors other than those varied.

Figure 4.

Figure 4.

Five-year absolute risk of lung cancer mortality based on Asian Lung Cancer Absolute Risk Model-NS for an average risk or high risk never-smoker for combinations of age, lung function (FEV1/FVC), sex, and history of COPD (emphysema or bronchitis). An average risk profile is defined as having the average covariate value for all predictors other than those varied, and a high risk profile is defined as having the highest risk covariate value observed in the China Kadoorie Biobank (based on direction of effect) for all predictors other than those varied. The contour plots display risk bands starting with the lowest risk band in the bottom-left corner and bands increase in risk toward the top-right corner in 0.25% increments. COPD = Chronic obstructive pulmonary disease.

We compared our absolute risk models with the recently updated USPSTF criteria for lung cancer screening (see Table 3). Using the USPSTF criteria, 35.5% of CKB ever-smokers would be eligible for lung cancer screening. To compare our model with the USPSTF criteria, we applied the 5-year risk threshold that produced an equivalent specificity to the USPSTF criteria; our model (ALARM-ES) demonstrated an improved sensitivity (81.3% vs 68.6% based on USPSTF) while selecting an equivalent proportion of the population for screening. At a 5-year risk threshold that has an equivalent sensitivity to the USPSTF criteria, ALARM-ES would select 11.5% fewer ever-smokers for screening (24.0% vs 35.5% based on USPSTF).

Table 3.

Comparison of ALARM-ES and LCDRAT against the USPSTF 2021 criteria for lung cancer screeninga

Specificity, %b Sensitivity, %b % Eligible for screening
USPSTF-2021 64.7 68.6 35.5
ALARM-ES
 Matched on specificity Same as USPSTF 81.3 35.5
 Matched on sensitivity 76.2 Same as USPSTF 24.0
LCDRAT
 Matched on specificity Same as USPSTF 81.2 35.5
 Matched on sensitivity 76.3 Same as USPSTF 23.9
a

The USPSTF recommends annual screening for adults aged 50-80 years with 20 pack-year smoking history and smoking cessation less than 15 years previous for former smokers. We compared the ever-smoker models 2 ways: (1) using a risk threshold that matches the specificity of USPSTF criteria, and (2) using a risk threshold that matches the sensitivity of USPSTF criteria. ALARM = Asian Lung Cancer Absolute Risk Model; ES = ever-smoker; LCDRAT = Lung Cancer Death Risk Assessment Tool; USPSTF = United States Preventive Services Task Force.

b

Time-dependent sensitivity and specificity are based on the inverse-probability of censoring weighted (IPCW) estimates for a cumulative/dynamic definition of cases and controls in a competing-risks setting proposed by Blanche et al. (2013) (17).

We applied the established Lung Cancer Death Risk Assessment Tool (LCDRAT) (19) to the CKB ever-smokers to assess how ALARM-ES performed compared with a validated ever-smoker lung cancer mortality model (details described in the Supplementary Methods, available online). Calibration and discrimination statistics are presented in Supplementary Table 3 (available online). LCDRAT and LCDRAT-Constrained had similar discriminative performance (ie, AUC) as ALARM-ES; however, ALARM-ES had superior calibration. The expected to observed death ratios were 1.04 (95% CI = 0.92 to 1.20) in ALARM-ES and 1.57 (95% CI = 1.46 to 1.69) in LCDRAT, and the differences between expected and observes deaths (per 100 000) were 18.68 (95% CI = −45.68 to 89.53) and 288.02 (95% CI = 250.10 to 326.47), respectively. These observations remained the same when we assessed LCDRAT models using the same 25% hold-out test set (Supplementary Table 3, available online). This is not surprising because risk models developed in North American cohorts may not be well calibrated in Asian populations, which are reflected in the calibration statistics, despite good overall discrimination.

Discussion

We established risk prediction models based on a relatively simple set of predictors that can be ascertained during a routine physician visit, which can accurately discriminate high- and low-risk individuals for lung cancer mortality in an Asian population better than the US-based current recommended screening guidelines. These are some of the first prediction models for lung cancer mortality developed and evaluated specifically for an Asian population and separately for never- and ever-smokers. Our model has shown superior calibration compared with an established lung cancer mortality model (LCDRAT) while maintaining a comparable model accuracy. At a risk threshold that matches the specificity of the USPSTF criteria, ALARM-ES has better sensitivity while selecting an equivalent proportion of the population for lung cancer screening.

Despite the global reductions in smoking prevalence in most parts of the world (20), lung cancers among never-smokers are an increasingly important public health concern (21). The proportion of lung cancers in never-smokers has been increasing and is now one of the most common cancers (22). It is estimated that the epicenter of lung cancer in the next few decades would be within Asian countries, where a substantial proportion of lung cancers would occur in never-smokers. This highlights the importance of developing risk models specifically for this population. The International Association of Lung Cancer Study recently released a statement specifically focused on never-smokers in which it emphasizes the importance of risk-based screening in this population (21). Although widespread screening of never-smokers at the current stage would not be effective, the development and validation of risk predictions will play a critical role (21). The report encourages and recommends modeling studies focused on lung cancer risk estimation for never-smokers to eventually realize the benefit of risk-based screening in this group (21).

Furthermore, it is well known that lung cancers occurring in never-smokers represent a distinct form of the disease (22,23). Lung cancers occurring among never-smoker are more commonly adenocarcinomas, with higher proportions of EGFR mutations, which makes them highly targetable by therapeutics and leads to improved prognosis (24). This contributes, in part, to the improved survival observed among never-smokers (24). In the absence of primary smoking history as a risk factor, statistical models based on a constellation of risk factors will be required to identify high-risk never-smokers for screening. The ability to identify and screen high-risk never-smokers for lung cancer is an important public health concern.

In this study, we developed models to estimate the absolute risk of lung cancer mortality and evaluated them in the hold-out test data. Because lung cancer incidence data were not yet available from the CKB at the time of our analysis, lung cancer mortality was modeled as the endpoint. In general, incidence data are preferred to develop prediction models for lung cancer risk, when available. However, lung cancer is a highly fatal disease; therefore, mortality is a suitable proxy for incidence. In addition, individuals who are at a high risk of dying from lung cancer would, by definition, be suitable candidates to screen. Given that mortality is the primary endpoint of computed tomography screening programs, modeling lung cancer mortality as the outcome also has the clinical advantage of identifying individuals who are at high risk of lung cancer death and reduces the possibility of overdiagnosis.

Note that in the competing risks setting, the cause-specific hazard ratios only reflect how a variable affects the rate at which the cause-specific event occurs and might not directly coincide with how that variable affects the probability of that event occurring, because the absolute risk depends on both the cause-specific hazard for lung cancer mortality and the cause-specific hazard for all-other-cause mortality. Still, we may draw some conclusions about the cause-specific hazard models fitted in this study.

Lung cancers have previously been found to affect never-smoking women disproportionately more than men (24,25), though women are still observed to have a survival advantage (26), which is consistent with our findings. Due to limitations in the data, we were unable to determine a participant’s family history of lung cancer, so we used family history of any cancer, which is expected to be a relatively weak proxy. Exposure to cooking fuel has previously been identified as a risk factor for lung cancer (27,28), particularly among Asian women (29,30). However, in our study we did not observe an association between cooking fuel exposure and lung cancer mortality. We believe this may be due, in part, to imprecise ascertainment of this putative risk factor. Similar to passive tobacco exposure (ie, second-hand smoke), cooking fuel exposure is subject to recall bias, and thus it is difficult to accurately measure cumulative lifetime exposure.

We identified 4 studies that developed or adapted risk models for never-smokers, and 2 were based in Asian populations. The PLCOall2014 model was developed in both ever- and never-smoker populations and was adapted for use in never-smokers by removing smoking-related predictors (31). We applied the PLCOall2014 to the CKB, and our model achieved higher AUC for never-smokers (0.77 vs 0.76 for PLCOall2014) and ever-smokers (0.81 vs 0.75 based on PLCOall2014). Details of this comparison are described in the Supplementary Methods (available online). A second study was developed in a predominantly Caucasian cohort (UK Biobank) (32). A third study was developed in a Taiwanese cohort, and models were built separately for never-, light, and heavy smokers (33). This study achieved a training set AUC of 0.806 (95% CI = 0.790 to 0.819) among never-smokers but included several serological protein biomarkers unavailable in the CKB that are not routinely tested in health populations and therefore could not be validated. The authors presented limited evidence of model calibration across risk groups and by smoking status. A fourth model was developed for never-smoker females only in an Asian case-control study and achieved an AUC of 0.71 (95% CI = 0.66 to 0.77) (34). However, this study was not based on prospective data and would require the collection of genetic data (ie, genotyping), which may be prohibitive for widespread application. The model without the 9 genetic variants performed moderately worse (AUC = 0.69, 95% CI = 0.64 to 0.75).

Based on ALARM-NS, fewer than 1% of the never-smokers in the CKB reached a 5-year risk threshold of at least 1.5% to be eligible lung cancer screening, with the highest risk never-smoker achieving a risk of 2.6%. The marginal age-specific lung cancer and other-cause-mortality rates for men and women were lower in the CKB than those observed in the general Chinese population, based on Global Burden of Disease 2019 mortality estimates (see Supplementary Figure 4, available online). This may be due, in part, to a “healthy volunteer” effect when those who volunteer for a study are healthier and not fully representative of the general population. As such, we anticipate the actual risk distribution in the general Chinese population would be higher than what is observed in the CKB (Supplementary Figure 1, available online). A larger proportion of individuals would be at a higher risk for lung cancer mortality and may be eligible for screening based on applying our model to the general Chinese population.

In summary, we developed absolute risk models that accurately identify both ever- and never-smokers at high risk for lung cancer–related mortality, accounting for the competing risk of all-other-cause mortality. These models were developed exclusively using prospective data collected in a large Asian population, and models were fitted separately based on smoking history (ALARM-NS and ALARM-ES). Our models discriminated high and low risk for lung cancer death similarly well to an established lung cancer mortality model (ie, LCDRAT) but demonstrated much better absolute risk calibration for an Asian population. In the future, these models may be useful for identifying a subset of the Asian population at high risk for lung cancer mortality who may benefit from lung cancer screening. The next step will be to externally validate our models in an independent cohort.

Funding

This work was supported by Canadian Institutes of Health Research (FDN 167273) and the National Institutes of Health (U19 CA203654).

Notes

Role of the funder: The funding sources had no role in the conception, design, implementation, analysis, or interpretation of the study, or writing of the report, or the decision to submit the report for publication.

Disclosures: The authors report no potential conflicts of interests.

Author contributions: RJH: conceptualization, funding acquisition. MTW: data curation, formal analysis. RJH, OEG, and MCT: formal analysis. MTW and RJH: methodology, writing—original draft. All authors: investigation, writing—review and editing.

Supplementary Material

djac176_Supplementary_Data

Contributor Information

Matthew T Warkentin, Prosserman Center for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada; Department of Public Health Sciences, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.

Martin C Tammemägi, Department of Health Sciences, Brock University, St. Catharines, ON, Canada.

Osvaldo Espin-Garcia, Department of Public Health Sciences, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada; Department of Biostatistics, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada.

Sanjeev Budhathoki, Prosserman Center for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada.

Geoffrey Liu, Department of Public Health Sciences, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada; Department of Medical Oncology and Hematology, Princess Margaret Cancer Centre, Toronto, ON, Canada.

Rayjean J Hung, Prosserman Center for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada; Department of Public Health Sciences, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.

Data Availability

All of the data used in the completion of this study are publicly available, by request, from the China Kadoorie Biobank Data Access Committee (https://www.ckbiobank.org/site/Data+Access).

The absolute risk models presented in this article are made available as a user-friendly, free and open-source tool, at the following link: https://github.com/mattwarkentin/ALARM.

References

  • 1. Sung H, Ferlay J, Siegel RL, et al.  Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.CA Cancer J Clin.  2021;71(3):209-249. [DOI] [PubMed] [Google Scholar]
  • 2. Koning HD, Aalst CVD, Jong PD, et al.  Reduced lung-cancer mortality with volume CT screening in a randomized trial. New Engl J Med. 2020;382(6):503-513. [DOI] [PubMed] [Google Scholar]
  • 3. National Lung Screening Trial Research Team. Reduced lung-cancer mortality with low-dose computed tomographic screening. New Engl J Med. 2011;365(5):395-409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. National Lung Screening Trial Research Team. Lung cancer incidence and mortality with extended follow-up in the national lung screening trial. J Thorac Oncol. 2019;14(10):1732-1742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Pastorino U, Silva M, Sestini S, et al.  Prolonged lung cancer screening reduced 10-year mortality in the MILD trial: new confirmation of lung cancer screening efficacy. Ann Oncol. 2019;30(7):1162-1169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Moyer VA; U.S. Preventive Services Task Force. Screening for lung cancer: US Preventive Services Task Force recommendation statement. Ann Intern Med. 2014;160(5):330-338. [DOI] [PubMed] [Google Scholar]
  • 7. Toh CK, Lim WT.  Lung cancer in never-smokers. J Clin Pathol. 2007;60(4):337-340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Tammemägi MC, Katki HA, Hocking WG, et al.  Selection criteria for lung-cancer screening. N Engl J Med. 2013;368(8):728-736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Chen Z, Chen J, Collins R, et al. ; China Kadoorie Biobank (CKB) Collaborative Group. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int J Epidemiol. 2011;40(6):1652-1666. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Andersen PK, Geskus RB, T de W, Putter H.  Competing risks in epidemiology: possibilities and pitfalls. Int J Epidemiol. 2012;41(3):861-870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Hinchliffe SR, Lambert PC.  Flexible parametric modelling of cause-specific hazards to estimate cumulative incidence functions. BMC Med Res Methodol. 2013;13(1):13-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Royston P, Parmar MK.  Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002;21(15):2175-2197. [DOI] [PubMed] [Google Scholar]
  • 13. Aune D, Sen A, Prasad M, et al.  BMI and all cause mortality: systematic review and non-linear dose-response meta-analysis of 230 cohort studies with 3.74 million deaths among 30.3 million participants. BMJ. 2016;353:i2156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Shepshelovich D, Xu W, Lu L, et al.  Body mass index (BMI), BMI change, and overall survival in patients with SCLC and NSCLC: a pooled analysis of the international lung cancer consortium. J Thorac Oncol. 2019;14(9):1594-1607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; 2021. https://www.R-project.org/.
  • 16. Gerds TA, Ozenne B. riskRegression: risk regression models and prediction scores for survival analysis with competing risks.2020. https://CRAN.R-project.org/package=riskRegression.
  • 17. Blanche P, Dartigues JF, Jacqmin-Gadda H.  Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Stat Med. 2013;32(30):5381-5397. http://onlinelibrary.wiley.com/doi/10.1002/sim.5958/full. [DOI] [PubMed] [Google Scholar]
  • 18. Katki HA, Kovalchik SA, Berg CD, Cheung LC, Chaturvedi AK.  Development and validation of risk models to select ever-smokers for CT lung cancer screening. JAMA. 2016;315(21):2300-2311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. World Health Organization. WHO Global Report on Trends in Prevalence of Tobacco Smoking 2000-2025.  2nd ed.  Geneva: World Health Organization; 2018. [Google Scholar]
  • 20. Kerpel-Fronius A, Tammemägi M, Cavic M, et al. ; ED and Screening Committee. Screening for lung cancer in individuals who never smoked: an international association for the study of lung cancer early detection and screening committee report. J Thorac Oncol. 2022;17(1):56-66. [DOI] [PubMed] [Google Scholar]
  • 21. Sun S, Schiller JH, Gazdar AF.  Lung cancer in never smokers—a different disease. Nat Rev Cancer. 2007;7(10):778-790. [DOI] [PubMed] [Google Scholar]
  • 22. Couraud S, Zalcman G, Milleron B, Morin F, Souquet PJ.  Lung cancer in never smokers–a review. Eur J Cancer. 2012;48(9):1299-1311. [DOI] [PubMed] [Google Scholar]
  • 23. Subramanian J, Govindan R.  Lung cancer in never smokers: a review. J Clin Oncol. 2007;25(5):561-570. [DOI] [PubMed] [Google Scholar]
  • 24. Freedman ND, Leitzmann MF, Hollenbeck AR, Schatzkin A, Abnet CC.  Cigarette smoking and subsequent risk of lung cancer in men and women: analysis of a prospective cohort study. Lancet Oncol. 2008;9(7):649-656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Thun MJ, Hannan LM, Adams-Campbell LL, et al.  Lung cancer occurrence in never-smokers: an analysis of 13 cohorts and 22 cancer registry studies. PLoS Med. 2008;5(9):e185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Kurmi OP, Arya PH, Lam KBH, Sorahan T, Ayres JG.  Lung cancer risk and solid fuel smoke exposure: a systematic review and meta-analysis. Eur Respir J. 2012;40(5):1228-1237. [DOI] [PubMed] [Google Scholar]
  • 27. Lissowska J, Bardin-Mikolajczak A, Fletcher T, et al.  Lung cancer and indoor pollution from heating and cooking with solid fuels: the IARC international multicentre case-control study in eastern/central Europe and the United Kingdom. Am J Epidemiol. 2005;162(4):326-333. [DOI] [PubMed] [Google Scholar]
  • 28. Xue Y, Jiang Y, Jin S, Li Y.  Association between cooking oil fume exposure and lung cancer among Chinese nonsmoking women: a meta-analysis. OTT. 2016;9:2987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Chen TY, Fang YH, Chen HL, et al.  Impact of cooking oil fume exposure and fume extractor use on lung cancer risk in non-smoking Han Chinese women. Sci Rep. 2020;10(1):1-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Tammemägi MC, Church TR, Hocking WG, et al.  Evaluation of the lung cancer risks at which to screen ever-and never-smokers: screening rules applied to the PLCO and NLST cohorts. PLoS Med. 2014;11(12):e1001764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Muller DC, Johansson M, Brennan P.  Lung cancer risk prediction model incorporating lung function: development and validation in the UK biobank prospective cohort study. J Clin Oncol.  2017;35(8):861-869. [DOI] [PubMed] [Google Scholar]
  • 32. Wu X, Wen CP, Ye Y, et al.  Personalized risk assessment in never, light, and heavy smokers in a prospective cohort in Taiwan. Sci Rep. 2016;6(1):1-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Chien LH, Chen CH, Chen TY, et al.  Predicting lung cancer occurrence in never-smoking females in Asia: TNSF-SQ, a prediction model. Cancer Epidemiol Biomarkers Prev. 2020;29(2):452-459. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

djac176_Supplementary_Data

Data Availability Statement

All of the data used in the completion of this study are publicly available, by request, from the China Kadoorie Biobank Data Access Committee (https://www.ckbiobank.org/site/Data+Access).

The absolute risk models presented in this article are made available as a user-friendly, free and open-source tool, at the following link: https://github.com/mattwarkentin/ALARM.


Articles from JNCI Journal of the National Cancer Institute are provided here courtesy of Oxford University Press

RESOURCES