Key Points
Question
Can a risk prediction model based on circulating protein biomarkers improve on a traditional risk prediction model for lung cancer and the current US screening criteria?
Findings
In a validation study of 63 ever-smoking patients with lung cancer and 90 matched controls, a biomarker-based risk prediction model consisting of 4 protein markers that was developed in a cohort of US individuals at high risk of lung cancer outperformed a model based on smoking history alone when blindly validated using prediagnostic samples from 2 European cohorts.
Meaning
Biomarker-based risk profiling has the potential to improve eligibility criteria for lung cancer screening.
Abstract
Importance
There is an urgent need to improve lung cancer risk assessment because current screening criteria miss a large proportion of cases.
Objective
To investigate whether a lung cancer risk prediction model based on a panel of selected circulating protein biomarkers can outperform a traditional risk prediction model and current US screening criteria.
Design, Setting, and Participants
Prediagnostic samples from 108 ever-smoking patients with lung cancer diagnosed within 1 year after blood collection and samples from 216 smoking-matched controls from the Carotene and Retinol Efficacy Trial (CARET) cohort were used to develop a biomarker risk score based on 4 proteins (cancer antigen 125 [CA125], carcinoembryonic antigen [CEA], cytokeratin-19 fragment [CYFRA 21-1], and the precursor form of surfactant protein B [Pro-SFTPB]). The biomarker score was subsequently validated blindly using absolute risk estimates among 63 ever-smoking patients with lung cancer diagnosed within 1 year after blood collection and 90 matched controls from 2 large European population-based cohorts, the European Prospective Investigation into Cancer and Nutrition (EPIC) and the Northern Sweden Health and Disease Study (NSHDS).
Main Outcomes and Measures
Model validity in discriminating between future lung cancer cases and controls. Discrimination estimates were weighted to reflect the background populations of EPIC and NSHDS validation studies (area under the receiver-operating characteristics curve [AUC], sensitivity, and specificity).
Results
In the validation study of 63 ever-smoking patients with lung cancer and 90 matched controls (mean [SD] age, 57.7 [8.7] years; 68.6% men) from EPIC and NSHDS, an integrated risk prediction model that combined smoking exposure with the biomarker score yielded an AUC of 0.83 (95% CI, 0.76-0.90) compared with 0.73 (95% CI, 0.64-0.82) for a model based on smoking exposure alone (P = .003 for difference in AUC). At an overall specificity of 0.83, based on the US Preventive Services Task Force screening criteria, the sensitivity of the integrated risk prediction (biomarker) model was 0.63 compared with 0.43 for the smoking model. Conversely, at an overall sensitivity of 0.42, based on the US Preventive Services Task Force screening criteria, the integrated risk prediction model yielded a specificity of 0.95 compared with 0.86 for the smoking model.
Conclusions and Relevance
This study provided a proof of principle in showing that a panel of circulating protein biomarkers may improve lung cancer risk assessment and may be used to define eligibility for computed tomography screening.
This validation study investigates the use of circulating protein biomarkers to improve lung cancer risk assessment and eligibility criteria for screening with low-dose computed tomography.
Introduction
The National Lung Screening Trial (NLST) findings suggested that screening with low-dose computed tomography (LDCT) can reduce lung cancer mortality.1 As a result, the US Preventive Services Task Force (USPSTF) recommends LDCT screening for lung cancer among individuals aged 55 to 80 years who have smoked 30 pack-years with up to 15 years since quitting smoking.1,2 However, LDCT screening results in a large number of indeterminate nodules,1 and less than 50% of incident lung cancer cases are among individuals who are eligible for screening.3 Biomarkers may improve lung cancer risk assessment over and beyond traditional smoking-based risk models and improve current screening eligibility criteria.4,5
Previous studies have shown that the precursor form of surfactant protein B (Pro-SFTPB) is predictive of lung cancer risk.5,6 Other markers that have been shown to be useful for the workup and diagnosis of lung cancer include cancer antigen 125 (CA125), cytokeratin-19 fragment (CYFRA 21-1), carcinoembryonic antigen (CEA), and human epididymis protein 4 (HE4).7,8,9,10,11,12 However, there are limited data regarding the performance of these markers in discriminating between future lung cancer cases and controls.
This study aimed to assess the potential of these 5 protein biomarkers to inform about lung cancer risk when tested blindly using prediagnostic samples.
Methods
A full account of the methods is provided in the eMethods in the Supplement. In brief, samples obtained from ever-smoking patients with lung cancer (cases) diagnosed within 1 year after blood collection (n = 108) and smoking-matched controls (n = 216) from the US Carotene and Retinol Efficacy Trial (CARET) cohort were used to develop a biomarker score based on circulating measures of Pro-SFTPB, CA125, CEA, HE4, and CYFRA 21-1 using logistic regression. All study participants gave written informed consent to participate in the study, and the research was approved by the institutional review boards of all of the participating institutions.
The extent to which the biomarker score improved discrimination of incident lung cancer cases and controls was validated externally using ever-smoking patients with lung cancer (cases) diagnosed within 1 year after blood collection (n = 63) and matched controls (n = 90) from the European Prospective Investigation into Cancer and Nutrition (EPIC) study and the Northern Sweden Health and Disease Study (NSHDS) (eFigure 1 in the Supplement). Absolute 1-year risks of lung cancer were estimated for each study participant in the validation study by modeling the cumulative hazards of lung cancer using flexible parametric survival models.13 Two models were evaluated: a traditional smoking history–based risk model and an integrated risk prediction model that combined the smoking model and the biomarker score. Model discrimination was assessed by receiver operating characteristic (ROC) analysis using the predicted 1-year lung cancer risks as scoring rule. Discrimination estimates included area under the ROC curve (AUC), sensitivity, and specificity, which were weighted to reflect the background populations. In the context of using the 1-year absolute risk of lung cancer to define screening eligibility, the sensitivity provides an estimate of the fraction of future lung cancer cases that would be eligible for screening at a certain absolute risk threshold. Conversely, the specificity provides an estimate of the fraction of individuals from the background population who remain healthy and would not be eligible for screening. A sensitivity of 1.00 (or 100%) would indicate that all lung cancer cases are eligible for screening and a specificity of 1.00 (100%) would indicate that all individuals who remain healthy are not eligible for screening (ie, that there are no false-positive controls). Statistical significance was assumed at a 2-sided P < .05.
Results
Details of the biomarker score and discrimination estimates in the CARET training study are available in eTables 1 and 2 and eFigures 2 and 3 in the Supplement. In the validation study of 63 ever-smoking patients with lung cancer and 90 matched controls (mean [SD] age, 57.7 [8.7] years; 68.6% men) from EPIC and NSHDS, the predicted risk of receiving a diagnosis of lung cancer within 1 year for a 60-year-old man with 30 pack-years of smoking history was estimated at 0.37% using the smoking model (Figure 1). In comparison, using the integrated risk prediction model, we estimated 1-year risks at 0.07% and 1.56% for the same man assuming a biomarker score equal to the average of the first and fourth quartile, respectively. The 1-year lung cancer risk estimates for each study participant in the validation study according to the smoking and integrated risk prediction models are shown in Figure 2. In comparison with the smoking model, the median 1-year risk estimates from the integrated risk prediction model increased for cases from 0.27% (interquartile range [IQR], 0.14%-0.50%) to 0.45% (IQR, 0.18%-1.5%) and decreased for controls from 0.12% (IQR, 0.05%-0.21%) to 0.04% (IQR, 0.015%-0.17%).
In the validation study, the population-weighted AUC was 0.73 (95% CI, 0.64-0.82) for the smoking model and 0.83 (95% CI, 0.76-0.90) for the integrated risk prediction model (P = .003 for difference in AUC) (Figure 3A). The AUCs were consistently higher for the integrated model than for the smoking model across relevant strata (eTable 3 in the Supplement). At an overall specificity of 0.83 based on the USPSTF screening criteria, the integrated risk prediction model yielded a sensitivity of 0.63 (95% CI, 0.49-0.76) compared with 0.43 (95% CI, 0.23-0.65) for the smoking model. Similarly, at an overall sensitivity of 0.42 (USPSTF), the integrated risk prediction model yielded a specificity of 0.95 (95% CI, 0.85-0.99) compared with 0.86 (95% CI, 0.72-0.94) for the smoking model. The improvement in AUC for the integrated risk prediction model (AUC, 0.80; 95% CI, 0.75-0.85) over the smoking model (AUC, 0.73; 95% CI, 0.68-0.79) was more modest when cases diagnosed up to 2 years after blood draw were considered (eFigure 4 in the Supplement). A full account of all conducted analyses is provided in the eResults; eTables 1, 2, and 4 to 10; and eFigures 6 to 10 of the Supplement.
Discussion
This is, to our knowledge, the first study in which a blood-based biomarker score was developed using one cohort and externally validated using prediagnostic samples from other independent cohorts. We observed a notable improvement in discrimination between future lung cancer cases and controls over a traditional smoking-based risk prediction model by incorporating information from a biomarker score consisting of 4 circulating proteins.
In our validation study, 26 of the 62 incident lung cancer cases (42%, corresponding to a sensitivity of 0.42) would have qualified for LDCT screening according to USPSTF criteria (USPSTF eligibility criteria could not be assessed for 1 case). Using the biomarker score together with smoking information, we estimated that 40 of 63 cases (63%, corresponding to a sensitivity of 0.63) could be identified without increasing the number of eligible controls (ie, without decreasing the specificity). The data further suggested that the biomarker score could alternatively be used to reduce screening of individuals not destined to develop lung cancer (false positives) from 15 of 90 controls (17%) to 4 of 90 controls (5%) without affecting the uptake of future lung cancer cases (sensitivity). These improvements in sensitivity and specificity were consistently observed across each evaluated stratum. Our findings also indicated that the improvement in discrimination afforded by the biomarker score is more modest beyond the initial year after blood draw, which suggests that an annual biomarker test may be necessary in a screening program.
Strengths and Limitations
Naive discrimination estimates, as typically provided in a matched, nested, case-control setting, are inherently biased. An important strength of our study was the use of absolute risks and population-based discrimination estimates, which were necessary to estimate the number of individuals who would be selected for screening using the biomarker-based eligibility criterion in the overall background cohorts, beyond our specific case-control study.
A limitation of our study was that 3 variables that were originally included in a validated risk prediction model (the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial model from 2012 [PLCOM2012]) were not available in our validation studies.14 However, with use of the original PLCO data, the exclusion of these variables from the PLCOM2012 model only nominally decreased the model’s performance, which suggests that our risk prediction model represented a valid comparison for the biomarkers score (eMethods and eFigure 5 in the Supplement).14
Although this study provided a proof of principle of the potential of using biomarkers in lung cancer risk assessment to define screening eligibility, validating and calibrating the integrated risk prediction model using larger sample size with prediagnostic samples is clearly needed before such a risk prediction tool can be used in practice. A larger sample size will also allow stratified analysis to evaluate the performance of the biomarker panel in predicting lung cancer cases associated with different characteristics, such as stage at diagnosis and histologic subtype. Furthermore, our study was limited to a select panel of circulating proteins, and we note that other types of biomarkers may also be informative.4,5 We also note that the population that would most benefit from a biomarker test before undergoing LDCT screening remains to be defined. A thorough cost-effectiveness assessment based on a large study sample is warranted to determine the threshold in absolute risk of developing lung cancer during a specific period, above which the benefits of screening outweigh the harms.15
Conclusions
This study provides a proof of principle in demonstrating that circulating biomarkers have the potential to inform lung cancer risk assessment and substantially improve on current criteria for LDCT screening.
References
- 1.National Lung Screening Trial Research Team; Aberle DR, Adams AM, Berg CD, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. 2011;365(5):395-409. doi: 10.1056/NEJMoa1102873 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Humphrey L, Deffebach M, Pappas M, et al. Screening for Lung Cancer. Rockville, MD: Systematic Review to Update the US Preventive Services Task Force Recommendation; 2013. [PubMed] [Google Scholar]
- 3.Muller DC, Johansson M, Brennan P. Lung cancer risk prediction model incorporating lung function: development and validation in the UK Biobank Prospective Cohort Study. J Clin Oncol. 2017;35(8):861-869. doi: 10.1200/JCO.2016.69.2467 [DOI] [PubMed] [Google Scholar]
- 4.Shiels MS, Pfeiffer RM, Hildesheim A, et al. Circulating inflammation markers and prospective risk for lung cancer. J Natl Cancer Inst. 2013;105(24):1871-1880. doi: 10.1093/jnci/djt309 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sin DD, Tammemagi CM, Lam S, et al. Pro-surfactant protein B as a biomarker for lung cancer prediction. J Clin Oncol. 2013;31(36):4536-4543. doi: 10.1200/JCO.2013.50.6105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Taguchi A, Hanash S, Rundle A, et al. Circulating pro-surfactant protein B as a risk biomarker for lung cancer. Cancer Epidemiol Biomarkers Prev. 2013;22(10):1756-1761. doi: 10.1158/1055-9965.EPI-13-0251 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bigbee WL, Gopalakrishnan V, Weissfeld JL, et al. A multiplexed serum biomarker immunoassay panel discriminates clinical lung cancer patients from high-risk individuals found to be cancer-free by CT screening. J Thorac Oncol. 2012;7(4):698-708. doi: 10.1097/JTO.0b013e31824ab6b0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Rastel D, Ramaioli A, Cornillie F, Thirion B; CYFRA 21-1 Multicentre Study Group . CYFRA 21-1, a sensitive and specific new tumour marker for squamous cell lung cancer: report of the first European multicentre evaluation. Eur J Cancer. 1994;30A(5):601-606. doi: 10.1016/0959-8049(94)90528-2 [DOI] [PubMed] [Google Scholar]
- 9.Patz EF Jr, Campa MJ, Gottlin EB, Kusmartseva I, Guan XR, Herndon JE II. Panel of serum biomarkers for the diagnosis of lung cancer. J Clin Oncol. 2007;25(35):5578-5583. doi: 10.1200/JCO.2007.13.5392 [DOI] [PubMed] [Google Scholar]
- 10.Schneider J, Bitterlich N, Kotschy-Lang N, Raab W, Woitowitz HJ. A fuzzy-classifier using a marker panel for the detection of lung cancers in asbestosis patients. Anticancer Res. 2007;27(4A):1869-1877. [PubMed] [Google Scholar]
- 11.Zeng Q, Liu M, Zhou N, Liu L, Song X. Serum human epididymis protein 4 (HE4) may be a better tumor marker in early lung cancer. Clin Chim Acta. 2016;455:102-106. doi: 10.1016/j.cca.2016.02.002 [DOI] [PubMed] [Google Scholar]
- 12.Taguchi A, Politi K, Pitteri SJ, et al. Lung cancer signatures in plasma based on proteome profiling of mouse tumor models. Cancer Cell. 2011;20(3):289-299. doi: 10.1016/j.ccr.2011.08.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Royston P, Parmar MK. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002;21(15):2175-2197. doi: 10.1002/sim.1203 [DOI] [PubMed] [Google Scholar]
- 14.Tammemägi MC, Katki HA, Hocking WG, et al. Selection criteria for lung-cancer screening. N Engl J Med. 2013;368(8):728-736. doi: 10.1056/NEJMoa1211776 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ten Haaf K, van Rosmalen J, de Koning HJ. Lung cancer detectability by test, histology, stage, and gender: estimates from the NLST and the PLCO trials. Cancer Epidemiol Biomarkers Prev. 2015;24(1):154-161. doi: 10.1158/1055-9965.EPI-14-0745 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.