Skip to main content
JAMA Network logoLink to JAMA Network
. 2018 Jul 12;4(10):e182078. doi: 10.1001/jamaoncol.2018.2078

Assessment of Lung Cancer Risk on the Basis of a Biomarker Panel of Circulating Proteins

Integrative Analysis of Lung Cancer Etiology and Risk (INTEGRAL) Consortium for Early Detection of Lung Cancer, Florence Guida 1, Nan Sun 2, Leonidas E Bantis 3, David C Muller 4, Peng Li 1,5, Ayumu Taguchi 6, Dilsher Dhillon 2, Deepali L Kundnani 2, Nikul J Patel 2, Qingxiang Yan 3, Graham Byrnes 7, Karel G M Moons 8, Anne Tjønneland 9, Salvatore Panico 10, Claudia Agnoli 11, Paolo Vineis 4,12, Domenico Palli 13, Bas Bueno-de-Mesquita 4,14, Petra H Peeters 8, Antonio Agudo 15, Jose M Huerta 16,17, Miren Dorronsoro 18, Miguel Rodriguez Barranco 17,19,20, Eva Ardanaz 17,21,22, Ruth C Travis 23, Karl Smith Byrne 23, Heiner Boeing 24, Annika Steffen 24, Rudolf Kaaks 25, Anika Hüsing 25, Antonia Trichopoulou 26,27, Pagona Lagiou 26,27,28, Carlo La Vecchia 26,29, Gianluca Severi 12,30, Marie-Christine Boutron-Ruault 30, Torkjel M Sandanger 31, Elisabete Weiderpass 31,32,33,34, Therese H Nøst 31, Kostas Tsilidis 4,35, Elio Riboli 4, Kjell Grankvist 36, Mikael Johansson 37, Gary E Goodman 38, Ziding Feng 3, Paul Brennan 1, Mattias Johansson 1,, Samir M Hanash 2,
PMCID: PMC6233784  PMID: 30003238

Key Points

Question

Can a risk prediction model based on circulating protein biomarkers improve on a traditional risk prediction model for lung cancer and the current US screening criteria?

Findings

In a validation study of 63 ever-smoking patients with lung cancer and 90 matched controls, a biomarker-based risk prediction model consisting of 4 protein markers that was developed in a cohort of US individuals at high risk of lung cancer outperformed a model based on smoking history alone when blindly validated using prediagnostic samples from 2 European cohorts.

Meaning

Biomarker-based risk profiling has the potential to improve eligibility criteria for lung cancer screening.

Abstract

Importance

There is an urgent need to improve lung cancer risk assessment because current screening criteria miss a large proportion of cases.

Objective

To investigate whether a lung cancer risk prediction model based on a panel of selected circulating protein biomarkers can outperform a traditional risk prediction model and current US screening criteria.

Design, Setting, and Participants

Prediagnostic samples from 108 ever-smoking patients with lung cancer diagnosed within 1 year after blood collection and samples from 216 smoking-matched controls from the Carotene and Retinol Efficacy Trial (CARET) cohort were used to develop a biomarker risk score based on 4 proteins (cancer antigen 125 [CA125], carcinoembryonic antigen [CEA], cytokeratin-19 fragment [CYFRA 21-1], and the precursor form of surfactant protein B [Pro-SFTPB]). The biomarker score was subsequently validated blindly using absolute risk estimates among 63 ever-smoking patients with lung cancer diagnosed within 1 year after blood collection and 90 matched controls from 2 large European population-based cohorts, the European Prospective Investigation into Cancer and Nutrition (EPIC) and the Northern Sweden Health and Disease Study (NSHDS).

Main Outcomes and Measures

Model validity in discriminating between future lung cancer cases and controls. Discrimination estimates were weighted to reflect the background populations of EPIC and NSHDS validation studies (area under the receiver-operating characteristics curve [AUC], sensitivity, and specificity).

Results

In the validation study of 63 ever-smoking patients with lung cancer and 90 matched controls (mean [SD] age, 57.7 [8.7] years; 68.6% men) from EPIC and NSHDS, an integrated risk prediction model that combined smoking exposure with the biomarker score yielded an AUC of 0.83 (95% CI, 0.76-0.90) compared with 0.73 (95% CI, 0.64-0.82) for a model based on smoking exposure alone (P = .003 for difference in AUC). At an overall specificity of 0.83, based on the US Preventive Services Task Force screening criteria, the sensitivity of the integrated risk prediction (biomarker) model was 0.63 compared with 0.43 for the smoking model. Conversely, at an overall sensitivity of 0.42, based on the US Preventive Services Task Force screening criteria, the integrated risk prediction model yielded a specificity of 0.95 compared with 0.86 for the smoking model.

Conclusions and Relevance

This study provided a proof of principle in showing that a panel of circulating protein biomarkers may improve lung cancer risk assessment and may be used to define eligibility for computed tomography screening.


This validation study investigates the use of circulating protein biomarkers to improve lung cancer risk assessment and eligibility criteria for screening with low-dose computed tomography.

Introduction

The National Lung Screening Trial (NLST) findings suggested that screening with low-dose computed tomography (LDCT) can reduce lung cancer mortality.1 As a result, the US Preventive Services Task Force (USPSTF) recommends LDCT screening for lung cancer among individuals aged 55 to 80 years who have smoked 30 pack-years with up to 15 years since quitting smoking.1,2 However, LDCT screening results in a large number of indeterminate nodules,1 and less than 50% of incident lung cancer cases are among individuals who are eligible for screening.3 Biomarkers may improve lung cancer risk assessment over and beyond traditional smoking-based risk models and improve current screening eligibility criteria.4,5

Previous studies have shown that the precursor form of surfactant protein B (Pro-SFTPB) is predictive of lung cancer risk.5,6 Other markers that have been shown to be useful for the workup and diagnosis of lung cancer include cancer antigen 125 (CA125), cytokeratin-19 fragment (CYFRA 21-1), carcinoembryonic antigen (CEA), and human epididymis protein 4 (HE4).7,8,9,10,11,12 However, there are limited data regarding the performance of these markers in discriminating between future lung cancer cases and controls.

This study aimed to assess the potential of these 5 protein biomarkers to inform about lung cancer risk when tested blindly using prediagnostic samples.

Methods

A full account of the methods is provided in the eMethods in the Supplement. In brief, samples obtained from ever-smoking patients with lung cancer (cases) diagnosed within 1 year after blood collection (n = 108) and smoking-matched controls (n = 216) from the US Carotene and Retinol Efficacy Trial (CARET) cohort were used to develop a biomarker score based on circulating measures of Pro-SFTPB, CA125, CEA, HE4, and CYFRA 21-1 using logistic regression. All study participants gave written informed consent to participate in the study, and the research was approved by the institutional review boards of all of the participating institutions.

The extent to which the biomarker score improved discrimination of incident lung cancer cases and controls was validated externally using ever-smoking patients with lung cancer (cases) diagnosed within 1 year after blood collection (n = 63) and matched controls (n = 90) from the European Prospective Investigation into Cancer and Nutrition (EPIC) study and the Northern Sweden Health and Disease Study (NSHDS) (eFigure 1 in the Supplement). Absolute 1-year risks of lung cancer were estimated for each study participant in the validation study by modeling the cumulative hazards of lung cancer using flexible parametric survival models.13 Two models were evaluated: a traditional smoking history–based risk model and an integrated risk prediction model that combined the smoking model and the biomarker score. Model discrimination was assessed by receiver operating characteristic (ROC) analysis using the predicted 1-year lung cancer risks as scoring rule. Discrimination estimates included area under the ROC curve (AUC), sensitivity, and specificity, which were weighted to reflect the background populations. In the context of using the 1-year absolute risk of lung cancer to define screening eligibility, the sensitivity provides an estimate of the fraction of future lung cancer cases that would be eligible for screening at a certain absolute risk threshold. Conversely, the specificity provides an estimate of the fraction of individuals from the background population who remain healthy and would not be eligible for screening. A sensitivity of 1.00 (or 100%) would indicate that all lung cancer cases are eligible for screening and a specificity of 1.00 (100%) would indicate that all individuals who remain healthy are not eligible for screening (ie, that there are no false-positive controls). Statistical significance was assumed at a 2-sided P < .05.

Results

Details of the biomarker score and discrimination estimates in the CARET training study are available in eTables 1 and 2 and eFigures 2 and 3 in the Supplement. In the validation study of 63 ever-smoking patients with lung cancer and 90 matched controls (mean [SD] age, 57.7 [8.7] years; 68.6% men) from EPIC and NSHDS, the predicted risk of receiving a diagnosis of lung cancer within 1 year for a 60-year-old man with 30 pack-years of smoking history was estimated at 0.37% using the smoking model (Figure 1). In comparison, using the integrated risk prediction model, we estimated 1-year risks at 0.07% and 1.56% for the same man assuming a biomarker score equal to the average of the first and fourth quartile, respectively. The 1-year lung cancer risk estimates for each study participant in the validation study according to the smoking and integrated risk prediction models are shown in Figure 2. In comparison with the smoking model, the median 1-year risk estimates from the integrated risk prediction model increased for cases from 0.27% (interquartile range [IQR], 0.14%-0.50%) to 0.45% (IQR, 0.18%-1.5%) and decreased for controls from 0.12% (IQR, 0.05%-0.21%) to 0.04% (IQR, 0.015%-0.17%).

Figure 1. Predicted Probability of Lung Cancer Within 1 Year for a Male From the Northern Sweden Health and Disease Study (NSHDS) According to Smoking History.

Figure 1.

A, Predicted probability of lung cancer according to the smoking risk prediction model based on age in years and smoking history. The rug plot shows the observed distribution of age in the validation study (European Prospective Investigation into Cancer and Nutrition [EPIC] and NSHDS, ever smokers). B, Predicted probability of lung cancer according to the integrated risk prediction model based on the biomarker score and the smoking history. The rug plot shows the observed distribution of the biomarker score in the validation study (EPIC and NSHDS, ever smokers). The vertical lines correspond to the quartiles threshold for biomarker score among controls (Q1, Q2, Q3, and Q4).

Figure 2. Predicted Probabilities of Lung Cancer Within 1 Year Based on the Smoking and Integrated Risk Prediction Models in the Validation Study (European Prospective Investigation Into Cancer and Nutrition [EPIC] and Northern Sweden Health and Disease Study [NSHDS], Ever Smokers).

Figure 2.

The validation samples consist of EPIC and NSHDS ever-smoking participants who received a diagnosis of lung cancer within 1 year after blood collection. For the controls, the size of the points is proportional to the number of eligible participants represented (corresponding to the inverse of the sampling probability). The right panel represents a magnified excerpt of the full figure.

In the validation study, the population-weighted AUC was 0.73 (95% CI, 0.64-0.82) for the smoking model and 0.83 (95% CI, 0.76-0.90) for the integrated risk prediction model (P = .003 for difference in AUC) (Figure 3A). The AUCs were consistently higher for the integrated model than for the smoking model across relevant strata (eTable 3 in the Supplement). At an overall specificity of 0.83 based on the USPSTF screening criteria, the integrated risk prediction model yielded a sensitivity of 0.63 (95% CI, 0.49-0.76) compared with 0.43 (95% CI, 0.23-0.65) for the smoking model. Similarly, at an overall sensitivity of 0.42 (USPSTF), the integrated risk prediction model yielded a specificity of 0.95 (95% CI, 0.85-0.99) compared with 0.86 (95% CI, 0.72-0.94) for the smoking model. The improvement in AUC for the integrated risk prediction model (AUC, 0.80; 95% CI, 0.75-0.85) over the smoking model (AUC, 0.73; 95% CI, 0.68-0.79) was more modest when cases diagnosed up to 2 years after blood draw were considered (eFigure 4 in the Supplement). A full account of all conducted analyses is provided in the eResults; eTables 1, 2, and 4 to 10; and eFigures 6 to 10 of the Supplement.

Figure 3. Receiver Operating Characteristic (ROC) Curve Analysis in the Validation Study (European Prospective Investigation Into Cancer and Nutrition [EPIC] and Northern Sweden Health and Disease Study [NSHDS], Ever Smokers).

Figure 3.

A, ROC curve analysis in the validation study (EPIC and NSHDS ever-smoker participants who received a diagnosis of lung cancer within 1 year after blood collection) for 2 risk prediction models: a model that used smoking variables only (smoking) and an integrated model with the smoking variables and the biomarker score combined (smoking + biomarkers). AUC indicates area under the curve; USPSTF, US Preventive Services Task Force. The horizontal dashed line indicates sensitivity and the vertical dashed line, specificity. B, Sensitivity and specificity in relation to the probability of lung cancer within 1 year predicted by the integrated model.

Discussion

This is, to our knowledge, the first study in which a blood-based biomarker score was developed using one cohort and externally validated using prediagnostic samples from other independent cohorts. We observed a notable improvement in discrimination between future lung cancer cases and controls over a traditional smoking-based risk prediction model by incorporating information from a biomarker score consisting of 4 circulating proteins.

In our validation study, 26 of the 62 incident lung cancer cases (42%, corresponding to a sensitivity of 0.42) would have qualified for LDCT screening according to USPSTF criteria (USPSTF eligibility criteria could not be assessed for 1 case). Using the biomarker score together with smoking information, we estimated that 40 of 63 cases (63%, corresponding to a sensitivity of 0.63) could be identified without increasing the number of eligible controls (ie, without decreasing the specificity). The data further suggested that the biomarker score could alternatively be used to reduce screening of individuals not destined to develop lung cancer (false positives) from 15 of 90 controls (17%) to 4 of 90 controls (5%) without affecting the uptake of future lung cancer cases (sensitivity). These improvements in sensitivity and specificity were consistently observed across each evaluated stratum. Our findings also indicated that the improvement in discrimination afforded by the biomarker score is more modest beyond the initial year after blood draw, which suggests that an annual biomarker test may be necessary in a screening program.

Strengths and Limitations

Naive discrimination estimates, as typically provided in a matched, nested, case-control setting, are inherently biased. An important strength of our study was the use of absolute risks and population-based discrimination estimates, which were necessary to estimate the number of individuals who would be selected for screening using the biomarker-based eligibility criterion in the overall background cohorts, beyond our specific case-control study.

A limitation of our study was that 3 variables that were originally included in a validated risk prediction model (the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial model from 2012 [PLCOM2012]) were not available in our validation studies.14 However, with use of the original PLCO data, the exclusion of these variables from the PLCOM2012 model only nominally decreased the model’s performance, which suggests that our risk prediction model represented a valid comparison for the biomarkers score (eMethods and eFigure 5 in the Supplement).14

Although this study provided a proof of principle of the potential of using biomarkers in lung cancer risk assessment to define screening eligibility, validating and calibrating the integrated risk prediction model using larger sample size with prediagnostic samples is clearly needed before such a risk prediction tool can be used in practice. A larger sample size will also allow stratified analysis to evaluate the performance of the biomarker panel in predicting lung cancer cases associated with different characteristics, such as stage at diagnosis and histologic subtype. Furthermore, our study was limited to a select panel of circulating proteins, and we note that other types of biomarkers may also be informative.4,5 We also note that the population that would most benefit from a biomarker test before undergoing LDCT screening remains to be defined. A thorough cost-effectiveness assessment based on a large study sample is warranted to determine the threshold in absolute risk of developing lung cancer during a specific period, above which the benefits of screening outweigh the harms.15

Conclusions

This study provides a proof of principle in demonstrating that circulating biomarkers have the potential to inform lung cancer risk assessment and substantially improve on current criteria for LDCT screening.

Supplement.

eMethods. Supplementary Methods

eResults. Supplementary Results

eTables 1 through 10. Supplementary Tables

eFigures 1 through 10. Supplementary Figures

References

  • 1.National Lung Screening Trial Research Team; Aberle DR, Adams AM, Berg CD, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. 2011;365(5):395-409. doi: 10.1056/NEJMoa1102873 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Humphrey L, Deffebach M, Pappas M, et al. Screening for Lung Cancer. Rockville, MD: Systematic Review to Update the US Preventive Services Task Force Recommendation; 2013. [PubMed] [Google Scholar]
  • 3.Muller DC, Johansson M, Brennan P. Lung cancer risk prediction model incorporating lung function: development and validation in the UK Biobank Prospective Cohort Study. J Clin Oncol. 2017;35(8):861-869. doi: 10.1200/JCO.2016.69.2467 [DOI] [PubMed] [Google Scholar]
  • 4.Shiels MS, Pfeiffer RM, Hildesheim A, et al. Circulating inflammation markers and prospective risk for lung cancer. J Natl Cancer Inst. 2013;105(24):1871-1880. doi: 10.1093/jnci/djt309 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Sin DD, Tammemagi CM, Lam S, et al. Pro-surfactant protein B as a biomarker for lung cancer prediction. J Clin Oncol. 2013;31(36):4536-4543. doi: 10.1200/JCO.2013.50.6105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Taguchi A, Hanash S, Rundle A, et al. Circulating pro-surfactant protein B as a risk biomarker for lung cancer. Cancer Epidemiol Biomarkers Prev. 2013;22(10):1756-1761. doi: 10.1158/1055-9965.EPI-13-0251 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bigbee WL, Gopalakrishnan V, Weissfeld JL, et al. A multiplexed serum biomarker immunoassay panel discriminates clinical lung cancer patients from high-risk individuals found to be cancer-free by CT screening. J Thorac Oncol. 2012;7(4):698-708. doi: 10.1097/JTO.0b013e31824ab6b0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Rastel D, Ramaioli A, Cornillie F, Thirion B; CYFRA 21-1 Multicentre Study Group . CYFRA 21-1, a sensitive and specific new tumour marker for squamous cell lung cancer: report of the first European multicentre evaluation. Eur J Cancer. 1994;30A(5):601-606. doi: 10.1016/0959-8049(94)90528-2 [DOI] [PubMed] [Google Scholar]
  • 9.Patz EF Jr, Campa MJ, Gottlin EB, Kusmartseva I, Guan XR, Herndon JE II. Panel of serum biomarkers for the diagnosis of lung cancer. J Clin Oncol. 2007;25(35):5578-5583. doi: 10.1200/JCO.2007.13.5392 [DOI] [PubMed] [Google Scholar]
  • 10.Schneider J, Bitterlich N, Kotschy-Lang N, Raab W, Woitowitz HJ. A fuzzy-classifier using a marker panel for the detection of lung cancers in asbestosis patients. Anticancer Res. 2007;27(4A):1869-1877. [PubMed] [Google Scholar]
  • 11.Zeng Q, Liu M, Zhou N, Liu L, Song X. Serum human epididymis protein 4 (HE4) may be a better tumor marker in early lung cancer. Clin Chim Acta. 2016;455:102-106. doi: 10.1016/j.cca.2016.02.002 [DOI] [PubMed] [Google Scholar]
  • 12.Taguchi A, Politi K, Pitteri SJ, et al. Lung cancer signatures in plasma based on proteome profiling of mouse tumor models. Cancer Cell. 2011;20(3):289-299. doi: 10.1016/j.ccr.2011.08.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Royston P, Parmar MK. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002;21(15):2175-2197. doi: 10.1002/sim.1203 [DOI] [PubMed] [Google Scholar]
  • 14.Tammemägi MC, Katki HA, Hocking WG, et al. Selection criteria for lung-cancer screening. N Engl J Med. 2013;368(8):728-736. doi: 10.1056/NEJMoa1211776 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ten Haaf K, van Rosmalen J, de Koning HJ. Lung cancer detectability by test, histology, stage, and gender: estimates from the NLST and the PLCO trials. Cancer Epidemiol Biomarkers Prev. 2015;24(1):154-161. doi: 10.1158/1055-9965.EPI-14-0745 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement.

eMethods. Supplementary Methods

eResults. Supplementary Results

eTables 1 through 10. Supplementary Tables

eFigures 1 through 10. Supplementary Figures


Articles from JAMA Oncology are provided here courtesy of American Medical Association

RESOURCES