Skip to main content
Elsevier Sponsored Documents logoLink to Elsevier Sponsored Documents
. 2023 Jan;77:1–12. doi: 10.1016/j.annepidem.2022.10.014

Design and methodological considerations for biomarker discovery and validation in the Integrative Analysis of Lung Cancer Etiology and Risk (INTEGRAL) Program

Hilary A Robbins a,, Karine Alcala a,^, Elham Khodayari Moez b,^, Florence Guida c, Sera Thomas b, Hana Zahed a, Matthew T Warkentin b,d, Karl Smith-Byrne e, Yonathan Brhane b, David Muller f, Xiaoshuang Feng a, Demetrius Albanes g, Melinda C Aldrich h, Alan A Arslan i, Julie Bassett j, Christine D Berg k, Qiuyin Cai l, Chu Chen m, Michael PA Davies n, Brenda Diergaarde o,p, John K Field n, Neal D Freedman g, Wen-Yi Huang g, Mikael Johansson q, Michael Jones r, Woon-Puay Koh s,t, Stephen Lam u, Qing Lan g, Arnulf Langhammer v,w, Linda M Liao g, Geoffrey Liu x, Reza Malekzadeh y, Roger L Milne j,z,aa, Luis M Montuenga bb,cc,dd, Thomas Rohan ee, Howard D Sesso ff, Gianluca Severi gg, Mahdi Sheikh a, Rashmi Sinha g, Xiao-Ou Shu l, Victoria L Stevens hh, Martin C Tammemägi ii,jj, Lesley F Tinker kk, Kala Visvanathan ll, Ying Wang mm, Renwei Wang nn, Stephanie J Weinstein g, Emily White oo, David Wilson pp, Jian-Min Yuan qq,p, Xuehong Zhang ff, Wei Zheng l, Christopher I Amos rr, Paul Brennan a, Mattias Johansson a,⁎,#, Rayjean J Hung b,d,1,#,
PMCID: PMC9835888  PMID: 36404465

Abstract

The Integrative Analysis of Lung Cancer Etiology and Risk (INTEGRAL) program is an NCI-funded initiative with an objective to develop tools to optimize low-dose CT (LDCT) lung cancer screening. Here, we describe the rationale and design for the Risk Biomarker and Nodule Malignancy projects within INTEGRAL.

The overarching goal of these projects is to systematically investigate circulating protein markers to include on a panel for use (i) pre-LDCT, to identify people likely to benefit from screening, and (ii) post-LDCT, to differentiate benign versus malignant nodules. To identify informative proteins, the Risk Biomarker project measured 1161 proteins in a nested-case control study within 2 prospective cohorts (n = 252 lung cancer cases and 252 controls) and replicated associations for a subset of proteins in 4 cohorts (n = 479 cases and 479 controls). Eligible participants had a current or former history of smoking and cases were diagnosed up to 3 years following blood draw. The Nodule Malignancy project measured 1078 proteins among participants with a heavy smoking history within four LDCT screening studies (n = 425 cases diagnosed up to 5 years following blood draw, 430 benign-nodule controls, and 398 nodule-free controls).

The INTEGRAL panel will enable absolute quantification of 21 proteins. We will evaluate its performance in the Risk Biomarker project using a case-cohort study including 14 cohorts (n = 1696 cases and 2926 subcohort representatives), and in the Nodule Malignancy project within five LDCT screening studies (n = 675 cases, 680 benign-nodule controls, and 648 nodule-free controls). Future progress to advance lung cancer early detection biomarkers will require carefully designed validation, translational, and comparative studies.

Keywords: Lung cancer screening, early detection, biomarkers, risk prediction, nodule malignancy, biomarker discovery and validation, study design

Introduction

Lung cancer screening by low-dose computed tomography (LDCT) has accelerated the field of lung cancer research with a renewed focus on early detection [1,2]. However, several questions remain regarding how to best implement LDCT screening [3], including how to identify individuals who are likely to benefit from screening, and how to manage nodules of indeterminate malignancy status identified on LDCT scans. Here, we describe the rationale and design of a large international research effort to develop and validate biomarker tools that can be applied in these two settings.

In 2018, the U.S. National Cancer Institute (NCI) funded the Integrative Analysis of Cancer Risk and Etiology (INTEGRAL) U19 program, which includes an objective to develop early detection biomarkers and risk prediction tools for lung cancer screening. The INTEGRAL program comprises 3 projects: the Genetics project which studies germline genetics, the Risk Biomarker project which studies prediagnostic blood biomarkers, and the Nodule Malignancy project which studies applications in LDCT screening studies including nodule evaluation. Here, we describe a joint effort of the Risk Biomarker and Nodule Malignancy projects to systematically investigate circulating protein markers for pre- and post-LDCT applications.

The primary objective of the Risk Biomarker project is to identify and validate biomarkers that can improve lung cancer risk prediction among people with a smoking history. A secondary objective is to develop and validate questionnaire-based lung cancer risk prediction models. The objectives for the Nodule Malignancy project are to identify biomarkers and establish quantitative imaging models that can differentiate benign versus malignant nodules following an initial LDCT scan. The Risk Biomarker project leverages resources from the Lung Cancer Cohort Consortium (LC3) [4], [5], [6], [7], [8] which was initially established in 2010 within the NCI Cohort Consortium [9]. The Nodule Malignancy project brings together LDCT screening studies in the framework of the International Lung Cancer Consortium (ILCCO), which has provided a foundation for collaborative research on lung cancer since 2004 (http://ilcco.iarc.fr).

This paper provides a design overview of the biomarker studies within the INTEGRAL Risk Biomarker and Nodule Malignancy projects. We highlight considerations that motivated the design, present details of the study population, and describe the harmonized databases resulting from these projects. Finally, we discuss perspectives for research to follow this initiative with a view toward implementation of the prediction tools in clinical practice.

Development and validation of a protein biomarker panel for early lung cancer detection

Motivation

The US Preventive Services Task Force (USPSTF) currently recommends lung cancer screening for people aged 50–80 years who have smoked at least 20 pack-years and currently smoke or have quit within the past 15 years [10]. However, more than one-third of lung cancer deaths that could be prevented among people who have smoked fall outside of these criteria [11]. To better target the highest-risk population, screening can instead be offered to people whose individual lung cancer risk exceeds a certain threshold as estimated by a risk prediction model [12], [13], [14], [15]. This approach is included in the US National Comprehensive Cancer Network (NCCN) guidelines [16].

Biomarkers may provide additional or complementary information on lung cancer risk and represent a promising avenue to improve existing risk prediction models. Conceptually, this could improve efficiency in two ways: by offering screening to people who have high risk based on biomarkers but are not otherwise eligible for screening based on the current recommendation, and by deprioritizing screening for individuals who are eligible but have a low-risk biomarker profile. Various domains of biomarkers have been investigated, but the translation of this research into practice has been slow, partly due to the lack of appropriately designed studies to establish and validate biomarker-based risk prediction models [17,18].

Another setting in which biomarkers could be applied in lung cancer screening is to better distinguish between malignant and benign nodules on LDCT images. Nodules are detected in up to one-quarter of participants, but the vast majority are benign. Managing nodules with uncertain clinical significance (i.e., indeterminate nodules) represents an important challenge because false-positive nodules can lead to interventions with risks of long-term harm. On the other hand, missed malignant nodules can lead to a lost opportunity for curative treatment. Several prediction models for nodule malignancy have been developed [19], [20], [21], but their classification accuracies remain imperfect.

Recent papers have highlighted common limitations in the design of studies aiming to identify and validate biomarkers for early cancer detection [22], including lung cancer [18]. To avoid common biases resulting from systematic differences between cases and controls, the prospective-specimen-collection, retrospective-blinded-evaluation (PRoBE) design emphasizes the use of pre-diagnostic samples, sampling from the same source population, and matching on important factors that impact biomarker measurements and outcome [23]. In validation studies, it is critical that the added contribution of the biomarker, compared with existing tools, can be clearly identified and quantified [18].

Several studies led by our group and others informed our overall choice to pursue a research program focused on protein biomarkers within INTEGRAL. First, in a pilot study published in 2018, members of our team found that a pre-defined set of cancer-related protein biomarkers improved discrimination between lung cancer cases and controls compared to a smoking-based risk prediction model, when the markers were measured in an independent validation study using samples collected within the year before diagnosis [24]. Second, we carried out a modeling study which suggested that using such biomarkers to optimize screening eligibility could be cost-effective, as long as the biomarker provides moderate or better risk discrimination at modest cost [25]. Studies also suggest that protein markers can improve discrimination between malignant and benign lung nodules [26,27]. Therefore, building on these promising preliminary data, the INTEGRAL program was formed to conduct a comprehensive protein biomarker evaluation from discovery to validation for both population-based risk prediction (Risk Biomarker project), and nodule differentiation (Nodule Malignancy project).

Our overarching aims are i) to identify circulating proteins that provide additional information to the gold standard on both lung cancer risk and nodule malignancy and ii) to develop and validate a multiplex lung cancer biomarker assay that can quantify key lung cancer risk and/or nodule malignancy proteins in small volumes of peripheral blood in a cost-effective manner. Use of a single assay will help to streamline clinical implementation along the various steps of the LDCT screening pathway.

Design

Overview

Figure 1 outlines the sequential study phases of the INTEGRAL Risk Biomarker and Nodule Malignancy projects. In the Risk Biomarker project, using pre-diagnostic samples from population cohorts, an initial ‘full discovery’ phase scanned a broad set of protein markers, followed by a ‘targeted discovery’ phase which replicated results for a subset of proteins. The Nodule Malignancy project started with an expanded targeted discovery phase and analyzed samples from LDCT screening studies to identify proteins that are specifically useful to distinguish between benign and malignant lung nodules. The results from both projects will be used to configure the INTEGRAL panel with 21 circulating protein markers, whose performance will be assessed in a validation phase conducted separately within each project. Table 1 summarizes the key characteristics of the participating cohorts and LDCT screening studies in each phase.

Fig. 1.

Fig 1

Schematic describing the development and validation of the INTEGRAL protein panel for lung cancer early detection and nodule malignancy.

See Table 1 for definitions of the cohort abbreviations.

a: Cardiometabolic, Cardiovascular II, Cardiovascular III, Cell Regulation, Development, Immune response, Inflammation, Metabolism, Neurology, Oncology II, Oncology III, Organ Damage, NeuroExploratory

b: Cardiovascular III, Inflammation, Immuno-Oncology, Oncology II, Oncology III, NeuroExploratory

c: Cardiometabolic, Cardiovascular II, Cardiovascular III, Development, Immune Response, Inflammation, Metabolism, Neurology, Oncology II, Oncology III, Organ Damage, NeuroExploratory.

Table 1.

Description of lung cancer cases participating in the development and validation of the INTEGRAL protein panel for lung cancer early detection and nodule malignancy. Bold values are to indicate that these are totals.

Study component Location Years of blood draw(s) Lung cancer cases
Matched controls Subcohort reps.
Total Former smoking Current smoking

Risk Biomarker: Full discovery
European Prospective Investigation into Cancer and Nutrition (EPIC) Europe 1991–2002 188 59 (31%) 129 (69%) 188
Northern Sweden Health and Disease Study (NSHDS) Sweden 1988–2016 64 26 (41%) 38 (59%) 64
Total 252 85 (34%) 167 (66%) 252

Risk Biomarker: Targeted discovery*
Cancer Prevention Study II (CPS-II) USA 1998–2001 115 94 (82%) 21 (18%) 115
Nord-Trøndelag Health Study (HUNT) Norway 1995–1997
2006–2008
164 61 (37%) 103 (63%) 164
Melbourne Collaborative Cohort Study (MCCS)⁎⁎ Australia 1990–1994
2003–2007
108 65 (60%) 43 (40%) 108
Singapore Chinese Health Study (SCHS) Singapore 1994–2005 92 29 (32%) 63 (68%) 92
Total 479 249 (52%) 230 (48%) 479

Risk Biomarker: Validation – training set*
Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study (ATBC) Finland 1985–1988 327 327 (100%) 654
Campaign Against Cancer and Heart Disease (CLUE) USA 1989–1989 60 33 (55%) 27 (45%) 123
Cancer Prevention Study II (CPS-II) USA 1998–2001 115 94 (82%) 21 (18%) 115
Nord-Trøndelag Health Study (HUNT) Norway 1995–1997
2006–2008
164 61 (37%) 103 (63%) 165
Melbourne Collaborative Cohort Study (MCCS)⁎⁎ Australia 1990–1994
2003–2007
108 65 (60%) 43 (40%) 111
Physicians’ Health Study (PHS) USA 1995–2002 29 20 (69%) 9 (31%) 58
Singapore Chinese Health Study (SCHS) Singapore 1994–2005 92 29 (32%) 63 (68%) 92
Women's Health Initiative (WHI) (1)⁎⁎ USA 1993–1997 241 167 (69%) 74 (31%) 482
Total 1136 469 (41%) 667 (59%) 1800

Risk Biomarker: Validation – testing set
Golestan Cohort Study (GCS) Iran 2004–2008 14 14 (100%) 28
New York University Women's Health Study (NYUWHS) USA 1985–1991 19 7 (37%) 12 (63%) 38
Shanghai Cohort Study (SCS) China 1986–1989 56 8 (14%) 48 (86%) 112
Southern Community Cohort Study (SCCS) USA 2002–2009 143 31 (22%) 112 (78%) 292
Shanghai Men's Health Study (SMHS) China 2001–2006 91 19 (21%) 72 (79%) 182
Women's Health Initiative (WHI) (2)⁎⁎ USA 1998–2002 204 145 (71%) 59 (29%) 408
Women's Health Study (WHS) USA 1993–1996 33 19 (58%) 14 (42%) 66
Total 560 229 (41%) 331 (59%) 1126

Study component Location Years of blood draw(s) Lung cancer cases
Nodule-free controls Benign nodule controls
Total Former smoking Current smoking

Nodule Malignancy: Targeted discovery
Pan-Canadian Early Detection of Lung Cancer Study (PanCan) Canada 2008–2014 169 60 (36%) 109 (64%) 169 169
The UK Lung Cancer Pilot Screening Trial (UKLS) England 2011–2013 101 41 (41%) 60 (59%) 64 92
The International Early Lung Cancer Action Program (IELCAP-Toronto) Canada 2003–2019 79 30 (38%) 49 (62%) 89 87
The International Early Lung Cancer Action Program (Pamplona-IELCAP) Spain 2001–2020 76 29 (38%) 47 (62%) 76 82
Total 425 160 (38%) 265 (62%) 398 430
Nodule Malignancy: Validation
The Pittsburgh Lung Screening Study (PLuSS) USA 2002–2016 250 77 (31%) 173 (69%) 250 250

INTEGRAL, the Integrative Analysis of Lung Cancer Etiology and Risk program. IELCAP, the International Early Lung Cancer Action Program. Details on the eligibility criteria, data collection, and outcome ascertainment for each cohort are described in the Supplement. Further description of the lung cancer cases is given in Supplementary Table 1.

*Cohorts in the Risk Biomarker targeted discovery phase are also included in the validation phase training set and are listed twice in the table.

⁎⁎For the Risk Biomarker project, in MCCS and WHI, participants were sampled separately at two different blood draws. We chose to include the first WHI blood draw in the training set, and the second blood draw in the testing set, to achieve a similar balance of current and former smoking cases between the two sets. For the stratified selection of subcohort representatives, WHI included a stratification by study arm (observational study or the non-intervention arm of the clinical trial).

We are using the Olink proteomics platform (Olink Proteomics, Uppsala, Sweden) throughout the project [28]. Olink discovery assays allow high-throughput semi-quantified concentration measures of highly annotated proteins in less than 50 uL of plasma or serum. The platform uses proximity extension assay (PEA) technology which is highly sensitive, avoids cross-reactivity, and has high reproducibility. Relative protein concentrations are expressed as normalized protein expression (NPX) on log2 scale, which is estimated from quantitative PCR cycle threshold values, and were standardized for analysis. For all laboratory analyses in INTEGRAL, cases and controls are randomly allocated across plates, with matched pairs plated together where relevant.

To enable absolute quantification of proteins for clinical applications, we will develop the INTEGRAL panel as an Olink customized panel. Customized panels are also based on PEA technology and can measure up to 21 proteins in less than 50 uL of plasma or serum [29]. We plan to include 21 proteins on our panel, which is the maximum due to technical limitations, since reducing the number of proteins reduces neither the assay cost nor the sample volume requirement.

Risk Biomarker project

The design of the Risk Biomarker project was informed by several considerations. Given that a key application for biomarkers in screening eligibility could be to identify individuals at high risk for lung cancer despite not meeting eligibility criteria (e.g., USPSTF criteria), it was crucial that the Risk Biomarker project include individuals who are both eligible and ineligible by current criteria. Therefore, pre-diagnostic samples collected within prospective cohorts provided an ideal study resource. Within cohorts, we first restricted to participants who currently or formerly smoked because they represent the current target population for lung cancer screening [10]. Second, we included cases diagnosed up to 3 years following blood draw, to predict lung cancer within a clinically actionable timeframe [24]. Third, we used a matched case-control design for the discovery phases, but a case-cohort design for the validation phase. For discovery, the matched design is important to eliminate influences such as storage duration and biospecimen handling. In the validation phase, we changed to a case-cohort design to facilitate development of an integrated risk prediction model that is well-calibrated and representative of the source population (i.e., representative of all participants in the cohorts who ever smoked).

Full discovery phase

In the Risk Biomarker project full discovery phase, we measured all 13 Olink proteomics panels available in late 2019, which cover a range of domains including inflammation, oncology, and cardiovascular disease (1161 proteins, Appendix Table, Table 2). The objective of the full discovery phase was to select panels to measure in the targeted discovery phase, and the sample included the European Investigation into Cancer and Nutrition (EPIC, n = 188 lung cancer cases) and the Northern Sweden Health and Disease Study (NSHDS, n = 64 cases) (Table 1; further details in Supplementary Table 1). We included all confirmed lung cancer cases among people who ever smoked that were diagnosed within 3 years of blood draw. For each case, one control was randomly chosen using incidence density sampling from risk sets consisting of people who ever smoked and were alive and free of cancer at the time of diagnosis of the index case. Matching criteria included cohort, study center (where relevant), sex, date of blood collection (within 1 month of the index case, relaxed to 3 months for cases without available controls), date of birth (within 1 year of the index case, relaxed to 3 years), and smoking status in 4 categories: people who formerly smoked and quit less than 10 vs. at least 10 years prior, and people who currently smoked less than 15 vs. at least 15 cigarettes per day.

Table 2.

Proteomics panels tested in the full and targeted discovery phases to develop the INTEGRAL protein panel for lung cancer early detection and nodule malignancy.

Risk Biomarker project
Nodule Malignancy project
Full discovery
Targeted discovery
Targeted discovery
Cohorts EPIC NSHDS SCHS CPS-II HUNT MCCS PanCan UKLS IELCAP-Toronto Pamplona-IELCAP
Number of lung cancer cases 188 64 92 115 164 108 169 101 79 76
Number of panels measured 13 13 5 6 5 6 12 12 12 12
Number of measurements* 1196 1196 460 552 460 552 1104 1104 1104 1104
Number of unique proteins* 1161 1161 394 484 392 484 1078 1078 1078 1078
Proteomics panels
Cardiovascular III X X X X X X X X X X
Inflammation X X X X X X X X X X
Immuno-Oncology (X) (X) X X X X (X) (X) (X) (X)
Oncology II X X X X X X X X X X
Oncology III X X X X X X X X X
NeuroExploratory X X X X X X X X X
Cardiometabolic X X X X X X
Cardiovascular II X X X X X X
Cell Regulation X X
Development X X X X X X
Immune Response X X X X X X
Metabolism X X X X X X
Neurology X X X X X X
Organ Damage X X X X X X

Some proteins are measured on multiple panels. In these cases, we chose a single measurement of each protein for analysis by choosing the one that was measured on more cohorts, and then if needed, the one with the highest variance.

(X): all the proteins from the Immuno-Oncology panel are included on other panels assayed as indicated.

Details of the proteins measured on each panel are provided in the Appendix Table.

The dataset generated by the full discovery phase therefore includes 252 case-control pairs with 1161 proteins measured on each participant (Table 2). Statistical analyses applied conditional logistic and penalized regression. We used the results to examine, for each of the 13 proteomics panels, the number of highly ranked and consistently selected proteins.

Targeted discovery phase

The targeted discovery phase of the Risk Biomarker project used the same design to independently replicate associations for a subset of proteomics panels, chosen to maximize coverage of the promising proteins while minimizing the total cost. This phase included 4 cohorts with 479 total eligible lung cancer cases: the Cancer Prevention Study II, the Nord-Trøndelag Health Study, the Melbourne Collaborative Cohort Study, and the Singapore Chinese Health Study (Table 1; further details in Supplementary Table 1). To cover as many of the promising proteins as possible, we measured the Immuno-oncology, Oncology II, Cardiovascular III, and Inflammation panels on all four cohorts, and the Oncology III and Neuro-exploratory panels on three cohorts each (Table 2).

The dataset generated for the targeted discovery phase therefore includes 479 case-control pairs with between 392 and 484 proteins measured for each participant (Table 2). Statistical analyses included conditional logistic regression, penalized regression, and stratified approaches. For the INTEGRAL panel, we are prioritizing proteins selected in penalized regression models that show a consistent association with lung cancer across cohorts.

Validation phase

The Risk Biomarker project validation phase includes 14 cohorts and employs a case-cohort design. In each cohort, all cases diagnosed within 3 years of blood draw were included. Subcohort representatives were randomly sampled at the time of blood draw in eight jointly defined categories including age (above or below the median age among cases), sex (male or female, except for single-sex cohorts), and smoking status (current or former). We then weight each selected participant by his/her inverse probability of selection to fully represent the cohorts of participants who ever smoked at the time of enrollment. To maximize statistical power, we included the four cohorts from the targeted discovery phase again in the validation phase, analyzing the same cases as in the targeted discovery phase but selecting one new subcohort representative per case. Then, for the 10 cohorts that are included for the first time in the validation phase, we selected two subcohort representatives per case.

The validation phase samples will be assayed for absolute quantification of the 21 proteins on the INTEGRAL panel. The cohorts will be divided into training and testing sets (Table 1). To maintain full independence of the testing set, the four cohorts that contributed to the targeted discovery phase will be included in the training set. In addition to these four cohorts, the training set will also include the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study, the Campaign Against Cancer and Heart Disease, the Physicians’ Health Study, and the first blood draw from the Women's Health Initiative. The testing set will include the Golestan Cohort Study, the New York University Women's Health Study, the Shanghai Cohort Study, the Southern Community Cohort Study, the Shanghai Men's Health Study, the second blood draw from the Women's Health Initiative, and the Women's Health Study. These groupings were chosen to balance the training and testing sets by geographical location, U.S. racial/ethnic groups, people who currently or formerly smoked, and lung cancer histological types. For the Women's Health Initiative, two independent groups of participants were selected from two blood draws, and we chose to separate these to achieve a similar balance of current and former smoking cases between the training and testing sets.

Statistical analyses in the validation phase will use the training set to establish flexible parametric survival models that predict absolute risk of lung cancer over 3 years [30]. Predictors will include a subset of the 21 proteins from the INTEGRAL panel in addition to demographic, health history, and smoking information. The final model will be evaluated in the testing set to measure its calibration (ratio of expected to observed cases) and discrimination. Discrimination analyses will calculate the area under the receiver-operating curve (AUC) and the sensitivity and specificity of the biomarker model at different thresholds. We will also compare its performance directly to existing definitions of screening eligibility including USPSTF criteria and the PLCOm2012 risk model [14], where our large sample size will ensure we can detect any AUC differences of clinically meaningful magnitude. A sensitivity analysis will exclude late-stage cases with blood draw close to diagnosis from the dataset.

Nodule Malignancy project

The goal of the Nodule Malignancy project is to identify biomarkers that can differentiate benign versus malignant pulmonary nodules, and the study design is based on the following considerations. First, to focus on the actionable time window while maximizing sample size, we included cases diagnosed up to 5 years following blood draw. For lung cancers diagnosed at the baseline screen, the sample collected at baseline was included. This differs from post-diagnostic samples because all individuals participating in LDCT screening are without cancer diagnosis and mostly asymptomatic at baseline. Second, to maximize statistical power and ensure robust discovery results, we included 4 of the LDCT screening studies in the expanded targeted discovery phase (Fig. 1). Third, the main comparison group is comprised of individuals with benign nodules who did not develop lung cancer, frequency matched on age at enrollment, age at the abnormal finding, age at blood collection, sex, and follow-up time. When multiple study participants with nodules were available as the matched benign nodule-control, we chose participants with higher estimated probability of nodule malignancy based on the Brock/PanCan model to increase power for nodules with higher malignancy potential [19]. To examine levels of proteins among nodule-free individuals in the screening-eligible population, we also included one control with no nodule findings per case, frequency matched on age at enrollment, age of blood collection, sex, and follow-up time.

Targeted discovery phase

The Nodule Malignancy project used a broad targeted discovery phase. We measured all available panels except the Cell Regulation panel, which did not show any robust associations with lung cancer in the Risk Biomarker project full discovery phase (Table 2). We included samples from the Pan-Canadian Early Detection of Lung Cancer Study (PanCan), UK Lung Cancer Pilot Screening Trial (UKLS), International Early Lung Cancer Action Program (IELCAP)-Toronto, and Pamplona-IELCAP (Table 1; further details in Supplementary Table 1). All samples within each LDCT study were randomly plated regardless of their cancer or nodule status to avoid batch effects by case status.

Statistical analyses applied multivariable logistic regression for each protein, adjusting for the Brock/PanCan nodule malignancy score which includes age, sex, family history of lung cancer, emphysema, and nodule size, type, location, count, and spiculation (when available) [19]. To select protein markers for the INTEGRAL panel, we are using elastic net penalized regression and a random-forest-based feature selection approach to identify the combination of markers that best predicts nodule malignancy [31], [32]. We will also conduct analyses stratified by time to diagnosis. We will prioritize markers based on selection by either elastic net or random forest and consistency of results across studies.

Validation phase

To evaluate the results obtained from the targeted discovery based on relative abundance, we will measure the INTEGRAL panel with absolute quantification in the same set of samples (PanCan, UKLS, IELCAP-Toronto, Pamplona-IELCAP), plus 1 independent study, the Pittsburgh Lung Screening Study (PLuSS). The model will be trained on the 4 original studies, and then evaluated in the PLuSS study. This enables evaluation of the data using absolute quantification of the protein markers (using the same set of studies), as well as external validation of the predictive accuracy (using the independent study).

Harmonized databases created within the framework of the INTEGRAL risk biomarker and nodule malignancy projects

Risk Biomarker project

One challenge for implementing risk-model-based eligibility for lung cancer screening is the unclear generalizability of risk prediction models in diverse worldwide populations [13,14,33]. We therefore leveraged the infrastructure from the Risk Biomarker project and the Lung Cancer Cohort Consortium to develop a comprehensive study database for lung cancer incidence and mortality. Our vision is that this database will serve as a key resource for future research on lung cancer. For example, additional epidemiologic studies and development and validation of risk prediction tools will likely be needed to support health authorities in making decisions about lung cancer screening implementation over time in different geographical regions, particularly as the tobacco epidemic evolves.

The cohorts contributing data on all participants to the LC3 harmonized database include most cohorts in the Risk Biomarker project and some additional cohorts. In total, 24 cohorts have contributed data on nearly 3 million participants (Table 3, descriptions in Supplement). The years of enrollment range from 1976 to 2010 and geographical regions include North America, Europe, Asia, and Australia. More than 69,000 lung cancer cases have been diagnosed during follow-up, including over 7600 cases among people who never smoked.

Table 3.

Description of the harmonized Lung Cancer Cohort Consortium database.

Cohort Location Years of enrollment Participants, N Median follow-up (years)* Female participants, % Age at enrollment, median (min-max) - - - - - - - Lung cancer cases, N (%) - - - - - - -
Total⁎⁎ Never smoking Former smoking Current smoking
AARP USA 1995–1996 565,645 15.5 40% 62 (50–71) 28,652 2124 (8) 15,272 (55) 10,189 (37)
ATBC Finland 1985–1988 29,133 17.7 0% 57 (49–70) 3959 - - 3959 (100)
CLUE USA 1989 30,461 29.1 57% 48 (18–101) 762 69 (9) 271 (36) 422 (55)
CPS-II USA 1992–1993 144,670 13.8 55% 70 (47–90) 3745 446 (12) 2519 (67) 778 (21)
CSDLH Canada 1992–1998 11,189 12.3 49% 62 (23–100) 367 65 (18) 203 (56) 93 (26)
EPIC Europe 1992–2000 518,112 14.9 71% 51 (19–98) 5233 610 (12) 1468 (28) 3155 (60)
GCS Iran 2004–2008 50,032 13.0 58% 52 (36–78) 118 53 (45) 4 (3) 61 (52)
GS UK 2003–2009 106,761 9.6 100% 47 (18–102) 217 57 (29) 87 (44) 52 (27)
HPFS USA 1986 50,444 25.2 0% 55 (32–81) 1295 164 (13) 635 (51) 444 (36)
HUNT Norway 1995–1997 78,941 16.9 53% 48 (19–101) 719 34 (5) 167 (24) 504 (71)
MCCS Australia 1990–1994 41,473 23.1 59% 55 (28–76) 855 139 (16) 377 (44) 338 (40)
NHS USA 1976 120,617 39.9 100% 43 (29–56) 3986 383 (10) 489 (12) 3103 (78)
NYUWHS USA 1985–1991 14,266 30.0 100% 50 (31–70) 484 77 (18) 166 (38) 194 (44)
PHS USA 1982 26,338 11.7 0% 65 (50–99) 228 49 (21) 127 (56) 52 (23)
PLCO USA 1993–2001 154,884 11.9 50% 63 (49–78) 3827 311 (8) 1821 (50) 1551 (42)
SCCS USA 2002–2009 84,429 11.2 60% 52 (40–79) 1846 109 (6) 369 (21) 1316 (73)
SCHS Singapore 1999–2003 50,962 13.5 57% 63 (46–86) 1300 393 (30) 267 (21) 640 (49)
SCS China 1986–1989 18,069 25.3 0% 56 (31–79) 1098 167 (15) 69 (6) 862 (79)
SMHS China 2002–2006 61,469 12.2 0% 55 (40–75) 1164 173 (15) 178 (15) 813 (70)
SWHS China 1996–2000 79,940 18.1 100% 50 (40–70) 975 898 (92) 12 (1) 65 (7)
UKBB UK 2006–2010 502,105 12.1 54% 57 (37–73) 4094 728 (18) 1764 (44) 1550 (38)
VITAL USA 2000–2002 77,118 10.0 52% 62 (50–77) 1374 110 (8) 782 (58) 450 (34)
WHI USA 1993–1998 118,749 18.2 100% 64 (49–83) 2389 415 (18) 1371 (58) 574 (24)
WHS USA 1992–1995 39,852 24.1 100% 55 (39–90) 588 91 (15) 200 (34) 297 (51)
Total 2,970,659 69,275 7665 (11) 28,618 (42) 31,462 (47)

Follow-up time for lung cancer incidence. Mortality follow-up time may differ.

⁎⁎

Cases with missing smoking status are included in the total, but not the stratified counts, so in some cases the stratified counts may not sum to the total.

Details on the eligibility criteria, data collection, and outcome ascertainment for each cohort are described in the Supplement. Time varying variables such as age were assessed as of the time of blood draw, or if blood was not collected, as of enrollment. Participants with a history of lung cancer prior to enrollment were excluded. For CSLDH, the dataset provided is a case-cohort sample (see Supplement). For SCHS, the initial enrollment took place during 1993–1998, but the 1999–2003 follow-up visit was used as the baseline for the LC3 dataset (further information in Supplement). For WHI, the data include the observational study and the control arms of the Clinical Trials.

AARP: NIH-AARP Diet and Health Study; ATBC: Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study; CLUE: Campaign Against Cancer and Heart Disease II; CPS-II: American Cancer Society Cancer Prevention Study-II Nutrition Cohort; CSDLH: Canadian Study of Diet, Lifestyle and Health; EPIC: European Prospective Investigation into Cancer and Nutrition; GCS: Golestan Cohort Study; GS: Generations Study; HPFS: Health Professionals Follow-up Study; HUNT2 & HUNT3: Trøndelag Health Study; MCCS: Melbourne Collaborative Cohort Study; NHS: Nurses’ Health Study I and II; NYUWHS: New York University Women's Health Study; PHS: Physician's Health Study; PLCO: Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial; SCCS: Southern Community Cohort Study; SCHS: Singapore Chinese Health Study; SCS: Shanghai Cohort Study; SMHS: Shanghai Men's Health Study; UKBB: UK Biobank; VITAL: VITamins And Lifestyle Study; WHI: Women's Health Initiative; WHS: Women's Health Study.

Details on the eligibility criteria, data collection, and outcome ascertainment for each cohort are provided in the Supplement and the list of variables in Table 4. The variables were chosen to maximize our ability to calculate risk estimates for existing lung cancer prediction models [34,35]. A summary of methods for harmonization and imputation is provided in the Supplement. An initial analysis in the harmonized dataset compared the performance of lung cancer risk models in the United Kingdom [36].

Table 4.

Variables included in the harmonized databases for the Lung Cancer Cohort Consortium (Risk Biomarker project) and LDCT screening studies (Nodule Malignancy project).

Variables included in the harmonized Lung Cancer Cohort Consortium database (Risk Biomarker project)
Demographic information Follow-up and outcomes Smoking Exposures other than smoking Personal health history
  • Age

  • Sex

  • Education

  • Race/ethnicity

  • Year of enrollment or blood draw

  • State or region of residence (for USA cohorts)

  • Follow-up time for lung cancer and death

  • Lung cancer diagnosis with TNM stage and histology

  • Vital status and cause of death, including lung cancer death

  • Smoking status

  • Years smoked

  • Age at smoking initiation

  • Age at smoking cessation

  • Years since cessation

  • Pack-years smoked

  • Smoking intensity (cigarettes per day)

  • Type of tobacco product

  • Time to first cigarette

  • Secondhand smoke exposure

  • Asbestos exposure

  • Indoor air pollution (e.g., cookstoves)

  • Body mass index

  • Family history of lung cancer

  • Personal history of cancer

  • COPD or emphysema

  • Asthma

  • Tuberculosis

  • Daily cough

  • Liver or kidney condition

  • Diabetes

  • Chronic bronchitis

  • Hypertension

  • Stroke

  • Heart attack or heart disease


Variables included in the harmonized LDCT screening study database (Nodule Malignancy project)
Demographic information Follow-up and outcomes Smoking Nodule characteristics Personal health history

  • Age

  • Sex

  • Education

  • Race/ethnicity

  • Country

  • Follow-up time for lung cancer and death

  • Lung cancer diagnosis with TNM stage and histology

  • Vital status and cause of death, including lung cancer death

  • Smoking status

  • Duration of smoking

  • Age at smoking initiation

  • Age at smoking cessation

  • Years since quitting

  • Pack-years smoked

  • Smoking intensity (cigarettes per day)

  • Screening round

  • Date of screening

  • Nodule location

  • Nodule size

  • Attenuation

  • Nodule count

  • Semantic features (spiculation, margin, calcification)

  • Malignant status

  • Body mass index

  • Family history of lung cancer

  • Personal history of cancer

  • COPD

  • Spirometry measures

  • Asthma

  • Chronic bronchitis

Many variables are not available in all cohorts. Cohorts participating in the Risk Biomarker project (see Table 1) also provided information on biospecimens including the year of blood draw, storage temperature, number of freeze-thaw cycles, preprocessing time, and details regarding case/control status or subcohort membership.

We have defined a priority to facilitate sharing of the LC3 harmonized database. We are currently establishing a legal and technical infrastructure that will allow investigators outside of the LC3 consortium to request permission to remotely access and analyze the data in a secure computing environment. Available data will include the variables listed in Table 4, the metabolomics biomarkers measured in the first project of the LC3 [37], and eventually the proteomics biomarkers.

Nodule Malignancy project

For the Nodule Malignancy project, data from 6 LDCT screening studies were harmonized within the framework of ILCCO. In addition to the 5 LDCT screening studies described above, the National Lung Screening Trial (NLST) is also participating in the Nodule Malignancy project for quantitative imaging analysis. The design of each CT screening program including eligibility and recruitment framework is described in the Supplement.

For quality control, data were systematically checked for missing values, outliers, inadmissible values, aberrant distributions, and internal inconsistencies. All procedures were recorded and a central data dictionary was maintained throughout the process. A total of 2088 cases and 42,940 screened individuals from the six LDCT screening studies are included in the harmonized database of screening studies (Supplementary Table 2). The variables that are compatible across the screening studies are shown in Table 4.

Perspectives

With the advent of LDCT screening, the potential to substantially reduce lung cancer mortality has vastly expanded, and so has the domain of potential research questions. The current work of the INTEGRAL program aims to address two specific ways in which biomarkers might contribute; namely, to improve the selection of individuals for screening, and to better distinguish between malignant and benign nodules on LDCT images. At the completion of our current work, we anticipate that we will have developed a fit-for-purpose biomarker panel that can be applied in both settings. For pre-screening risk assessment, we will deliver an integrated risk prediction model including the biomarkers on the panel and results of a comprehensive independent validation study of its performance. For nodule discrimination, we will establish an integrated nodule probability model including quantitative radiological features and biomarkers.

If these steps are successful, important work will remain to implement the INTEGRAL panel in clinical practice. While use of biomarkers in lung cancer screening may have advantages, such as more accurate identification of future cases, there are also potential disadvantages such as the need for a blood draw, delay in obtaining biomarker test results, and financial costs. Specific considerations related to biomarker implementation have been outlined [38]. We plan to assess whether repeated measurements of the panel could improve our ability to predict lung cancer risk. Implementation studies will be needed to determine the feasibility and acceptability of this approach in practice. The design of future evaluations will require careful consideration, as we consider it infeasible to evaluate the incremental improvement in performance offered by biomarkers in the setting of a randomized trial. Finally, another future goal might be to identify predictors of lung cancer among people who never smoked.

It is important to note that many other tools exist or are being developed to refine risk estimation for lung cancer, including both biomarkers and risk prediction models. Another important future direction will be to directly compare the performance of these tools or, where feasible and cost-effective, to integrate them. Comparisons should be made in the same set of samples so that discrimination metrics can be directly compared.

The INTEGRAL biomarker program represents an ambitious initiative to develop a flexible biomarker tool to improve early lung cancer detection via optimized LDCT screening. With a focus on protein biomarkers, the program spans discovery, panel development, model training and validation – all whilst remaining in an observational framework. The forthcoming results from the validation phase of INTEGRAL will provide a definitive benchmark on the potential for circulating protein biomarkers to improve early detection of lung cancer – and most importantly – whether it is justified to introduce them in a screening scenario to inform who should be screened and how to manage nodules.

Acknowledgments

Acknowledgments

This study was supported by the US NCI (INTEGRAL program U19 CA203654 and R03 CA245979), the Lung Cancer Research Foundation, l'Institut National Du Cancer (2019–1-TABAC-01, INCa, France), the Cancer Research Foundation of Northern Sweden (AMP19–962), and an early detection of cancer development grant from Swedish Department of Health ministry. RJH is supported by the Canada Research Chair of the Canadian Institute of Health Research. LMM was supported by FIMA, Fundación ARECES, ISCIII-Fondo de Investigación Sanitaria-Fondo Europeo de Desarrollo Regional (PI19/00098) and a grant from The Lung Ambition Alliance. MCA is supported by NCI R01 CA251758. The ATBC Study is supported by the Intramural Research Program of the U.S. National Cancer Institute, National Institutes of Health, Department of Health and Human Services. The Southern Community Cohort Study was supported by NCI U01CA202979. The Physicians’ Health Study (PHS) is supported by research grants CA097193, CA34944, CA40360, HL26490, and HL34595 from the NIH. The Women's Health Study (WHS) is supported by research grants EY06633, EY18820, CA047988, HL043851, HL080467, HL099355, and CA182913 from the NIH.The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts 75N92021D00001, 75N92021D00002, 75N92021D00003, 75N92021D00004, 75N92021D00005. CLUE II funding was from the National Cancer Institute (U01 CA86308, Early Detection Research Network; P30 CA006973), National Institute on Aging (U01 AG18033), and the American Institute for Cancer Research. Maryland Cancer Registry (MCR) Cancer data was provided by the Maryland Cancer Registry, Center for Cancer Prevention and Control, Maryland Department of Health, with funding from the State of Maryland and the Maryland Cigarette Restitution Fund. The collection and availability of cancer registry data is also supported by the Cooperative Agreement NU58DP006333, funded by the Centers for Disease Control and Prevention. Acknowledgements for the NIH-AARP study are available at: https://dietandhealth.cancer.gov/acknowledgement.html. PLuSS was supported by NCI P50 CA090440 and NCI P30 CA047904.

Data sharing statement

Researchers who are interested in analyzing the Lung Cancer Cohort Consortium (LC3) dataset are encouraged to contact Dr Robbins or Dr Johansson. The LC3 Access Policy is available at the following link: https://www.iarc.who.int/wp-content/uploads/2021/12/LC3_Access_Policy.pdf.

Disclaimer

Where authors are identified as personnel of the International Agency for Research on Cancer / World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy, or views of the International Agency for Research on Cancer / World Health Organization. The contents of this manuscript are solely the responsibility of the authors and do not necessarily represent the official views of the Centers for Disease Control and Prevention or the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. government.

Footnotes

Dr Montuenga reports the following potential conflicts of interest: Astra-Zeneca (speaker's bureau and research grant), Bristol Myers Squibb (research grant), AMADIX: (licensed patent co-holder on complement fragments for lung cancer early detection). All other authors report no conflicts of interest.

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.annepidem.2022.10.014.

Contributor Information

Hilary A. Robbins, Email: RobbinsH@iarc.fr.

Mattias Johansson, Email: JohanssonM@iarc.fr.

Rayjean J. Hung, Email: Rayjean.hung@lunenfeld.ca.

Appendix. Supplementary materials

mmc1.xlsx (91.6KB, xlsx)
mmc2.docx (115.8KB, docx)

References

  • 1.National Lung Screening Trial Research Team. Aberle DR, Adams AM, Berg CD, Black WC, Clapp JD, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. 2011;365(5):395–409. doi: 10.1056/NEJMoa1102873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.de Koning HJ, van der Aalst CM, de Jong PA, Scholten ET, Nackaerts K, Heuvelmans MA, et al. Reduced lung-cancer mortality with volume CT screening in a randomized trial. N Engl J Med. 2020 doi: 10.1056/NEJMoa1911793. Published online January 29. [DOI] [PubMed] [Google Scholar]
  • 3.Oudkerk M, Liu S, Heuvelmans MA, Walter JE, Field JK. Lung cancer LDCT screening and mortality reduction - evidence, pitfalls and future perspectives. Nat Rev Clin Oncol. 2021;18(3):135–151. doi: 10.1038/s41571-020-00432-6. [DOI] [PubMed] [Google Scholar]
  • 4.Fanidi A, Muller DC, Yuan JM, Stevens VL, Weinstein SJ, Albanes D, et al. Circulating folate, vitamin B6, and methionine in relation to lung cancer risk in the Lung Cancer Cohort Consortium (LC3) J Natl Cancer Inst. 2018;110(1) doi: 10.1093/jnci/djx119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Muller DC, Hodge AM, Fanidi A, Albanes D, Mai XM, Shu XO, et al. No association between circulating concentrations of vitamin D and risk of lung cancer: an analysis in 20 prospective studies in the Lung Cancer Cohort Consortium (LC3) Annals of Oncology. 2018;29(6):1468–1475. doi: 10.1093/annonc/mdy104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Fanidi A, Carreras-Torres R, Larose TL, Yuan JM, Stevens VL, Weinstein SJ, et al. Is high vitamin B12 status a cause of lung cancer? Int J Cancer. 2019;145(6):1499–1503. doi: 10.1002/ijc.32033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Huang JY, Larose TL, Luu HN, Wang R, Fanidi A, Alcala K, et al. Circulating markers of cellular immune activation in prediagnostic blood sample and lung cancer risk in the Lung Cancer Cohort Consortium (LC3) Int J Cancer. 2020;146(9):2394–2405. doi: 10.1002/ijc.32555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Muller DC, Larose TL, Hodge A, Guida F, Langhammer A, Grankvist K, et al. Circulating high sensitivity C reactive protein concentrations and risk of lung cancer: nested case-control study within Lung Cancer Cohort Consortium. BMJ. January 3, 2019:k4981. doi: 10.1136/bmj.k4981. Published online. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.US National Cancer Institute . 2022. NCI Cohort Consortium.https://epi.grants.cancer.gov/cohort-consortium/ Published 2022. Accessed February 21. [Google Scholar]
  • 10.US Preventive Services Task Force Screening for Lung Cancer: US Preventive Services Task Force Recommendation Statement. JAMA. 2021;325(10):962–970. doi: 10.1001/jama.2021.1117. [DOI] [PubMed] [Google Scholar]
  • 11.Landy R, Young CD, Skarzynski M, Cheung LC, Berg CD, Rivera MP, et al. Using prediction models to reduce persistent racial/ethnic disparities in draft 2020 USPSTF lung cancer screening guidelines. J Natl Cancer Inst. January 2021 doi: 10.1093/jnci/djaa211. Published online. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kovalchik SA, Tammemagi M, Berg CD, Caporaso NE, Riley TL, Korch M, et al. Targeting of low-dose CT screening according to the risk of lung-cancer death. N Engl J Med. 2013;369(3):245–254. doi: 10.1056/NEJMoa1301851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Katki HA, Kovalchik SA, Berg CD, Cheung LC, Chaturvedi AK. Development and validation of risk models to select ever-smokers for CT lung cancer screening. JAMA. 2016;315(21):2300–2311. doi: 10.1001/jama.2016.6255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tammemägi MC, Katki HA, Hocking WG, Church TR, Caporaso N, Kvale PA, et al. Selection criteria for lung-cancer screening. N Engl J Med. 2013;368(8):728–736. doi: 10.1056/NEJMoa1211776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Tammemägi MC, Ruparel M, Tremblay A, Myers R, Mayo J, Yee J, et al. USPSTF2013 versus PLCOm2012 lung cancer screening eligibility criteria (International Lung Screening Trial): interim analysis of a prospective cohort study. Lancet Oncol. 2021;13 doi: 10.1016/S1470-2045(21)00590-8. Published online December. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.National Comprehensive Cancer Network . 2021. NCCN Clinical Practice Guidelines in Oncology: Lung Cancer Screening version 1.2022.https://www.nccn.org/professionals/physician_gls/pdf/lung_screening.pdf PublishedAccessed December 20, 2021. [Google Scholar]
  • 17.Seijo LM, Peled N, Ajona D, Boeri M, Field JK, Sozzi G, et al. Biomarkers in lung cancer screening: Achievements, promises, and challenges. Journal of Thoracic Oncology. 2019;14(3):343–357. doi: 10.1016/J.JTHO.2018.11.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Baldwin D, Callister M, Crosbie PA, O'Dowd E, Rintoul R, Robbins HA, et al. Biomarkers in lung cancer screening: the importance of study design. European Respiratory Journal. 2021;57(1) doi: 10.1183/13993003.04367-2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.McWilliams A, Tammemagi MC, Mayo JR, Roberts H, Liu G, Soghrati K, et al. Probability of cancer in pulmonary nodules detected on first screening CT. New England Journal of Medicine. 2013;369(10):910–919. doi: 10.1056/NEJMoa1214726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Al-Ameri A, Malhotra P, Thygesen H, Plant PK, Vaidyanathan S, Karthik S, et al. Risk of malignancy in pulmonary nodules: A validation study of four prediction models. Lung Cancer. 2015;89(1):27-30. doi:10.1016/j.lungcan.2015.03.018 [DOI] [PubMed]
  • 21.Horeweg N, van Rosmalen J, Heuvelmans MA, van der Aalst CM, Vliegenthart R, Scholten ET, et al. Lung cancer probability in patients with CT-detected pulmonary nodules: a prespecified analysis of data from the NELSON trial of low-dose CT screening. Lancet Oncol. 2014;15(12):1332–1341. doi: 10.1016/S1470-2045(14)70389-4. [DOI] [PubMed] [Google Scholar]
  • 22.Feng Z, Pepe MS. Adding rigor to biomarker evaluations—EDRN experience. Cancer Epidemiology Biomarkers & Prevention. 2020;29(12) doi: 10.1158/1055-9965.EPI-20-0240. 2575 LP - 2582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Pepe MS, Feng Z, Janes H, Bossuyt PM, Potter JD. Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: standards for study design. J Natl Cancer Inst. 2008;100(20):1432–1438. doi: 10.1093/jnci/djn326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Integrative Analysis of Lung Cancer Etiology and Risk (INTEGRAL) Consortium for Early Detection of Lung Cancer. Assessment of lung cancer risk on the basis of a biomarker panel of circulating proteins. JAMA Oncol. July 12, 2018 doi: 10.1001/jamaoncol.2018.2078. Published online. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Larose TL, Meheus F, Brennan P, Johansson M, Robbins HA. Assessment of biomarker testing for lung cancer screening eligibility. JAMA Netw Open. 2020;3(3) doi: 10.1001/jamanetworkopen.2020.0409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Silvestri GA, Tanner NT, Kearney P, Vachani A, Massion PP, Porter A, et al. Assessment of plasma proteomics biomarker's ability to distinguish benign from malignant lung nodules: Results of the PANOPTIC (Pulmonary Nodule Plasma Proteomic Classifier) Trial. Chest. 2018;154(3):491–500. doi: 10.1016/J.CHEST.2018.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ostrin EJ, Bantis LE, Wilson DO, Patel N, Wang R, Kundnani D, et al. Contribution of a blood-based protein biomarker panel to the classification of indeterminate pulmonary nodules. Journal of Thoracic Oncology. 2021;16(2):228–236. doi: 10.1016/j.jtho.2020.09.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Proteomics Olink. 2020. Measuring protein biomarkers with Olink - technical comparisons and orthogonal validation.https://www.olink.com/content/uploads/2021/09/olink-technical-comparisons-and-orthogonal-validation-1118-v2.0.pdf Published online Accessed October 17, 2022. [Google Scholar]
  • 29.Proteomics Olink. 2017. Development and validation of customized PEA biomarkers with clinical utility.https://www.olink.com/content/uploads/2021/09/olink-development-and-validation-of-customized-pea-biomarker-panels-1083-v2.0.pdf PublishedAccessed October 17, 2022. [Google Scholar]
  • 30.Royston P, Parmar MKB. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002;21(15):2175–2197. doi: 10.1002/sim.1203. [DOI] [PubMed] [Google Scholar]
  • 31.Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol. 2005;67(2):301–320. doi: 10.1111/j.1467-9868.2005.00503.x. [DOI] [Google Scholar]
  • 32.Degenhardt F, Seifert S, Szymczak S. Evaluation of variable selection methods for random forests and omics data sets. Brief Bioinform. 2019;20(2):492–503. doi: 10.1093/bib/bbx124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bach PB, Kattan MW, Thornquist MD, Kris MG, Tate RC, Barnett MJ, et al. Variations in lung cancer risk among smokers. J Natl Cancer Inst. 2003;95(6):470–478. doi: 10.1093/jnci/95.6.470. http://www.ncbi.nlm.nih.gov/pubmed/12644540 [DOI] [PubMed] [Google Scholar]
  • 34.Katki HA, Petito LC, Cheung LC, Jacobs E, Jemal A, Berg CD, et al. Implications of 9 risk prediction models for selecting ever-smokers for CT lung-cancer screening. Ann Intern Med. 2018;169(1):10–19. doi: 10.7326/M17-2701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Cheung LC, Berg CD, Castle PE, Katki HA, Chaturvedi AK. Life-gained-based versus risk-based selection of smokers for lung cancer screening. Ann Intern Med. 2019;171(9):623–632. doi: 10.7326/M19-1263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Robbins HA, Alcala K, Swerdlow AJ, Schoemaker MJ, Wareham N, Travis RC, et al. Comparative performance of lung cancer risk models to define lung screening eligibility in the United Kingdom. Br J Cancer. 2021;124(12):2026–2034. doi: 10.1038/s41416-021-01278-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Zahed H, Johansson M, Ueland PM, Midttun Ø, Milne RL, Giles GG, et al. Epidemiology of 40 blood biomarkers of one-carbon metabolism, vitamin status, inflammation, and renal and endothelial function among cancer-free older adults. Sci Rep. 2021;11(1):13805. doi: 10.1038/s41598-021-93214-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Hung RJ. Biomarker-Based Lung Cancer Screening Eligibility: Implementation Considerations. Cancer Epidemiology, Biomarkers & Prevention. 2022;31(4):698–701. doi: 10.1158/1055-9965.EPI-22-0099. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.xlsx (91.6KB, xlsx)
mmc2.docx (115.8KB, docx)

RESOURCES