Prediction of lung cancer incidence on the low-dose computed tomography arm of the National Lung Screening Trial: A dynamic Bayesian network

Panayiotis Petousis; Simon X Han; Denise Aberle; Alex AT Bui

doi:10.1016/j.artmed.2016.07.001

. Author manuscript; available in PMC: 2017 Sep 1.

Published in final edited form as: Artif Intell Med. 2016 Jul 27;72:42–55. doi: 10.1016/j.artmed.2016.07.001

Prediction of lung cancer incidence on the low-dose computed tomography arm of the National Lung Screening Trial: A dynamic Bayesian network

Panayiotis Petousis ^1,², Simon X Han ^1,², Denise Aberle ^1,², Alex AT Bui ^1,²

PMCID: PMC5082434 NIHMSID: NIHMS812728 PMID: 27664507

Abstract

Introduction

Identifying high-risk lung cancer individuals at an early disease stage is the most effective way of improving survival. The landmark National Lung Screening Trial (NLST) demonstrated the utility of low-dose computed tomography (LDCT) imaging to reduce mortality (relative to x-ray screening). As a result of the NLST and other studies, imaging-based lung cancer screening programs are now being implemented. However, LDCT interpretation results in a high number of false positives. A set of dynamic Bayesian networks (DBN) were designed and evaluated to provide insight into how longitudinal data can be used to help inform lung cancer screening decisions.

Methods

The LDCT arm of the NLST dataset was used to build and explore five DBNs for high-risk individuals. Three of these DBNs were built using a backward construction process, and two using structure learning methods. All models employ demographic, smoking status, cancer history, family lung cancer history, exposure risk factors, comorbidities related to lung cancer, and LDCT screening outcome information. Given the uncertainty arising from lung cancer screening, a cancer state-space model based on lung cancer staging was utilized to characterize the cancer status of an individual over time. The models were evaluated on balanced training and test sets of cancer and non-cancer cases to deal with data imbalance and overfitting.

Results

Results were comparable to expert decisions. The average area under the curve (AUC) of the receiver operating characteristic (ROC) for the three intervention points of the NLST trial was higher than 0.75 for all models. Evaluation of the models on the complete LDCT arm of the NLST dataset (N = 25, 486) demonstrated satisfactory generalization. Consensus of predictions over similar cases is reported in concordance statistics between the models’ and the physicians’ predictions. The models’ predictive ability with respect to missing data was also evaluated with the sample of cases that missed the second screening exam of the trial (N = 417). The DBNs outperformed comparison models such as logistic regression and naïve Bayes.

Conclusion

The lung cancer screening DBNs demonstrated high discrimination and predictive power with the majority of cancer and non-cancer cases.

Keywords: Dynamic Bayesian networks, Structure learning, Expert-driven networks, Lung stage cancer state-space, Individualized lung cancer screening, Cancer incidence, Annual NLST cancer risk

1. Introduction

Lung cancer is the leading cause of cancer death worldwide. In the United States, it is estimated to be responsible for over 150,000 annual deaths [1, 2], comprising 27% of all cancer deaths [3, 4]. A number of factors have been associated with the high incidence and mortality of lung cancer, the most important being cigarette smoking [5]; and late-stage/advanced diagnoses [6], wherein treatment is non-curative. Patients with lung cancer have a higher probability of metastases and a relatively low 5-year survival rate of 18% [7]. Markedly, when diagnosed early, the 5-year survival rate increases to 54%. However, only 15% of all lung cancer cases are detected at an early stage [7]. Considering the high mortality associated with late-stage lung cancer diagnosis, it is crucial that patients who are at high risk of lung cancer be identified and monitored so that early treatment can be initiated if needed.

Screening has the potential to detect the formation of problematic pulmonary nodules at an early stage; and when detected earlier, more choices for treatment are available, along with improved chances of survival. Evidence regarding the benefits of lung screening comes from the landmark National Lung Screening Trial (NLST), which demonstrated a 20% mortality reduction in lung cancer in individuals who underwent screening using low-dose computed tomography (LDCT) relative to plain chest radiography [8]. Given this evidence, the American Patient Protection and Affordable Care Act (ACA) has mandated that CT screening be covered by private insurers; the Centers for Medicare and Medicaid Services (CMS) has also approved reimbursement of CT screening in Medicare-eligible patients up to the age of 77. Unfortunately, LDCT also detects many benign nodules and non-cancer related pathologies (e.g., inflammation, emphysema, other lesions), resulting in many false positives and the need for further diagnostic evaluation to confirm findings. In fact, the false positive rate of screening strategies used by the NLST was determined to be over 23% for individuals that underwent additional diagnostic imaging [8]. Confirmed cancer cases in the NLST CT positive arm were determined to be 3.6% of all cases and any lung cancer detected had a probability of 18.5% to be an over-diagnosis [9]. This suggests that while an acceptable false negative rate is achieved, the majority of healthy patients in a population get over-screened and/or over-diagnosed. Unnecessary diagnostic procedures, such as biopsies and thoracotomies, place healthy patients at a higher risk of complications and incur an unnecessary psychological burden [10]. A framework that optimizes early detection while reducing false negative rates would be ideal, and can then be used to support more individually-tailored screening recommendations.

This work aims to provide insights into how recommendations can be individualized over time in the context of lung cancer screening. We explore the issues surrounding the development and evaluation of a dynamic Bayesian network (DBN), built from the NLST dataset, to predict the development of lung cancer in high-risk patients. We compare DBNs built using the “backward construction” method and “learned” DBNs¹. We also compare and contrast the DBNs’ performance versus experts and other predictive models for lung cancer. Relative to existing predictive models, our methodology has several advantages. First, it can make sensible predictions even with missing data, a common occurrence in real-world settings (e.g., a missed screening exam). Second, it is built on top of a lung cancer state-space defined on lung cancer staging. This state space unites lung cancer risk factors and diagnostic procedures in a meaningful network structure, while also enabling the flow of probabilistic influence between these variables. Third, contrary to existing predictive methods for lung cancer screening, our methodology and in particular DBNs can explain and show the contributing factors for its predictions (i.e., factors investigated in lung cancer screening). We present the results of our evaluations and discuss the advantages and limitations of our work, providing some future directions for further improvement.

2. Background

In recent years, many risk models have been published to predict the development of different cancers [11]. In lung cancer, Bach et al. [12] developed an analog of the well-known Gail model used to calculate the risk for developing breast cancer [13, 14]. The model predicts the 10-year probability of an individual being diagnosed with lung cancer. This 10-year risk was obtained through the use of two one-year risk models, a lung cancer diagnosis model and a competing model of dying without lung cancer. Subsequently, the one-year models were run recursively over 10 epochs (i.e., years) to obtain cumulative probabilities over time [15]. Even though the model does not distinguish the risks of the various types of lung cancer, it can identify those subjects who are most likely to develop lung cancer [12]. The model’s validity was assessed by Conin et al. [16] using 6,239 smokers from the placebo arm of the Alpha-Tocopherol, Beta-Carotene Cancer Prevention (ATBC) Study. The risk and competing risk models both underestimated the observed lung cancer risk and the observed non-lung cancer mortality risk for individuals that smoked less than 20 cigarettes per day. Raji et al. [17] evaluated the predictive accuracy of the Liverpool Lung Project Risk Model. This single log-odds model, developed through the use of logistic regression, was developed from the Liverpool Lung Project Cohort (LLPC) [15] study. The model was evaluated in three independent external datasets, from Europe and North America, with good discrimination in all three datasets. The area under the curve (AUC) in these datasets varied from 0.67–0.82 [17]. Spitz et al. [18] developed a lung cancer risk prediction model using a multivariate regression analysis to develop log-odds models for never, current, and former smokers. The model’s concordance statistics (0.57, 0.63 and 0.58, respectively) and discriminatory ability (true positive rates in high-risk groups of current and former smokers were 0.69 and 0.70, respectively [18]) were satisfactory, but precision was modest [19]. Finally, more recently, Tammemagi et al. [20] developed lung cancer models that demonstrated high discrimination and calibration using the Prostate, Lung, Colorectal and Ovarian Cancer (PLCO) Screening Trial. In contrast with most lung cancer prediction studies, this study’s models incorporated a wider range of risk factors that were incrementally evaluated using AUC as a comparison metric. The models were evaluated for the prediction of lung cancer on the entire PLCO dataset and a subset of ever-smokers, with both models achieving an AUC of 0.857 and 0.841, for the PLCO dataset, and 0.805 and 0.784, for the ever-smokers subset, respectively.

The models we describe for lung cancer screening are based on a dynamic Bayesian network. DBNs, as well as Bayesian networks (BNs), are increasingly being used in clinical screening and treatment decision making. For example, DBNs and BNs have been used in the domain of nosocomial infections [21], pneumonia [22], cardiac surgery [23], gait analysis [24], osteoporosis [25], oral cancer [26], colon cancer [27], cervical cancer [28], and breast cancer [29, 30, 31, 32]. Notably, [33] proposed a Bayesian network for lung cancer built from both physical and biological data (biomarkers) for the prediction of local failure in non-small cell lung cancer (NSCLC) after radiotherapy. This integrated approach was tested on two different NSCLC datasets with the biological data contributing the most in the model’s performance. In this study, to handle the inherent temporal nature of screening observations over time, we propose a set of DBNs to obtain individualized predictions for patients at high-risk for lung cancer. In the following sections, we present the methodology as well as the theoretical formulations supporting our model.

3. Methods

We used the NLST dataset to create DBNs for the prediction of lung cancer incidence. The description of the dataset, overall methods, measured outcomes, and statistical evaluation methods used in this study are as follows.

3.1. The NLST dataset

The NLST is a randomized, multi-site trial that examined lung cancer-specific mortality among participants in an asymptomatic high-risk cohort. Subjects underwent screening with the use of low-dose CT or a chest x-ray. Over 53,000 participants each underwent three annual screenings from 2002–2007 (approximately 25,500 in the LDCT study arm), with follow-up post-screening through 2009. Lung cancers identified as pulmonary nodules were confirmed by diagnostic procedures (e.g., biopsy, cytology); participants with confirmed lung cancer were subsequently removed from the trial for treatment.

The NLST dataset provides a longitudinal perspective on high-risk lung cancer patients in terms of demographics, clinical history, and imaging data. We used subjects from the LDCT arm, across all three screening events and the post-screening period of the trial. Information used in our study includes: demographics (e.g., age, gender, body mass index); smoking history; family history of cancer; personal history of cancer; history of comorbidities related to lung cancer; occupational exposures (e.g., asbestos, coal, chemicals); and LDCT screening outcomes. Table 1 summarizes the number of cases determined to have cancer during any of the three imaging points of intervention (and the remaining number of non-cancer patients), as well as post-screening cancer patients (i.e., those individuals who went on to develop lung cancer after the third screening event).

Table 1.

NLST dataset, detailing the determined health state of a subject after each screening exam. Post-trial cancer cases represent the cancer cases that lung cancer was the cause of their death and were not identified as lung cancer cases through the NLST trial. The number of patients shown represent the patients for which we have information about the development of lung cancer. A cancer incidence occurring after the first screening and before the second screening was assumed to be a first screening cancer. A cancer incidence occurring after the second screening and before the third screening was assumed to be a second screening cancer. A cancer incidence occurring after the third screening and before the post screening period was assumed to be a third screening cancer. The above information was computed from the NLST dataset under our possession.

	First Screening	Second Screening	Third Screening	Post-screening	Post-trial	Total Cases
Remaining non-cancer subjects	25,530	25,217	24,842	24,477	24,461	-
Individuals with confirmed cancer	305	174	223	365	16	1,083
Deceased subjects	11	139	152	-	-	302
Total subjects	25,846	25,530	25,217	24,842	24,477	-

	DBN A Predictions		DBN B Predictions		Physicians’ Predictions		DBN A & Physicians’ Concurrence		DBN B & Physicians’ Concurrence		McNemar’s Test		95% C.I. (b – c)		95% C.I. (p₂ – p₁)
	DBN A Predictions		DBN B Predictions		Physicians’ Predictions		DBN A & Physicians’ Concurrence		DBN B & Physicians’ Concurrence		A	B	A	B	A	B
First Screening	53 (tp) 221 (fp)	2 (fn) 121 (tn)	51 (tp) 134 (fp)	4 (fn) 208 (tn)	49 (tp) 108 (fp)	6 (fn) 235 (tn)	70.4% (tp) 49.3% (fp)	35.2% (fn) 46.0% (tn)	71.4% (tp) 59.1% (fp)	48.7% (fn) 73.3% (tn)	x² = 91.03 p < 1.0 e⁻¹⁰	x² = 77.39 p < 1.0e⁻¹⁰	b = fp c = fn b – c = 94.5 (77.5,110.57)	b = fp c = fn b – c = 82.1 (64.87,98.69)	$\begin{array}{l} p_{1} = \frac{t p + f n}{N} \\ p_{1} = \frac{t p + f p}{N} \end{array}$ p₂–p₁= 0. 3778 (0.3099,0.4421)	$\begin{array}{l} p_{1} = \frac{t p + f n}{N} \\ p_{1} = \frac{t p + f p}{N} \end{array}$ p₂– p₁ = 0.2655 (0.2098,0.3192)
Second Screening	27 (tp) 50 (fp)	4 (fn) 244 (tn)	27 (tp) 50 (fp)	4 (fn) 244 (tn)	29 (tp) 61 (fp)	2 (fn) 233 (tn)	69.0% (tp) 39.3% (fp)	30.5% (fn) 78.9% (tn)	68.7% (tp) 39.3% (fp)	29.0% (fn) 79.0% (tn)	x² = 26.05 p = 3.28e⁻⁷	x² = 25.95 p = 3.5e⁻⁷	b = fp c = fn b – c = 28.6 (16.45,39.52)	b = fp c = fn b – c = 28.5 (16.42,39.52)	$\begin{array}{l} p_{1} = \frac{t p + f n}{N} \\ p_{1} = \frac{t p + f p}{N} \end{array}$ p₂–p₁ = 0. 1092 (0.0628,0.1509)	$\begin{array}{l} p_{1} = \frac{t p + f n}{N} \\ p_{1} = \frac{t p + f p}{N} \end{array}$ p₂– p₁ = 0.1088 (0.0627,0.1509)
Third Screening	35 (tp) 24 (fp)	7 (fn) 227 (tn)	35 (tp) 24 (fp)	7 (fn) 227 (tn)	37 (tp) 32 (fp)	4 (fn) 219 (tn)	71.0% (tp) 41.6% (fp)	46.0% (fn) 85.8% (tn)	71.0% (tp) 41.6% (fp)	46.0% (fn) 85.8% (tn)	x² = 8.45 p = 0. 0036	x² = 8.45 p = 0.0036	b = fp c = fn b – c = 12.3 (2.83,21.18)	b = fp c = fn b – c =12.7 (3.61,22.34)	$\begin{array}{l} p_{1} = \frac{t p + f n}{N} \\ p_{1} = \frac{t p + f p}{N} \end{array}$ p₂–p₁ = 0. 0492 (0.0113,0.0847)	$\begin{array}{l} p_{1} = \frac{t p + f n}{N} \\ p_{1} = \frac{t p + f p}{N} \end{array}$ p₂– p₁ = 0.0507 (0.0144,0.0892)

Model	A		B		C		D		E		F
	AUC	C.I.	AUC	C.I.	AUC	C.I.	AUC	C.I.	AUC	C.I.	AUC	C.I.
First Screening	0.778	0.757 – 0.800	0.798	0.776 – 0.821	0.789	0.774 – 0.804	0.790	0.769 – 0.810	0.751	0.654 – 0.849	0.799	0.777 – 0.821
Second Screening	0.857	0.834 – 0.880	0.858	0.832 – 0.884	0.844	0.819 – 0.869	0.862	0.839 – 0.886	0.853	0.832 – 0.875	0.865	0.844 – 0.885
Third Screening	0.887	0.869 – 0.905	0.887	0.866 – 0.907	0.884	0.863 – 0.906	0.877	0.858 – 0.896	0.878	0.859 – 0.897	0.886	0.866 – 0.907

	DBN A				DBN B
	DBN Predictions By 3^rd Screening		DBN Predictions After 3^rd Screening		DBN Predictions By 3^rd Screening		DBN Predictions After 3^rd Screening
Cases that missed the second screening	8 (tp) 91 (fp)	3 (fn) 315 (tn)	4 (tp) 88 (fp)	3 (fn) 311 (tn)	6 (tp) 71 (fp)	5 (fn) 335 (tn)	4 (tp) 67 (fp)	3 (fn) 332 (tn)

	First Screening		Second Screening		Third Screening
DBN A, whole dataset	92.6% (tp) 30.2% (fp)	7.40% (fn) 69.8% (tn)	87.3% (tp) 9.40% (fp)	12.7% (fn) 90.6% (tn)	83.9% (tp) 6.70% (fp)	16.1% (fn) 93.3% (tn)
DBN A, random test sets	96.4% (tp) 65.6% (fp)	3.60% (fn) 35.4% (tn)	87.1% (tp) 17.0% (fp)	12.9% (fn) 83.0% (tn)	83.3% (tp) 9.60% (fp)	16.7% (fn) 90.4% (tn)
DBN B, whole dataset	92.6% (tp) 30.2% (fp)	7.40% (fn) 69.8% (tn)	87.3% (tp) 9.40% (fp)	12.7% (fn) 90.6% (tn)	83.9% (tp) 6.9% (fp)	16.1% (fn) 93.1% (tn)
DBN B, random test sets	92.7% (tp) 39.2% (fp)	7.30% (fn) 60.8% (tn)	87.1% (tp) 17.0% (fp)	12.9% (fn) 83.0% (tn)	83.3% (tp) 9.60% (fp)	16.7% (fn) 90.4% (tn)
Physicians, whole dataset	89.7% (tp) 23.2% (fp)	10.3% (fn) 76.8% (tn)	93.7% (tp) 15.0% (fp)	6.30% (fn) 85.0% (tn)	90.1% (tp) 9.60% (fp)	9.90% (fn) 90.4% (tn)
Physicians, random test sets	89.1% (tp) 31.5% (fp)	10.9% (fn) 68.5% (tn)	93.6% (tp) 20.7% (fp)	6.40% (fn) 79.3% (tn)	90.2% (tp) 12.8% (fp)	9.80% (fn) 87.3% (tn)
Logistic Regression [41]	11.0% (tp) 0.50% (fp)	89.0% (fn) 99.5% (tn)	- -	- -	- -	- -
Logistic Regression, retrained	16.3% (tp) 0.8% (fp)	83.7% (fn) 99.2% (tn)	- -	- -	- -	- -

Variable Name	Description	Discretization

Age	Age of the individual	Under 60 years old; Between 60 and 70 years old; and More than 70 years old
Gender	Gender of the study subject	Male, female
Smoking status	The smoking status of the individual at the outset of the NLST.	Yes, no
Body mass index (BMI)	Height/weight ratio of the individual at the start of the NLST	Underweight, normal, overweight, obese
Cancer history	Specifies if the individual had a prior history of bladder, breast, cervical, colorectal, esophageal, larynx, lung, nasal, oral, pancreatic, pharynx, stomach, thyroid, or transitional cell cancer.	Yes, no
Disease history	Boolean variable representing the individual’s history of diagnosis of asthma (adult or childhood), COPD, emphysema, fibrosis of the lung, sarcodosis, or tuberculosis.	Yes, no
Work history	Represents work-based exposures related to the development of lung cancer, including asbestos, coal, and other chemicals.	Yes, no
Family history of lung cancer	Boolean variable indicating if an immediate family member (parent, sibling, child) was previously diagnosed with lung cancer.	Yes, no
Cancer	This variable represents the state of the individual to have a suspected lung cancer, based on Figure 1.
LDCT	The outcome of the imaging study for the individual, based on radiologist interpretation.	Screening with abnormalities detected and growth since prior study; Screening with abnormalities detected but no growth or change since prior study; no abnormalities
Biopsy	The results of a diagnostic biopsy.	Positive, negative
Death	Boolean variable giving the probability of death.	Yes, no

	DBN A Predictions		DBN B Predictions
First Screening	150 (tp) 124 (fp)	47 (fn) 77 (tn)	121 (tp) 64 (fp)	76 (fn) 121 (tn)
Second Screening	58 (tp) 20 (fp)	76 (fn) 172 (tn)	58 (tp) 19 (fp)	76 (fn) 172 (tn)
Third Screening	45 (tp) 13 (fp)	58 (fn) 175 (tn)	45 (tp) 13 (fp)	58 (fn) 175 (tn)

Structure Learning
Dataset number of cases	25046
Learning Algorithm	Bayesian Search
Algorithm Parameters
Max parent count	8
Iterations	20
Sample size	50
Seed	0
Link Probability	0.1
Prior Link Probability	0.001
Background Knowledge
Forced Arcs	5
Nodes assigned to tiers	6

The Forward-Arrow DBN without a NoisyMax gate
	Screen 1		Screen 2		Screen 3
Rates	0.927 (tp) 0.347 (fp)	0.073 (fn) 0.653 (tn)	0.903 (tp) 0.228 (fp)	0.097 (fn) 0.772 (tn)	0.854 (tp) 0.139 (fp)	0.146 (fn) 0.861 (tn)
Counts	51 (tp) 119 (fp)	4 (fn) 224 (tn)	28 (tp) 67 (fp)	3 (fn) 227 (tn)	35 (tp) 35 (fp)	6 (fn) 216 (tn)

		Age	BMI	Family History	Disease History	Cancer History	Smoking Status	Work Exposure	Gender
Count	Present values	25846	25573	25846	25846	25846	25846	25846	25846
Count	Missing values	0	93	0	0	0	0	0	0
Fraction	Present values	1	0.9964	1	1	1	1	1	1
Fraction	Missing values	0	0.0036	0	0	0	0	0

		LDCT Screen 1 Outcome	LDCT Screen 2 Outcome	LDCT Screen 3 Outcome
Count	Present values	25827	24335	23696
Count	Missing values	19	1511	2150
Fraction	Present values	0.9993	0.942	0.917
Fraction	Missing values	0.0007	0.058	0.083

PERMALINK

Prediction of lung cancer incidence on the low-dose computed tomography arm of the National Lung Screening Trial: A dynamic Bayesian network

Panayiotis Petousis

Simon X Han

Denise Aberle, MD

Alex AT Bui, PhD

Abstract

Introduction

Methods

Results

Conclusion

1. Introduction

2. Background

3. Methods

3.1. The NLST dataset

Table 1.

Figure 1.

3.2. Dynamic Bayesian networks

Figure 2.

3.3. The lung cancer screening DBNs

3.4. Comparison methods

4. Evaluation and results

Figure 3.

Figure 4.

Comparison with experts

Table 2.

Figure 5.

Figure 6.

Table 3.

Assessing model performance given missing data

Table 4. DBN Predictions By 3rd Screening.

Generalization and comparison to other models

Table 5.

5. Discussion

6. Conclusion

Acknowledgments

Appendix

A. Eligibility criteria

B. Variables

C. Prediction of future cancer cases

Table 6.

D. Calibration curves

Figure 7.

E. The DBN networks

Figure 8.

Table 7.

F. Statistics

F.1. The Forward-Arrow DBN without a NoisyMax gate

Table 8.

Table 9.

F.2. The Forward-Arrow DBN with a NoisyMax gate

Table 10.

Table 11.

F.3. Reversed-Arrow DBN

Table 12.

Table 13.

F.4. Learned DBN with compositional variables (structure learning)

Table 14.

Table 15.

F.5. Learned DBN without compositional variables

Table 16.

Table 17.

F.6. Naïve Bayes (NB)

Table 18.

Table 19.

G. The Probability Distributions over each screen of confirmed cancer and Non-cancer cases

G.1. The Forward-Arrow DBN without a NoisyMax gate

Figure 9.

G.2. The Forward-Arrow DBN with a NoisyMax gate

Figure 10.

G.3. Reversed-Arrow DBN

Figure 11.

G.4. Learned DBN with compositional variables

Figure 12.

G.5. Learned DBN without compositional variables

Figure 13.

G.6. 10-fold cross validation of the Forward-Arrow DBN with a NoisyMax gate

Figure 14.

G.7. Naïve Bayes

Figure 15.

Table 4. DBN Predictions By 3^rd Screening.