Estimation of Preclinical State Onset Age and Sojourn Time for Heavy Smokers in Lung Cancer

Dongfeng Wu; Shesh N Rai; Albert Seow

doi:10.4310/21-sii696

. Author manuscript; available in PMC: 2022 Aug 5.

Published in final edited form as: Stat Interface. 2022;15(3):349–358. doi: 10.4310/21-sii696

Estimation of Preclinical State Onset Age and Sojourn Time for Heavy Smokers in Lung Cancer

Dongfeng Wu ¹, Shesh N Rai ², Albert Seow ³

PMCID: PMC9355113 NIHMSID: NIHMS1734205 PMID: 35936652

Abstract

Estimation of the three key parameters: onset age of the preclinical state, sojourn time and screening sensitivity is critical in cancer screening, since all other terms are functions of the three. A novel link function to connect sensitivity with time in the preclinical state and the likelihood method were used in this project; since sensitivity depends on how long one has entered the preclinical state relative to the total sojourn time. Simulations using Markov Chain Monte Carlo and maximum likelihood estimate were carried out to estimate the key parameters for male and female heavy smokers separately in the low-dose computed tomography group of the National Lung Screening Trial. Sensitivity for male and female heavy smokers were 0.883 and 0.915 respectively at the onset of the preclinical state, and increased to 0.972 and 0.981 at the end. The mean age to make the transition into the preclinical state was 70.94 or 71.15 for male and female heavy smokers respectively, and 90% of heavy smokers at risk for lung cancer would enter the preclinical state in age interval (55.7, 85.8) for males and (54.2, 87.7) for females, and the transition peaked around age 69 for both genders. The mean sojourn time in the preclinical state was 1.43 and 1.49 years, and the 99% credible intervals for the sojourn time were (0.21, 2.96) and (0.37, 2.69) years for male and female heavy smokers correspondingly. Based on the result, low-dose CT should be started at age 55 and ended before 85 for heavy smokers. This provided important information to policy makers.

Keywords and phrases: Sensitivity, Sojourn time, Transition density, Low-dose computed tomography, Heavy smoker, Cancer screening

1. INTRODUCTION

Lung cancer is the leading cause of cancer death and accounts about 22.4% of all expected cancer deaths in the United States in 2020 [1]. Recently the U.S. Preventive Services Task Force recommends annual screening for lung cancer with low-dose computed tomography (CT) in adults ages 50 to 80 years, who have a 20 pack-year smoking history and currently smoke or have quit smoking within the past 15 years [2]. However, to better understand when to initiate lung cancer screening for smokers, it is important to know at what age they will enter the preclinical state, in which an asymptomatic individual unknowingly has the disease that a screening exam can detect; and it is equally important to know how long the preclinical state will last, since the goal of screening is to catch the disease before it becomes symptomatic.

There were several major randomized controlled lung cancer screening studies carried out in North America: the Mayo Lung Project, the Johns Hopkins Study, the Memorial Sloan-Kettering Study, the Early Lung Cancer Action Project, the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO), and the National Lung Screening Trial (NLST) [3]-[23]. The recently finished NLST study was designed to compare two different screening modalities for early detection: low-dose helical computed tomography (LDCT) with standard X-rays among heavy smokers [16, 17].

We will briefly review the cancer screening data and the statistical issues involved. A cohort of initially asymptomatic individuals are enrolled in a screening program to detect the presence of a specific cancer. We use the commonly followed disease progressive stochastic model [24], where a clinical disease is assumed to progress through three states: S₀ → S_p → S_c. S₀ means the disease-free state or the early state in which the disease can not be detected; S_p refers to the preclinical state, in which an asymptomatic individual unknowingly has the disease that a screening can detect; and S_c refers to the clinical state at which the disease presents itself in clinical symptoms. The model describes the natural history of tumor development, where each state is a time period, not a time point. The goal of screening programs is to detect the cancer in the preclinical state S_p.

There are three key parameters that uniquely determine the screening process, and all other terms can be expressed as functions of the three. They are the screening sensitivity, sojourn time in the preclinical state, and transition density from the disease-free state to the preclinical state. Sensitivity is the probability that a screening result is positive, given that an individual is in the preclinical stage Sp. Sojourn time is the time duration in the preclinical state Sp. The transition density from the disease-free state S₀ to the preclinical state S_p measures at what age will one enters the preclinical state (or, how long one will stay in the disease-free state from birth). The nature of data collection in a screening program make it impossible to observe the onset of either S_p or S_c; therefore, estimation of the sojourn time and the transition density is difficult. Some people may stay cancer-free in their lifetime, therefore the transition density is not a real probability density function (PDF), but a sub-PDF, which means, the whole area under this density curve (ie, the lifetime risk of making a transition into the preclinical state) is less than 1.

The three key parameters can be estimated using screening data and the likelihood method in a parametric model [20, 21]. We have applied the method to a few lung cancer screening datasets before, with sensitivity modeled as a function of age [3, 9, 10, 22]. However, preliminary work on the NLST and the PLCO study showed that the screening sensitivity barely changes with one’s age [13, 19]. In fact, sensitivity depends more on how long an individual has stayed in the preclinical state, or how far the tumor cells have developed. It is very hard to diagnose these tumors until they have reached a macroscopic dimension that usually have a cell population of about 10⁹ and are at least 31 generations from the first post-malignant division [25]. Based on these facts, we will model sensitivity as a function of the ratio of time spent in the preclinical state and the sojourn time at diagnosis. In Kim and Wu (2016), we explored some parametric models and got very useful information [12]. We will modify the sensitivity link function to build a better model for the NLST low-dose CT group data in this project for male and females heavy smokers. This will provide better modeling and more accurate estimation, regarding at what age male or female smokers will make the transition to enter the preclinical state and for how long they will stay in this state. The improved model also make it possible to estimate the sensitivity at the onset of the preclinical state and at the end of it. Therefore, this project will lay a foundation for future research regarding cancer screening scheduling, such as, when to initiate screening for lung cancer using low-dose CT for heavy smokers.

2. METHODS AND MATERIAL

2.1. The NLST Low-Dose CT Screening Data

The NLST study was designed to compare two different screening modalities for early detection of lung cancer: low-dose helical computed tomography (CT) versus standard chest X-rays among heavy smokers [16, 17]. The spiral low-dose CT uses X-rays to obtain a multiple-image scan of the entire chest, while a standard chest X-ray produces a single image of the whole chest. Participants were required to have a smoking history of at least 30 pack-years and were either current or former heavy smokers without signs, symptoms, or history of lung cancer. About 54,000 male or female heavy smokers with initial screen ages of 55 to 74 were enrolled in thirty-three centers across the US between August 2002 and April 2004; and they were randomized to either the chest X-ray arm or the low-dose CT arm evenly. Three annual screening exams were taken by participants in each arm. In summary, 15,537 men and 10,769 women were assigned to the low-dose CT arm, and 15,396 men and 10,634 women were assigned to the X-ray arm. If any of the screening results was abnormal, then the screen was considered positive and a biopsy was conducted as a diagnostic test. Participants were followed with a median time of 6.5 years.

For accurate estimation of the three key parameters, the data we used in the likelihoond function for cancer screening was simple: the participants data were separated into subgroups based on their age at the initial screening; for each age t₀ at study entry, and at each screening, the total number of people being screened n_i, the number of confirmed cancer cases s_i, and the interval incident cases r_i before the next exam were collected. Participants that dropped in the middle of the program were included (in the previous exams). Such data have been extracted from the NLST-CT study for male and female heavy smokers.

2.2. Probability Formulas and The Likelihood Function

Consider a cohort of initially asymptomatic individuals who enroll in a screening program. We let β(s|T) be the sensitivity, where s is the length of time that one has been stayed in the S_p, and T represents the (total) sojourn time in the S_p, a random variable. Intuitively, β will increase as s increases, and will decrease as T increases. We define w(t) as the probability density of making a transition from the S₀ to S_p at age t, a sub-PDF. Let q(x) be the probability density function (PDF) of the sojourn time in S_p. And let $Q (z) = \int_{z}^{\infty} q (x) d x$ be the survival function of the sojourn time. We will use female smokers in the set up of the modeling, the result is applicable for male smokers as well.

Assume there are K ordered screening exams that, for a subgroup of women who were all aged t₀ at study entry, occurring at ages t₀ < t₁ <·⋯ < t_K−1. Define the i-th screening interval as (t_i−1, t_i), i = 1, 2, … , K − 1. The i-th generation of individuals consists of those who enter S_p during this interval. The 0-th generation includes all who enter S_p before t₀, and we let t₋₁ = 0. Let $n_{i, t_{0}}$ be the total number of individuals examined at the i-th screening; $s_{i, t_{0}}$ is the number of cases diagnosed at the i-th screening; and $r_{i, t_{0}}$ is the number of interval incident cases within (t_i−1, t_i).

Let $D_{k, t_{0}}$ be the probability that an individual will be diagnosed at the k-th scheduled exam, combined with the fact that she is already in the preclinical state S_p. Let $I_{k, t_{0}}$ be the probability of being incident in the k-th screening interval. We have derived the formula [12]:

D_{1, t_{0}} = \int_{0}^{_{t 0}} w (x) \int_{t_{0} - x}^{\infty} q (t) β (t_{0} - x ∣ t) dtdx .

(1)

D_{k, t_{0}} = \sum_{i = 0}^{k - 2} \int_{t_{i - 1}}^{t_{i}} w (x) \int_{t_{k - 1} - x}^{\infty} q (t) \times {\prod_{j = i}^{k - 2} [1 - β (t_{j} - x ∣ t)]} β (t_{k - 1} - x ∣ t) dtdx + \int_{t_{k - 2}}^{t_{k - 1}} w (x) \int_{t_{k - 1} - x}^{\infty} q (t) β (t_{k - 1} - x ∣ t) dtdx, for \forall k = 2, \dots, K .

(2)

I_{k, t_{0}} = \sum_{i = 0}^{k - 1} \int_{t_{i - 1}}^{t_{i}} w (x) \int_{t_{k - 1} - x}^{t_{k} - x} q (t) \times {\prod_{j = i}^{k - 1} [1 - β (t_{j} - x ∣ t)]} dtdx + \int_{t_{k - 1}}^{t_{k}} w (x) [1 - Q (t_{k} - x)] dx, for \forall k = 1, \dots, K .

(3)

The double integral arises because the sensitivity is changing with the sojourn time and the time spent in the preclinical stage.

The likelihood is proportional to the multinomial probability of each age group t₀:

L (\cdot ∣ t_{0}) = \prod_{k = 1}^{3} D_{k, t_{0}}^{s_{k, t_{0}}} I_{k, t_{0}}^{r_{k, t_{0}}} {(1 - D_{k, t_{0}} - I_{k, t_{0}})}^{n_{k, t_{0}} - s_{k, t_{0}} - r_{k, t_{0}}}

And the initial screening age t₀ ranges from 55 to 74 (There were few cases in the 75 age groups that we excluded) in the NLST low-dose CT arm. Therefore, the total likelihood is proportional to:

L = \prod_{t_{0} = 55}^{74} L (\cdot ∣ t_{0}) = \prod_{t_{0} = 55}^{74} \prod_{k = 1}^{3} D_{k, t_{0}}^{s_{k, t_{0}}} I_{k, t_{0}}^{r_{k, t_{0}}} {(1 - D_{k, t_{0}} - I_{k, t_{0}})}^{n_{k, t_{0}} - s_{k, t_{0}} - r_{k, t_{0}}}

3. APPLICATION TO THE NLST CT DATA

We applied the likelihood method to the NLST low-dose CT arm data for male and female heavy smokers separately.

3.1. Parametric Link Functions

In our previous work [13, 20, 22], the sensitivity was modeled as β(t) = [1 + exp(−b₀ − b₁ × (t − m))]⁻¹, where t is age at diagnosis, and m is the average age at study entry, and (b₀, b₁) were the parameters to be estimated. For this project, we tried a few different sensitivity functions, based on the fact that the longer time one has stayed the preclinical state, the easier for the tumor to be detected. However, there were different paths for the sensitivity to move from a lower value of β₀ at the onset of the preclinical state, to a higher value of β₁ at the end of the preclinical state.

We use the ratio x = s/T to measure the tumor growth, where s represents the time one has stayed in the preclinical state S_p, and T represents the total sojourn time in the S_p, and 0 ≤ s ≤ T. There are two major types of paths: above the straight line connecting 0 and 1, or below this straight line. We tried the following two paths and their variations (with different parameters) to generate pesudo screening data and compared with the NLST low-dose CT data, with 0 ≤ x ≤ 1, see Figure 1.

V_{1} : β_{1} (s ∣ T) = b_{0} + (b_{1} - b_{1}) (2^{x} - 1), 0.5 \leq b_{0} < b_{1} \leq 1,

V_{2} : β_{2} (s ∣ T) = \frac{1}{1 + exp (- b_{0} - b_{1} x)}, b_{0} \geq 0, b_{1} \geq 0.

The generated data using model V₁ usually have fewer screen-detected cases s_i and more incident cases r_i at the i-th screening, which is far from the collected NLST-CT data. So we picked model V₂ (above the straight line) to model sensitivity in the likelihood.

Figure 1. — Moving path of sensitivity β in the preclinical state

For the transition probability w(t) and the sojourn time distribution q(x), we will use the same parametric functions as in Liu et al. (2015) [13]:

w (t) = \frac{0.3}{\sqrt{2 π} σ t} exp {- {(log t - μ)}^{2} / (2 σ^{2})}, σ > 0 .

Q (x) = exp (- λ x^{α}), q (x) = λ α x^{α - 1} Q (x), λ > 0, α > 0 .

There were six parameters θ = (b₀, b₁, μ, σ², λ, α) in the model, and we used the maximum likelihood estimate (MLE) and the Markov Chain Monte Carlo (MCMC) simulations to estimate the θ for male and female heavy smokers separately in the NLST CT data.

3.2. the MLE and the MCMC Implementation

To find the MLE, we called the built-in function of ”nlminb” in S+ or R. However, the calculation of the likelihood is not easy due to the double integrals in the probability calculation. The likelihood function was implented in Visual Studio C and called by a S function. We have to add appropriate boundaries to the parameters to make sure that the result was meaningful. For example, the sensitivity β was assumed to be at least 0.5 at the onset (entry point) of S_p and less than 1 at the end of S_p, and it was increasing as time spent in the S_p increased. So we put a boundary of b₀ ≥ 0, b₁ ≥ 0 and b₀ + b₁ ≤ 5 in our simulation, to make sure that the sensitivity at the entry of S_p was no less than 0.5, i.e., when x = 0, β(0) = 1/[1 + exp(−b₀)] ≥ 1/[1 + exp(0)] = 0.5, and it also guaranteed that the sensitivity at the end of S_p was less than 1, i.e., when x = 1, β(1) = 1/[1 + exp(−b₀ − b₁)] ≥ 1/[1 + exp(−5)] = 0.993. Detailed discussion on the choices of boundaries regarding w(t), q(x) and Q(x) were in section 2 (Methodology) in Liu et al. (2015) [13].

We used the Gibbs sampler combined with the Metropolis-Hasting (MH) algorithm to generate posterior samples using the likelihood and non-informative (flat) priors. Three sub-chains using the MH algorithm were used to sample (b₀, b₁), (μ, σ²) and (λ, α) separately. The jumping density for sub-chains (μ, σ²) and (λ, α) was bivariate normal, centered at the current values. To get the Markov chain converge faster, we ran a few round first to fine tune the covariance matrix in the bivariate normal jumping density for the two sub-chains. The jumping density for the sub-chain (b₀, b₁) was uniform. For each gender group, eight chains were ran with overdispersed initial values. Each Markov chain has ran 6000 steps, with a burn-in of 1000 steps. Bayesian output analysis using Geweke diagnostic test and Gelman-Rubin statistics showed convergence after the burn-in steps. That would leave 5000 samples from each chain, then thinning the chain every 50 steps (based on the auto-correlateion plot), that will provide 100 posterior samples from each chain. Finally, we combined the posterior samples from the eight chains to get 800 posterior samples for each gender. Table 1 was the summary result of MLE and Bayesian posterior estimates.

Table 1.

MLE and Bayesian posterior estimates using the NLST-CT data

	Female heavy smokers				Male heavy smokers
		Bayesian posterior estimate				Bayesian posterior estimate
Parameter	MLE	Mean	Median	S.E.	MLE	Mean	Median	S.E.
b ₀	3.139	2.972	2.992	1.242	4.084	2.636	2.577	1.339
b ₁	1.860	1.223	0.875	1.107	0	1.285	0.936	1.112
μ	4.266	4.254	4.253	0.014	4.262	4.253	4.253	0.010
σ ²	0.023	0.021	0.021	0.004	0.019	0.017	0.017	0.002
λ	0.214	0.194	0.190	0.089	0.354	0.294	0.305	0.109
α	2.803	4.238	3.512	2.173	2.288	3.168	2.614	1.662

Open in a new tab

3.3. Results

From Table 1 it was clear that the MLE of (μ, σ²) and the posterior mean (or median) were very close to each other for each gender, implying that the transition density curves using the MLE or using the posterior mean/median should be very close for each gender. However, the MLE of (b₀, b₁) was very different from that of the posterior mean estimates for both genders; specifically, the MLE of b₁ was zero for male heavy smokers, suggesting that the sensitivity was the same at the beginning and at the end of the preclinical state, which seems incompatible with clinical observations. Similarly, the MLE of (λ, α) was also different from that of the posterior mean estimates for each gender. In general, we prefer Baysian estimates, since the posterior samples provides the distribution for each parameter, not just a single value, hence it is easier to make inferences on the variation and credible interval. Figure 2 showed six graphs corresponding to the six components in the θ = (b₀, b₁, μ, σ², λ, α) using the 800 posterior samples. Each graph plotted the estimated distribution of the posterior samples of the parameter for both genders: the solid line represented the estimated density of the parameter for the female heavy smokers; and the dotted line represents the estimated density of the same parameter for the male counterparts. It showed that the posterior distribution of b₁ for male and female smokers are very close to each other, implying that screening sensitivity increased at about the same rate for both genders as one’s time in the preclinical state increased. However, the distribution of b₀ for female smokers was shifting to the right of their male counterparts, implying that the sensitivity for female smokers was higher at the onset of S_p.

Figure 2. — Estimated density of parameters using posterior samples: solid line for female heavy smokers; dotted line for the male counterparts.

If the MLE $(\hat{b_{0}}, \hat{b_{1}})$ were used, then the sensitivity for female smokers was 0.958 at the onset of S_p; it increased to 0.993 before the onset of S_c. Sensitivity for male smokers was 0.983, the same at the onset and at the end of S_p, since the MLE $\hat{b_{1}}$ was zero.

If we used the posterior mean of the 800 posterior samples of $(b_{0}^{*}, b_{1}^{*})$ in Table 1 to estimate the sensitivity, it would be 0.951 and 0.985 for female heavy smokers at the beginning and at the end of the S_p (or right before the onset of S_c) respectively; And it would be 0.933 and 0.980 for the male counterparts correspondingly. However, a better inference would be to use all 800 posterior samples of the $(b_{0}^{*}, b_{1}^{*})$ , to calculate the sensitivity at the onset of S_p by $β_{0}^{*} = 1 / (1 + exp (- b_{0}^{*}))$ and to calculate the sensitivity at the end of S_p by $β_{1}^{*} = 1 / (1 + exp (- b_{0}^{*} - b_{1}^{*}))$ . There were 800 posterior samples of $β_{0}^{*}$ and $β_{1}^{*}$ , then we can take the average, and calculate the 95% highest posterior density (HPD) interval, etc. Using this approach, the estimated sensitivity for female heavy smokers was 0.915 at the onset of S_p, with a standard error (SE) of 0.096 and the 95% HPD interval was (0.700, 0.993); the sensitivity increased to 0.981 at the end of S_p, with a SE of 0.015, and the 95% HPD interval was (0.952, 0.993). Similarly, sensitivity for male heavy smokers was 0.884 (SE was 0.120, 95% HPD was (0.613, 0.993)) at the onset of S_p; and it increased to 0.972 (SE was 0.028, 95% HPD was (0.906, 0.993)) right before the onset of S_c. Figure 3 helps to visualize the result: there are 4 curves in each panel; the pointwise mean sensitivity at each point x = s/T ∈ [0, 1] (the solid line), the corresponding pointwise 95% highest posterior density (HPD) intervals (the two dotted lines), and the sensitivity curve using the posterior mean (the broken line).

Figure 3. — Plots of estimated sensitivity for heavy smokers

We can estimate the transition age from the disease-free state S₀ to the preclinical state S_p using the posterior mean of $(μ^{*}, σ^{2^{*}})$ in Table 1. For female heavy smokers, the mean, median and mode of making a transition to the S_p were 71.14 70.39, and 68.93 years respectively, with a standard error of 10.38 years; and the highest density was close to 0.012 at the peak/mode. For male heavy smokers, the mean, median and mode of making a transition to the preclinical state were 70.94, 70.34, and 69.15 years old, with a standard error of 9.28 years; and the highest density at the peak was 0.013. However, a better way to make inference for the transition density w(t) was to use all 800 posterior samples. Each pair $(μ^{*}, σ^{2^{*}})$ would provide a curve for the w(t). We summarized the mean, median and mode with the corresponding 95% HPD intervals (in years) for w(t) and q(x) in Table 2. For example, the first row under the column ”Mean” showed that the average of the mean transition age into the preclincal state for female heavy smokers was 71.15 years, with a 95% HPD interval (69.25, 73.28) years.

Table 2.

Transition age and sjourn time with corresponding 95% HOP intervals (in years) based on the posterior samples.

Female heavy smokers
	Mean	Median	Mode
w(t)	71.15 (69.25,73.28)	70.40(68.55, 72.31)	68.93(67.26,70.65)
q(x)	1.49 (1.02, 1.92)	1.47(1.06, 1.89)	1.46 (1.10, 1.80)
Male heavy smokers
	Mean	Median	Mode
w(t)	70.94(69.40,72.32)	70.34(68.96,71.66)	69.16 (68.03, 70.38)
q(x)	1.43(1.05,1.83)	1.39(1.09,1.73)	1.30 (1.01, 1.64)

Open in a new tab

Figure 4 showed the transition density curve and the pointwise 95% HPD intervals. The density curve plotted using the posterior mean (the broken line) was almost the same as the pointwise posterior average (the solid line) in Figure 4. Comparing with the MLE, for female heavy smokers, the mean, median and mode of transition age from the S₀ to the S_p were 72.06, 71.24, and 69.62 years respectively, with a standard error of 10.99 yrs; and the highest density value was 0.011 at the peak of 69.62 years old. For male heavy smokers, the mean, median and mode of transition age from the S₀ to the S_p were 71.63, 70.95, and 69.62 years correspondingly, with a standard error of 9.92 years; and the highest density at the peak year was 0.012, slightly higher.

For the sojourn time, since there were 800 posterior samples for q(x|λ, α), each pair of would generate a density curve q(x) and a corresponding mean sojourn time. We calculated the average of the 800 mean sojourn time and its 95% HPD interval. The summarized result was in Table 2 as well. For female heavy smokers, the average of the mean sojourn time was 1.49 years, and the 95% HPD interval of the mean sojourn time was (1.02, 1.92) years. For male heavy smokers, the average and the 95% HPD interval of the mean sojourn time were 1.43 and (1.05, 1.83) years. Hence female heavy smokers have a slightly longer mean sojourn time. Figure 5 showed the density curve of sojourn time and the pointwise 95% HPD intervals. For comparison, the density curve using the posterior sample mean of (λ*, α*) was also plotted in the same graph, using the broken line. For each density curve q(x) generated by the (λ*, α*), we can also calculate the 95% or 99% HPD interval for the sojourn time distribution, then took the average of the 800 intervals to obtain the Bayesian credible interval for the sojourn time itself. The 95% credible interval of the sojourn time is (0.37, 2.54) and (0.56, 2.40) years for male and female heavy smokers respectively. The 99% credible interval of the sojourn time is (0.21, 2.96) and (0.37, 2.69) years for male and female heavy smokers correspondingly.

Figure 5. — Plots of estimated sojourn time density for heavy smokers

If we only used the posterior mean of the 800 posterior samples (λ*, α*), we would get a slightly shorter mean sojourn time, etc. For female heavy smokers, the mean, median and mode of the sojourn time in the preclinical state S_p were 1.34, 1.35, and 1.38 years respectively, with a standard error of 0.36 years. For male heavy smokers, the mean, median and mode of the sojourn time in the Sp were 1.32, 1.31, and 1.31 years, with a standard error of 0.46 years. However, important information was lost if we just used the posterior mean. Similarly, based on the MLE $(\hat{λ}, \hat{α})$ , for female heavy smokers, the mean, median and mode of the sojourn time in the preclinical state S_p were 1.54, 1.52, and 1.48 years, with a standard error of 0.60 years; For the male counterparts, the mean, median and mode of the sojourn time in the S_p were 1.39, 1.34, and 1.22 years, with a standard error of 0.65 years. However, it is hard to get the confidence intervals for the mean, median or mode of sojourn time if we only used the MLE.

4. DISCUSSION

We applied the likelihood method to the NLST low dose CT data for male and female heavy smokers separately, to estimate the three key parameters for each gender with a new link function for sensitivity. The goal was to obtain accurate and reliable estimation of the sojourn time and transition age to the preclinical state, so policy makers can use these information to make decisions regarding when to initiate lung cancer screeing exams for heavy smokers and the frequency to schedule future exams.

There are many research in the estimation of the three key parameters in lung cancer. This research is an incremental improvement over the existing methods, by using a different link function for the sensitivity. This link functions make it possible to estimate the sensitivity at the onset (entry point) of the preclinical state S_p and at the end of the S_p. Even though it is a very small change in sensitivity, the probability formula of screen-detected cases and that of interval cases have dramatically changed, making it much harder to evaluate. We also want to point out that due to this small change in sensitivity, all formulas in the estimation of interesting screening terms would have a dramatic change, such as the lead time (diagnosis time advanced by screening) distribution, probability of overdiagnosis and probability of true-early-detection, etc.

Liu et al. (2015) used the NLST low-dose CT data to estimate the sensitivity as a function of age, and used the same link functions for the sojourn time and the transition density [13]. They showed that the age effect for sensitivity was negligible, and that motivated us to find a better link function, to model sensitivity based on the fact that the longer time one stayed in the preclinical state, the easier for his/her cancer to be detected. Comparing with their result, our estimated average of mean sojourn time for female and male smokers were 1.49 and 1.43 years, slightly shorter than theirs (1.62 and 1.44 years).

We want to point out that using all Bayesian posterior samples is more reliable than only using the posterior mean or the MLE to estimate the three key parameters. For example, in the sensitivity estimates, using all posterior samples showed that sensitivity was 0.915 for female and 0.884 for male heavy smokers at the onset of preclinical state, and it increased to 0.981 (female) and 0.972 (male) at the end of the precinical state. This was close to the epidemiologist’s rough estimate of the average sensitivity, which was obtained through dividing screen-detected cases by all cancer cases (sum of screen-detected and interval cases). In the NLST CT female heavy smokers group, it was 0.966, 0.932 and 0.922 at the three annual exams; and it was 0.918, 0.892, 0.877 for male smokers. We can also use the posterior samples to estimate the HPD interval for the w(t) itself: each pair $(μ^{*}, σ^{2^{*}})$ would generate a density curve of w(t) and a corresponding HPD interval, then we take the average of the HPD intervals to get a more accurate one. For example, the 90% HPD interval for w(t) was (54.2, 87.7) for female heavy smokers, and it was (55.7, 85.8) for male counter- parts, implying that for heavy smokers who are at risk for lung cancer, they should be screened within these age intervals correpondingly.

This research layed a foundation for future research on the planning of screening program. People at potential risk and policy makers need to know at what age would the transition from the disease-free state to the preclinical state start and end. Our result showed that early transition could happen before age 50, and last until after age 80 and by calculating probabilities using the posterior samples. The probability of making a transition to the preclinical state before age 50 is 0.96% and 0.46% for female and male heavy smokers; and it is 18.80% and 16.12% for female and male heavy smokers after 80 years old among those who would have lung cancer. and the transition density would peak between age 67 to 71 for both male and female heavy smokers.

Our simulation also revealed that the MLE is not reliable using the NLST CT data. The main reason is that too few screening exams would make it more difficult to obtain accurate MLE [26]. Although the MLE procedure could provide point estimation for parameters, and the corresponding standard errors could be obtained by using bootstrapping method [20], the estimated standard errors were usually under estimated, and also this approach could be very time consuming and inefficient. On the other hand, a Bayesian approach using non-informative priors and MCMC could overcome these hurdles. We will do more simulation study to explore the minimum number of screening exams needed to get a reliable estimate of MLE in cancer screening.

ACKNOWLEDGEMENTS

We thank the National Cancer Institute’s Cancer Data Access System for allowing us to use the NLST data. The statements contained herein are solely of the authors and do not represent or imply concurrence or endorsement by the NCI. We also thank the Editor and the two anonymous referees for their valuable input.

This research was partially support by the NIH/NCI 1R15CA242482 (Wu). Dr. Rai was supported by Wendell Cherry Chair in Clinical Trial Research and KY Lung Cancer Program.

Contributor Information

Dongfeng Wu, Department of Bioinformatics and Biostatistics, University of Louisville, USA.

Shesh N. Rai, Department of Bioinformatics and Biostatistics, University of Louisville, USA

Albert Seow, Department of Radiology, University of Louisville, USA.

REFERENCES

[1].National Cancer Institute Surveillance, Epidemiology, and End Results Program. Cancer Stat Facts: Lung and Bronchus Cancer. http://seer.cancer.gov/statfacts/html/lungb.html. Last accessed on 8/23/2020. [Google Scholar]
[2].U.S. Preventive Services Task Force. 2020. Lung Cancer Screening Draft Recommendation Statement. July 07, 2020. https://www.uspreventiveservicestaskforce.org/uspstf/draftrecommendation/lung-cancer-screening-2020. Last accessed on 8/23/2020.
[3].Chen Y, Erwin D and Wu D. (2014). Over-diagnosis in lung cancer screening using the MSKC-LCSP data. Journal of Biometrics and Biostatistics. 5:201. DOI: 10.4172/2155-6180.1000201. [DOI] [Google Scholar]
[4].Chien CR, and Chen THH (2008). Mean sojourn time and effectiveness of mortality reduction for lung cancer screening with computed tomography. International Journal of Cancer. 122, 2594–2599. [DOI] [PubMed] [Google Scholar]
[5].Fontana RS, Sanderson DR, Woolner LB, Miller WE, Bernatz PE, Payne WS and Taylor WF (1975). The Mayo Lung Project for early detection and localization of bronchogenic carcinoma: a status report. CHEST. 67(5) 511–522. [DOI] [PubMed] [Google Scholar]
[6].Gohagan JK, Prorok PC, Hayes RB, Kramer BS, PLCO Project Team (2000). The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial of the National Cancer Institute: History, organization, and status. Controlled Clinical Trials. 21:251S–272S. [DOI] [PubMed] [Google Scholar]
[7].Henschke CI, McCauley DI, Yankelevitz DF, et al. (1999). Early lung cancer action project: overall design and ndings from baseline screening. Lancet. 354 (9173), 99–105. [DOI] [PubMed] [Google Scholar]
[8].Henschke CI, Naidich DP, Yankelevitz DF, et al. (2001). Early lung cancer action project: initial findings on repeat screenings. Cancer. 92(1): 153–159. [DOI] [PubMed] [Google Scholar]
[9].Jang H, Kim S and Wu D (2013). Bayesian Lead Time Estimation for the Johns Hopkins lung project data. Journal of Epidemiology and Global Health 3(3) 157–163. DOI: 10.1016/j.jegh.2013.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Kim S, Erwin D, and Wu D. (2012). Efficacy of dual lung cancer screening by chest x-ray and sputum cytology using Johns Hopkins lung project data. Journal of Biometrics and Biostatistics. 3:139. doi: 10.4172/2155-6180.1000139 [DOI] [Google Scholar]
[11].Kim S, Jang H, Wu D, Abrams J. (2015). A Bayesian nonlinear mixed-effects disease progression model. Journal of Biometrics and Biostatistics. 6:271. doi: 10.4172/2155-6180.1000271. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Kim S and Wu D. (2016). Estimation of sensitivity depending on sojourn time and time spent in preclinical state. Statistical Methods in Medical Research. 25(2), 728–740. DOI: 10.1177/0962280212465499. [DOI] [PubMed] [Google Scholar]
[13].Liu R, Levitt B, Riley T and Wu D (2015). Bayesian estimation of the three key parameters in CT for the National Lung Screening Trial data. Journal of Biometrics and Biostatistics 6:263. doi: 10.4172/2155-6180.1000263 [DOI] [Google Scholar]
[14].Marcus PM, Bergstralh EJ, Zweig MH, Harris A, Offord KP, Fontana RS. (2006). Extended lung cancer incidence followup in the Mayo Lung Project and overdiagnosis. Journal National Cancer Institute. 2006; 98:748–56. [DOI] [PubMed] [Google Scholar]
[15].Marshall HM, Bowman RV, Yang IA, Fong KM, Berg CD. (2013). Screening for lung cancer with low-dose computed tomography: a review of current status. J Thorac Dis. 2013; 5(S5):S524–S539. doi: 10.3978/j.issn.2072-1439.2013.09.06 [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].The National Lung Screening Trial Research Team (2011a). The National Lung Screening Trial: overview and study design. Radiology. 258(1), 243–253. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].The National Lung Screening Trial Research Team (2011b). Reduced lung-cancer mortality with low-fose computed tomographic screening. New England Journal of Medicine 365 395–409. doi: 10.1056/NEJMoa1102873. [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Prorok PC, Andriole GL, Bresalier RS, et al. (2000). Design of the Prostate, Lung, Colorectal and Ovarian (PLCO) cancer screening trial. Controlled Clincal Trials. 21: 273S–309S. [DOI] [PubMed] [Google Scholar]
[19].Wang D, Levitt B, Riley T, Wu D. (2017). Estimation of sojourn time and transition probability of lung cancer for smokers using the PLCO data. Journal of Biometrics and Biostatistics. 8:360. doi: 10.4172/2155-6180.1000360. [DOI] [Google Scholar]
[20].Wu D, Rosner GL and Broemeling LD. (2005). MLE and Bayesian inference of age-dependent sensitivity and transition probability in periodic screening. Biometrics 61. 1056–1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Wu D, Carino RL, and Wu X. (2008). When sensitivity is a function of age and time spent in the preclinical state in periodic cancer screening. Journal of Modern Applied Statistical Methods. Vol. 7, No. 1, 297–303. [Google Scholar]
[22].Wu D, Erwin D and Rosner GL (2011). Sojourn time and lead time projection in lung cancer screening. Lung Cancer. 72(3) 322–326. DOI: 10.1016/j.lungcan.2010.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[23].Wu D, Erwin D, and Kim S. (2011). Projection of longterm outcomes using x-rays and pooled cytology in lung cancer screening. Open Access Medical Statistics. Vol. 2011:1, 13–19. DOI: 10.2147/OAMS.S22987. [DOI] [Google Scholar]
[24].Zelen M and Feinleib M (1969). On the Theory of Screening for Chronic Diseases. Biometrika 56(3) 601–614. [Google Scholar]
[25].Chen WY, Annamreddy PR, and Fan LT. (2003). Modeling growth of a heterogeneous tumor. Journal of Theoretical Biology. 221 205–227. [DOI] [PubMed] [Google Scholar]
[26].Wu D and Kim S. (2020). Problems in the estimation of the key parameters using MLE in lung cancer screening. Journal of Clinical Research and Reports. 5(3): 117. DOI: 10.31579/2690-1919/117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] [1].National Cancer Institute Surveillance, Epidemiology, and End Results Program. Cancer Stat Facts: Lung and Bronchus Cancer. http://seer.cancer.gov/statfacts/html/lungb.html. Last accessed on 8/23/2020. [Google Scholar]

[R2] [2].U.S. Preventive Services Task Force. 2020. Lung Cancer Screening Draft Recommendation Statement. July 07, 2020. https://www.uspreventiveservicestaskforce.org/uspstf/draftrecommendation/lung-cancer-screening-2020. Last accessed on 8/23/2020.

[R3] [3].Chen Y, Erwin D and Wu D. (2014). Over-diagnosis in lung cancer screening using the MSKC-LCSP data. Journal of Biometrics and Biostatistics. 5:201. DOI: 10.4172/2155-6180.1000201. [DOI] [Google Scholar]

[R4] [4].Chien CR, and Chen THH (2008). Mean sojourn time and effectiveness of mortality reduction for lung cancer screening with computed tomography. International Journal of Cancer. 122, 2594–2599. [DOI] [PubMed] [Google Scholar]

[R5] [5].Fontana RS, Sanderson DR, Woolner LB, Miller WE, Bernatz PE, Payne WS and Taylor WF (1975). The Mayo Lung Project for early detection and localization of bronchogenic carcinoma: a status report. CHEST. 67(5) 511–522. [DOI] [PubMed] [Google Scholar]

[R6] [6].Gohagan JK, Prorok PC, Hayes RB, Kramer BS, PLCO Project Team (2000). The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial of the National Cancer Institute: History, organization, and status. Controlled Clinical Trials. 21:251S–272S. [DOI] [PubMed] [Google Scholar]

[R7] [7].Henschke CI, McCauley DI, Yankelevitz DF, et al. (1999). Early lung cancer action project: overall design and ndings from baseline screening. Lancet. 354 (9173), 99–105. [DOI] [PubMed] [Google Scholar]

[R8] [8].Henschke CI, Naidich DP, Yankelevitz DF, et al. (2001). Early lung cancer action project: initial findings on repeat screenings. Cancer. 92(1): 153–159. [DOI] [PubMed] [Google Scholar]

[R9] [9].Jang H, Kim S and Wu D (2013). Bayesian Lead Time Estimation for the Johns Hopkins lung project data. Journal of Epidemiology and Global Health 3(3) 157–163. DOI: 10.1016/j.jegh.2013.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Kim S, Erwin D, and Wu D. (2012). Efficacy of dual lung cancer screening by chest x-ray and sputum cytology using Johns Hopkins lung project data. Journal of Biometrics and Biostatistics. 3:139. doi: 10.4172/2155-6180.1000139 [DOI] [Google Scholar]

[R11] [11].Kim S, Jang H, Wu D, Abrams J. (2015). A Bayesian nonlinear mixed-effects disease progression model. Journal of Biometrics and Biostatistics. 6:271. doi: 10.4172/2155-6180.1000271. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Kim S and Wu D. (2016). Estimation of sensitivity depending on sojourn time and time spent in preclinical state. Statistical Methods in Medical Research. 25(2), 728–740. DOI: 10.1177/0962280212465499. [DOI] [PubMed] [Google Scholar]

[R13] [13].Liu R, Levitt B, Riley T and Wu D (2015). Bayesian estimation of the three key parameters in CT for the National Lung Screening Trial data. Journal of Biometrics and Biostatistics 6:263. doi: 10.4172/2155-6180.1000263 [DOI] [Google Scholar]

[R14] [14].Marcus PM, Bergstralh EJ, Zweig MH, Harris A, Offord KP, Fontana RS. (2006). Extended lung cancer incidence followup in the Mayo Lung Project and overdiagnosis. Journal National Cancer Institute. 2006; 98:748–56. [DOI] [PubMed] [Google Scholar]

[R15] [15].Marshall HM, Bowman RV, Yang IA, Fong KM, Berg CD. (2013). Screening for lung cancer with low-dose computed tomography: a review of current status. J Thorac Dis. 2013; 5(S5):S524–S539. doi: 10.3978/j.issn.2072-1439.2013.09.06 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].The National Lung Screening Trial Research Team (2011a). The National Lung Screening Trial: overview and study design. Radiology. 258(1), 243–253. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].The National Lung Screening Trial Research Team (2011b). Reduced lung-cancer mortality with low-fose computed tomographic screening. New England Journal of Medicine 365 395–409. doi: 10.1056/NEJMoa1102873. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] [18].Prorok PC, Andriole GL, Bresalier RS, et al. (2000). Design of the Prostate, Lung, Colorectal and Ovarian (PLCO) cancer screening trial. Controlled Clincal Trials. 21: 273S–309S. [DOI] [PubMed] [Google Scholar]

[R19] [19].Wang D, Levitt B, Riley T, Wu D. (2017). Estimation of sojourn time and transition probability of lung cancer for smokers using the PLCO data. Journal of Biometrics and Biostatistics. 8:360. doi: 10.4172/2155-6180.1000360. [DOI] [Google Scholar]

[R20] [20].Wu D, Rosner GL and Broemeling LD. (2005). MLE and Bayesian inference of age-dependent sensitivity and transition probability in periodic screening. Biometrics 61. 1056–1063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Wu D, Carino RL, and Wu X. (2008). When sensitivity is a function of age and time spent in the preclinical state in periodic cancer screening. Journal of Modern Applied Statistical Methods. Vol. 7, No. 1, 297–303. [Google Scholar]

[R22] [22].Wu D, Erwin D and Rosner GL (2011). Sojourn time and lead time projection in lung cancer screening. Lung Cancer. 72(3) 322–326. DOI: 10.1016/j.lungcan.2010.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] [23].Wu D, Erwin D, and Kim S. (2011). Projection of longterm outcomes using x-rays and pooled cytology in lung cancer screening. Open Access Medical Statistics. Vol. 2011:1, 13–19. DOI: 10.2147/OAMS.S22987. [DOI] [Google Scholar]

[R24] [24].Zelen M and Feinleib M (1969). On the Theory of Screening for Chronic Diseases. Biometrika 56(3) 601–614. [Google Scholar]

[R25] [25].Chen WY, Annamreddy PR, and Fan LT. (2003). Modeling growth of a heterogeneous tumor. Journal of Theoretical Biology. 221 205–227. [DOI] [PubMed] [Google Scholar]

[R26] [26].Wu D and Kim S. (2020). Problems in the estimation of the key parameters using MLE in lung cancer screening. Journal of Clinical Research and Reports. 5(3): 117. DOI: 10.31579/2690-1919/117. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Estimation of Preclinical State Onset Age and Sojourn Time for Heavy Smokers in Lung Cancer

Dongfeng Wu

Shesh N Rai

Albert Seow

Abstract

1. INTRODUCTION