A few remarks on “A capture-recapture approach for screening using two diagnostic tests with availability of disease status for the test positives only” by Böhning and Patilea

Haitao Chu; Lei Nie

doi:10.1198/016214508000000940

. Author manuscript; available in PMC: 2010 Jun 9.

Published in final edited form as: J Am Stat Assoc. 2008 Dec;103(484):1518–1519. doi: 10.1198/016214508000000940

A few remarks on “A capture-recapture approach for screening using two diagnostic tests with availability of disease status for the test positives only” by Böhning and Patilea

Haitao Chu ¹, Lei Nie ²

PMCID: PMC2882802 NIHMSID: NIHMS90014 PMID: 20539836

Using a capture-recapture approach, Böhning and Patilea (2008) proposed two useful estimators for unobserved cell counts, assuming homogeneous association of the screening tests over disease status. However, they are mistaken in claiming that the maximum likelihood estimators (MLEs) are difficult to obtain. The point of this note is to present closed-form MLEs for, in their notation: 1) the α model where $α = p_{11}^{(i)} p_{00}^{(i)} / (p_{01}^{(i)} p_{10}^{(i)})$ is assumed to be identical for all i = 1, 2, …, d; and 2) the θ model where $θ = p_{1 | 1}^{(i)} / p_{1 +}^{(i)}$ is assumed to be identical for all i.

One way to write the likelihood function (ignoring constant terms) in this setting is in terms of q_i and $p_{j k}^{(i)} (i = 1, 2, \dots, d; j = 0, 1; k = 0, 1)$ , as the authors did:

x_{00}^{(+)} log (\sum_{i} p_{00}^{(i)} q_{i}) + \sum_{i} x_{11}^{(i)} log (p_{11}^{(i)} q_{i}) + \sum_{i} x_{10}^{(i)} log (p_{10}^{(i)} q_{i}) + \sum_{i} x_{01}^{(i)} log (p_{01}^{(i)} q_{i}) .

(1)

This parameterization involves a mixture likelihood, preventing closed-form solution for the MLEs. To obtain closed-form MLEs, we consider an alternative parameterization in terms of π_jk and $π_{j k}^{(i)} (j, k = 0, 1 and i = 1, 2, \dots, d)$ where π_jk = P(T₁ = j, T₂ = k), $π_{j k}^{(i)} = P (D = i | T_{1} = j, T_{2} = k)$ . The log-likelihood function is (ignoring constant terms),

\begin{matrix} log L = \sum_{j} \sum_{k} x_{j k}^{(+)} log (π_{j k}) + \sum_{i} x_{11}^{(i)} log (π_{11}^{(i)}) + \sum_{i} x_{10}^{(i)} log (π_{10}^{(i)}) + \sum_{i} x_{01}^{(i)} log (π_{01}^{(i)}) \\ = x_{00}^{(+)} log (π_{00}) + \sum_{d} x_{11}^{(i)} {log (π_{11}^{(i)}) + log (π_{11})} + \sum_{i} x_{10}^{(i)} {log (π_{10}^{(i)}) + log (π_{10})} + \sum_{i} x_{01}^{(i)} {log (π_{01}^{(i)}) + log (π_{01})} \\ = x_{00}^{(+)} log (π_{00}) + \sum_{i} x_{11}^{(i)} log (π_{11}^{(i)} π_{11}) + \sum_{i} x_{10}^{(i)} log (π_{10}^{(i)} π_{10}) + \sum_{i} x_{01}^{(i)} log (π_{01}^{(i)} π_{01}) \end{matrix}

(2)

This representation relates to previous work in some other settings (Satten and Kupper 1993; Lyles 2002; Pepe and Janes 2007). Note that, $π_{j k}^{(i)} π_{j k} = P (D = i | T_{1} = j, T_{2} = k) P (T_{1} = j, T_{2} = k) = P (T_{1} = j, T_{2} = k | D = i) P (D = i) = p_{j k}^{(i)} q_{i}$ , and $π_{00} = P (T_{1} = 0, T_{2} = 0) = \sum_{i} P (T_{1} = 0, T_{2} = 0, D = i) = \sum_{i} P (T_{1} = 0, T_{2} = 0 | D = i) P (D = i) = \sum_{i} p_{00}^{(i)} q_{i}$ Therefore equation (2) is equivalent to equation (1).

These equations are tractable and yield closed-form MLEs of π_jk (j, k =0, 1) and $π_{j k}^{(i)}$ if j + k > 0. Omitting the algebra, we obtain the MLEs as ${\overset{`}{π}}_{j k} = x_{j k} / n (j, k = 0, 1)$ and ${\overset{`}{π}}_{j k}^{(i)} = x_{j k}^{(i)} / x_{j k}$ if j + k > 0. Therefore, the MLEs of q_i s, which can be written as functions of π_jk (j, k = 0, 1) and $π_{j k}^{(i)} (j + k > 0)$ under the α or θ model assumptions, have closed-form solutions. The details are given below.

Under the α model, $α = \frac{p_{11}^{(i)} p_{00}^{(i)}}{p_{01}^{(i)} p_{10}^{(i)}}$ is assumed to be identical for all i = 1, 2, …, d; by Bayes’ rule,

\begin{matrix} α = \frac{p_{11}^{(i)} p_{00}^{(i)}}{p_{01}^{(i)} p_{10}^{(i)}} = \frac{P (T_{1} = 1, T_{2} = 1 | D = i) P (T_{1} = 0, T_{2} = 0 | D = i)}{P (T_{1} = 0, T_{2} = 1 | D = i) P (T_{1} = 1, T_{2} = 0 | D = i)} \\ = \frac{P (D = i | T_{1} = 1, T_{2} = 1) P (T_{1} = 1, T_{2} = 1) P (D = i | T_{1} = 0, T_{2} = 0) P (T_{1} = 0, T_{2} = 0)}{P (D = i | T_{1} = 1, T_{2} = 0) P (T_{1} = 1, T_{2} = 0) P (D = i | T_{1} = 0, T_{2} = 1) P (T_{1} = 0, T_{2} = 1)} \\ = \frac{π_{11} π_{00}}{π_{01} π_{10}} \times \frac{π_{11}^{(i)} π_{00}^{(i)}}{π_{01}^{(i)} π_{10}^{(i)}}, \end{matrix}

Thus

\begin{matrix} α = \frac{π_{11} π_{00}}{π_{01} π_{10}} \times {[\sum_{i} \frac{π_{01}^{(i)} π_{10}^{(i)}}{π_{11}^{(i)}}]}^{- 1}, π_{00 α}^{(i)} = \frac{π_{01}^{(i)} π_{10}^{(i)}}{π_{11}^{(i)}} \times {[\sum_{i} \frac{π_{01}^{(i)} π_{10}^{(i)}}{π_{11}^{(i)}}]}^{- 1}, \\ q_{i α} = π_{11} π_{11}^{(i)} + π_{10} π_{10}^{(i)} + π_{01} π_{01}^{(i)} + π_{00} \frac{π_{01}^{(i)} π_{10}^{(i)}}{π_{11}^{(i)}} {[\sum_{i} \frac{π_{01}^{(i)} π_{10}^{(i)}}{π_{11}^{(i)}}]}^{- 1}, \end{matrix}

where the subscript α indicates the α model assumption. Since the MLEs of the parameters π_jk and $π_{j k}^{(i)}$ are ${\overset{`}{π}}_{j k} = x_{j k} / n (j, k = 0, 1)$ and ${\overset{`}{π}}_{j k}^{(i)} = x_{j k}^{(i)} / x_{j k}$ if j + k > 0, the closed-form MLE of n_iα under the α model is

{\hat{n}}_{i α} = n {\hat{q}}_{i α} = x_{11}^{(i)} + x_{10}^{(i)} + x_{01}^{(i)} + x_{00} \frac{x_{01}^{(i)} x_{10}^{(i)}}{x_{11}^{d}} {[\sum_{i} \frac{x_{01}^{(i)} x_{10}^{(i)}}{x_{11}^{(i)}}]}^{- 1},

(3)

which is essentially the same as the equation (15) in Böhning and Patilea (2008) without the stability correction. In other words, the estimator obtained in equation (15) is the MLE under the α model assumption with the stability correction.

Under the θ model $θ = \frac{p_{1 | 1}^{(i)}}{p_{1 +}^{(i)}}$ is assumed to be identical for all i = 1, 2, …, d; by Bayes’ rule

\begin{matrix} θ & = \frac{p_{1 | 1}^{(i)}}{p_{1 +}^{(i)}} = \frac{P (T_{1} = 1 | T_{2} = 1, D = i)}{P (T_{1} = 1 | D = i)} = \frac{P (D = i | T_{1} = 1, T_{2} = 1) P (T_{1} = 1, T_{2} = 1)}{P (T_{2} = 1, D = i) P (T_{1} = 1, D = i)} \times P (D = i) \\ = \frac{π_{11} π_{11}^{(i)}}{(π_{01} π_{01}^{(i)} + π_{11} π_{11}^{(i)}) (π_{10} π_{10}^{(i)} + π_{11} π_{11}^{(i)})} \times P (D = i), \end{matrix}

Thus $θ = {[\sum_{i} (\frac{π_{10} π_{10}^{(i)}}{π_{11} π_{11}^{(i)}} + 1) (π_{01} π_{01}^{(i)} + π_{11} π_{11}^{(i)})]}^{- 1}$ and

π_{00 θ}^{(i)} = \frac{1}{π_{00}} {(\frac{π_{10} π_{10}^{(i)}}{π_{11} π_{11}^{(i)}} + 1) (π_{01} π_{01}^{(i)} + π_{11} π_{11}^{(i)}) {[\sum_{i} (\frac{π_{10} π_{10}^{(i)}}{π_{11} π_{11}^{(i)}} + 1) (π_{01} π_{01}^{(i)} + π_{11} π_{11}^{(i)})]}^{- 1} - π_{11} π_{11}^{(i)} - π_{10} π_{10}^{(i)} - π_{01} π_{01}^{(i)}}

q_{i θ} = (\frac{π_{10} π_{10}^{(i)}}{π_{11} π_{11}^{(i)}} + 1) (π_{01} π_{01}^{(i)} + π_{11} π_{11}^{(i)}) {[\sum_{i} (\frac{π_{10} π_{10}^{(i)}}{π_{11} π_{11}^{(i)}} + 1) (π_{01} π_{01}^{(i)} + π_{11} π_{11}^{(i)})]}^{- 1},

where the subscript θ indicates the θ model assumption. Similarly, the closed-form MLE of under n_iθ under the θ model is

{\hat{n}}_{i θ} = n {\hat{q}}_{i θ} = (\frac{x_{10}^{(i)}}{x_{11}^{(i)}} + 1) (x_{01}^{(i)} + x_{11}^{(i)}) {[\sum_{i} (\frac{x_{10}^{(i)}}{x_{11}^{(i)}} + 1) (x_{01}^{(i)} + x_{11}^{(i)})]}^{- 1} = \frac{x_{1 +}^{(i)} x_{+ 1}^{(i)}}{x_{11}^{(i)}} {[\sum_{i} \frac{x_{1 +}^{(i)} x_{+ 1}^{(i)}}{x_{11}^{(i)}}]}^{- 1},

(4)

which is essentially the same as the equation (10) in Böhning and Patilea (2008) without the stability correction.

As a byproduct of this alternative parameterization, we can test the difference between ^q_iθ and ^q_iα (or equivalently, the difference between ^n_iθ and ^n_iα) to make inference on whether these two assumptions provide statistically significantly different predictions for the probability (or equivalently, the number) of individuals with certain disease class i. Although the formula for se(^q_iθ − ^q_iα) is tedious, its numerical value can be obtained easily through statistical software using the delta method. We note that the difference between estimated probabilities of disease classes under the α and θ models can be statistically different and potentially meaningful for the same study. For example, in the Health Insurance Plan Study for breast cancer screening in New York (Strax, Venet Shapiro and Gross 1967), the estimated probability of having cancer assuming the α model is 4.8% with a 95% confidence interval (CI) of 0.3% to 9.3%, while the estimated probability of having cancer assuming the θ model is 7.5% with 95% CI of 2.8% to 12.2%. The difference is 2.7% (95% CI: 1.4% to 4%) with a p-value less than 0.001. This difference can have a big impact on the cancer surveillance and prevention. Unfortunately, the data does not contain information to differentiate the α model versus the θ model.

The alternative parameterization in (2) sheds lights on maximum likelihood approaches in the setting considered here; the corresponding closed-form ML estimators under the α and θ models allow tests of the difference between the estimated probabilities of a specific disease class using the α versus the θ model. Our results complements the estimators obtained in equations (10) and (15) by Böhning and Patilea (2008) using a capture-recapture approach, and ensure the usual MLE properties.

Acknowledgments

Dr. Chu was supported in part by the Lineberger Cancer Center Core Grant CA16086 from the U.S. National Cancer Institute. The authors are very grateful to the editor for his helpful comments and suggestions.

Contributor Information

Haitao Chu, Department of Biostatistics and the Lineberger Comprehensive Cancer Center, The Univerity of North Carolina, Chapel Hill, NC 27516 (Email: hchu@bios.unc.edu)..

Lei Nie, Office of Biostatistics, Food and Drug Administration, Silver Spring, MD 20993 (Email: lei.nie@fda.hhs.gov)..

References

Böhning D, Patilea V. A capture-recapture approach for screening using two diagnostic tests with availability of disease status for the test positives only. Journal of the American Statistical Association. 2008;103:212–221. doi: 10.1198/016214508000000940. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lyles RH. A note on estimating crude odds ratios in case-control studies with differentially misclassified exposure. Biometrics. 2002;58:1034–1036. doi: 10.1111/j.0006-341x.2002.1034_1.x. [DOI] [PubMed] [Google Scholar]
Pepe MS, Janes H. Insights into latent class analysis of diagnostic test performance. Biostatistics. 2007;8:474–484. doi: 10.1093/biostatistics/kxl038. [DOI] [PubMed] [Google Scholar]
Satten GA, Kupper LL. Inferences About Exposure-Disease Associations Using Probability-Of-Exposure Information. Journal of the American Statistical Association. 1993;88:200–208. [Google Scholar]
Strax P, Venet L, Shapiro S, Gross S. Mammography and Clinical Examination in Mass Screening for Cancer of the Breast. Cancer. 1967;20:2184–2188. doi: 10.1002/1097-0142(196712)20:12<2184::aid-cncr2820201217>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]

[R1] Böhning D, Patilea V. A capture-recapture approach for screening using two diagnostic tests with availability of disease status for the test positives only. Journal of the American Statistical Association. 2008;103:212–221. doi: 10.1198/016214508000000940. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Lyles RH. A note on estimating crude odds ratios in case-control studies with differentially misclassified exposure. Biometrics. 2002;58:1034–1036. doi: 10.1111/j.0006-341x.2002.1034_1.x. [DOI] [PubMed] [Google Scholar]

[R3] Pepe MS, Janes H. Insights into latent class analysis of diagnostic test performance. Biostatistics. 2007;8:474–484. doi: 10.1093/biostatistics/kxl038. [DOI] [PubMed] [Google Scholar]

[R4] Satten GA, Kupper LL. Inferences About Exposure-Disease Associations Using Probability-Of-Exposure Information. Journal of the American Statistical Association. 1993;88:200–208. [Google Scholar]

[R5] Strax P, Venet L, Shapiro S, Gross S. Mammography and Clinical Examination in Mass Screening for Cancer of the Breast. Cancer. 1967;20:2184–2188. doi: 10.1002/1097-0142(196712)20:12<2184::aid-cncr2820201217>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]

PERMALINK

A few remarks on “A capture-recapture approach for screening using two diagnostic tests with availability of disease status for the test positives only” by Böhning and Patilea

Haitao Chu

Lei Nie

Roles

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A few remarks on “A capture-recapture approach for screening using two diagnostic tests with availability of disease status for the test positives only” by Böhning and Patilea

Haitao Chu

Lei Nie

Roles

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases