Using a capture-recapture approach, Böhning and Patilea (2008) proposed two useful estimators for unobserved cell counts, assuming homogeneous association of the screening tests over disease status. However, they are mistaken in claiming that the maximum likelihood estimators (MLEs) are difficult to obtain. The point of this note is to present closed-form MLEs for, in their notation: 1) the α model where is assumed to be identical for all i = 1, 2, …, d; and 2) the θ model where is assumed to be identical for all i.
One way to write the likelihood function (ignoring constant terms) in this setting is in terms of qi and , as the authors did:
(1) |
This parameterization involves a mixture likelihood, preventing closed-form solution for the MLEs. To obtain closed-form MLEs, we consider an alternative parameterization in terms of πjk and where πjk = P(T1 = j, T2 = k), . The log-likelihood function is (ignoring constant terms),
(2) |
This representation relates to previous work in some other settings (Satten and Kupper 1993; Lyles 2002; Pepe and Janes 2007). Note that, , and Therefore equation (2) is equivalent to equation (1).
These equations are tractable and yield closed-form MLEs of πjk (j, k =0, 1) and if j + k > 0. Omitting the algebra, we obtain the MLEs as and if j + k > 0. Therefore, the MLEs of qi s, which can be written as functions of πjk (j, k = 0, 1) and under the α or θ model assumptions, have closed-form solutions. The details are given below.
Under the α model, is assumed to be identical for all i = 1, 2, …, d; by Bayes’ rule,
Thus
where the subscript α indicates the α model assumption. Since the MLEs of the parameters πjk and are and if j + k > 0, the closed-form MLE of niα under the α model is
(3) |
which is essentially the same as the equation (15) in Böhning and Patilea (2008) without the stability correction. In other words, the estimator obtained in equation (15) is the MLE under the α model assumption with the stability correction.
Under the θ model is assumed to be identical for all i = 1, 2, …, d; by Bayes’ rule
Thus and
where the subscript θ indicates the θ model assumption. Similarly, the closed-form MLE of under niθ under the θ model is
(4) |
which is essentially the same as the equation (10) in Böhning and Patilea (2008) without the stability correction.
As a byproduct of this alternative parameterization, we can test the difference between ^qiθ and ^qiα (or equivalently, the difference between ^niθ and ^niα) to make inference on whether these two assumptions provide statistically significantly different predictions for the probability (or equivalently, the number) of individuals with certain disease class i. Although the formula for se(^qiθ − ^qiα) is tedious, its numerical value can be obtained easily through statistical software using the delta method. We note that the difference between estimated probabilities of disease classes under the α and θ models can be statistically different and potentially meaningful for the same study. For example, in the Health Insurance Plan Study for breast cancer screening in New York (Strax, Venet Shapiro and Gross 1967), the estimated probability of having cancer assuming the α model is 4.8% with a 95% confidence interval (CI) of 0.3% to 9.3%, while the estimated probability of having cancer assuming the θ model is 7.5% with 95% CI of 2.8% to 12.2%. The difference is 2.7% (95% CI: 1.4% to 4%) with a p-value less than 0.001. This difference can have a big impact on the cancer surveillance and prevention. Unfortunately, the data does not contain information to differentiate the α model versus the θ model.
The alternative parameterization in (2) sheds lights on maximum likelihood approaches in the setting considered here; the corresponding closed-form ML estimators under the α and θ models allow tests of the difference between the estimated probabilities of a specific disease class using the α versus the θ model. Our results complements the estimators obtained in equations (10) and (15) by Böhning and Patilea (2008) using a capture-recapture approach, and ensure the usual MLE properties.
Acknowledgments
Dr. Chu was supported in part by the Lineberger Cancer Center Core Grant CA16086 from the U.S. National Cancer Institute. The authors are very grateful to the editor for his helpful comments and suggestions.
Contributor Information
Haitao Chu, Department of Biostatistics and the Lineberger Comprehensive Cancer Center, The Univerity of North Carolina, Chapel Hill, NC 27516 (Email: hchu@bios.unc.edu)..
Lei Nie, Office of Biostatistics, Food and Drug Administration, Silver Spring, MD 20993 (Email: lei.nie@fda.hhs.gov)..
References
- Böhning D, Patilea V. A capture-recapture approach for screening using two diagnostic tests with availability of disease status for the test positives only. Journal of the American Statistical Association. 2008;103:212–221. doi: 10.1198/016214508000000940. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lyles RH. A note on estimating crude odds ratios in case-control studies with differentially misclassified exposure. Biometrics. 2002;58:1034–1036. doi: 10.1111/j.0006-341x.2002.1034_1.x. [DOI] [PubMed] [Google Scholar]
- Pepe MS, Janes H. Insights into latent class analysis of diagnostic test performance. Biostatistics. 2007;8:474–484. doi: 10.1093/biostatistics/kxl038. [DOI] [PubMed] [Google Scholar]
- Satten GA, Kupper LL. Inferences About Exposure-Disease Associations Using Probability-Of-Exposure Information. Journal of the American Statistical Association. 1993;88:200–208. [Google Scholar]
- Strax P, Venet L, Shapiro S, Gross S. Mammography and Clinical Examination in Mass Screening for Cancer of the Breast. Cancer. 1967;20:2184–2188. doi: 10.1002/1097-0142(196712)20:12<2184::aid-cncr2820201217>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]