Abstract
Suppose that we wish to know the probability that an object belongs to a class. For example, we may wish to estimate the probability that a patient has a particular disease, given a set of symptoms, or we may wish to know the probability that a novel peptide binds to a receptor, given the peptide's amino-acid composition. The conventional approach is to first use a classification algorithm to find partitions in feature space and to assign each partition to a class, and then to estimate the conditional probabilities as the proportion of patients or peptides that are correctly and incorrectly classified in each partition. Unfortunately, this estimation method often gives probability estimates that are in error by 20% or more, and thus can cause incorrect decisions. We have implemented and compared alternative methods. In Monte Carlo simulations the alternative methods are substantially more accurate than is the current method.
Full text
PDF




Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Giampaolo C., Gray A. T., Olshen R. A., Szabo S. Predicting chemically induced duodenal ulcer and adrenal necrosis with classification trees. Proc Natl Acad Sci U S A. 1991 Jul 15;88(14):6298–6302. doi: 10.1073/pnas.88.14.6298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harley C. B., Reynolds R. P. Analysis of E. coli promoter sequences. Nucleic Acids Res. 1987 Mar 11;15(5):2343–2361. doi: 10.1093/nar/15.5.2343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leibenhaut M. H., Hoppe R. T., Efron B., Halpern J., Nelsen T., Rosenberg S. A. Prognostic indicators of laparotomy findings in clinical stage I-II supradiaphragmatic Hodgkin's disease. J Clin Oncol. 1989 Jan;7(1):81–91. doi: 10.1200/JCO.1989.7.1.81. [DOI] [PubMed] [Google Scholar]
