Vardeman, S. B. and Morris, M. D. (2013), “Majority Voting by Independent Classifiers can Increase Error Rates,” The American Statistician, 67, 94–96: Comment by Baker, Xu, Hu, and Huang and Reply

Stuart G Baker; Jian-Lun Xu; Ping Hu; Peng Huang

doi:10.1080/00031305.2014.882867

. Author manuscript; available in PMC: 2015 Jun 23.

Published in final edited form as: Am Stat. 2014 May 20;68(2):125–126. doi: 10.1080/00031305.2014.882867

Vardeman, S. B. and Morris, M. D. (2013), “Majority Voting by Independent Classifiers can Increase Error Rates,” The American Statistician, 67, 94–96: Comment by Baker, Xu, Hu, and Huang and Reply

Stuart G Baker ¹, Jian-Lun Xu ², Ping Hu ³, Peng Huang ⁴

PMCID: PMC4477968 NIHMSID: NIHMS687457 PMID: 26113746

Vardeman and Morris (VM) found a counterexample to the assertion that a majority voting classifier always performs better than its independent component classifiers. VM's counterexample applies to independent classifiers, but biostatisticians are often more interested in conditionally independent classifiers. In biomedical studies, where class is disease status, classifiers are inherently dependent simply because positivity of any reasonable classifier depends on the presence or absence of disease. Conditional independence of classifiers, given disease status, could arise if the classifiers are detecting different biological phenomenon, such as tissue abnormalities versus protein markers.

To explore how majority voting affects classification performance with conditionally independent classifiers, we investigated many examples (Figure 1). Much as we expected, we found that it generally works quite well. However, we also found that conditional independence is not a sufficient condition to ensure that majority voting always leads to better classification performance than the individual classifiers.

Comparison of ROC curves for majority voting classifier and conditionally independent component classifiers. The 45-degree line is included for reference.

As with VM, we considered two classes and component classifiers with identical classification performances. To measure classification performance we used receiver operating characteristic (ROC) curves. ROC curves play a central role in the evaluation of diagnostic and screening tests (Baker 2003; Pepe 2003). In accordance with a decision theory view of ROC curves (Baker, Van Calster, and Steyerberg 2012), we restricted our investigation to ROC curves that are concave, namely with monotonically decreasing slopes from left to right. For a given cutpoint x of a score, let fpr(x) and tpr(x) denote the false positive and true positive rates of the component classifier. The ROC curve for the component classifier plots tpr(x) versus fpr(x). At a given cutpoint, the true positive rate for the majority voting classifier is the probability of three or exactly two true positives among the component classifiers, namely tprM(x) = tpr(x)³ + 3 tpr(x)² {1−tpr(x)}. Similarly the false positive rate for the majority voting classifier is fprM(x) = fpr(x)³ + 3 fpr(x)² {1−fpr(x)}. The ROC curve for the majority voting classifier plots tprM(x) versus fprM(x). We considered the following six cases.

Binormal

In this standard formulation of ROC curves (Pepe 2003), the score that determines classification follows a normal distribution with a different mean for each class. Here we set fpr(x) = 1−ϕ(x; 0, 0.25) and tpr(x) = 1− ϕ(x; 0.12, 0.25), where ϕ(x; μ, σ) denotes the cumulative normal distribution function corresponding to mean μ, and standard deviation σ.

Mixture: VM example

Based on the online supplement of VM, the score that determines classification follows a mixture of normal distributions and gives rise to fpr(x) = 1−{(12/33) ϕ(x; 0, 0.25) +(21/33) ϕ(x; 1, 0.25)} and tpr(x) = 1−{(4/736) ϕ(x; 0, 0.25) +(732/736) ϕ(x; 1, 0.25)}.

Square root

The true positive rate is tpr = fpr^½.

Constant odds ratio

In the constant odd ratio (OR) formulation tpr (1−fpr) / {(1−tpr) fpr} = OR, which gives tpr = fpr OR /{1+fpr (OR−1)}. In this formulation an odds ratio of 3, which is large by epidemiological standards, implies poor classification performance (Pepe et al. 2004). Here we set OR = 10.

Three-segment A

The three-segment ROC curve successively connects points (0,0), (a₁, b₁), (a₂, b₂), and (1,1) with line segments using the formula tpr = (b₁/ a₁) fpr if fpr ≤ a₁, {(b₂− b₁)/(a₂− a₁)} (fpr− a₁)} + b₁ if a₁ < fpr ≤ a₂, and {(1− b₂) / (1− a₂)} (fpr−a₂) + b₂, if fpr > a₂. Here (a₁, b₁) = (0.3, 0.5) and (a₂, b₂) = (0.5, 0.7).

Three-segment B

For this three-segment ROC curve, (a₁, b₁) = (0.1, 0.5) and (a₂, b₂) = (0.75, 0.99).

Two-segment VM

The two-segment ROC curve successively connects points (0,0), (a, b), and (1,1) with line segments using the formula tpr = (b/ a) fpr if fpr ≤ a and {(1− b) / (1− a)} (fpr− a) + b if fpr > a. Let f_c denote the binary outcomes of component classifiers c = 1, 2, 3 in Table 1 of VM, from which we computed a = pr(f₁ = 1 | Y = 0) = pr(f₂ = 1 | Y = 0) = pr(f₃ = 1 | Y = 0), and b = pr(f₁ = 1 | Y = 1) = pr(f₂ = 1 | Y = 1) = pr(f₃ = 1 | Y = 1). For classifier 1, pr(f₁ = 1 | Y = y) = Σ_j _{= (0,1)} Σ_k _{= (0,1)} pr(f₁ = 1, f₂ = j, f₃ = k, y) / Σ_i _{= (0,1)} Σ_j _{= (0,1)} Σ_k _{= (0,1)} pr(f₁ = i, f₂ = j, f₃ = k, y) and similarly for the other classifiers. This procedure gave (a, b) = (0.636, 0.995).

Two-segment A

In this two-segment ROC curve, (a, b) = (0.1, 0.50).

Two-segment B

In this two-segment ROC curve, (a, b) = (0.75, 0.99).

In three of these six cases, the ROC curve for the majority voting classifier was superior to the ROC curves of the conditionally independent component classifiers over the entire range of false positive rates. However, to our surprise, in three cases (Two-Segment B, Two-Segment VM, Three-Segment B) the ROC curve for the conditionally independent classifier was slightly greater (by at most 0.02, 0.01, 0.003, respectively) than the ROC curve for the majority voting classifier over a small range of false positives ([0.647,0.718], [0.585, 0.613], [0.670, 0.688], respectively). To better understand how these results arise, we derived a simple set of sufficient conditions for part of these ranges for the two-segment cases (Appendix); a key quantity is the ratio b/a for a≥ ½.

If one ROC curve is higher than another ROC curve over the entire range of false positives, the classifier for the former is clearly superior to that for the latter. How does one compare classifiers when the ROC curves cross? In the three cases discussed, the crossover was so slight as to make little difference in classification performance. VM noted “the importance of precision of language when saying what mathematical results mean in practice.” In this spirit, we mention a precise way to compare classifiers with crossing ROC curves based on an anticipated cost/benefit ratio for a practical application. Typically one class is positive for an event, such as disease, and one class is negative for the event. The optimal point on the ROC curve, which corresponds to the largest net benefit, occurs where the slope of the ROC curve equals {(cost of a false positive)/(benefit of a true positive)} × {(probability of no event)/(probability of event)} (Metz 1978; Baker and Kramer 2007). The decision-maker chooses the classifier whose optimal point on the ROC curve (which also determines the optimal cutpoint on the classifier) has the largest net benefit. When considering a range of cost-benefit ratios, analysts can more easily compare classifiers via decision curves (Vickers 2008) or relative utility curves (Baker 2009; Baker, Van Calster, and Steyerberg 2012).

Appendix

For the two-segment ROC curve with point (a, b), we derive a set of five simple conditions that are sufficient for the component classifier to perform better than the majority voting classifier. We begin with Condition 1, fpr ≤ a, and Condition 2, fprM ≤ a. Conditions 1 and 2 imply tpr = (a/b) fpr and tprC = (b/a) fprM, where tprC is the true positive rate of the component classifier corresponding to false positive rate fprM. Condition 3, fpr > (3/2)/(1 + b/a), implies the superiority of the component classifier at fprM, namely tprC > tprM. We next need to specify the acceptable values of a and b for these Conditions to hold. Combining Conditions 1 and 3 gives b ≥ 3/2 − a; with the additional requirement that b≥ a for a proper ROC curve, we obtain Condition 4, b ≥ Max(a, 3/2-a). Combining b ≤ 1 with Condition 4 implies Condition 5, a≥ ½. Consider Two-Segment B. Because a = 0.75 and b = 0.99, Conditions 4 and 5 hold. Under Condition 2, fprM ≤ 0.75, which implies fpr ≤ 0.674, thereby satisfying Condition 1. Under Condition 3, fpr > 0.647. Consider Two-Segment VM. Because a = 0.636 and b = 0.995, Conditions 4 and 5 hold. Under Condition 2, fprM ≤ 0.636, which implies fpr ≤ 0.592, thereby satisfying Condition 1. Under Condition 3, fpr > 0.585.

Contributor Information

Stuart G. Baker, National Cancer Institute

Jian-Lun Xu, National Cancer Institute.

Ping Hu, National Cancer Institute.

Peng Huang, Johns Hopkins Medical Institution, In the Public Domain.

References

Baker SG. The Central Role of Receiver Operating Characteristic (ROC) Curves in Evaluating Tests for the Early Detection of Cancer. Journal of the National Cancer Institute. 2003;95:511–515. doi: 10.1093/jnci/95.7.511. [DOI] [PubMed] [Google Scholar]
Baker SG. Putting Risk Prediction in Perspective: Relative Utility Curves. Journal of the National Cancer Institute. 2009;101:1538–1542. doi: 10.1093/jnci/djp353. [DOI] [PMC free article] [PubMed] [Google Scholar]
Baker SG, Kramer BS. Peirce, Youden, and Receiver Operating Characteristic Curves. The American Statistician. 2007;61:343–346. [Google Scholar]
Baker SG, Van Calster B, Steyerberg EW. Evaluating a New Marker for Risk Prediction Using the Test Tradeoff: An Update. International Journal of Biostatistics. 2012;8:5. doi: 10.1515/1557-4679.1395. [DOI] [PubMed] [Google Scholar]
Metz CE. Basic Principles of ROC Analysis. Seminars in Nuclear Medicine. 1978;VIII:283–298. doi: 10.1016/s0001-2998(78)80014-2. [DOI] [PubMed] [Google Scholar]
Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. Cambridge, MA: Oxford University Press; 2003. [Google Scholar]
Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the Odds Ratio in Gauging the Performance of a Diagnostic, Prognostic, or Screening Marker. American Journal of Epidemiology. 2004;159:882–890. doi: 10.1093/aje/kwh101. [DOI] [PubMed] [Google Scholar]
Vardeman SB, Morris MD. Majority Voting by Independent Classifiers Can Increase Error Rates. The American Statistician. 2013;67:94–96. doi: 10.1080/00031305.2014.882867. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vickers AJ. Decision Analysis for the Evaluation of Diagnostic Tests, Prediction Models and Molecular Markers. The American Statistician. 2008;62:314–320. doi: 10.1198/000313008X370302. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Baker SG. The Central Role of Receiver Operating Characteristic (ROC) Curves in Evaluating Tests for the Early Detection of Cancer. Journal of the National Cancer Institute. 2003;95:511–515. doi: 10.1093/jnci/95.7.511. [DOI] [PubMed] [Google Scholar]

[R2] Baker SG. Putting Risk Prediction in Perspective: Relative Utility Curves. Journal of the National Cancer Institute. 2009;101:1538–1542. doi: 10.1093/jnci/djp353. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Baker SG, Kramer BS. Peirce, Youden, and Receiver Operating Characteristic Curves. The American Statistician. 2007;61:343–346. [Google Scholar]

[R4] Baker SG, Van Calster B, Steyerberg EW. Evaluating a New Marker for Risk Prediction Using the Test Tradeoff: An Update. International Journal of Biostatistics. 2012;8:5. doi: 10.1515/1557-4679.1395. [DOI] [PubMed] [Google Scholar]

[R5] Metz CE. Basic Principles of ROC Analysis. Seminars in Nuclear Medicine. 1978;VIII:283–298. doi: 10.1016/s0001-2998(78)80014-2. [DOI] [PubMed] [Google Scholar]

[R6] Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. Cambridge, MA: Oxford University Press; 2003. [Google Scholar]

[R7] Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the Odds Ratio in Gauging the Performance of a Diagnostic, Prognostic, or Screening Marker. American Journal of Epidemiology. 2004;159:882–890. doi: 10.1093/aje/kwh101. [DOI] [PubMed] [Google Scholar]

[R8] Vardeman SB, Morris MD. Majority Voting by Independent Classifiers Can Increase Error Rates. The American Statistician. 2013;67:94–96. doi: 10.1080/00031305.2014.882867. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Vickers AJ. Decision Analysis for the Evaluation of Diagnostic Tests, Prediction Models and Molecular Markers. The American Statistician. 2008;62:314–320. doi: 10.1198/000313008X370302. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Vardeman, S. B. and Morris, M. D. (2013), “Majority Voting by Independent Classifiers can Increase Error Rates,” The American Statistician, 67, 94–96: Comment by Baker, Xu, Hu, and Huang and Reply

Stuart G Baker

Jian-Lun Xu

Ping Hu

Peng Huang

Figure 1.

Binormal

Mixture: VM example

Square root

Constant odds ratio

Three-segment A

Three-segment B

Two-segment VM

Two-segment A

Two-segment B

Appendix

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Vardeman, S. B. and Morris, M. D. (2013), “Majority Voting by Independent Classifiers can Increase Error Rates,” The American Statistician, 67, 94–96: Comment by Baker, Xu, Hu, and Huang and Reply

Stuart G Baker

Jian-Lun Xu

Ping Hu

Peng Huang

Figure 1.

Binormal

Mixture: VM example

Square root

Constant odds ratio

Three-segment A

Three-segment B

Two-segment VM

Two-segment A

Two-segment B

Appendix

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases