. Author manuscript; available in PMC: 2024 Jul 8.

Published in final edited form as: IEEE Winter Conf Appl Comput Vis. 2021 Jun 14;2021:325–334. doi: 10.1109/wacv48630.2021.00037

Table 4:

Average pairwise model similarity scores of true positive, false positive, and all positive (foreground) predictions for the different ensemble methods in the myocardium segmentation experiment. Lower values indicate more diversity. A good ensemble should have high diversity in its (e.g. false positive) errors, but less diversity in correct predictions.

Method	True Positive	False Positive	All Positive

Baseline Ensemble	0.926	0.760	0.898
Low prec Ensemble (β=0.95)	0.971	0.644	0.842
Low Prec Ensemble (random β)	0.974	0.621	0.818