Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2023 Jul 14;4(7):100790. doi: 10.1016/j.patter.2023.100790

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2023 The Author(s)

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

PMC Copyright notice

Illustrations of different cases of binary classification under group underrepresentation

Circles and crosses denote the two possible outcomes (values of y), blue (majority) and red (minority), two patient groups of interest. The variables $x_{1}$ and $x_{2}$ denote model inputs.

(A) Group underrepresentation is not problematic if the same decision boundary is optimal for all groups.

(B) If the optimal decision boundaries differ between groups, and either the model or the input data are not sufficiently expressive to capture the optimal decision boundaries for all groups simultaneously, standard (empirical risk minimizing) learning approaches will optimize for performance in the majority group (here, the blue group).

(C) An expressive model could learn a decision boundary (red) that is optimal for both groups. In practice, however, it is unclear whether a training procedure will indeed identify this optimal boundary. This is due to inductive biases,¹⁹ local optimization schemes, and limited dataset size for the minority groups, all combined with standard empirical risk minimization, which prioritizes optimizing performance for the majority group.