Skip to main content
. 2022 Sep 5;25(1):188–202. doi: 10.1093/biostatistics/kxac035

Fig. 1.

Fig. 1

The Inline graphic discordance metric varies as function of Inline graphic (proportion of within-cluster distances), which is a function of the group balance. We randomly sampled Inline graphic = 1000 observations with 500 features from a mixture distribution Inline graphic with Inline graphic being the probability of an observation coming from Inline graphic and Inline graphic coming from Inline graphic with (a,b) no mean difference (Inline graphic) (or a “null” setting), (c,d) a small mean difference (Inline graphic), and (e,f) a large mean difference (Inline graphic). We simulate data with (a,c,e) balanced groups (Inline graphic = 0.5) and (b,d,f) imbalanced groups (Inline graphic = 0.9). For each simulation, the top row contains observations belonging to a group (Inline graphic and Inline graphic) along the first two principal components (PCs) and the bottom row contains histograms of the within- (Inline graphic) and between- (Inline graphic) cluster distances (Euclidean) for the balanced and imbalanced groups. Refer to Figure S1 of the Supplementary material available at Biostatistics online for an illustration of (and Section 2.4.1 for the explicit relationship between) the proportion of within-cluster distances (Inline graphic) and the group balance (Inline graphic). For each simulation, the bottom row includes Inline graphic and the two discordance metrics Inline graphic and Inline graphic. Generally, values close to zero represent more concordance, while a larger values represent more discordance.