Skip to main content
. 2020 Dec 2;6:60. doi: 10.1038/s41522-020-00160-w

Fig. 3. Box plot of residuals between true sampling fraction and its estimate for each sample.

Fig. 3

In the box plot, the lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles). The median is represented by a solid line within the box. The upper whisker extends from the hinge to the largest value (maxima) no further than 1.5 times Interquartile Range (IQR, distance between the first and third quartiles) from the hinge, the lower whisker extends from the hinge to the smallest value (minima) at most 1.5 times IQR of the hinge. Data beyond the end of the whiskers are called “outlying” points. N = 90 samples examined over three study groups (denoted by circle, cross, and triangle, with 30 samples per group) and the data points are overlaid in each box. Each facet title indicates the normalization method and its variance is provided within parenthesis. The microbial absolute abundances in the ecosystem are generated from the log-normal distribution. By comparing residuals across different groups, an ideal box-plot should display a narrow height (i.e. smaller variability) and samples from different groups should be inter-mixed and not display any systematic separation. We note that all existing methods have larger variances compared to ANCOM-BC, and TSS has the largest variance. Except ANCOM-BC, UQ, and TMM, we see from the plot that circles, cross, and triangles are systematically separated, which indicates that ELib-UQ, ELib-TMM, CSS, MED, and TSS do not account for systematic bias due to differences in sampling fractions across groups.