Skip to main content
. 2021 May 25;22:265. doi: 10.1186/s12859-021-04193-6

Fig. 2.

Fig. 2

Effect of taxonomic data filtering on pairwise concordances and proportion of genera detected as differentially abundant. Differential abundance testing and calculation of pairwise concordances were performed again for 133 genera in dataset 1 and 195 genera in dataset 2 after filtering out genera that were found in < 10% of samples. Column a for each method, the distributions of pairwise concordances and proportion of genera detected as differentially abundant for dataset 1 (top row), dataset 2 (middle row), and for DA signatures that replicated across datasets (bottom row). Column b the relationship between pairwise concordances and the proportion of genera detected as differentially abundant. Each dot in the boxplots represents a method, plotted according to the concordance it had with the method on the x-axis (22 dots for each method). The bottom, middle, and top boundaries of each box in the boxplots represent the first, second (median), and third quartiles of the concordances. The whiskers (lines extending from the top and bottom of the box and ending in horizontal cap) extend to points within 1.5 times the interquartile range. The points extending above the whiskers are outliers. Red circles indicate the mean concordance for a method. The horizontal red line indicates the mean concordance for either dataset 1, dataset 2, or replicated signatures. Values above the box and whiskers are the differences in mean concordance between filtered and unfiltered (Fig. 1) data. Values above the bars in bar plots are the differences in proportion of differentially abundant genera between filtered and unfiltered (Fig. 1) data. For dot plots, each concordance value was plotted against the proportion of genera deemed differentially abundant by a method, and a linear trend line (black solid line) was fitted to the data. The grey area surrounding the trend line is the 95% CI of the fitted line. Pearson’s correlation coefficient (r) and corresponding P value (P) were calculated for each dot plot to test strength of the relationship. Concordances: pairwise concordances for each method; Proportion DA: proportion of genera detected as differentially abundant (DA) by a method; GLM: generalized linear model; CLR: centered log-ratio; KW: Kruskal–Wallis; TSS: total sum scaling (relative abundances); rCLR: robust centered log-ratio transformation with matrix completion; RLE: relative log expression; TMM: trimmed mean of M-values; NBZI: negative binomial zero-inflated