Fig. 2. The bias introduced by cross-sample variations in sampling fractions.
Sampling fraction is defined as the ratio of expected absolute abundance in a sample to the corresponding absolute abundance in the ecosystem, which could be empirically estimated by the ratio of library size to the microbial load. Differences in sampling fractions may introduce bias and increase in false positive as well as false negative rates in differential abundance analysis. In this toy example, the microbial load for subject A in a unit volume of ecosystem (e.g., a unit volume of gut) is 18 (12 red + 6 green), while for subject B is 27 (18 red + 9 green). However, the samples taken from subject A and B have the same library size 6 (4 red + 2 green), the same observed absolute abundance as well as the same relative abundance of red and green taxa. Thus, one may mistakenly conclude that the red and green taxa are not differentially abundant, which is not the case in the two ecosystems. This false negative conclusion is caused by differences in the sampling fractions in the two samples. The sampling fraction in sample A is 3/9 and for B it is 2/9. One can similarly construct examples where a false positive conclusion is arrived at. Thus, a normalization method must account for differences in sampling fractions to avoid such erroneous conclusions.