Table 2. . Results from a forward, stepwise model selection of factors influencing microbial community beta-diversity.
Data type | Distance metric | Factor | Adjusted R2 | df | AIC | F | p-value |
---|---|---|---|---|---|---|---|
16S | Unweighted UniFrac | Sample type | 0.87 | 24 | -556.59 | 172.97 | 0.0002 |
Host identity | 0.01 | 30 | -583.89 | 2.85 | 0.0002 | ||
Extraction protocol | 0.001 | 2 | -588.47 | 3.92 | 0.004 | ||
Weighted UniFrac | Sample type | 0.76 | 24 | -165.42 | 79.55 | 0.0002 | |
Host identity | 0.06 | 30 | -320.67 | 7.83 | 0.0002 | ||
Extraction protocol | 0.001 | 2 | -323.72 | 3.21 | 0.02 | ||
Jaccard | Sample type | 0.89 | 24 | -651.49 | 206.18 | 0.0002 | |
Host identity | 0.02 | 30 | -756.85 | 5.76 | 0.0002 | ||
Extraction protocol | 0.001 | 2 | -762.48 | 4.40 | 0.0008 | ||
RPCA | Sample type | 0.86 | 24 | -495.50 | 154.16 | 0.0002 | |
Host identity | 0.03 | 30 | -619.04 | 6.49 | 0.0002 | ||
Extraction protocol | 0.001 | 2 | -625.14 | 4.61 | 0.0002 | ||
Metagenomics | Unweighted UniFrac | Sample type | 0.93 | 26 | -958.24 | 317.60 | 0.0002 |
Host identity | 0.01 | 31 | -1062.60 | 5.57 | 0.0002 | ||
Extraction protocol | 0.001 | 2 | -1067.53 | 4.08 | 0.0006 | ||
Weighted UniFrac | Sample type | 0.87 | 26 | -602.92 | 173.32 | 0.0002 | |
Host identity | 0.02 | 31 | -676.11 | 4.42 | 0.0002 | ||
Extraction protocol | 0.003 | 2 | -693.97 | 10.09 | 0.0002 | ||
Jaccard | Sample type | 0.94 | 26 | -1084.87 | 391.42 | 0.0002 | |
Host identity | 0.01 | 31 | -1217.42 | 6.67 | 0.0002 | ||
RPCA | Sample type | 0.85 | 26 | -496.04 | 143.29 | 0.0002 | |
Host identity | 0.03 | 31 | -620.86 | 6.36 | 0.0002 | ||
Extraction protocol | 0.005 | 2 | -645.41 | 13.24 | 0.0002 |
Values are based on permutation tests of variation explained by redundancy analysis, done separately for four unique metrics for both 16S and metagenomics data. The full model included bead-beating time (i.e., 2 vs 20 min), sample biomass (i.e., high vs low biomass), sample type, host subject identity and extraction protocol (i.e., MagMAX 2-min, MagMAX 20-min, PowerSoil) as model variables. The 16S data were rarefied, as noted for Figure 3. Metagenomics data were rarefied to 17,000 host- and quality-filtered reads per sample or had samples with fewer than 17,000 reads excluded when using RPCA distances (n = 647 samples). Rarefaction depths were selected to maintain at least 75% samples from both high- and low-biomass datasets.
AIC: Akaike information criterion; df: degrees of freedom; RPCA: Robust principal component analysis.