Skip to main content
. 2021 Jan 29;70(3):149–159. doi: 10.2144/btn-2020-0153

Table 2. . Results from a forward, stepwise model selection of factors influencing microbial community beta-diversity.

Data type Distance metric Factor Adjusted R2 df AIC F p-value
16S Unweighted UniFrac Sample type 0.87 24 -556.59 172.97 0.0002
    Host identity 0.01 30 -583.89 2.85 0.0002
    Extraction protocol 0.001 2 -588.47 3.92 0.004
  Weighted UniFrac Sample type 0.76 24 -165.42 79.55 0.0002
    Host identity 0.06 30 -320.67 7.83 0.0002
    Extraction protocol 0.001 2 -323.72 3.21 0.02
  Jaccard Sample type 0.89 24 -651.49 206.18 0.0002
    Host identity 0.02 30 -756.85 5.76 0.0002
    Extraction protocol 0.001 2 -762.48 4.40 0.0008
  RPCA Sample type 0.86 24 -495.50 154.16 0.0002
    Host identity 0.03 30 -619.04 6.49 0.0002
    Extraction protocol 0.001 2 -625.14 4.61 0.0002
Metagenomics Unweighted UniFrac Sample type 0.93 26 -958.24 317.60 0.0002
    Host identity 0.01 31 -1062.60 5.57 0.0002
    Extraction protocol 0.001 2 -1067.53 4.08 0.0006
  Weighted UniFrac Sample type 0.87 26 -602.92 173.32 0.0002
    Host identity 0.02 31 -676.11 4.42 0.0002
    Extraction protocol 0.003 2 -693.97 10.09 0.0002
  Jaccard Sample type 0.94 26 -1084.87 391.42 0.0002
    Host identity 0.01 31 -1217.42 6.67 0.0002
  RPCA Sample type 0.85 26 -496.04 143.29 0.0002
    Host identity 0.03 31 -620.86 6.36 0.0002
    Extraction protocol 0.005 2 -645.41 13.24 0.0002

Values are based on permutation tests of variation explained by redundancy analysis, done separately for four unique metrics for both 16S and metagenomics data. The full model included bead-beating time (i.e., 2 vs 20 min), sample biomass (i.e., high vs low biomass), sample type, host subject identity and extraction protocol (i.e., MagMAX 2-min, MagMAX 20-min, PowerSoil) as model variables. The 16S data were rarefied, as noted for Figure 3. Metagenomics data were rarefied to 17,000 host- and quality-filtered reads per sample or had samples with fewer than 17,000 reads excluded when using RPCA distances (n = 647 samples). Rarefaction depths were selected to maintain at least 75% samples from both high- and low-biomass datasets.

AIC: Akaike information criterion; df: degrees of freedom; RPCA: Robust principal component analysis.