TABLE 1.
Random forest regression modeling of amplicon data using features collapsed at different taxonomic levelsa
Amplicon | Season(s) | Most accurate level | Range of MAEs | Top five important features | Range of importance |
---|---|---|---|---|---|
16S rRNA | Spring and summer (“combined”) | ASV | 793.33–851.41 | Phyllobacteriaceae, Defluvibacter, Corynebacterium, Shinella, Devosia | 0.040–0.023 |
16S rRNA | Spring | ASV | 872.02–1,074.76 | Gallicola, Cellulosimicrobium, Brachybacterium, Comamonas, Leucobacter | 0.075–0.042 |
16S rRNA | Summer | ASV | 723.98–853.38 | Phyllobacteriaceae, Sphingopyxis, Alcaligenaceae, Devosia, Pseudaminobacter | 0.014–0.010 |
18S rRNA | Spring and summer (“combined”) | 8 | 941.22–1,128.13 | Eurotiomycetes, Sordariomycetes, Metazoa, Saccharomycetes, Tremellomycetes | 0.067–0.033 |
18S rRNA | Spring | 5 | 1,025.53–1,443.86 | Mucoromycota, Metazoa, Vannellida, Eumetazoa, Dikarya | 0.102–0.047 |
18S rRNA | Summer | 7 | 820.67–1,083.95 | Nematoda, Saccharomycotina, BOLA868, Alveolata, Eumetazoa | 0.071–0.037 |
Model accuracy is assessed using mean absolute error (MAE). The range of MAEs resulting from modeling at all taxonomic levels is reported. The top five most important features within each model are arranged from the most to least important, as determined by the random forest regression. Note that some important features were not able to be classified all the way down to the same taxonomic level at which the model was performed (e.g., Metazoa). Underlined features include those commonly important between model types. Note that there are no commonly important features in the 18S rRNA models due differences in the most accurate levels, whereas in the 16S rRNA models, all of the most accurate models were at the ASV level. MAEs for all levels are reported in Table S2.