Skip to main content
. 2021 Jul 14;6(4):e00455-21. doi: 10.1128/mSphere.00455-21

TABLE 1.

Random forest regression modeling of amplicon data using features collapsed at different taxonomic levelsa

Amplicon Season(s) Most accurate level Range of MAEs Top five important features Range of importance
16S rRNA Spring and summer (“combined”) ASV 793.33–851.41 Phyllobacteriaceae, Defluvibacter, Corynebacterium, Shinella, Devosia 0.040–0.023
16S rRNA Spring ASV 872.02–1,074.76 Gallicola, Cellulosimicrobium, Brachybacterium, Comamonas, Leucobacter 0.075–0.042
16S rRNA Summer ASV 723.98–853.38 Phyllobacteriaceae, Sphingopyxis, Alcaligenaceae, Devosia, Pseudaminobacter 0.014–0.010
18S rRNA Spring and summer (“combined”) 8 941.22–1,128.13 Eurotiomycetes, Sordariomycetes, Metazoa, Saccharomycetes, Tremellomycetes 0.067–0.033
18S rRNA Spring 5 1,025.53–1,443.86 Mucoromycota, Metazoa, Vannellida, Eumetazoa, Dikarya 0.102–0.047
18S rRNA Summer 7 820.67–1,083.95 Nematoda, Saccharomycotina, BOLA868, Alveolata, Eumetazoa 0.071–0.037
a

Model accuracy is assessed using mean absolute error (MAE). The range of MAEs resulting from modeling at all taxonomic levels is reported. The top five most important features within each model are arranged from the most to least important, as determined by the random forest regression. Note that some important features were not able to be classified all the way down to the same taxonomic level at which the model was performed (e.g., Metazoa). Underlined features include those commonly important between model types. Note that there are no commonly important features in the 18S rRNA models due differences in the most accurate levels, whereas in the 16S rRNA models, all of the most accurate models were at the ASV level. MAEs for all levels are reported in Table S2.