Skip to main content
. 2023 Apr 27;89(5):e02108-22. doi: 10.1128/aem.02108-22

TABLE 2.

Degree of overestimation due to intragenomic heterogeneity and underestimation caused by insufficient interspecific variation for different 16S rRNA gene regions under the ASV and 97%-OTU levels

Identity threshold 16S gene region HIQ-T-NCBIa
HIQ-C-NCBIa
Overestimation (%)b Underestimation (%)b
No. of sequences No. of OTUs No. of sequences No. of OTUs
100% Full-length 29,416 15,727 6,550 6,131 156.5 6.4
V1–V2 29,459 10,287 6,562 5,433 89.3 17.2
V1–V3 28,883 11,623 6,339 5,523 110.5 12.9
V3 29,246 5,467 6,554 4,060 34.6 38.0
V3–V4 29,994 8,079 6,866 5,325 51.7 22.4
V4 29,903 5,593 6,829 4,341 28.8 36.4
V4–V5 30,020 5,636 6,890 4,392 28.3 36.2
V5–V7 29,058 7,333 6,402 4,823 52.0 24.7
V6 26,669 3,816 5,713 2,998 27.3 47.5
V6–V8 30,027 8,310 6,890 5,407 53.7 21.5
V7–V9 26,179 6,474 5,875 4,374 48.0 25.6
97% Full-length 29,416 3,181 6,550 3,035 4.8 53.7
V1–V2 29,459 4,074 6,562 3,647 11.7 44.4
V1–V3 28,883 3,788 6,339 3,478 8.9 45.1
V3 29,246 2,715 6,554 2,556 6.2 61.0
V3–V4 29,994 2,794 6,866 2,663 4.9 61.2
V4 29,903 2,284 6,829 2,186 4.5 68.0
V4–V5 30,020 2,314 6,890 2,217 4.4 67.8
V5–V7 29,058 2,511 6,402 2,384 5.3 62.8
V6 26,669 2,989 5,713 2,623 14.0 54.1
V6–V8 30,027 2,692 6,890 2,566 4.9 62.8
V7–V9 26,179 2,114 5,875 2,025 4.4 65.5
a

HIQ-T-NCBI, the data set constructed considering intragenomic heterogeneity; HIQ-C-NCBI, the data set constructed ruling out intragenomic heterogeneity.

b

The overestimation rate was calculated as (AB)/B · 100%, where A represents the number of OTUs from HIQ-T-NCBI and B represents the number of OTUs from HIQ-C-NCBI. The underestimation rate was calculated as (CB)/C · 100%, where C is the number of sequences in HIQ-C-NCBI.