Table 3.

OTU picking approaches comparison. The table shows when each of the OTU picking approaches should be used and when they cannot be applied. It briefly describes the advantages and disadvantages of using each of the OTU picking approaches.

	denovo	closed-reference	open-reference
Must use if	There is no reference sequence collection to cluster against (e.g. infrequently used marker gene)	Comparing non-overlapping amplicons. The reference set of sequences must span both of the regions being sequenced	-

Cannot use if	Comparing non-overlapping amplicons (e.g. V2 and V4 regions of 16S rRNA)	There is no reference sequence collection to cluster against (e.g. infrequently used marker gene)	Comparing non-overlapping amplicons (e.g. V2 and V4 regions of 16S rRNA) There is no reference sequence collection to cluster against (e.g. infrequently used marker gene)

Pros	All reads are clustered	Fast, as it is fully parallelizable (useful for extremely large datasets) Better tree and taxonomy quality since the OTUs are already defined on the reference set.	All reads are clustered. Fast, as is partially run on parallel

Cons	Time consuming since it runs in serial	Inability to detect novel diversity with respect to the reference set because the reads that don’t hit the reference sequence collection are discarded, so the analysis focus on the “already known” diversity If the studied environment is not well-characterized, a large fraction of the reads can be thrown away	There are still some steps performed in serial. If the data set contains a lot of novel diversity with respect to the reference set, this can still be slow