. 2022 Sep 5;10:e13821. doi: 10.7717/peerj.13821

Table 1. The summary description for six datasets. Each dataset is numbered, named, and given a description. The intended use is also listed.

Dataset	Name	Description	Intended use	Reference
1	Boston outbreak	A cohort of 63 samples from a real outbreak with three introductions, metagenomic approach	To understand the features of virus transmission during a real outbreak setting	Lemieux et al. (2021)
2	CoronaHiT rapid	A cohort of 39 samples prepared by 18 h wet-lab protocol and sequenced by two platforms (Illumina vs MinION), amplicon-based approach	To verify that a bioinformatics pipeline finds virtually no differences between sequences from the same genome run on different platforms.	Baker et al. (2021)
3	CoronaHiT routine	A cohort of 69 samples prepared by 30 h wet-lab protocol and sequenced by two platforms (Illumina vs MinION), amplicon-based approach	To verify that a bioinformatics pipeline finds virtually no differences between sequences from the same genome run on different platforms.	Baker et al. (2021)
4	VOI/VOC lineages	A cohort of 16 samples from 11 representative CDC defined VOI/VOC^a lineages as of 05/30/2021, amplicon-based approach	To benchmark lineage-calling bioinformatics software, especially for VOI/VOCs.	This study
5	Non-VOI/VOC lineages	A cohort of 39 samples from representative non VOI/VOC^a lineages, amplicon-based approach	To benchmark lineage-calling bioinformatics software, nonspecific to VOI/VOCs.	This study
6	Failed QC	A cohort of 24 samples failed basic QC metrics, covering 8 possible failure scenarios, amplicon-based approach	To serve as controls to test bioinformatics QC cutoffs.	This study

Notes.

VOI, variant of interest; VOC, variant of concern