Table 3.
Organism | Tech | Type | NCBI accession | Size (Mbp) | Time (CPU s) | LCA | Best hit |
---|---|---|---|---|---|---|---|
E. coli
K12 MG1655 |
MiSeq | Assembly | (SPAdes) | 4.6 | 2.45 | Entero. |
E. coli
K12 MG1655 |
E. coli
K12 MG1655 |
PacBio | Assembly | GCA_000801205 | 4.6 | 2.66 | Entero. |
E. coli
K12 MG1655 |
E. coli
DH1 |
ABI 3730 | Reads | (Trace Archive) | 60 | 17.08 | Entero. |
E. coli
DH1 |
E. coli
K12 MG1655 |
454 | Reads | SRR797242 | 233 | 57.12 | Entero. |
E. coli
K12 MG1655 |
E. coli
K12 MG1655 |
Ion PGM | Reads | SRR515925 | 407 | 72.01 | E. coli |
E. coli
K12 1655 |
E. coli
K12 MG1655 |
MiSeq | Reads | SRR1770413 | 387 | 72.01 | Entero. |
E. coli
KLY |
E. coli
K12 MT203 |
HiSeq | Reads | SRR490124 | 2155 | 369.86 | E. coli |
E. coli
GCF_000833635 |
E. coli
K12 MG1655 |
PacBio | Reads | SRR1284073 | 397 | 77.96 | E. coli | E. coli XH140A GCF_000226585 |
E. coli
K12 MG1655 |
MinION | 1D | ERR764952..55 | 248 | 55.52 | Entero. |
E. coli
O113 H21 |
E. coli
K12 MG1655 |
MinION | 2D | ERR764952..55 | 134 | 27.82 | E. coli | E. coli GCF_000953515 |
B. anthracis Ames | MinION | 1D + 2D | SRR2671867 | 210 | 44.66 | B. anthracis |
B. anthracis
str. Carbosap |
B. cereus ATCC 10987 | MinION | 1D + 2D | SRR2671868 | 266 | 76.85 | B. cereus ATCC 10987 |
B. cereus
ATCC 10987 |
Zaire ebolavirus | MinION | 1D + 2D | ERR1050070 | 8.7 | 2.06 | Zaire ebolavirus | Zaire ebolavirus Mayinga |
In all cases, Mash search required 21 MB of RAM for genome assemblies and 209 MB of RAM for sequencing runs (due to the additional Bloom filter overhead). Organism: source strain. Tech: Sequencing technology ABI 3730, 454 GS FLX, Illumina MiSeq, Illumina HiSeq, Ion PGM, PacBio RSII, Oxford Nanopore MinION. Type: Assembly, reads, 1D and 2D nanopore reads. NCBI accession: NCBI accession of the dataset or reads. The SPAdes [63] assembly was derived from the MiSeq reads. Size: total dataset size in Mbp. LCA: lowest common ancestor classification based on the NCBI taxonomy and the resulting hits within a significance tolerance of the best. In several cases, the LCA is at the family level (Enterobacteriaceae) due to significant Mash hits to both E. coli and S. sonnei species. This is a known species naming conflict within the NCBI taxonomy, with some genomes sharing ANI >98 % between these species. Best hit: reports the smallest significant distance reported