Skip to main content
. 2017 Aug 30;30(4):1015–1063. doi: 10.1128/CMR.00016-17

TABLE 4.

Performance analysis of comparative genomics toolsa

Analysis tool (reference[s]) Concept Method Run time (h) Topology score (%) Web address(es) Input type(s) Input format(s) Output format(s)
Web based
    PubMLST (158) Web-accessible database where it is possible to run cgMLST and wgMLST analyses cgMLST/wgMLST NA NA https://pubmlst.org/ Contigs FASTA cgMLST/wgMLST profile
    CSI Phylogeny 1.4 (161) High-quality SNP method using reference mapping of reads and mapping and SNP calling assessments Reference-based SNP ND ND https://cge.cbs.dtu.dk/services/CSIPhylogeny/ Raw sequences, contigs FASTA, FASTQ ND
    NDtree 1.2 (161) Creates k-mers of reads and maps them to a reference; performs simple model to determine no. of SNPs Statistical method 3–3.5b ND https://cge.cbs.dtu.dk/services/NDtree/ Raw sequences FASTQ Newick
Command line
    kSNP3 (154, 155) Uses k-mer analyses to detect SNPs between strains without using either multiple-sequence alignment or a reference genome Non-reference-based SNP 0.5c 91.80–95.80c,e https://sourceforge.net/projects/ksnp/ Raw sequences, contigs FASTA Newick, MSA
    Roary (169) Tool for constructing pangenomes from contigs Pangenome 4.30d 100d https://sanger-pathogens.github.io/Roary/ Contigs GFF3 FASTA, TXT, CSV, Rtab
    Pan-Seqf (175) Pangenome assembler with additional locus finder for core/accessory gene allele profiles (a Web-based version is also available) Pangenome ND ND https://github.com/chadlaing/Panseq, https://lfz.corefacility.ca/panseq/ Contigs FASTA TXT, FASTA
    Lyve-SET (179) High-quality SNP method using reference mapping of reads and mapping and SNP calling assessments Reference-based SNP 6.25c 85c https://github.com/lskatz/lyve-SET Raw sequences, contigsg FASTA, FASTQ Matrix, FASTA, Newick, VCF
    SPANDx (182) Complete workflow for creating SNP/indel matrixes as well as locus presence/absence matrixes from raw sequencing reads from a range of NGS technologies Reference-based SNP 3.1c 100c https://sourceforge.net/projects/spandx/ Raw sequences FASTA, FASTQ NEXUS
a

All quantitative performance measures were taken from previously reported data, as indicated. ND, no data; NA, not applicable; MSA, multiple-sequence alignment; GFF3, General Feature Format 3; VCF, variant call format.

b

Based on 46 VTEC genomes (20).

c

Based on 21 E. coli genomes (167).

d

Wall time for 1,000 S. enterica serovar Typhi genomes (169).

e

Using core.

f

A Web-based version is also available.

g

Contigs are simulated to reads.