. 2017 Mar 13;8:375. doi: 10.3389/fmicb.2017.00375

Table 1.

Features of Lyve-SET.

	Description	Lyve-SET	kSNP	RealPhy	SNP-Pipeline	SNVPhyl
Repeat detection	Detection of repeat elements that could confound SNP results	0^a	0	0	0	1^a
Auto-choose reference or reference-free	Independence of a reference genome or a user-defined reference genome to find SNPs	0	1	1	0	0
Removal of distant genomes	Removal of genomes from analysis when they are greater than a certain threshold of SNPs	0	0	0	1	0
Phage detection	Detection and masking of phages	1	0	0	0	0
Cliff detection	Detection and masking of cliffs	1	0	0	0	0
SNP cluster detection	Detection and masking of clustered SNPs	1	0^b	0	1	1
Read cleaning	Cleaning and trimming of raw reads	1	0	0	0	0
BAM file for each individual genome	Standardized BAM files that describe the locations of mapped reads	1	0	1	1	1
VCF file for each individual genome	Standardized VCF files that describe the locations of SNPs and evidence supporting them	1	1	1	1	1
Pooled VCF file	Standardized VCF file that describes the locations of all SNPs for all genomes in a single file. This file is created with the `bcftools merge` command	1	0	0	1	0
Fasta alignment of all sites	Standardized fasta file of all sites across the reference genome, whether they are invariant or SNP sites	1	0	1	0	1
Fasta alignment of SNPs	Standardized fasta file of SNP sites	1	1	1	1	1
Standardized tree file	File representing the phylogeny in a standardized format, e.g., Newick	1	1	1	0	1
Settings for different species	Does the pipeline have customizable settings for different species? Lyve-SET has customized settings using the– –`presets` flag (Table 2)	1	0	0	0	0
Audit trail: repeatability	Displays the path to the SNP pipeline installation and the exact command to repeat the analysis. Lyve-SET provides the command and all explicit and implicit options	1	0	0	1	1
Automated quality control	Reviews the analysis results and describes low-quality results. This quality control can be a review of the length of the multiple sequence alignment, the number of positions masked in each genome, or simply reviewing something minor like the insert length of each genome. Lyve-SET encompasses this quality control step in `set_diagnose.pl`	1	0	0	1	1

Although Lyve-SET does not have repeat detection, it does not allow the short-read mapper to place reads where they map equally well in two locations, i.e., repeat regions. SNVPhyl can perform the same function but also straightforwardly identifies repeat regions in the reference genome.

Although kSNP does not have SNP cluster detection directly, its fundamental algorithm prohibits any SNP from occurring within k-1 bp from each other, where k is the length of the kmer. For example on a kmer value of 5, two SNPs must occur at least 4 bp from each other.

Features of Lyve-SET are shown with a comparison of the other SNP pipelines compared in this study. “1” indicates the feature is present; “0” indicates that the feature is absent. A comparison of software-level features, e.g., command-line vs. web interface, has already been performed in Petkau et al. (2016).