Skip to main content
. 2017 Mar 13;8:375. doi: 10.3389/fmicb.2017.00375

Table 1.

Features of Lyve-SET.

Description Lyve-SET kSNP RealPhy SNP-Pipeline SNVPhyl
Repeat detection Detection of repeat elements that could confound SNP results 0a 0 0 0 1a
Auto-choose reference or reference-free Independence of a reference genome or a user-defined reference genome to find SNPs 0 1 1 0 0
Removal of distant genomes Removal of genomes from analysis when they are greater than a certain threshold of SNPs 0 0 0 1 0
Phage detection Detection and masking of phages 1 0 0 0 0
Cliff detection Detection and masking of cliffs 1 0 0 0 0
SNP cluster detection Detection and masking of clustered SNPs 1 0b 0 1 1
Read cleaning Cleaning and trimming of raw reads 1 0 0 0 0
BAM file for each individual genome Standardized BAM files that describe the locations of mapped reads 1 0 1 1 1
VCF file for each individual genome Standardized VCF files that describe the locations of SNPs and evidence supporting them 1 1 1 1 1
Pooled VCF file Standardized VCF file that describes the locations of all SNPs for all genomes in a single file. This file is created with the bcftools merge command 1 0 0 1 0
Fasta alignment of all sites Standardized fasta file of all sites across the reference genome, whether they are invariant or SNP sites 1 0 1 0 1
Fasta alignment of SNPs Standardized fasta file of SNP sites 1 1 1 1 1
Standardized tree file File representing the phylogeny in a standardized format, e.g., Newick 1 1 1 0 1
Settings for different species Does the pipeline have customizable settings for different species? Lyve-SET has customized settings using the– –presets flag (Table 2) 1 0 0 0 0
Audit trail: repeatability Displays the path to the SNP pipeline installation and the exact command to repeat the analysis. Lyve-SET provides the command and all explicit and implicit options 1 0 0 1 1
Automated quality control Reviews the analysis results and describes low-quality results. This quality control can be a review of the length of the multiple sequence alignment, the number of positions masked in each genome, or simply reviewing something minor like the insert length of each genome. Lyve-SET encompasses this quality control step in set_diagnose.pl 1 0 0 1 1
a

Although Lyve-SET does not have repeat detection, it does not allow the short-read mapper to place reads where they map equally well in two locations, i.e., repeat regions. SNVPhyl can perform the same function but also straightforwardly identifies repeat regions in the reference genome.

b

Although kSNP does not have SNP cluster detection directly, its fundamental algorithm prohibits any SNP from occurring within k-1 bp from each other, where k is the length of the kmer. For example on a kmer value of 5, two SNPs must occur at least 4 bp from each other.

Features of Lyve-SET are shown with a comparison of the other SNP pipelines compared in this study. “1” indicates the feature is present; “0” indicates that the feature is absent. A comparison of software-level features, e.g., command-line vs. web interface, has already been performed in Petkau et al. (2016).