Table 1.
Programa | Reference | Target | Implementation | Method Description | Advantages and Limitations | Available from |
---|---|---|---|---|---|---|
OLGenie | This study | Protein-coding sequence | Perl | Estimates dN/dS by introducing three modifications to Wei–Zhang: 1) minimal overlapping units of 6 nt, that is, 1 reference codon and 2 alternate codons; 2) the Nei–Gojobori method; and 3) only single nucleotide differences rather than all mutational pathways | Fast; applicable to multiple sequence alignments; tree-agnostic; conservative for purifying selection and high levels of divergence, but nonconservative for positive selection; loss of power for pairwise distance >0.1 and neighboring variants | https://github.com/chasewnelson/OLGenie, last accessed April 10, 2020. |
“Frameshift” | Schlub et al. (2018) | Protein-coding sequence | R | Finds ORFs longer than expected by chance given nucleotide context; includes two complementary methods: “codon permutation” and “synonymous mutation” | Medium to high accessibility as an R script requiring minor modifications. Can only detect relatively long OLGs. Slow for long sequences. | https://github.com/TimSchlub/Frameshift, last accessed April 10, 2020. |
“StopStatistics” | Cassan et al. (2016) | Protein-coding sequence | Python, bash | Tests for depletion of those stop codons in sas12 that would be synonymous in reference; also applicable to enrichment of start codons | Low accessibility; scripts specific to particular data sets | https://figshare.com/s/9668ef62e84488d4787a, last accessed April 10, 2020. |
FRESCo | Sealfon et al. (2015) | Constraint at synonymous sites | HYPHY batch language | Rates of nucleotide evolution across an alignment inferred using a maximum-likelihood model. Models of neutral and nonneutral evolution tested in sliding windows to infer regions with excess synonymous constraint | Suitable for short genomes/regions despite using a codon model; requires a phylogenetic tree; performs best at deep sequence coverage and increased sequence divergence | https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-015-0603-7/MediaObjects/13059_2015_603_MOESM1_ESM.zip, last accessed April 10, 2020. |
Wei–Zhang method | Wei and Zhang (2015) | Protein-coding sequence | Perl | Estimates dN/dS in minimal-length coding regions flanked by variant-free codons (i.e., data-dependent minimal overlapping units) to determine the effects of all mutational pathways in the reference and alternate genes using the modified Nei–Gojobori method | Accurate but slow, especially for highly diverged sequences; tree-agnostic; outperforms Sabath et al. method (according to Wei and Zhang [2015]); only implemented for pairs of sequences; low accessibility and scalability | http://www.umich.edu/~zhanglab/download/Xinzhu_GBE2014/index.htm, last accessed April 10, 2020. |
Synplot2 | Firth (2014) | Constraint at synonymous sites | C++; Web-interface | Evolution at synonymous sites in a codon alignment compared to a null model of neutral evolution in order to infer sites with excess constraint; expected diversity at synonymous sites is set equal to diversity over the full alignment, and diversity is measured between sequential pairs around a phylogenetic tree | Medium accessibility; fast; limited use in the case of sas12; requires a phylogenetic tree; does not distinguish between coding and noncoding overlapping features | http://guinevere.otago.ac.nz/cgi-bin/aef/synplot.pl, last accessed April 10, 2020 |
http://www.firthlab.path.cam.ac.uk/SynPlot2.zip, last accessed April 10, 2020. | ||||||
KaKi (“Multilayer”) | Rubinstein et al. (2011) | Unexpected variation at synonymous sites | C++ | Maximum-likelihood codon model approach that allows variation in both the synonymous and nonsynonymous substitution rates along a sequence; accounting for variability in the baseline substitution rate allows more reliable inference of positive selection | Low accessibility (requires an old Linux distribution to install); requires a phylogenetic tree; complex input and results; focus of explicit testing is on positive selection; applicable (but not specific) to protein-coding OLGs. | https://www.tau.ac.il/~talp/multilayer.tar.gz, last accessed April 10, 2020. |
https://www.tau.ac.il/~talp/readme.txt, last accessed April 10, 2020. | ||||||
Sabath et al. method | Sabath et al. (2008) | Protein-coding sequence | MATLAB | Maximum-likelihood framework for estimating dN/dS; similar to the (nonimplemented) method of Pedersen and Jensen (2001) | Slower than Wei–Zhang; not recommended for highly similar sequences (pairwise distance <0.08); similar to OLGenie in the use of 6 nt (“sextet”) units; only implemented for pairs of sequences; low accessibility and scalability | http://nsmn1.uh.edu/dgraur/Software.html, last accessed April 10, 2020. |
MLOGD | Firth and Brown (2006) | Protein-coding sequence | C++ | Simple statistics on properties of sequence variation by codon position, and a maximum-likelihood statistic (MLOGD) taking into account nucleotide and amino acid substitution rates and codon usage | Less sensitive at detecting OLGs than Synplot2 (according to Firth [2014]); requires a minimum of ∼20 independent nucleotide variants; sas12 frame generates false-positives. | http://guinevere.otago.ac.nz/aef/MLOGD/software.html, last accessed April 10, 2020. |
Programs in descending order by year of publication; methods lacking implementations at active URLs are not listed.