Skip to main content
. 2020 Apr 3;37(8):2440–2449. doi: 10.1093/molbev/msaa087

Table 1.

Methods with Available Implementations for Detecting Selection in Overlapping Genes.

Programa Reference Target Implementation Method Description Advantages and Limitations Available from
OLGenie This study Protein-coding sequence Perl Estimates dN/dS by introducing three modifications to Wei–Zhang: 1) minimal overlapping units of 6 nt, that is, 1 reference codon and 2 alternate codons; 2) the Nei–Gojobori method; and 3) only single nucleotide differences rather than all mutational pathways Fast; applicable to multiple sequence alignments; tree-agnostic; conservative for purifying selection and high levels of divergence, but nonconservative for positive selection; loss of power for pairwise distance >0.1 and neighboring variants https://github.com/chasewnelson/OLGenie, last accessed April 10, 2020.
“Frameshift” Schlub et al. (2018) Protein-coding sequence R Finds ORFs longer than expected by chance given nucleotide context; includes two complementary methods: “codon permutation” and “synonymous mutation” Medium to high accessibility as an R script requiring minor modifications. Can only detect relatively long OLGs. Slow for long sequences. https://github.com/TimSchlub/Frameshift, last accessed April 10, 2020.
“StopStatistics” Cassan et al. (2016) Protein-coding sequence Python, bash Tests for depletion of those stop codons in sas12 that would be synonymous in reference; also applicable to enrichment of start codons Low accessibility; scripts specific to particular data sets https://figshare.com/s/9668ef62e84488d4787a, last accessed April 10, 2020.
FRESCo Sealfon et al. (2015) Constraint at synonymous sites HYPHY batch language Rates of nucleotide evolution across an alignment inferred using a maximum-likelihood model. Models of neutral and nonneutral evolution tested in sliding windows to infer regions with excess synonymous constraint Suitable for short genomes/regions despite using a codon model; requires a phylogenetic tree; performs best at deep sequence coverage and increased sequence divergence https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-015-0603-7/MediaObjects/13059_2015_603_MOESM1_ESM.zip, last accessed April 10, 2020.
Wei–Zhang method Wei and Zhang (2015) Protein-coding sequence Perl Estimates dN/dS in minimal-length coding regions flanked by variant-free codons (i.e., data-dependent minimal overlapping units) to determine the effects of all mutational pathways in the reference and alternate genes using the modified Nei–Gojobori method Accurate but slow, especially for highly diverged sequences; tree-agnostic; outperforms Sabath et al. method (according to Wei and Zhang [2015]); only implemented for pairs of sequences; low accessibility and scalability http://www.umich.edu/~zhanglab/download/Xinzhu_GBE2014/index.htm, last accessed April 10, 2020.
Synplot2 Firth (2014) Constraint at synonymous sites C++; Web-interface Evolution at synonymous sites in a codon alignment compared to a null model of neutral evolution in order to infer sites with excess constraint; expected diversity at synonymous sites is set equal to diversity over the full alignment, and diversity is measured between sequential pairs around a phylogenetic tree Medium accessibility; fast; limited use in the case of sas12; requires a phylogenetic tree; does not distinguish between coding and noncoding overlapping features http://guinevere.otago.ac.nz/cgi-bin/aef/synplot.pl, last accessed April 10, 2020
http://www.firthlab.path.cam.ac.uk/SynPlot2.zip, last accessed April 10, 2020.
KaKi (“Multilayer”) Rubinstein et al. (2011) Unexpected variation at synonymous sites C++ Maximum-likelihood codon model approach that allows variation in both the synonymous and nonsynonymous substitution rates along a sequence; accounting for variability in the baseline substitution rate allows more reliable inference of positive selection Low accessibility (requires an old Linux distribution to install); requires a phylogenetic tree; complex input and results; focus of explicit testing is on positive selection; applicable (but not specific) to protein-coding OLGs. https://www.tau.ac.il/~talp/multilayer.tar.gz, last accessed April 10, 2020.
https://www.tau.ac.il/~talp/readme.txt, last accessed April 10, 2020.
Sabath et al. method Sabath et al. (2008) Protein-coding sequence MATLAB Maximum-likelihood framework for estimating dN/dS; similar to the (nonimplemented) method of Pedersen and Jensen (2001) Slower than Wei–Zhang; not recommended for highly similar sequences (pairwise distance <0.08); similar to OLGenie in the use of 6 nt (“sextet”) units; only implemented for pairs of sequences; low accessibility and scalability http://nsmn1.uh.edu/dgraur/Software.html, last accessed April 10, 2020.
MLOGD Firth and Brown (2006) Protein-coding sequence C++ Simple statistics on properties of sequence variation by codon position, and a maximum-likelihood statistic (MLOGD) taking into account nucleotide and amino acid substitution rates and codon usage Less sensitive at detecting OLGs than Synplot2 (according to Firth [2014]); requires a minimum of ∼20 independent nucleotide variants; sas12 frame generates false-positives. http://guinevere.otago.ac.nz/aef/MLOGD/software.html, last accessed April 10, 2020.
a

Programs in descending order by year of publication; methods lacking implementations at active URLs are not listed.