. 2020 Apr 3;37(8):2440–2449. doi: 10.1093/molbev/msaa087

Table 1.

Methods with Available Implementations for Detecting Selection in Overlapping Genes.

Program^a	Reference	Target	Implementation	Method Description	Advantages and Limitations	Available from
OLGenie	This study	Protein-coding sequence	Perl	Estimates d_N/d_S by introducing three modifications to Wei–Zhang: 1) minimal overlapping units of 6 nt, that is, 1 reference codon and 2 alternate codons; 2) the Nei–Gojobori method; and 3) only single nucleotide differences rather than all mutational pathways	Fast; applicable to multiple sequence alignments; tree-agnostic; conservative for purifying selection and high levels of divergence, but nonconservative for positive selection; loss of power for pairwise distance >0.1 and neighboring variants	https://github.com/chasewnelson/OLGenie, last accessed April 10, 2020.
“Frameshift”	Schlub et al. (2018)	Protein-coding sequence	R	Finds ORFs longer than expected by chance given nucleotide context; includes two complementary methods: “codon permutation” and “synonymous mutation”	Medium to high accessibility as an R script requiring minor modifications. Can only detect relatively long OLGs. Slow for long sequences.	https://github.com/TimSchlub/Frameshift, last accessed April 10, 2020.
“StopStatistics”	Cassan et al. (2016)	Protein-coding sequence	Python, bash	Tests for depletion of those stop codons in sas12 that would be synonymous in reference; also applicable to enrichment of start codons	Low accessibility; scripts specific to particular data sets	https://figshare.com/s/9668ef62e84488d4787a, last accessed April 10, 2020.
FRESCo	Sealfon et al. (2015)	Constraint at synonymous sites	HYPHY batch language	Rates of nucleotide evolution across an alignment inferred using a maximum-likelihood model. Models of neutral and nonneutral evolution tested in sliding windows to infer regions with excess synonymous constraint	Suitable for short genomes/regions despite using a codon model; requires a phylogenetic tree; performs best at deep sequence coverage and increased sequence divergence	https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-015-0603-7/MediaObjects/13059_2015_603_MOESM1_ESM.zip, last accessed April 10, 2020.
Wei–Zhang method	Wei and Zhang (2015)	Protein-coding sequence	Perl	Estimates d_N/d_S in minimal-length coding regions flanked by variant-free codons (i.e., data-dependent minimal overlapping units) to determine the effects of all mutational pathways in the reference and alternate genes using the modified Nei–Gojobori method	Accurate but slow, especially for highly diverged sequences; tree-agnostic; outperforms Sabath et al. method (according to Wei and Zhang [2015]); only implemented for pairs of sequences; low accessibility and scalability	http://www.umich.edu/~zhanglab/download/Xinzhu_GBE2014/index.htm, last accessed April 10, 2020.
Synplot2	Firth (2014)	Constraint at synonymous sites	C++; Web-interface	Evolution at synonymous sites in a codon alignment compared to a null model of neutral evolution in order to infer sites with excess constraint; expected diversity at synonymous sites is set equal to diversity over the full alignment, and diversity is measured between sequential pairs around a phylogenetic tree	Medium accessibility; fast; limited use in the case of sas12; requires a phylogenetic tree; does not distinguish between coding and noncoding overlapping features	http://guinevere.otago.ac.nz/cgi-bin/aef/synplot.pl, last accessed April 10, 2020
Synplot2	Firth (2014)	Constraint at synonymous sites	C++; Web-interface			http://www.firthlab.path.cam.ac.uk/SynPlot2.zip, last accessed April 10, 2020.
KaKi (“Multilayer”)	Rubinstein et al. (2011)	Unexpected variation at synonymous sites	C++	Maximum-likelihood codon model approach that allows variation in both the synonymous and nonsynonymous substitution rates along a sequence; accounting for variability in the baseline substitution rate allows more reliable inference of positive selection	Low accessibility (requires an old Linux distribution to install); requires a phylogenetic tree; complex input and results; focus of explicit testing is on positive selection; applicable (but not specific) to protein-coding OLGs.	https://www.tau.ac.il/~talp/multilayer.tar.gz, last accessed April 10, 2020.
KaKi (“Multilayer”)	Rubinstein et al. (2011)	Unexpected variation at synonymous sites	C++			https://www.tau.ac.il/~talp/readme.txt, last accessed April 10, 2020.
Sabath et al. method	Sabath et al. (2008)	Protein-coding sequence	MATLAB	Maximum-likelihood framework for estimating d_N/d_S; similar to the (nonimplemented) method of Pedersen and Jensen (2001)	Slower than Wei–Zhang; not recommended for highly similar sequences (pairwise distance <0.08); similar to OLGenie in the use of 6 nt (“sextet”) units; only implemented for pairs of sequences; low accessibility and scalability	http://nsmn1.uh.edu/dgraur/Software.html, last accessed April 10, 2020.
MLOGD	Firth and Brown (2006)	Protein-coding sequence	C++	Simple statistics on properties of sequence variation by codon position, and a maximum-likelihood statistic (MLOGD) taking into account nucleotide and amino acid substitution rates and codon usage	Less sensitive at detecting OLGs than Synplot2 (according to Firth [2014]); requires a minimum of ∼20 independent nucleotide variants; sas12 frame generates false-positives.	http://guinevere.otago.ac.nz/aef/MLOGD/software.html, last accessed April 10, 2020.

Programs in descending order by year of publication; methods lacking implementations at active URLs are not listed.