Abstract
Summary: GenAlEx: Genetic Analysis in Excel is a cross-platform package for population genetic analyses that runs within Microsoft Excel. GenAlEx offers analysis of diploid codominant, haploid and binary genetic loci and DNA sequences. Both frequency-based (F-statistics, heterozygosity, HWE, population assignment, relatedness) and distance-based (AMOVA, PCoA, Mantel tests, multivariate spatial autocorrelation) analyses are provided. New features include calculation of new estimators of population structure: G′ST, G′′ST, Jost’s Dest and F′ST through AMOVA, Shannon Information analysis, linkage disequilibrium analysis for biallelic data and novel heterogeneity tests for spatial autocorrelation analysis. Export to more than 30 other data formats is provided. Teaching tutorials and expanded step-by-step output options are included. The comprehensive guide has been fully revised.
Availability and implementation: GenAlEx is written in VBA and provided as a Microsoft Excel Add-in (compatible with Excel 2003, 2007, 2010 on PC; Excel 2004, 2011 on Macintosh). GenAlEx, and supporting documentation and tutorials are freely available at: http://biology.anu.edu.au/GenAlEx.
Contact: rod.peakall@anu.edu.au
1 INTRODUCTION
GenAlEx 6 was originally developed as a teaching tool to facilitate teaching population genetic analysis at the graduate level (Peakall and Smouse, 2006). GenAlEx operates within Microsoft Excel—the widely used spreadsheet software that forms part of the cross-platform Microsoft Office suite. Packaging genetic analysis within a familiar and flexible environment resulted in quick understanding and effective performance of population genetic analyses. Taking advantage of the rich graphical options available within Excel, GenAlEx offers a wide range of graphical outputs that aid genetic data analysis and interpretation. GenAlEx is now widely used by university teachers at both undergraduate and graduate levels around the world. Moreover, the software has also attracted a large number of researchers who utilize its unique features. Here we provide an update on the new features offered in GenAlEx 6.5 that we believe will be welcomed by students, teachers and researchers.
GenAlEx offers population genetic analysis of diploid codominant, haploid, haplotypic and binary genetic data from animals, plants and microorganisms. It accommodates a wide range of genetic markers, including microsatellites (SSRs), single-nucleotide polymorphisms (SNPs), amplified fragment length polymorphisms and DNA sequences. Both allele frequency-based and distance-based analysis options are provided. The former includes estimates of heterozygosity and genetic diversity, F-statistics, Nei’s genetic distance, population assignment and relatedness. The latter includes Analysis of Molecular Variance (AMOVA), Principal Coordinates Analysis (PCoA), Mantel tests, TwoGener, multivariate and 2D spatial autocorrelation. Readers are referred to Peakall and Smouse (2006) for a more comprehensive outline of these standard procedures, data formats and data import options.
GenAlEx 6.5 maintains backward compatibility, but it provides access to the expanded spreadsheet of Excel 2007 onward. Thus, the maximum numbers of loci and samples are vastly expanded and only constrained by memory. More than 30 different Excel graphs summarize the outcomes of genetic analyses. Graphics can be further manipulated with Excel options and easily converted to pdf or other publication-quality formats.
2 NEW FEATURES
2.1 New estimators of population structure
There has been much recent debate about the utility of FST as a measure of population genetic structure (Jost, 2008; Ryman and Leimar, 2009; Whitlock, 2011). GenAlEx 6.5 offers the calculation of G′ST, G′′ST and Jost’s Dest, providing [0,1]-standardized allele frequency-based estimators of population genetic structure, following Meirmans and Hedrick (2011), testing the null by random permutation and estimating variances via jackknifing and bootstrapping over loci. New AMOVA routines now enable the estimation of standardized F′ST, following Meirmans (2006). The calculation of these statistics was validated by comparison with the software GenoDive v2.0b22 (Meirmans and Van Tienderen, 2004).
2.2 Shannon’s information statistics
Shannon information indices have been widely used in ecology but largely overlooked in genetics despite offering a framework for quantifying biological diversity across multiple scales (genes to landscapes). GenAlEx offers the calculation of a series of Shannon indices, including the mutual information index SHUA, an alternative estimator of population structure. The methods follow Sherwin et al. (2006) who assessed the performance of Shannon indices for estimating genetic diversity. Smouse and Ward (1978) extend to multiple hierarchical levels, with a unique three-level partition option and statistical testing by random permutation offered in GenAlEx 6.5.
2.3 Tools for comparing pairwise population statistics
The Mantel test capability of GenAlEx has been extended to allow multiple comparison among pairwise population statistics such as FST, F′ST, G′ST, G′′ST, Dest and SHUA. This will allow informed comparison of the new estimators of population structure.
2.4 Heterogeneity testing for spatial autocorrelation
GenAlEx 6.5 introduces novel heterogeneity tests (Smouse et al., 2008), extending application of the multiallelic, multilocus spatial autocorrelation analysis methods of Smouse and Peakall (1999), Peakall et al. (2003) and Double et al. (2005). These new methods provide valuable insights into fine-scale genetic processes across a wide range of animals and plants. Banks and Peakall (2012) have confirmed the statistical power and performance of this heterogeneity test by spatially explicit computer simulations.
2.5 Linkage disequilibrium tests (LD) for biallelic data
Despite its importance, there is no universal test for disequilibrium (Slatkin, 2008). GenAlEx 6.5 offers pairwise tests for disequilibrium between biallelic markers such as SNPs. When phase is known, this includes the calculation of D, D′, r and r2, following Hedrick (2005). Maximum likelihood estimation is used to calculate D and r when phase is unknown (Weir, 1990, p. 310). The results were validated against GDA (Lewis and Zaykin, 2001). Inclusion of LD fills an important technical gap, particularly for teachers. For large SNP sets, or multiallelic data, GenAlEx users are encouraged to take advantage of the options to export their data to other packages such as Arlequin 3.5 (Excoffier and Lischer, 2010).
2.6 New allele frequency format
Retrospective calculation of the new estimators of population structure such as G′ST, Dest and Shannon indices are now possible from published allele frequency data. Teachers will also find this a helpful option for the re-analysis of textbook examples.
2.7 Import and export options
GenAlEx offers data import from several popular formats and tools for importing and manipulating raw data from DNA sequencers. Export to more than 30 other data formats is provided, enabling access to myriad other software packages. For example, direct export is offered to programs such as GENEPOP (Rousset, 2008) and STRUCTURE (Pritchard et al., 2000), and via these same formats to many other programs, including genetic packages in R such as adegenet (Jombart, 2008) and pegas (Paradis, 2010). The full list of export options, along with notes on the export process, can found at the website.
3 SPECIAL FEATURES FOR TEACHING
Offering a user-friendly software package for university students and teachers remains an ongoing goal of GenAlEx. We continue to expand the popular step-by-step output options that allow students to follow the steps in the analytical pathway. Teaching-specific menu options are also provided. For example, the Rand menu allows students to permute and bootstrap hypothetical datasets with color tracking, to aid an understanding of how these statistical tests work. Finally, we have made freely available a set of tutorial notes and supporting datasets drawn from the graduate workshops that we have offered (both jointly and independently) around the world.
4 DOCUMENTATION
More than 150 pages of documentation are provided. This includes Appendix 1 that outlines the statistical analyses used and their supporting references. The revised guide to GenAlEx 6.5 fully cross-links with the GenAlEx tutorials and Appendix 1.
5 CONCLUSION
GenAlEx 6.5 offers a wide range of population genetic analysis options for the full spectrum of genetic markers within the Microsoft Excel environment on both PC and Macintosh computers. When combined with its user-friendly interface, rich graphical outputs for data exploration and publication, tools for data manipulation and export options to many other software packages, we believe that GenAlEx offers an ideal launching pad for population genetic analysis by students, teachers and researchers alike.
ACKNOWLEDGEMENTS
We thank the many students, teachers and researchers who have enthusiastically adopted GenAlEx as one of their tools, especially those who have offered suggestions for improvement. Michaela Blyton revised the guide, performed extensive beta-testing and offered crucial advice on improving the user interface. Sasha Peakall re-designed the GenAlEx logo.
Conflict of Interest: none declared.
REFERENCES
- Banks SC, Peakall R. Genetic spatial autocorrelation can readily detect sex-biased dispersal. Mol. Ecol. 2012;21:2092–2105. doi: 10.1111/j.1365-294X.2012.05485.x. [DOI] [PubMed] [Google Scholar]
- Double MC, et al. Dispersal, philopatry and infidelity: dissecting local genetic structure in superb fairy-wrens (Malurus cyaneus) Evolution. 2005;59:625–635. [PubMed] [Google Scholar]
- Excoffier L, Lischer HEL. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Res. 2010;10:564–567. doi: 10.1111/j.1755-0998.2010.02847.x. [DOI] [PubMed] [Google Scholar]
- Hedrick PW. Genetics of Populations. 3rd. Jones and Bartlett Publishers: Sudbury, MA; 2005. [Google Scholar]
- Jost L. GST and its relatives do not measure differentiation. Mol. Ecol. 2008;17:4015–4026. doi: 10.1111/j.1365-294x.2008.03887.x. [DOI] [PubMed] [Google Scholar]
- Jombart T. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics. 2008;24:1403–1405. doi: 10.1093/bioinformatics/btn129. [DOI] [PubMed] [Google Scholar]
- Lewis PO, Zaykin D. Genetic Data Analysis V1.1. 2001. Available at http://www.eeb.uconn.edu/people/plewis/software.php (30 May 2012, date last accessed)
- Meirmans PG. Using the AMOVA framework to estimate a standardized genetic differentiation measure. Evolution. 2006;60:2399–2402. [PubMed] [Google Scholar]
- Meirmans PG, Hedrick PW. Assessing population structure: FST and related measures. Mol. Ecol. Res. 2011;11:5–18. doi: 10.1111/j.1755-0998.2010.02927.x. [DOI] [PubMed] [Google Scholar]
- Meirmans PG, Van Tienderen PH. GENOTYPE and GENODIVE: two programs for the analysis of genetic diversity of asexual organisms. Mol. Ecol. Notes. 2004;4:792–794. [Google Scholar]
- Paradis E. pegas: an R package for population genetics with an integrated-modular approach. Bioinformatics. 2010;26:419–420. doi: 10.1093/bioinformatics/btp696. [DOI] [PubMed] [Google Scholar]
- Peakall R, et al. Spatial autocorrelation analysis offers new insights into gene flow in the Australian bush rat, Rattus fuscipes. Evolution. 2003;57:1182–1195. doi: 10.1111/j.0014-3820.2003.tb00327.x. [DOI] [PubMed] [Google Scholar]
- Peakall R, Smouse PE. GenAlEx 6: genetic analysis in Excel. Population genetic software for teaching and research. Mol. Ecol. Notes. 2006;6:288–295. doi: 10.1093/bioinformatics/bts460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pritchard JK, et al. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–959. doi: 10.1093/genetics/155.2.945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rousset F. GENEPOP’007: a complete re-implementation of the genepop software for Windows and Linux. Mol. Ecol. Res. 2008;8:103–106. doi: 10.1111/j.1471-8286.2007.01931.x. [DOI] [PubMed] [Google Scholar]
- Ryman N, Leimar O. GST is still a useful measure of genetic differentiation—a comment on Jost's D. Mol. Ecol. 2009;18:2084–2087. doi: 10.1111/j.1365-294X.2009.04187.x. [DOI] [PubMed] [Google Scholar]
- Sherwin W, et al. Measurement of biological information with applications from genes to landscapes. Mol. Ecol. 2006;15:2857–2869. doi: 10.1111/j.1365-294X.2006.02992.x. [DOI] [PubMed] [Google Scholar]
- Slatkin M. Linkage disequilibrium—understanding the evolutionary past and mapping the medical future. Nat. Rev. Genet. 2008;9:477–485. doi: 10.1038/nrg2361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smouse PE, Peakall R. Spatial autocorrelation analysis of individual multiallele and multilocus genetic structure. Heredity. 1999;82:561–573. doi: 10.1038/sj.hdy.6885180. [DOI] [PubMed] [Google Scholar]
- Smouse PE, Ward RH. A comparison of the genetic infrastructure of the Ye'cuana and Yanomama: a likelihood analysis of genotypic variation among populations. Genetics. 1978;88:611–631. doi: 10.1093/genetics/88.3.611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smouse PE, et al. A heterogeneity test for fine-scale genetic structure. Mol. Ecol. 2008;17:3389–3400. doi: 10.1111/j.1365-294x.2008.03839.x. [DOI] [PubMed] [Google Scholar]
- Weir BS. Genetic Data Analysis. Sinauer Associates, Inc: Sunderland, MA; 1990. [Google Scholar]
- Whitlock MC. G'ST and D do not replace FST. Mol. Ecol. 2011;20:1083–1091. doi: 10.1111/j.1365-294X.2010.04996.x. [DOI] [PubMed] [Google Scholar]