Abstract
SLiM is an efficient forward population genetic simulation designed for studying the effects of linkage and selection on a chromosome-wide scale. The program can incorporate complex scenarios of demography and population substructure, various models for selection and dominance of new mutations, arbitrary gene structure, and user-defined recombination maps.
Keywords: population genetic simulation, genetic hitchhiking, background selection
RECENT studies suggest that linkage effects such as genetic draft and background selection can profoundly alter the patterns of genetic variation in many species (Sella et al. 2009; Lohmueller et al. 2011; Weissman and Barton 2012; Messer and Petrov 2013). Understanding the potential impact of these linkage effects on population genetic methods requires efficient forward simulations that can model the evolution of whole chromosomes with realistic gene structure.
Forward simulations have a long-standing tradition in population genetics and many programs have been developed (Carvajal-Rodriguez 2010; Hoban et al. 2011). For any such program, there is typically a trade-off between efficiency and flexibility. Simulations based on combined forward-backward approaches, such as MSMS (Ewing and Hermisson 2010), can be very fast but remain limited to scenarios with only a single selected locus. Current programs that can model scenarios with multiple linked selected polymorphisms, such as FREGENE (Chadeau-Hyam et al. 2008), GENOMEPOP (Carvajal-Rodriguez 2008), simuPOP (Peng and Kimmel, 2005), forwsim (Padhukasahasram et al. 2008), or SFS_CODE (Hernandez 2008), either lack the ability to model realistic gene structure or are not efficient enough to allow for simulations on the scale of a whole chromosome in reasonably large populations.
Here I present SLiM, a population genetic simulation targeted at bridging the gap between efficiency and flexibility for the simulation of linkage and selection on a chromosome-wide scale. The program can incorporate complex scenarios of demography and population substructure, various models for selection and dominance, realistic gene structure, and user-defined recombination maps. Special emphasis was further placed on the ability to model and track individual selective sweeps—both complete and partial. While retaining all capabilities of a forward simulation, SLiM utilizes sophisticated algorithms and optimized data structures that enable simulations in reasonably large populations. All these features are implemented in an easy-to-use C++ command line program. The source code is freely available under the GNU General Public License and can be downloaded from http://www.stanford.edu/~messer/software.
SLiM simulates the evolution of diploid genomes in a population of hermaphrodites under an extended Wright–Fisher model with selection (Figure 1A). In each generation, a new set of offspring is created, descending from the individuals in the previous generation. The probability of becoming a parent is proportional to an individual’s fitness, which is determined by the selection and dominance effects of the mutations present in its diploid genome. Gametes are generated by recombining parental chromosomes (both crossing over and gene conversion can be modeled) and adding new mutations.
Mutations can be of different user-defined mutation types, specified by their dominance coefficients and distributions of fitness effects (DFE); examples could be synonymous, adaptive, and lethal mutations. The possibility to specify arbitrary dominance effects allows for the simulation of a variety of evolutionary scenarios, including balancing selection and recessive deleterious mutations. Genomic regions can be of different user-defined genomic element types, specified by the particular mutation types that can occur in such elements and their relative proportions; examples could be exons and introns.
Each mutation has a specified position along the chromosome but remains abstract in the sense that the simulation does not specify the particular nucleotide states of ancestral and derived alleles. Note, however, that the user has the freedom to associate abstract mutation types with specific classes of events. There are no back mutations in the simulations. Fixed mutations are removed from the population and recorded as substitutions.
SLiM is capable of modeling complex scenarios of demography and population substructure. The simulation can include arbitrary numbers of subpopulations that can be added at user-defined times, initialized either with new individuals or with individuals drawn from another subpopulation to model a population split or colonization event. Subpopulation sizes, migration rates, and selfing rates can be changed over time.
To establish genetic diversity, simulations first have to go through a burn-in. Alternatively, simulations can be initialized with a set of predefined genomes provided by the user or with the output of a previous simulation run. The user can further specify predetermined mutations to be introduced at certain time points. Such mutations can be used, for example, to investigate individual selective sweeps or to track the frequency trajectories of particular mutations in the population. Predetermined adaptive mutations can also be limited to partial selective sweeps, where positive selection ceases once the mutation has reached a particular population frequency.
SLiM provides a variety of output options, including (i) the complete state of the population at specified time points, in terms of all mutations and genomes present in the population; (ii) random samples of specific sizes drawn from a subpopulation at given time points; (iii) lists of all mutations that have become fixed, together with the times when each mutation became fixed; and (iv) frequency trajectories of particular mutations over time.
I ran SLiM under various test scenarios to check whether its output agrees with theoretical predictions. Specifically, I analyzed levels of neutral heterozygosity under different scenarios of demography, population substructure, and selfing, the fixation probabilities of new mutations of different selective effects, and the reduction in neutral diversity around adaptive substitutions. Simulation results conformed with the respective theoretical predictions in all tests (Supporting Information, File S1, section 6).
SLiM utilizes sophisticated algorithms and optimized data structures to achieve its high computational efficiency (File S1, section 7.1). The simulation is based on a hierarchical data architecture that minimizes the amount of information stored redundantly. At many stages of the program, large quantities of random numbers have to be drawn from general probability distributions. To do this efficiently, SLiM uses lookup tables that are precomputed only once per generation and then allow one to draw random numbers in O(1) time. Most algorithms are implemented using fast routines provided in the GNU scientific library (Galassi et al. 2009).
To evaluate SLiM’s performance I compared its runtime and memory requirements with those of SFS_CODE, a popular forward simulation of similar scope (Hernandez 2008). For these tests, a chromosome of length L was simulated with uniform mutation rate u and recombination rate r in a population of size N over the course of 10N generations, assuming an exponential DFE with (File S1, section 7.3). The base scenario used values L = 5 Mbp, N = 500, u = 10−9, and r = 10−8 per site per generation. I then varied these four parameters independently to analyze how each individually affects the performance of either program. Simulations were conducted on a standard iMac desktop with a 2.8-Ghz Intel core 2 Duo CPU and 4 GB of memory. Figure 1B shows that in all analyzed scenarios SLiM outcompetes SFS_CODE by a substantial margin, typically running 5–10 times faster and requiring 20–100 times less memory. The large discrepancy in memory consumption between the two programs reflects the fact that SFS_CODE simulates the sequence of the whole chromosome, whereas SLiM simulates only the actual mutations.
Its computational performance enables SLiM to simulate entire eukaryotic chromosomes in reasonably large populations. For instance, simulating the functional regions in a typical human chromosome of length L = 100 Mbp over 105 generations in a population of size N = 104 with u = 10−8 and r = 10−9 per site per generation, assuming a functional density of 5% and , takes ∼4 days on a single core.
SLiM has already been successfully applied in several projects that required efficient forward simulations on large genomic scales. For example, Kousathanas and Keightley (2013) used the program to examine how linked selection can affect their method for inferring the DFE from polymorphism data in fruit flies and mice. In Messer and Petrov (2013), SLiM was used to investigate the effects of linked selection on the MK test, and it was shown that such effects can severely bias the test. These studies highlight the need for efficient forward simulations that can model chromosomes with realistic gene structure.
Most of the current machinery of population genetics is still deeply rooted in the mindset of neutral theory, which assumes that adaptation is rare and that linkage effects from recurrent selective sweeps can thus be neglected. However, this assumption may be violated in many species. It is hence essential to verify with forward simulations under realistic scenarios of selection and linkage whether population genetics methods, and our estimates of key evolutionary parameters obtained from them, are robust to linkage effects. SLiM is specifically designed for this purpose and I believe that it will become an important tool for future population genetic studies.
Supplementary Material
Acknowledgments
The author thanks Dmitri Petrov for continuous support throughout the project; members of the Petrov lab, especially Zoe Assaf, David Enard, and Nandita Garud for testing the program; and three anonymous reviewers for their valuable comments on program and documentation. Part of this research was funded by the National Institutes of Health (grants GM089926 and HG002568 to Dmitri Petrov).
Footnotes
Communicating editor: J. Wall
Literature Cited
- Carvajal-Rodriguez A., 2008. GENOMEPOP: a program to simulate genomes in populations. BMC Bioinformatics 9: 223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carvajal-Rodriguez A., 2010. Simulation of genes and genomes forward in time. Curr. Genomics 11: 58–61 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chadeau-Hyam M., Hoggart C. J., O’Reilly P. F., Whittaker J. C., De Iorio M., et al. , 2008. Fregene: simulation of realistic sequence-level data in populations and ascertained samples. BMC Bioinformatics 9: 364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ewing G., Hermisson J., 2010. MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics 26: 2064–2065 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galassi M., Davies J., Theiler J., Gough B., Jungman G., et al. , 2009. GNU Scientific Library: Reference Manual, Ed. 3 Network Theory, Bristol, UK [Google Scholar]
- Hernandez R. D., 2008. A flexible forward simulator for populations subject to selection and demography. Bioinformatics 24: 2786–2787 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoban S., Bertorelle G., Gaggiotti O. E., 2011. Computer simulations: tools for population and evolutionary genetics. Nat. Rev. Genet. 13: 110–122 [DOI] [PubMed] [Google Scholar]
- Kousathanas A., Keightley P. D., 2013. A comparison of models to infer the distribution of fitness effects of new mutations. Genetics 193: 1197–1208 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lohmueller K. E., Albrechtsen A., Li Y., Kim S. Y., Korneliussen T., et al. , 2011. Natural selection affects multiple aspects of genetic variation at putatively neutral sites across the human genome. PLoS Genet. 7: e1002326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Messer P. W., Petrov D. A., 2013. Frequent adaptation and the McDonald-Kreitman test. Proc. Natl. Acad. Sci. USA 110: 8615–8620 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Padhukasahasram B., Marjoram P., Wall J. D., Bustamante C. D., Nordborg M., 2008. Exploring population genetic models with recombination using efficient forward-time simulations. Genetics 178: 2417–2427 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peng B., Kimmel M., 2005. simuPOP: a forward-time population genetics simulation environment. Bioinformatics 21: 3686–3687 [DOI] [PubMed] [Google Scholar]
- Sella G., Petrov D. A., Przeworski M., Andolfatto P., 2009. Pervasive natural selection in the Drosophila genome? PLoS Genet. 5: e1000495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weissman D. B., Barton N. H., 2012. Limits to the rate of adaptive substitution in sexual populations. PLoS Genet. 8: e1002740. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.