Skip to main content
. 2014 May 19;15:149. doi: 10.1186/1471-2105-15-149

Table 1.

Genome simulators

Tool Description Outputs
ART [9]
Simulation of sequence reads with error models for multiple platforms (454, Solexa, SOLiD).
Single or pair ended sequence reads.
MetaSIM [10]
Simulation of sequence reads for metagenomics, particularly for highly variable data (taxonomically distinct but related organisms).
Single or pair ended sequence reads.
GENOME [14]
Population simulation within a set of alleles using genome level events such as recombination, migration, bottlenecks, and expansions.
Alleles identified as mutated (1) or not (0) across the simulated population.
GWASimulator [12]
Simulation of loci across a population which follows a given LD structure in case–control type studies.
SNVs per individual for input loci.
FreGene [13]
Mutation simulation using a theoretical sequence of a given size with hotspot, conversion, and selection parameters.
Mutation selection across population for a theoretical sequence.
genomeSIMLA [11]
Simulation of disease loci within a family or case–control setting using specific LD patterns for investigations of disease.
Affy identified SNPs selected by disease association.
ALF [15] Population simulation for a specific gene set using a model for variation at the sequence and individual level. FASTA protein and DNA sequences for specific genes.

Example simulators used in various types of genome investigations. Many use the Wright-Fisher model of population genetics theory [8] in order to generate populations that vary over time given some set of event frequencies such as LD, hotpots, population bottlenecks (GENOME, genomeSIMLA, FreGene), others provide a set of sequences that could be generated by a given sequencing technology with an error model (ART and MetaSIM). The specific simulator used is based on the type of investigation. In planning new GWAS studies for instance, a simulator that uses LD patterns and can provide predicted genomic regions for disease related mutations would be selected. However, such a simulator would not be of use in the planning of a metagenomic study for an organism which may not yet be fully sequenced, or is highly variable. None of these simulators provides whole genome FASTA as outputs.