Table 1.
Tool | Description | Outputs |
---|---|---|
ART
[9] |
Simulation of sequence reads with error models for multiple platforms (454, Solexa, SOLiD). |
Single or pair ended sequence reads. |
MetaSIM
[10] |
Simulation of sequence reads for metagenomics, particularly for highly variable data (taxonomically distinct but related organisms). |
Single or pair ended sequence reads. |
GENOME
[14] |
Population simulation within a set of alleles using genome level events such as recombination, migration, bottlenecks, and expansions. |
Alleles identified as mutated (1) or not (0) across the simulated population. |
GWASimulator
[12] |
Simulation of loci across a population which follows a given LD structure in case–control type studies. |
SNVs per individual for input loci. |
FreGene
[13] |
Mutation simulation using a theoretical sequence of a given size with hotspot, conversion, and selection parameters. |
Mutation selection across population for a theoretical sequence. |
genomeSIMLA
[11] |
Simulation of disease loci within a family or case–control setting using specific LD patterns for investigations of disease. |
Affy identified SNPs selected by disease association. |
ALF [15] | Population simulation for a specific gene set using a model for variation at the sequence and individual level. | FASTA protein and DNA sequences for specific genes. |
Example simulators used in various types of genome investigations. Many use the Wright-Fisher model of population genetics theory [8] in order to generate populations that vary over time given some set of event frequencies such as LD, hotpots, population bottlenecks (GENOME, genomeSIMLA, FreGene), others provide a set of sequences that could be generated by a given sequencing technology with an error model (ART and MetaSIM). The specific simulator used is based on the type of investigation. In planning new GWAS studies for instance, a simulator that uses LD patterns and can provide predicted genomic regions for disease related mutations would be selected. However, such a simulator would not be of use in the planning of a metagenomic study for an organism which may not yet be fully sequenced, or is highly variable. None of these simulators provides whole genome FASTA as outputs.