Table 7. Single and di-nucleotide frequencies for our simulated data (left) closely match those in the twelve Drosophila genomes (right).
A | C | G | T | A | C | G | T | |
0.273 | 0.228 | 0.228 | 0.271 | 0.285 | 0.204 | 0.204 | 0.284 | |
A | 0.070 | 0.053 | 0.052 | 0.060 | 0.094 | 0.049 | 0.052 | 0.077 |
C | 0.052 | 0.047 | 0.048 | 0.055 | 0.065 | 0.041 | 0.036 | 0.051 |
G | 0.055 | 0.049 | 0.047 | 0.050 | 0.051 | 0.053 | 0.041 | 0.048 |
T | 0.058 | 0.052 | 0.054 | 0.069 | 0.061 | 0.051 | 0.065 | 0.094 |
Our simulated data models heterogeneity in base composition across different genomic features such as coding and intergenic sequence, but does not model local fluctuations in base composition.