Table 6. Genome-wide statistics of our simulated alignments of twelve Drosophila genomes closely match those of the true data.
Dataset | % ID | % gap | % coding | % intronic |
PECAN | 83% | 89% | 33% | 18% |
simgenome (realigned) | 85% | 83% | 33% | 18% |
simgenome (original) | 69% | 41% | 33% | 18% |
The average length of simulated alignments was 240K columns, in contrast to the 142K for the PECAN alignments; however, our windowing approach makes our method insensitive to the sizes of syntenic regions. We generated a total of 3.6M columns of alignment data. “simgenome (realigned)” is the simulated alignments after re-alignment with PECAN which we use for all subsequent analysis and refer to as simply “simgenome”. “simgenome (original)” is the simulated alignments generated by simgenome. Sequence identity and gap fraction were estimated from the PECAN alignments; coding and intronic fractions were estimated from [27].