Table 3.
Comparison of two simple tiling metrics that incorporate repetitive nucleotides to improve non-repetitive sequence coverage
Case 1: Threshold repeat inclusion (50 bp)
|
Case 2: Percentage repeat inclusion (25%)
|
|||||||
---|---|---|---|---|---|---|---|---|
Organism | Genome size | Percent repeats | Percent non-repeat bp covered | Percent repeat bp included vs all non-repeat bp | Tile quality | Percent non-repeat bp covered | Percent repeat bp included vs all non-repeat bp | Tile quality |
Pan troglodytes | 3,083,993,401 | 57.74 | 64.85 | 4.15 | 62.04 | 66.85 | 17.94 | 52.24 |
Homo sapiens | 3,070,537,687 | 52.38 | 65.01 | 4.09 | 62.24 | 67.11 | 18.22 | 52.16 |
Rattus norvegicus | 2,795,745,218 | 48.75 | 66.66 | 4.28 | 63.68 | 69.42 | 19.84 | 52.24 |
Mus musculus | 2,638,213,512 | 45.62 | 77.56 | 4.30 | 74.07 | 80.82 | 20.15 | 60.43 |
Caenorhabditis elegans | 100,277,879 | 11.26 | 89.71 | 2.18 | 96.68 | 99.84 | 11.12 | 87.47 |
Drosophila melanogaster | 129,323,838 | 14.23 | 97.63 | 0.03 | 99.97 | 100 | 2.39 | 97.55 |
Fugu rubripes | 349,519,338 | 15.06 | 95.09 | 1.86 | 97.74 | 100 | 6.33 | 93.24 |
Arabidopsis thaliana | 119,186,497 | 0.16 | 99.51 | 1.29 | 98.22 | 100 | 13.29 | 84.68 |
In Case 1, repeat sequences ≤50 bp were allowed, and in Case 2 up to 25% of a tile may contain repetitive nucleotides. As in Table 1, tile sizes range from 300 bp to 1.5 kb. Case 1 achieves only marginal improvement in non-repetitive sequence coverage when compared with the same level of repeat nucleotide inclusion in the optimal tiling case. Non-repetitive sequence coverage in mammalian genomes falls sharply in Case 2 despite the inclusion of a high percentage of repetitive DNA. In each case, performance on mammalian genomes is significantly lower than that of the optimal tiling algorithm (Table 2).