Skip to main content
. 2006 Feb;16(2):271–281. doi: 10.1101/gr.4452906

Table 3.

Comparison of two simple tiling metrics that incorporate repetitive nucleotides to improve non-repetitive sequence coverage

Case 1: Threshold repeat inclusion (50 bp)
Case 2: Percentage repeat inclusion (25%)
Organism Genome size Percent repeats Percent non-repeat bp covered Percent repeat bp included vs all non-repeat bp Tile quality Percent non-repeat bp covered Percent repeat bp included vs all non-repeat bp Tile quality
Pan troglodytes 3,083,993,401 57.74 64.85 4.15 62.04 66.85 17.94 52.24
Homo sapiens 3,070,537,687 52.38 65.01 4.09 62.24 67.11 18.22 52.16
Rattus norvegicus 2,795,745,218 48.75 66.66 4.28 63.68 69.42 19.84 52.24
Mus musculus 2,638,213,512 45.62 77.56 4.30 74.07 80.82 20.15 60.43
Caenorhabditis elegans 100,277,879 11.26 89.71 2.18 96.68 99.84 11.12 87.47
Drosophila melanogaster 129,323,838 14.23 97.63 0.03 99.97 100 2.39 97.55
Fugu rubripes 349,519,338 15.06 95.09 1.86 97.74 100 6.33 93.24
Arabidopsis thaliana 119,186,497 0.16 99.51 1.29 98.22 100 13.29 84.68

In Case 1, repeat sequences ≤50 bp were allowed, and in Case 2 up to 25% of a tile may contain repetitive nucleotides. As in Table 1, tile sizes range from 300 bp to 1.5 kb. Case 1 achieves only marginal improvement in non-repetitive sequence coverage when compared with the same level of repeat nucleotide inclusion in the optimal tiling case. Non-repetitive sequence coverage in mammalian genomes falls sharply in Case 2 despite the inclusion of a high percentage of repetitive DNA. In each case, performance on mammalian genomes is significantly lower than that of the optimal tiling algorithm (Table 2).