Table 2.
Optimal and naive tiling of various sequenced genomes for tile sizes between 300 bp and 1.5 kb
Optimal sequence tiling
|
|||||||
---|---|---|---|---|---|---|---|
Naive partitioning
|
Comparison
|
||||||
Organism | Genome size | Percent repeats | Tile quality | Percent non-repeat bp covered | Percent repeat bp included vs all non-repeat bp | Tile quality | Percent improvement |
Pan troglodytes | 3,083,993,401 | 57.74 | 66.05 | 89.81 | 4.23 | 85.58 | 19.53 |
Homo sapiens | 3,070,537,687 | 52.38 | 66.07 | 89.60 | 4.06 | 85.53 | 19.47 |
Rattus norvegicus | 2,795,745,218 | 48.75 | 66.86 | 91.43 | 5.54 | 85.89 | 19.03 |
Mus musculus | 2,638,213,512 | 45.62 | 66.18 | 91.09 | 5.51 | 85.58 | 19.41 |
Caenorhabditis elegans | 100,277,879 | 11.26 | 84.29 | 98.54 | 3.10 | 95.44 | 11.16 |
Drosophila melanogaster | 129,323,838 | 14.23 | 86.89 | 99.40 | 2.62 | 96.78 | 9.89 |
Fugu rubripes | 349,519,338 | 15.06 | 87.97 | 99.07 | 2.13 | 96.94 | 8.97 |
Arabidopsis thaliana | 119,186,497 | 0.16 | 99.97 | 100.00 | 0.00 | 100.00 | 0.02 |
Repetitive elements were identified using RepeatMasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html) and Tandem Repeats Finder (Benson 1999). The genome sequences vary in the degree of repeat density, ranging from mammalian genomes with nearly 50% repeat content to the relatively repeat-free Arabidopsis genome. Obtaining a high degree of non-repetitive sequence coverage for the genomes on the latter end of the spectrum is straightforward. However, as higher eukaryotes are considered it becomes impossible to optimally tile the highly repetitive sequences without further processing.