Skip to main content
. 2006 Feb;16(2):271–281. doi: 10.1101/gr.4452906

Table 2.

Optimal and naive tiling of various sequenced genomes for tile sizes between 300 bp and 1.5 kb

Optimal sequence tiling
Naive partitioning
Comparison
Organism Genome size Percent repeats Tile quality Percent non-repeat bp covered Percent repeat bp included vs all non-repeat bp Tile quality Percent improvement
Pan troglodytes 3,083,993,401 57.74 66.05 89.81 4.23 85.58 19.53
Homo sapiens 3,070,537,687 52.38 66.07 89.60 4.06 85.53 19.47
Rattus norvegicus 2,795,745,218 48.75 66.86 91.43 5.54 85.89 19.03
Mus musculus 2,638,213,512 45.62 66.18 91.09 5.51 85.58 19.41
Caenorhabditis elegans 100,277,879 11.26 84.29 98.54 3.10 95.44 11.16
Drosophila melanogaster 129,323,838 14.23 86.89 99.40 2.62 96.78 9.89
Fugu rubripes 349,519,338 15.06 87.97 99.07 2.13 96.94 8.97
Arabidopsis thaliana 119,186,497 0.16 99.97 100.00 0.00 100.00 0.02

Repetitive elements were identified using RepeatMasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html) and Tandem Repeats Finder (Benson 1999). The genome sequences vary in the degree of repeat density, ranging from mammalian genomes with nearly 50% repeat content to the relatively repeat-free Arabidopsis genome. Obtaining a high degree of non-repetitive sequence coverage for the genomes on the latter end of the spectrum is straightforward. However, as higher eukaryotes are considered it becomes impossible to optimally tile the highly repetitive sequences without further processing.