. 2006 Feb;16(2):271–281. doi: 10.1101/gr.4452906

Table 2.

Optimal and naive tiling of various sequenced genomes for tile sizes between 300 bp and 1.5 kb

				Optimal sequence tiling
			Naive partitioning				Comparison
Organism	Genome size	Percent repeats	Tile quality	Percent non-repeat bp covered	Percent repeat bp included vs all non-repeat bp	Tile quality	Percent improvement
Pan troglodytes	3,083,993,401	57.74	66.05	89.81	4.23	85.58	19.53
Homo sapiens	3,070,537,687	52.38	66.07	89.60	4.06	85.53	19.47
Rattus norvegicus	2,795,745,218	48.75	66.86	91.43	5.54	85.89	19.03
Mus musculus	2,638,213,512	45.62	66.18	91.09	5.51	85.58	19.41
Caenorhabditis elegans	100,277,879	11.26	84.29	98.54	3.10	95.44	11.16
Drosophila melanogaster	129,323,838	14.23	86.89	99.40	2.62	96.78	9.89
Fugu rubripes	349,519,338	15.06	87.97	99.07	2.13	96.94	8.97
Arabidopsis thaliana	119,186,497	0.16	99.97	100.00	0.00	100.00	0.02

Repetitive elements were identified using RepeatMasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html) and Tandem Repeats Finder (Benson 1999). The genome sequences vary in the degree of repeat density, ranging from mammalian genomes with nearly 50% repeat content to the relatively repeat-free Arabidopsis genome. Obtaining a high degree of non-repetitive sequence coverage for the genomes on the latter end of the spectrum is straightforward. However, as higher eukaryotes are considered it becomes impossible to optimally tile the highly repetitive sequences without further processing.