Skip to main content
. 2011 Mar 31;12:169. doi: 10.1186/1471-2164-12-169

Figure 3.

Figure 3

Statistical analysis of contig populations. A: Log:log plots of contig length (X axis) vs. number of reads (Y axis) for identified sub-populations of contigs. Viral and bacterial sequences (upper row) show a linear relationship between contig length and number of reads, but eukaryotic contigs (lower row) contain a sub-population of contigs which have an unusually high number of reads for their length. Sequences containing tandem repeats contribute to this phenomenon but do not explain it entirely, as a large proportion of these "overread" contigs remain after removal of tandem repeats (lower right). B: Regression analysis of eukaryotic contigs with tandem repeats removed. Most contigs show a linear relationship between contig length and read number (black), but a second population of contigs does not (gray). This population contains disproportionately few genic contigs. X axis: log-log of length. Y axis: log-log of number of reads.