Table 2.
Number of | Average (reads) | Std. Dev. (reads) |
---|---|---|
Distinct WGS in overlapsa | 2342 | 2011 |
WGS passing rarityb | 1431 | 352 |
WGS passing overlap qualb | 2276 | 1947 |
WGS passing bothb | 1417 | 351 |
WGS binned with BACc | 1314 | 310 |
WGS binned + mates | 1757 | 390 |
WGS in Phrap contigs | 1675 | 368 |
Distinct WGS in all overlaps produced by the overlapper with 95% identity and 100 k-mer copies allowed
Filtering done in Binner based only on overlapper information. Repeat heuristic limits k-mer copies to 12 (three times the coverage). Overlap quality heuristic requires 3 × span/(3 + span-score) ≥ 35, where score is the banded alignment score, and 2 × span/(2 + span-score) would approximate the average distance between discrepancies were there only substitutions (indels have added penalties)
Beyond the k-mer repeat and quality heuristics, only the top six (i.e., coverage × 1.5) WGS overlaps from each end of a BAC read are examined, and they are kept only if strictly better by the quality heuristic than the top discarded overlap