Table 3. Sequence statistics by library.
%GC | avg. read length | total reads | Mapped reads | Unmapped reads | |||
ambiguous hits to ref orgs | no hits to ref orgs | ||||||
hits NCBI | no hits to NCBI | ||||||
Enz+Pyrosequencing | 0.58 | 202 | 505962 | 495428 | 2354 | 8004 | 176 |
Enz+Sanger | 0.56 | 578 | 11781 | 9054 | 364 | 272 | 2091 |
Additional Enz+Sanger | 0.59 | 688 | 6542 | 5351 | 149 | 267 | 775 |
Glycerol Enz+Sanger | 0.61 | 655 | 6186 | 4338 | 294 | 388 | 1166 |
EnzBB+Sanger | 0.48 | 563 | 14418 | 11552 | 760 | 519 | 1587 |
Additional EnzBB+Sanger | 0.56 | 665 | 2040 | 1258 | 39 | 266 | 477 |
Glycerol EnzBB+Sanger | 0.52 | 699 | 1348 | 718 | 51 | 180 | 399 |
DNeasy+Sanger | 0.6 | 568 | 14692 | 11865 | 759 | 592 | 1476 |
Additional DNeasy+Sanger | 0.6 | 654 | 2625 | 1945 | 90 | 168 | 422 |
Glycerol DNeasy+Sanger | 0.59 | 694 | 1726 | 1054 | 36 | 155 | 481 |
For each library, the average read length, percent G+C, total number of reads, and the numbers of mapped and unmapped reads are given. The unmapped reads fall into 2 categories: 1) those that have BLAST hits to our reference organisms, but cannot be mapped to a single organism because they have high sequence identity to more than one organism or because the sequence identity is below the 95% threshold; and 2) those that do not have BLAST hits to our reference organisms. Reads in the second category are further subdivided into reads that do hit other organisms in the NCBI non-redundant nucleotide database and reads that do not.