Table 1. Hybrid assembly statistics for individual libraries.
Library 1 | Library 1+2 | Library 3 | ||||
Full-length cDNA clones | 200 | 800 | 780 | |||
# contigs | N50 (bp) | # contigs | N50 (bp) | # contigs | N50 (bp) | |
de novo contigs | 424 | 1,721 | 2,824 | 1,214 | 1,910 | 1,184 |
After merging overlapping contigs | 283 | 1,912 | 1,605 | 1,558 | 965 | 1,542 |
After exon gaps are closed | 194 | 2,157 | 1,019 | 1,770 | 709 | 1,707 |
After intron gaps are closed | 187 | 2,157 | 890 | 1,864 | 674 | 1,741 |
After association with clones | 161 | 2,116 | 695 | 1,925 | 679 | 1,764 |
The number of contigs is presented for each step in the assembly process. Ideally, the figures converge to the number of PCR-amplified full-length cDNA clones in each library as it goes down the process. However, the actual number of PCR-amplified full-length cDNA clones is not known except for library 1 (158 clones). Less-amplified clones were often reconstructed well but their bands in a gel electrophoresis picture were too faint to identify, making it difficult to experimentally measure the number of PCR-amplified full-length cDNA clones without aligning the shotgun reads. In the last step, the N50 contig size for library 1 decreased because some contigs from the previous step were not associated with any Sanger read and were then discarded to avoid false positives. For library 3, the number of contigs increased in the last step because multiple Sanger reads were associated to the same contig, which was possibly an error that could be eliminated if the output was manually examined.