Table 3.
Comparisons of the Drosophila transcriptome assemblies of our postprocessing algorithm, Oases and Trans-ABySS using six publicly available libraries over different values of k-mer coverage cutoff c.
postprocess | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
k_c | initial nodes | largest tangle | largest SCC | splicing graphs | max length | N50 | >1-node graphs | max nodes | avg nodes | SNPs | total hits | unique hits | >1-hit graphs | max hits | time (mins) | memory (GB) |
35_3 | 227614 | 178545 | 88094 | 75367 | 10539 | 544 | 2048 | 124 | 6 | 16703 | 38448 | 10719 | 392 | 5 | 86,18 | 22,2 |
35_5 | 125414 | 87895 | 41654 | 47958 | 8678 | 705 | 1720 | 93 | 6 | 11334 | 27010 | 9889 | 429 | 13 | 86,17 | 22,2 |
35_10 | 57978 | 31785 | 12695 | 27695 | 6383 | 705 | 1020 | 63 | 6 | 5034 | 17271 | 8070 | 308 | 5 | 86,16 | 22,2 |
Oases | ||||||||||||||||
k_c | locus | max length | N50 | >1-trans locus | max trans | avg trans | total hits | unique hits | >1-hit locus | max hits |
time (mins) |
memory (GB) |
||||
35_3 | 39584 | 15586 | 801 | 3824 | 13 | 3 | 29928 | 10898 | 256 | 4 | 94,28 | 29,32 | ||||
35_5 | 28537 | 15586 | 936 | 2616 | 16 | 3 | 22460 | 10103 | 245 | 4 | 94,26 | 29,30 | ||||
35_10 | 17075 | 11104 | 982 | 1377 | 14 | 3 | 13800 | 8201 | 185 | 5 | 94,24 | 29,26 | ||||
Trans-ABySS | ||||||||||||||||
k_c | trans | max length | N50 | >1-node trans | max nodes | avg nodes | total hits | unique hits |
time (mins) |
memory (GB) |
||||||
35_3 | 91365 | 15586 | 898 | 50467 | 60 | 8 | 33600 | 10639 | 205,1 | 4,1 | ||||||
35_5 | 55164 | 10582 | 997 | 27763 | 46 | 7 | 25779 | 9944 | 195,1 | 4,1 | ||||||
35_10 | 28455 | 8865 | 929 | 13665 | 43 | 6 | 16032 | 8154 | 178,1 | 4,1 |
The k-mer length is fixed to 35 because Oases is only capable of assembling these libraries on machines with 32 GB physical memory when k is large. For our postprocessing algorithm, the notations are the same as in Table 1. For Oases, locus denotes the number of predicted locus, max length denotes the length of the longest predicted transcript, N50 denotes the N50 value of the longest transcript length in a predicted locus, >1-trans locus denotes the number of predicted locus with more than one transcript, max trans denotes the maximum number of transcripts in a predicted locus, avg trans denotes the average number of transcripts in predicted locus with more than one transcript, total hits denotes the total number of hits from translated BLAST search of each predicted transcript to Drosophila (isoforms are considered the same gene, only the top hit with E-value below 10−7 is considered for each transcript in a predicted locus, and hits from transcripts within the same predicted locus to the same gene are counted once), unique hits denotes the number of unique hits to different genes, >1-hit locus denotes the number of predicted locus that has BLAST hits to more than one gene, max hits denotes the maximum number of different genes that have BLAST hits to a predicted locus, time (mins) denotes the computational time in minutes, with the values to the left and to the right of "," indicating the running time of Velvet (without setting cov_cutoff) and Oases respectively, and memory (GB) denotes the memory requirement in gigabytes, with the values to the left and to the right of "," indicating the memory requirement of Velvet (without setting cov_cutoff) and Oases respectively. For Trans-ABySS, trans denotes the total number of predicted transcripts, max length denotes the length of the longest predicted transcript, N50 denotes the N50 value of the length of predicted transcripts, >1-node trans denotes the number of predicted transcripts that are the concatenation of more than one node, max nodes denotes the maximum number of nodes in a predicted transcript, avg nodes denotes the average number of nodes in predicted transcripts with more than one node, total hits denotes the total number of predicted transcripts that have BLAST hits, unique hits denotes the number of unique hits to different genes, time (mins) denotes the computational time in minutes, with the values to the left and to the right of "," indicating the running time of ABySS and Trans-ABySS respectively, and memory (GB) denotes the memory requirement in gigabytes, with the values to the left and to the right of "," indicating the memory requirement of ABySS and Trans-ABySS respectively.