Skip to main content
. 2014 Jul 14;15(Suppl 5):S6. doi: 10.1186/1471-2164-15-S5-S6

Table 1.

Statistics of the simulated transcriptome assemblies of Drosophila using its known complete genome over different values of k and k-mer coverage cutoff c with 0.1% mismatches in the reads.

k_c initial nodes largest tangle largest SCC splicing graphs max length N50 >1-node graphs max nodes avg nodes SNPs total hits unique hits >1-hit graphs max hits time (mins) memory (GB)
25_3 38884 17900 9937 15713 37380 2366 1361 3106 10 883 12731 10162 643 27 80,3 21,2
25_5 34822 16979 9255 15521 37380 2374 1351 266 7 517 12708 10160 643 27 80,3 21,2
25_10 34494 16712 9057 15486 37380 2373 1345 194 7 481 12699 10158 639 27 80,3 21,2

31_3 28342 5037 2080 13819 45158 2704 1719 1007 7 496 12523 11112 546 12 76,3 18,2
31_5 27307 4971 1898 13740 45158 2714 1717 167 6 381 12494 11110 552 13 76,3 18,2
31_10 27265 4947 1885 13829 45158 2704 1698 161 6 377 12536 11109 542 13 76,3 18,2

Initial nodes denotes the number of nodes that are in the initial assembly. Largest tangle denotes the number of nodes of the largest connected component. Largest SCC denotes the number of nodes of the largest strongly connected component. Splicing graphs denotes the number of splicing graphs. Max length denotes the length (in nucleotides) of the longest path over all splicing graphs. N50 denotes the N50 value of the length (in nucleotides) of the longest path in each graph. >1-node graphs denotes the number of graphs with more than one node. Max nodes denotes the maximum number of nodes in these non-linear graphs. Avg nodes denotes the average number of nodes in these non-linear graphs. SNPs denotes the number of SNPs recovered. Total hits denotes the total number of hits from translated BLAST search of each node to Drosophila (isoforms are considered the same gene, only the top hit with E-value below 107 is included for each node in a splicing graph, and hits from nodes within the same splicing graph to the same gene are counted once). Unique hits denotes the number of unique hits to different genes. >1-hit graphs denotes the number of splicing graphs that have BLAST hits to more than one gene. Max hits denotes the maximum number of different genes that have BLAST hits to a splicing graph. Time (mins) denotes the computational time in minutes, with the values to the left and to the right of "," indicating the running time of Velvet and our postprocessing algorithm respectively. Memory (GB) denotes the memory requirement in gigabytes, with the values to the left and to the right of "," indicating the memory requirement of Velvet and our postprocessing algorithm respectively.