Table 2.
Number of pair-end reads | ||||||
---|---|---|---|---|---|---|
100 M | 150 M | 200 M | ||||
REF-EVAL | Contig | Recall | Original | 0.3098 | 0.3122 | 0.3126 |
MapReduce | 0.3274 | 0.3308 | 0.3314 | |||
Precision | Original | 0.3258 | 0.3283 | 0.3280 | ||
MapReduce | 0.3389 | 0.3419 | 0.3422 | |||
Nucleotide | Recall | Original | 0.9752 | 0.9764 | 0.9779 | |
MapReduce | 0.9763 | 0.9783 | 0.9793 | |||
Precision | Original | 0.9847 | 0.9845 | 0.9845 | ||
MapReduce | 0.9870 | 0.9869 | 0.9869 | |||
N1 | Original | 32,712 | 39,273 | 43,862 | ||
MapReduce | 33,687 | 40,452 | 45,344 | |||
N2 | Original | 16 | 20 | 26 | ||
MapReduce | 4 | 12 | 8 |
Statistics from the REF-EVAL component of DENONATE [41], for three simulated read datasets. Recall is the fraction of reference elements that are correctly recovered by an assembly. Precision is the fraction of assembly elements that correctly recover a reference element. At the Contig level, a 99% alignment cutoff has been used to identify a recovered transcript (left-hand bars in Fig. 3). Original refers to the results of Trinity run with the original version of Inchworm. MapReduce refers to the results of Trinity run with the MapReduce-Inchworm method presented here. Also shown are the N1 and N2 statistics, as given by the script FL_trans_analysis_pipeline.pl. N1 represents the total number of assembled transcripts that give full-length matches to the reference. N2 represents the number of fused transcripts. For comparison, there are 80,867 reference transcripts