Table 1.
Data sets | Pipeline | Size (g)/time (h)/speed (g/h) | Error rate (%) | ≤5%(%) | N50 | N75 | Read number with HERS |
---|---|---|---|---|---|---|---|
E. coli | raw reads | 1.38/–/– | 17.8 | 0.01 | 41,074 | 35,484 | 121 |
Canu | 0.22/1.63/0.14 | 7.06 | 20.45 | 37,747 | 32,127 | 1 | |
NECAT | 1.41/0.76/1.86 | 2.23 (4.27) | 99.34 (80.51) | 43,140 | 37,502 | 1 | |
S. cerevisiae | raw reads | 5.48/–/– | 12 | 1.61 | 34,668 | 28,152 | 7589 |
Canu | 2.18/30.83/0.071 | 3.13 | 87.3 | 10,554 | 4567 | 4820 | |
NECAT | 4.57/3.90/1.17 | 1.53 (3.08) | 95.04 (88.09) | 31,364 | 24,480 | 268 | |
D. melanogaster | raw reads | 8.30/–/– | 16.2 | 2.3 | 17,730 | 13,621 | 12,438 |
Canu | 4.79/18.10/0.26 | 8.15 | 57.57 | 15,220 | 10,658 | 6523 | |
NECAT | 7.52/4.20/1.79 | 4.89 (7.03) | 72.03 (64.18) | 17,369 | 13,104 | 3481 | |
A. thaliana | raw reads | 3.08/–/– | 20.1 | 1.57 | 23,386 | 16,253 | 14,483 |
Canu | 2.59/12.07/0.22 | 12.05 | 8.09 | 21,472 | 13,133 | 8722 | |
NECAT | 2.85/1.33/2.14 | 9.01 (11.35) | 45.85 (25.67) | 23,600 | 15,944 | 7158 | |
C. reinhardtii | raw reads | 14.84/–/– | 15 | 1.16 | 54,409 | 46,812 | 4231 |
Canu | 4.61/59.40/0.078 | 5.35 | 76.05 | 53,891 | 45,934 | 726 | |
NECAT | 14.89/11.53/1.29 | 1.99 (4.40) | 95.18 (82.13) | 56,427 | 48,708 | 278 | |
O. sativa | raw reads | 63.40/–/– | 15.6 | 0.49 | 56,325 | 50,847 | 24,205 |
Canu | 15.23/43.20/0.35 | 7.99 | 44.42 | 55,010 | 49,612 | 4413 | |
NECAT | 63.83/18.95/3.37 | 4.66 (6.45) | 74.62 (51.49) | 56,573 | 51,141 | 3511 | |
S. pennellii | raw reads | 132.74/–/– | 18.49 | 1.7 | 24,801 | 22,226 | 127,808 |
Canu | 37.53/88.8/0.42 | 9.69 | 34.04 | 21,653 | 19,364 | 5511 | |
NECAT | 121.07/137.77/0.88 | 6.45 (9.23) | 63.04 (38.77) | 23,810 | 21,480 | 5445 | |
NA12878 (rel3,4) | raw reads | 106.52/–/– | 18.50 | 0.67 | 12,196 | 7209 | 286,641 |
NECAT | 101.28/34.65/2.92 | 5.04 (7.38) | 77.60 (34.33) | 13,018 | 7883 | 53,130 | |
NA12878 (rel6) | raw reads | 123.80/–/– | 12.08 | 8.91 | 13,630 | 7984 | 315,117 |
NECAT | 98.36/39.35/2.49 | 6.28 (6.46) | 75.45 (77.24) | 14,839 | 9638 | 64,210 |
Size is the total number of base pairs in corrected reads. Time is the running time of correction tools, and the speed is the size/time. Error rate denotes the mean error rate of raw reads and corrected reads; ≤5% denotes the percentage of reads with <5% error rate in total corrected read, values in the bracket are results of NECAT after the first correction; N50 and N75 are the length of reads that reached the 50 and 75% of the total length of all reads; read number with HERS denotes the number of reads that with at least one HERS (more than 50% error in the 500 bp window). The reads used in evaluating the last three metrics (N50, N75, and read number with HERS) of NECAT were corrected from the longest 40× of the raw data set that was selected by Canu by default, see Supplementary Note 6 for details.