Table 2:
Statistics of assembly results on real datasets
| Datasets | Toola | Assembly length (bp)b | #Ctgsc | N10 (kb)d | N50 (kb)d | N90 (kb)d | Genome fraction (%)e | #Misassembliesf | #SEsg | Real timeh | Peak memory (GB)i |
|---|---|---|---|---|---|---|---|---|---|---|---|
| C. elegans | xRead-nd | 106,383,147 | 50 | 6,183.59 | 3,561.75 | 970.09 | 97.989 | 186 | 18 | 360.4 (s) | 7.97 |
| xRead-wtpoa | 102,908,753 | 50 | 7,218.37 | 3,645.98 | 948.68 | 98.893 | 45 | 11 | 753.8 (s) | 18.93 | |
| xRead-pipe | 102,454,687 | 98 | 4,753.84 | 2,176.12 | 637.66 | 99.155 | 39 | 0 | — | — | |
| NextDenovo | 103,679,213 | 32 | 15,672.21 | 5,776.25 | 2136.32 | 99.547 | 76 | 2 | 7,979.5 (s) | 12.94 | |
| Wtdbg2 | 99,706,610 | 82 | 10,659.58 | 3,841.09 | 941.37 | 97.711 | 62 | 11 | 1,591.5 (s) | 20.47 | |
| Flye | 102,735,830 | 57 | 6,598.99 | 3,310.01 | 1,190.73 | 99.745 | 81 | 3 | 9,046.0 (s) | 45.57 | |
| Shasta | 99,144,317 | 66 | 5,428.85 | 2,831.68 | 950.70 | 97.369 | 38 | 1 | 611.5 (s) | 40.47 | |
| Hifiasm | — | — | — | — | — | — | — | — | — | — | |
| D. melanogaster | xRead-nd | 145,917,905 | 64 | 21,360.15 | 14,378.63 | 1,523.78 | 89.955 | 598 | 82 | 719.2 (s) | 12.56 |
| xRead-wtpoa | 146,420,770 | 64 | 21,611.07 | 14,367.83 | 1,508.29 | 90.973 | 393 | 77 | 1,210.0 (s) | 20.42 | |
| xRead-pipe | 142,950,657 | 151 | 15,209.22 | 10,453.23 | 532.13 | 90.979 | 333 | 1 | — | — | |
| NextDenovo | 136,332,796 | 31 | 27,936.35 | 22,718.49 | 2,307.26 | 92.572 | 205 | 13 | 7,444.5 (s) | 19.45 | |
| Wtdbg2 | 155,277,279 | 933 | 21,514.16 | 7,152.20 | 66.96 | 91.496 | 336 | 31 | 1,753.0 (s) | 27.92 | |
| Flye | 139,561,820 | 165 | 27,939.08 | 21,917.50 | 950.81 | 93.763 | 301 | 8 | 7,841.6 (s) | 50.84 | |
| Shasta | 133,568,381 | 152 | 27,932.29 | 21,763.69 | 944.65 | 91.181 | 242 | 6 | 457.5 (s) | 27.07 | |
| Hifiasm | — | — | — | — | — | — | — | — | — | — | |
|
H. sapiens
(ONT fast mode) |
xRead-nd | 2,858,898,337 | 971 | 27,779.62 | 6,055.25 | 1,386.12 | 90.401 | 3,018 | 358 | 20.89 (h) | 22.76 |
| xRead-wtpoa | 2,856,254,648 | 971 | 27,779.99 | 6,216.56 | 1,554.92 | 92.459 | 686 | 261 | 24.18 (h) | 37.05 | |
| xRead-pipe | 2,850,159,937 | 1410 | 21,912.39 | 4,486.01 | 1,155.18 | 92.405 | 372 | 29 | — | — | |
| NextDenovo | 2,782,862,247 | 638 | 57,559.56 | 25,059.84 | 3,780.30 | 91.614 | 329 | 108 | 71.89 (h) | 168.75 | |
| Wtdbg2 | 2,729,955,397 | 4257 | 31,781.95 | 11,412.92 | 1,354.65 | 88.623 | 522 | 400 | 20.12 (h) | 158.35 | |
| Flye | 2,847,205,020 | 3048 | 59,028.68 | 21,863.18 | 2,830.99 | 93.101 | 964 | 177 | 32.39 (h) | 217.85 | |
| Shasta | 2,768,613,021 | 5627 | 4,426.75 | 1,775.63 | 340.55 | 90.738 | 366 | 34 | 1.45 (h) | 293.24 | |
| Hifiasm | — | — | — | — | — | — | — | — | — | ||
| H. sapiens (PacBioHiFi) | xRead-nd | 2,972,271,147 | 2,478 | 31,619.87 | 12,035.12 | 662.49 | 95.209 | 3,269 | 253 | 7.16 (h) | 20.61 |
| xRead-wtpoa | 2,957,818,291 | 2,478 | 32,009.50 | 12,031.25 | 688.13 | 94.894 | 2,975 | 157 | 9.44 (h) | 22.88 | |
| xRead-pipe | 2,893,264,635 | 2,607 | 26,185.02 | 8,411.10 | 509.92 | 93.775 | 1,583 | 12 | — | — | |
| NextDenovo | 2,852,886,493 | 1,689 | 77,210.10 | 20,583.16 | 1,351.32 | 93.392 | 769 | 79 | 16.42 (h) | 86.67 | |
| Wtdbg2 | 2,769,287,788 | 2,295 | 44,134.56 | 14,168.42 | 2,005.26 | 90.804 | 619 | 135 | 10.32 (h) | 110.71 | |
| Flye | 2,918,011,841 | 3,121 | 59,296.15 | 24,686.38 | 2,047.19 | 94.656 | 2,678 | 107 | 25.88 (h) | 151.04 | |
| Shasta | 3,060,152,361 | 11,261 | 90,749.00 | 31,895.10 | 878.29 | 97.218 | 3,357 | 53 | 3.84 (h) | 547.01 | |
| Hifiasm | 3,089,368,991 | 664 | 139,105.11 | 87,025.29 | 9,357.14 | 98.353 | 3,693 | 854 | 6.07 (h) | 93.43 |
The assemblies were generated using 8 pipelines and benchmarked using QUAST and an in-house assessment script to assess the misassemblies.
The total number of bases in all contigs.
The total number of contigs.
N10/N50/N90: The length of the shortest contig at 10%/50%/90% of the assembly.
The percentage of aligned bases of the reference genome.
The number of all misassemblies, including both local sequence errors and structural errors.
The number of structural errors.
The overall real time of the tools’ cost on simulated datasets using 30 threads; the results marked by “s” and “h” indicate CPU hours and CPU seconds, respectively.
The peak memory of the tools (in GB) on simulated datasets.