Table 2.
Estimation of the node multiplicity in de Bruijn graphs (k=21) built from real Illumina data for 5 organisms (2 bacteria, 3 eukaryotes)
| 10× | 25× | 50× | |||||||
|---|---|---|---|---|---|---|---|---|---|
| s | node acc. | k-mer acc. | node acc. | k-mer acc. | node acc. | k-mer acc. | |||
| P. aeruginosa | 0 | 84.31 | 96.47 | 95.02 | 98.72 | 97.47 | 99.21 | ||
| 1 | 92.85 | 98.80 | 98.27 | 99.49 | 98.89 | 99.46 | |||
| 3 | 93.91 | 98.96 | 98.60 | 99.50 | 99.13 | 99.52 | |||
| 5 | 94.11 | 98.94 | 98.74 | 99.51 | 99.17 | 99.51 | |||
| S. enterica | 0 | 84.50 | 96.46 | 93.65 | 98.25 | 95.96 | 98.53 | ||
| 1 | 88.81 | 97.18 | 94.98 | 98.39 | 96.44 | 98.60 | |||
| 3 | 89.41 | 97.18 | 95.27 | 98.45 | 96.53 | 98.63 | |||
| 5 | 89.55 | 97.22 | 95.32 | 98.46 | 96.57 | 98.64 | |||
| C. elegans | 0 | 68.65 | 93.93 | 80.69 | 96.42 | 87.35 | 97.10 | ||
| 1 | 78.90 | 96.48 | 86.47 | 97.77 | 90.74 | 98.05 | |||
| 3 | 81.02 | 97.21 | 87.16 | 98.01 | 91.27 | 98.24 | |||
| 5 | 81.29 | 97.18 | 87.32 | 98.05 | 91.25 | 98.25 | |||
| A. thaliana | 0 | 67.84 | 89.10 | 82.20 | 96.16 | 89.91 | 97.05 | ||
| 1 | 73.67 | 94.64 | 85.45 | 96.83 | 91.27 | 97.54 | |||
| 3 | 73.92 | 95.26 | 85.83 | 97.09 | 91.46 | 97.71 | |||
| 5 | 73.93 | 95.43 | 85.68 | 97.17 | 91.56 | 97.70 | |||
| H. sapiens | 0 | 75.26 | 92.27 | 83.29 | 94.67 | 88.09 | 95.51 | ||
| 1 | 80.68 | 93.92 | 85.66 | 95.23 | 89.23 | 95.83 | |||
| 3 | 81.33 | 94.56 | 86.12 | 95.50 | 89.52 | 95.95 | |||
| 5 | 81.57 | 94.71 | 86.26 | 95.58 | 89.59 | 95.97 | |||
The datasets were downsampled to coverage depths of 10×,25× and 50×. For H. sapiens, the multiplicity was inferred for one million randomly sampled nodes; for all other datasets the multiplicity was inferred for all nodes. The node (resp. k-mer) accuracy refers to the percentage of nodes (resp. k-mers) in the de Bruijn graph that were assigned the correct multiplicity. The accuracy improves when using CRFs with increasing neighbourhood size s