Skip to main content
. 2020 Sep 14;21:402. doi: 10.1186/s12859-020-03740-x

Table 2.

Estimation of the node multiplicity in de Bruijn graphs (k=21) built from real Illumina data for 5 organisms (2 bacteria, 3 eukaryotes)

10× 25× 50×
s node acc. k-mer acc. node acc. k-mer acc. node acc. k-mer acc.
P. aeruginosa 0 84.31 96.47 95.02 98.72 97.47 99.21
1 92.85 98.80 98.27 99.49 98.89 99.46
3 93.91 98.96 98.60 99.50 99.13 99.52
5 94.11 98.94 98.74 99.51 99.17 99.51
S. enterica 0 84.50 96.46 93.65 98.25 95.96 98.53
1 88.81 97.18 94.98 98.39 96.44 98.60
3 89.41 97.18 95.27 98.45 96.53 98.63
5 89.55 97.22 95.32 98.46 96.57 98.64
C. elegans 0 68.65 93.93 80.69 96.42 87.35 97.10
1 78.90 96.48 86.47 97.77 90.74 98.05
3 81.02 97.21 87.16 98.01 91.27 98.24
5 81.29 97.18 87.32 98.05 91.25 98.25
A. thaliana 0 67.84 89.10 82.20 96.16 89.91 97.05
1 73.67 94.64 85.45 96.83 91.27 97.54
3 73.92 95.26 85.83 97.09 91.46 97.71
5 73.93 95.43 85.68 97.17 91.56 97.70
H. sapiens 0 75.26 92.27 83.29 94.67 88.09 95.51
1 80.68 93.92 85.66 95.23 89.23 95.83
3 81.33 94.56 86.12 95.50 89.52 95.95
5 81.57 94.71 86.26 95.58 89.59 95.97

The datasets were downsampled to coverage depths of 10×,25× and 50×. For H. sapiens, the multiplicity was inferred for one million randomly sampled nodes; for all other datasets the multiplicity was inferred for all nodes. The node (resp. k-mer) accuracy refers to the percentage of nodes (resp. k-mers) in the de Bruijn graph that were assigned the correct multiplicity. The accuracy improves when using CRFs with increasing neighbourhood size s