Table 1. Accuracies for Calling C5-Cytosine Variants Using a Random Forest Classifiera.
context | count | accuracy |
---|---|---|
AnCGA | 250 | 96.6 (±0.54) |
AnCGC | 250 | 95.9 (±0.72) |
AnCGG | 251 | 91.9 (±0.57) |
AnCGT | 250 | 92.7 (±0.93) |
CnCGA | 287 | 94.0 (±0.55) |
CnCGC | 251 | 97.0 (±0.36) |
CnCGG | 330 | 93.3 (±0.61) |
CnCGT | 250 | 93.4 (±0.94) |
GnCGA | 250 | 93.6 (±0.75) |
GnCGC | 513 | 98.3 (±0.22) |
GnCGG | 250 | 94.2 (±0.69) |
GnCGT | 250 | 96.5 (±0.65) |
TnCGA | 250 | 95.6 (±0.81) |
TnCGC | 250 | 95.0 (±0.66) |
TnCGG | 250 | 95.6 (±0.61) |
TnCGT | 259 | 91.6 (±1.19) |
Twenty iterations of 5-fold cross-validation were performed for each XnCGY context. For each iteration, an accuracy measurement was made (% correct across entire data set of C, mC, hmC, fC, and caC for that context). Column 3 is the mean and standard deviation of those 20 measurements. Column 2 is the total number of events quantified for each XnCGY context.