Table 1.
Genome-aware8 | GIREMI | Multiple data sets method9,c | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Region | Number of sites | %AG | Number of sites | %AG | Accuracya | Overlapb | Number of sites | %AG | Accuracya | Overlapb |
All | 41,027 | 98.8% | 37,591 | 98.6% | 98.1% | 90.0% | 8,307 | 90.2% | 85.0% | 18.5% |
Alu | 39,757 | 99.7% | 36,131 | 99.0% | 99.4% | 90.4% | 7,797 | 98.5% | 87.1% | 24.9% |
Repetitive non-Alu | 260 | 88.6% | 267 | 83.7% | 84.3% | 86.4% | 26 | 65.6% | 65.4% | 14.8% |
Non-repetitive | 1,010 | 73.5% | 1,193 | 82.8% | 73.8% | 87.6% | 484 | 41.0% | 55.6% | 29.2% |
Accuracy was defined as (1-% SNPs among predicted editing sites in each category); 30% of GM12878 SNPs were assumed to be unknown in applying the GIREMI and multiple data sets methods.
Overlap was calculated relative to the results of the genome-aware method.
Results were derived using two data sets (GM12878 and YH RNA-Seq, Supplementary Note 3). Editing sites were identified in the two data sets separately, and final GM12878 editing sites were called by requiring their presence in YH results. Results of another mode of the multiple data sets method (pooled samples) are included in Supplementary Table 2.