Table 2.
Sl. No. | Sources of error | True value in data | Observed value in data | Error percentage |
---|---|---|---|---|
1. | Entity detection error | 633,074* | 582,428 | 8.00% |
2. | Entity absent in text | 633,074* | 615,650 | 2.75% |
3. | Failure to detect entity | 633,074* | 609,413 | 3.73% |
4. | Entity normalisation error | |||
a. | Gene normalization error | 42,607 | 50,336 | 18.14% |
b. | Disease normalization error | 71,704 | 92,481 | 28.97% |
c. | Drug normalization error | 11,033 | 14,563 | 31.99% |
PharmGKB has been considered as the gold standard dataset for all the comparisons. *in total PGx corpus extracted from MEDLINE. The error percentage has been calculated according to the formula 1.