Table 1.
Common automated mutation curation tools and their extraction strategies and quality measures
| Tool | Extraction approach | Extraction pair | Literature set used | Quality measures (P; R; F) | Refs |
|---|---|---|---|---|---|
| MuteXt | Regular expression, word proximity, Swiss-Prot entry | Variant-protein (at amino acid level) | GPCR and NR protein related full texts and abstracts | 0.87; 0.87; U# | [26] |
| MEMA | Regular expression, word proximity | Variant – gene (at amino acid and DNA levels) | Medline abstracts | 0.93;0.35;U∗ | [4], [16] |
| Mutation GraB | Regular expression, graph metric, sequence check | Variant–protein–organism (at amino acid level) | Full text articles | 0.84;0.90;0.87 | [16] |
| Mutation miner | Regular expression, sentence co-mention | Variant-organism (at amino acid level) | Abstracts | 0.91;0.46;0.61 | [10], [16] |
| Mutation finder | Regular expression | Gene-variant (at amino acid level) | Full text articles | 0.98;0.81;0.81 | [31] |
| Yip et al., 2007 | Regular expression, rule-based system | Gene-variant (at amino acid level) | Full text articles | 0.89;U;U | [32] |
| coagMDB | Regular expression, graph metric, sequence check | Gene-variant (at amino acid level) | Full text articles; serine protease | 87-93;96-99;U | [33] |
| MuGeX | Regular expression | Gene-variant (at protein and DNA levels) | Medline abstracts; Alzheimer’s disease associated genes | 88.9;91.3;U | [34] |
| Krallinger et al., 2009 | Regular expression, residue disambiguation and classification | Gene-variant (at protein level); natural vs artificial variants | Abstract and full text articles; kinase protein | 72;U;U and 93.88;U;U for natural vs artificial variants | [35] |
| PolySearch | Sentence co-mention, word association | SNP detection; gene-variant | Abstracts, full text articles | U;U;U | [36] |
Note: U indicates undetermined; #, G-protein-coupled receptor (GPCR) mutations; NR, nuclear hormone receptor. ∗ For example, when 100 abstracts were tested by MEMA for cited mutations in one letter code for variant-gene extraction pair, the quality measures, P and R values, were 0.93 and 0.35, respectively. P, precision; R, recall; F, F-score. See more details in the text.