Skip to main content
. 2012 Nov 29;10(6):317–325. doi: 10.1016/j.gpb.2012.06.006

Table 1.

Common automated mutation curation tools and their extraction strategies and quality measures

Tool Extraction approach Extraction pair Literature set used Quality measures (P; R; F) Refs
MuteXt Regular expression, word proximity, Swiss-Prot entry Variant-protein (at amino acid level) GPCR and NR protein related full texts and abstracts 0.87; 0.87; U# [26]
MEMA Regular expression, word proximity Variant – gene (at amino acid and DNA levels) Medline abstracts 0.93;0.35;U∗ [4], [16]
Mutation GraB Regular expression, graph metric, sequence check Variant–protein–organism (at amino acid level) Full text articles 0.84;0.90;0.87 [16]
Mutation miner Regular expression, sentence co-mention Variant-organism (at amino acid level) Abstracts 0.91;0.46;0.61 [10], [16]
Mutation finder Regular expression Gene-variant (at amino acid level) Full text articles 0.98;0.81;0.81 [31]
Yip et al., 2007 Regular expression, rule-based system Gene-variant (at amino acid level) Full text articles 0.89;U;U [32]
coagMDB Regular expression, graph metric, sequence check Gene-variant (at amino acid level) Full text articles; serine protease 87-93;96-99;U [33]
MuGeX Regular expression Gene-variant (at protein and DNA levels) Medline abstracts; Alzheimer’s disease associated genes 88.9;91.3;U [34]
Krallinger et al., 2009 Regular expression, residue disambiguation and classification Gene-variant (at protein level); natural vs artificial variants Abstract and full text articles; kinase protein 72;U;U and 93.88;U;U for natural vs artificial variants [35]
PolySearch Sentence co-mention, word association SNP detection; gene-variant Abstracts, full text articles U;U;U [36]

Note: U indicates undetermined; #, G-protein-coupled receptor (GPCR) mutations; NR, nuclear hormone receptor. ∗ For example, when 100 abstracts were tested by MEMA for cited mutations in one letter code for variant-gene extraction pair, the quality measures, P and R values, were 0.93 and 0.35, respectively. P, precision; R, recall; F, F-score. See more details in the text.