‘…detected ankyrin G, which was…’ |
get raw text instance |
‘detected’,‘ankyrin’, | split into words, trim non-alphanumeric characters, discard stop words, and single letters |
‘ankyrin’ | match as substring to name dictionary: 118 hits for ‘ankyrin’, no hit for ‘detected’ |
…ankyrin-3lankyrin G… | use rules to generate spelling alternatives |
…ankyrin 3lankyrin- 3lankyrin Glankyrin-G… |
match each alternative in full length to the raw text |
‘ …detected ankyrin G | ‘ankyrin G’ matches |
‘ankyrin G’, Q12955, 12507143 |
recognized protein name, primary accession number, and PubMed identifier |
(Text sample from Kretschmer et al. 2002) |