| ‘…detected ankyrin G, which was…’ |
get raw text instance |
| ‘detected’,‘ankyrin’, | split into words, trim non-alphanumeric characters, discard stop words, and single letters |
| ‘ankyrin’ | match as substring to name dictionary: 118 hits for ‘ankyrin’, no hit for ‘detected’ |
| …ankyrin-3lankyrin G… | use rules to generate spelling alternatives |
| …ankyrin 3lankyrin- 3lankyrin Glankyrin-G… |
match each alternative in full length to the raw text |
| ‘ …detected ankyrin G | ‘ankyrin G’ matches |
| ‘ankyrin G’, Q12955, 12507143 |
recognized protein name, primary accession number, and PubMed identifier |
| (Text sample from Kretschmer et al. 2002) | |