Table 1.
Example of a rule. Row 2 shows a rule for capturing Street mentions. The rule contains four types of components (pattern, orthographic indicators and semantic/lexical and contextual clues).
Feature type | Pattern | Orthographic | Semantic/Lexical | Contextual |
{RegEx} = [1-9][0-9]{0,3} | {ORTHO} = {upperInitial, allCapital, ...} | {STREET_CLUE} = {Street, St, Drive, Dr, ...} | {SYMBOL} = {Ø, ‘.’} | |
A rule | {RegEx} {ORTHO} {STREET_CLUE}{SYMBOL} | |||
In text | ... 62 Angora Dr . ... ... 1 Jefferson Road ... ... 55 Bury St ... |