Table 2.
Orthographic features used in our system.
Feature | Regular Expression | Feature | Regular Expression |
---|---|---|---|
ALLCAPS | ^[A-Z]+$ | MANY_NUM | ^[0-9]{1,2}(,[0-9]{1,2})+$ |
INITCAP | ^[A-Z].* | REAL_NUM | ^-?[0-9]+[\.][0-9]+$ |
HASCAP | ^.*[A-Z].*$ | INDASH | ^([\w+][\-]+)+\w+$ |
SINGLECAP | ^[A-Z]$ | HASDIGIT | .*[0-9].* |
PUNCTATION | ^[,;:\'\"]$ | IS_DASH | ^[-]+$ |
INITDIGIT | ^[0-9].* | ROMAN | ^[IVXDLCM]+$ |
SINGLEDIGIT | ^[0-9]$ | END_PUNC | ^[.?!]$ |
ALPHANUM | .*[A-Za-z].*[0-9].* |.*[0-9].*[A-Za-z].* |
CAPSMIX | .*[A-Z].*[a-z].* |.*[a-z].*[A-Z].* |