Table 1.
Feature | Description | |
Baseline | TokenDistance | Number of tokens between markables |
SentenceDistance | Number of sentences between markables | |
ExactMatch | Markables are exact string matches | |
StartMatch | Markables match at the start | |
EndMatch | Markables match at the end | |
SoonStr | Markables match besides determiners | |
Pronoun1 | Proposed antecedent is pronoun | |
Pronoun2 | Proposed anaphor is pronoun | |
Definite1 (A) | Proposed antecedent is definite | |
Definite2 | Proposed anaphor is definite | |
Demonstrative2 | Proposed anaphor is demonstrative | |
NumberMatch (A) | Markables have same number | |
WnClass (A) | Markables have same named entity semantic category | |
Alias | Markables have UMLS Concept Unique Identifier (CUI) overlap | |
ProStr (A) | Markables are same and are pronouns | |
SoonStrNonpro | Markables are same and are not pronouns | |
WordOverlap | Markables share at least one word | |
WordSubstr | With stopwords removed, one is substring of the other | |
BothDefinites | Both markables are definite | |
BothPronouns | Both markables are pronouns | |
Indefinite (A) | Antecedent is indefinite | |
Pronoun | Antecedent is pronoun and anaphor is not | |
ClosestComp | Antecedent is closest semantically compatible markable | |
NPHead (A) | Antecedent span ends noun phrase (NP) span | |
Anaph | Probability output by anaphoricity classifier (NP markables only) | |
PermStrDist | String similarity under various permutations | |
Manually selected syntactic features | PathLength | Length of path between markables in syntax tree |
NPunderVP1 | Antecedent node is child of VP (verb phrase) node | |
NPunderVP2 | Anaphor node is child of VP node | |
NPunderS1 | Antecedent node is child of S (sentence) node | |
NPunderS2 | Anaphor node is child of S node | |
NPunderPP1 | Antecedent node is child of PP (prepositional phrase) node | |
NPunderPP2 | Anaphor node is child of PP node | |
NPSubj1 | Antecedent node has SBJ (subject) function tag | |
NPSubj2 | Anaphor node has SBJ function tag | |
NPSubjBoth | Both nodes have SBJ function tag | |
Automatically selected syntactic features | Path n-grams | See text in the ‘Features’ section |
Italicized features indicate those directly taken from Ng and Cardie.22 ‘(A)’ indicates the feature is used in the anaphoricity classifier.