Skip to main content
. 2012 Jan 31;19(4):660–667. doi: 10.1136/amiajnl-2011-000599

Table 1.

Features used by support vector machine

Feature Description
Baseline TokenDistance Number of tokens between markables
SentenceDistance Number of sentences between markables
ExactMatch Markables are exact string matches
StartMatch Markables match at the start
EndMatch Markables match at the end
SoonStr Markables match besides determiners
Pronoun1 Proposed antecedent is pronoun
Pronoun2 Proposed anaphor is pronoun
Definite1 (A) Proposed antecedent is definite
Definite2 Proposed anaphor is definite
Demonstrative2 Proposed anaphor is demonstrative
NumberMatch (A) Markables have same number
WnClass (A) Markables have same named entity semantic category
Alias Markables have UMLS Concept Unique Identifier (CUI) overlap
ProStr (A) Markables are same and are pronouns
SoonStrNonpro Markables are same and are not pronouns
WordOverlap Markables share at least one word
WordSubstr With stopwords removed, one is substring of the other
BothDefinites Both markables are definite
BothPronouns Both markables are pronouns
Indefinite (A) Antecedent is indefinite
Pronoun Antecedent is pronoun and anaphor is not
ClosestComp Antecedent is closest semantically compatible markable
NPHead (A) Antecedent span ends noun phrase (NP) span
Anaph Probability output by anaphoricity classifier (NP markables only)
PermStrDist String similarity under various permutations
Manually selected syntactic features PathLength Length of path between markables in syntax tree
NPunderVP1 Antecedent node is child of VP (verb phrase) node
NPunderVP2 Anaphor node is child of VP node
NPunderS1 Antecedent node is child of S (sentence) node
NPunderS2 Anaphor node is child of S node
NPunderPP1 Antecedent node is child of PP (prepositional phrase) node
NPunderPP2 Anaphor node is child of PP node
NPSubj1 Antecedent node has SBJ (subject) function tag
NPSubj2 Anaphor node has SBJ function tag
NPSubjBoth Both nodes have SBJ function tag
Automatically selected syntactic features Path n-grams See text in the ‘Features’ section

Italicized features indicate those directly taken from Ng and Cardie.22 ‘(A)’ indicates the feature is used in the anaphoricity classifier.