. 2012 Jan 31;19(4):660–667. doi: 10.1136/amiajnl-2011-000599

Table 1.

Features used by support vector machine

	Feature	Description
Baseline	TokenDistance	Number of tokens between markables
	SentenceDistance	Number of sentences between markables
	ExactMatch	Markables are exact string matches
	StartMatch	Markables match at the start
	EndMatch	Markables match at the end
	SoonStr	Markables match besides determiners
	Pronoun1	Proposed antecedent is pronoun
	Pronoun2	Proposed anaphor is pronoun
	Definite1 (A)	Proposed antecedent is definite
	Definite2	Proposed anaphor is definite
	Demonstrative2	Proposed anaphor is demonstrative
	NumberMatch (A)	Markables have same number
	WnClass (A)	Markables have same named entity semantic category
	Alias	Markables have UMLS Concept Unique Identifier (CUI) overlap
	ProStr (A)	Markables are same and are pronouns
	SoonStrNonpro	Markables are same and are not pronouns
	WordOverlap	Markables share at least one word
	WordSubstr	With stopwords removed, one is substring of the other
	BothDefinites	Both markables are definite
	BothPronouns	Both markables are pronouns
	Indefinite (A)	Antecedent is indefinite
	Pronoun	Antecedent is pronoun and anaphor is not
	ClosestComp	Antecedent is closest semantically compatible markable
	NPHead (A)	Antecedent span ends noun phrase (NP) span
	Anaph	Probability output by anaphoricity classifier (NP markables only)
	PermStrDist	String similarity under various permutations
Manually selected syntactic features	PathLength	Length of path between markables in syntax tree
	NPunderVP1	Antecedent node is child of VP (verb phrase) node
	NPunderVP2	Anaphor node is child of VP node
	NPunderS1	Antecedent node is child of S (sentence) node
	NPunderS2	Anaphor node is child of S node
	NPunderPP1	Antecedent node is child of PP (prepositional phrase) node
	NPunderPP2	Anaphor node is child of PP node
	NPSubj1	Antecedent node has SBJ (subject) function tag
	NPSubj2	Anaphor node has SBJ function tag
	NPSubjBoth	Both nodes have SBJ function tag
Automatically selected syntactic features	Path n-grams	See text in the ‘Features’ section

Italicized features indicate those directly taken from Ng and Cardie.²² ‘(A)’ indicates the feature is used in the anaphoricity classifier.