. 2015 Jan 19;7(Suppl 1):S3. doi: 10.1186/1758-2946-7-S1-S3

Table 1.

Comparison of Model 1 and Model 2.

Aspect	Model 1	Model 2
System adapted	BANNER [22]	tmVar [24]

Preprocessing

Unicode transliteration	No	Yes

Tokenization	whitespace punctuation digits lowercase to uppercase	whitespace punctuation digits lowercase to uppercase uppercase to lowercase

Sentence segmentation	Java BreakIterator	None

Conditional random field configuration and settings

Implementation	MALLET [25]	CRF++ [23]

Order	1	2

Label model	IOB with one entity label	IOB with one entity label

Regularization	L₂	L₂

Gaussian prior variance (σ)	1.0	4.0

Feature frequency threshold	0	3

Features

Individual tokens	Yes	Yes

Morphology	Lemmatization	Stemming

Part of speech	Yes	No

Word shapes	Yes	Yes

Characters	N-grams length 2 - 4	Prefixes and suffixes length 2 - 5

Character counts	None	Total characters, digits, uppercase, lowercase

ChemSpot [4]	Yes	No

Semantic affixes	None	Suffixes, alkane stems, trivial rings, simple multipliers, etc.

Chemical elements	Name and symbol	Name

Amino acids	Name, 3-char abbreviation, 1-char abbreviation	None

Chemical formulas	Within a single token	None

Amino acid sequences	Across tokens	None

Context window	2	3

Post processing

Consistency	Yes	No

Abbreviation resolution	Yes	Yes

Parenthesis balancing	Yes	Yes

Chemical identifiers	Yes	Yes

This table compares the setup and configuration of Model 1 and Model 2.