. 2015 Jan 19;7(Suppl 1):S6. doi: 10.1186/1758-2946-7-S1-S6

Table 7.

Character and word n-gram features extracted by NERsuite by default.

Feature	Brief description	Sample features (bigrams)
Character n-grams	the set of all possible combinations of a token's consecutive characters, taken n at a time (n = 2, 3, 4)	{GS}, {SK}, {K2}, {21}, {14}, {4a}

Token n-grams	unigrams and bigrams of surface forms; unigrams and bigrams of normalised surface forms where numbers numbers are replaced with '0's, the consecutive instances of which are compressed	{It, attenuated}, {attenuated, GSK214a}; {Aa, aaaaaaaaaa}, {aaaaaaaaaa, AAA000a}

Lemma n-grams	unigrams and bigrams of lemmatised surface forms	{It, attenuate}, {attenuate, GSK214a}

POS tag n-grams	unigrams and bigrams of part-of-speech (POS) tags	{PRP, VBD}, {VBD, NN},

Lemma & POS tag n-grams	unigrams and bigrams of lemmatised forms combined with POS tags	{It:PRP, attenuate:VBD}, {attenuate:VBD, GSK214a:NN}

Chunk information	chunk tag of current token; surface form of the enclosing chunk's	{B-NP}; {gestation}