Skip to main content
. 2008 Sep 1;9(Suppl 2):S12. doi: 10.1186/gb-2008-9-s2-s12

Table 5.

An example for constructing string features

Input and processed documents and condidate string features Details
Input document The Three Human Syntrophin Genes Are Expressed in Diverse Tissues, Have Distinct Chromosomal Locations, and Each Bind to Dystrophin and Its Relatives
Processed document the three human syntrophin genes are expressed in diverse tissues have distinct chromosomal locations and each bind to dystrophin and its relatives
Candidate string features the thr
he thre
e three
three h
hree hu
ree hum
ee huma
e human
...

The length of substring is fixed to 7. The example document only has one sentence (the title of the document of PMID:8576247). A seven-character window moves along the sequential text. All characters are converted to lower case. Only alphabetical letters and the space character are processed. Punctuation is converted to the space character.