Skip to main content
. 2007;2007:620–624.

Table 1.

The table shows the general steps taken by participants at each phase of the retrieval and the features we have extracted for use in multivariate regression modeling.

Step General approaches Features extracted
Document preprocessing and indexing -html to plain text by eliminating the tags -Stemming
-Stop-words filtering
-html to xml
-filtering out certain sections, such as references and acknowledgements
-conversion of html to records of a relational database structure[IIT]
-Stemming and stopwords filtering
Query expansion -Identification of keywords using automated, manual and interactive methods
  • -Run Type

  • -UMLS use

  • -Entrez use

  • -MeSH use

  • -HUGO use

  • -MetaMap use

  • -Webbased look-up

  • -Keyword Normalization

  • -Assigning weights to keywords

  • -Acronym expansion

-Synonyms lookup using online biomedical dictionaries such UMLS, Entrez Gene, MeSH, HUGO, MetaMAP etc.
-Assigning weights to keywords in the query
-Normalizing keywords into their root forms
Document retrieval -Use IR algorithms such as tf-idf, BM25, I(n)B2,dtu.dtn,Jelinek- Mercer smoothing, KL- divergence, SVM classifiers and an ensemble of standard algorithms - Retrieval Algorithm
- Unit of Text retrieval
-Retrieve different units of text, such as document, paragraph, subset of paragraphs and a sentence, using these algorithms
Passage retrieval -Use one of the following for passage extraction:
  • *Sentence

  • *Paragraph

  • *HMMs based estimate

  • *Subset of paragraphs

-Rerank extracted passage
-Passage Definition
-Passage rescoring