Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2005;3806:376–389. doi: 10.1007/11581062_28

Document Re-ranking by Generality in Bio-medical Information Retrieval

Xin Yan 21, Xue Li 21, Dawei Song 22
Editors: Anne H H Ngu16, Masaru Kitsuregawa17, Erich J Neuhold18, Jen-Yao Chung19, Quan Z Sheng20
PMCID: PMC7121049

Abstract

Document ranking is an important process in information retrieval (IR). It presents retrieved documents in an order of their estimated degrees of relevance to query. Traditional document ranking methods are mostly based on the similarity computations between documents and query. In this paper we argue that the similarity-based document ranking is insufficient in some cases. There are two reasons. Firstly it is about the increased information variety. There are far too many different types documents available now for user to search. The second is about the users variety. In many cases user may want to retrieve documents that are not only similar but also general or broad regarding a certain topic. This is particularly the case in some domains such as bio-medical IR. In this paper we propose a novel approach to re-rank the retrieved documents by incorporating the similarity with their generality. By an ontology-based analysis on the semantic cohesion of text, document generality can be quantified. The retrieved documents are then re-ranked by their combined scores of similarity and the closeness of documents’ generality to the query’s. Our experiments have shown an encouraging performance on a large bio-medical document collection, OHSUMED, containing 348,566 medical journal references and 101 test queries.

Keywords: Generality, Relevance, Document Ranking

Contributor Information

Anne H. H. Ngu, Email: hn12@txstate.edu

Masaru Kitsuregawa, Email: kitsure@tkl.iis.u-tokyo.ac.jp.

Erich J. Neuhold, Email: erich.neuhold@univie.ac.at

Jen-Yao Chung, Email: jychung@us.ibm.com.

Quan Z. Sheng, Email: qsheng@cse.unsw.edu.au

Xin Yan, Email: yanxin@itee.uq.edu.au.

Xue Li, Email: xueli@itee.uq.edu.au.

Dawei Song, Email: dawei_song2005@hotmail.com.

References

  • 1.Allen R.B., Wu Y. Generality of texts. In: Lim E.-p., Foo S.S.-B., Khoo C., Chen H., Fox E., Urs S.R., Costantino T., editors. Digital Libraries: People, Knowledge, and Technology; Heidelberg: Springer; 2002. pp. 111–116. [Google Scholar]
  • 2.He B., Ounis I. Inferring query performance using pre-retrieval predictors. In: Apostolico A., Melucci M., editors. String Processing and Information Retrieval; Heidelberg: Springer; 2004. pp. 43–54. [Google Scholar]
  • 3.Plachouras, V., Cacheda, F., Ounis, I., van Rijsbergen, C.: University of glasgow at the web track: Dynamic application of hyperlink analysis using the query scope. In: Proceedings of the 12th Text Retrieval Conference TREC 2003, Gaithersburg (2003)
  • 4.Van Rijsbergen C.J. Information Retrieval. London: Butterworths; 1979. [Google Scholar]
  • 5.Zhai, C., Cohen, W.W., Lafferty, J.: Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 10–17 (2003)
  • 6.Morris J., Hirst G. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics. 1991;17:21–48. [Google Scholar]
  • 7.Leacock, C., Chodorow, M.: Combining local context and wordnet similarity for word sense identification. Fellbaum, C265–C283 (1998)

Articles from Web Information Systems Engineering – WISE 2005 are provided here courtesy of Nature Publishing Group

RESOURCES