Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification

Vijay N Garla; Cynthia Brandt

doi:10.1136/amiajnl-2012-001350

. 2012 Oct 16;20(5):882–886. doi: 10.1136/amiajnl-2012-001350

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification

Vijay N Garla ¹, Cynthia Brandt ^1,²

PMCID: PMC3756260 PMID: 23077130

Abstract

Background

Word sense disambiguation (WSD) methods automatically assign an unambiguous concept to an ambiguous term based on context, and are important to many text-processing tasks. In this study we developed and evaluated a knowledge-based WSD method that uses semantic similarity measures derived from the Unified Medical Language System (UMLS) and evaluated the contribution of WSD to clinical text classification.

Methods

We evaluated our system on biomedical WSD datasets and determined the contribution of our WSD system to clinical document classification on the 2007 Computational Medicine Challenge corpus.

Results

Our system compared favorably with other knowledge-based methods. Machine learning classifiers trained on disambiguated concepts significantly outperformed those trained using all concepts.

Conclusions

We developed a WSD system that achieves high disambiguation accuracy on standard biomedical WSD datasets and showed that our WSD system improves clinical document classification.

Data sharing

We integrated our WSD system with MetaMap and the clinical Text Analysis and Knowledge Extraction System, two popular biomedical natural language processing systems. All codes required to reproduce our results and all tools developed as part of this study are released as open source, available under http://code.google.com/p/ytex.

Keywords: Word Sense Disambiguation, Semantic similarity, Natural Language Processing

Introduction

Terms in a natural language may be ambiguous—that is, they can have multiple meanings. For example, the word ‘cold’ can refer to the viral infection ‘common cold’ or the ‘sensation of cold’. Humans can relatively easily disambiguate the meaning of a term from its context. Word sense disambiguation (WSD) systems use the context surrounding an ambiguous term to assign it a unique unambiguous concept. WSD is an important stage in many text-processing tasks.^1–3

One subdomain of biomedical text processing to which WSD can potentially be applied is clinical text processing: clinical document classification, information extraction, and document retrieval have important applications to operational aspects of healthcare delivery and to electronic medical record-based biomedical research.⁴ ⁵ Many biomedical WSD approaches estimate term distributions, compute term co-occurrence, or train statistical models from biomedical literature corpora. It is not clear if WSD techniques based on literature corpora are applicable to narrative clinical text.

The goal of this study was to develop and evaluate a practical biomedical WSD system that can disambiguate both biomedical literature and narrative clinical text, and to evaluate the contribution of WSD to clinical text classification. We developed a knowledge-based WSD system that relies solely on the domain knowledge encoded in the Unified Medical Language System (UMLS) and evaluated it on standard biomedical WSD datasets. To quantify the contribution of WSD to clinical text classification, we evaluated machine learning classifiers on a clinical document classification benchmark. We compared the accuracy of models trained on all concepts mapped to ambiguous terms with models trained using disambiguated concepts.

Background

Biomedical WSD

In the biomedical domain, WSD uses the context surrounding an ambiguous term (target term) to assign it a unique concept from the UMLS Metathesaurus, a compendium of over 100 biomedical vocabularies that includes the Systematized Nomenclature of Medicine-Clinical Terminology (SNOMED-CT).⁶ The UMLS Metathesaurus enumerates concepts, assigns concepts unique identifiers (CUIs), maps concepts to terms, and encodes semantic relationships between concepts. UMLS terms may be ambiguous—that is, they may be assigned to multiple CUIs. For example, the term ‘pt’ maps to 12 CUIs including Patient (C0030705), Platinum (C0032207), and Pint (C0560012).

Approaches to WSD in the biomedical domain can roughly be divided into supervised, semi-supervised, and knowledge-based methods.⁷ Supervised and semi-supervised WSD approaches train a statistical model to assign a concept to a target term based on its context. Supervised approaches use manually annotated training data whereas semi-supervised approaches automatically create these data.^8–10 Although supervised and semi-supervised approaches outperform knowledge-based approaches, the need to assemble a training set for each target term makes it impractical to implement (semi-)supervised approaches for a large set of terms.

Knowledge-based techniques include vector-based, Personalized PageRank (PPR), and the adapted Lesk methods.^10–13 Knowledge-based techniques are unsupervised: they do not require labeled training data. In this study we implemented the adapted Lesk method (figure 1), which scores an ambiguous term's candidate concepts by summing the semantic relatedness between each candidate concept and surrounding context concepts.⁷ ¹⁴ In our implementation we compute relatedness via semantic similarity measures.

Candidate concept scoring. Adapted from McInnes *et al*.⁷

Semantic similarity measures estimate the similarity between a pair of concepts and can be roughly divided into knowledge-based and distributional methods.^15–17 Knowledge-based similarity methods use the taxonomic structure of a biomedical terminology to compute similarity; these include path finding measures and intrinsic information content (IC) measures.¹⁶ ¹⁷ Distributional similarity methods use the distribution of concepts within a corpus in conjunction with the taxonomic structure to compute similarity; these include corpus IC-based measures.¹⁶ McInnes et al evaluated the adapted Lesk method using path finding-based and corpus IC-based measures and found that corpus IC-based measures parameterized by concept frequencies derived from MEDLINE outperformed path finding-based measures.

It is important to note that significant differences may exist between biomedical literature and narrative clinical text. Clinical text is often composed of semi-grammatical ‘telegraphic’ phrases, uses a narrower vocabulary than biomedical literature, and is rife with domain-specific acronyms. For example, the term ‘pt’ when used in a clinical note almost always refers to the concept ‘Patient’ (C0030705) and rarely to any of the 11 other possible candidate concepts; this distribution may be significantly different in biomedical literature. Because of these differences, WSD approaches based on term distributions from biomedical literature may not generalize to clinical notes.

Clinical document classification

Many approaches to clinical document classification rely on accurate mapping of narrative text to concepts from a biomedical knowledge source. A common approach is to train machine learning algorithms on document representations that include UMLS CUIs as features.¹⁸ ¹⁹ More knowledge-intensive approaches enrich the feature set with related concepts²⁰ or apply semantic kernels that project documents that contain related concepts closer together in a feature space.²¹ ²² WSD can complement these document classification approaches by filtering out concepts mapped incorrectly to ambiguous terms.

Methods

The accuracy of our WSD method is influenced by the choice of semantic similarity measure, UMLS source vocabularies, context window size, and named entity recognition (NER) system implementation. In this section we discuss the semantic similarity measures and NER systems used in this study and describe the WSD datasets used for evaluation, the UMLS source vocabularies selected, and our approach to evaluating WSD via document classification.

Semantic similarity measures

Semantic similarity measures use a directed concept graph whose vertices represent concepts and edges represent taxonomical relationships. Semantic similarity measures calculate the shortest path between two concepts; this path traverses the least common subsumer—that is, the closest common parent concept. Path finding measures compute similarity as a function of the length of the path between a pair of concepts. One limitation of path finding measures is that they give equal weight to all relationships.¹⁶ IC-based measures attempt to correct for this by weighting edges based on concept specificity.¹⁶ ¹⁷ ²³ IC can be estimated solely from the structure of a taxonomy (intrinsic IC) or from the distribution of concepts in a text corpus in conjunction with a taxonomy (corpus IC).¹⁷ ²³ ²⁴ In this study we use intrinsic IC, as this mitigates bias towards a specific corpus.

In this study we evaluated the adapted Lesk method with the Path, Leacock and Chodorow (LCH) and Wu and Palmer (WUP) path finding measures²⁵ ²⁶ and the Lin and Jiang and Conrath (JC) intrinsic IC-based measures.²⁷ ²⁸ A detailed description of these measures is given in the online appendix.

Named entity recognition systems

Named entity recognition (NER) is the process of mapping natural language text to concepts from a knowledge source. The accuracy of our WSD method may differ significantly based on the NER method used to map text surrounding a target term to concepts. We evaluated our system with MetaMap and the clinical Text Analysis and Knowledge Extraction System (cTAKES), two popular publicly available biomedical natural language processing (NLP)/NER systems.²⁹ ³⁰

WSD evaluation

Word sense disambiguation datasets

We used the National Library of Medicine's WSD (NLM WSD) and MSH WSD datasets to evaluate the adapted Lesk algorithm.³¹ ³² The NLM WSD dataset contains 50 ambiguous terms from 100 PubMed abstracts from 1998 that were randomly selected for each term. In cases where no appropriate UMLS concept existed, annotators assigned the term the label ‘None’. We followed the same set-up as previous evaluations and discarded terms labeled with ‘None’.¹⁰ ¹³ ³³

The MSH WSD dataset contains 203 ambiguous terms from 37 888 MEDLINE abstracts. MEDLINE is a bibliographic database containing over 19 million references to journal articles in the life sciences.³⁴ The MSH WSD dataset was automatically created by identifying ambiguous terms in MEDLINE abstracts and assigning the terms the concept corresponding to the manually indexed MeSH concept.³²

We processed all abstracts with cTAKES V.2.5 and MetaMap V.2011 to map text to concepts from the UMLS³⁰ (see online appendix for a detailed description of cTAKES and MetaMap parameters). We defined the context window as follows: between 10 and 70 concepts on either side of the term were considered to be disambiguated (target term). In addition, for the NLM WSD corpus, we included all concepts from the title of the article.

Concept graphs

Similarity measures use a directed acyclic concept graph from which path lengths, depth, and IC are calculated. To determine the effect of source vocabulary selection on WSD accuracy, we performed WSD with concept graphs built from various subsets of the UMLS. The ‘parsimonious’ graph contained the minimal set of UMLS source vocabularies used by Agirre et al¹³ in their evaluation of the PPR algorithm on the NLM WSD dataset; this included SNOMED-CT, MeSH, CRISP, and Alcohol and Other Drug (AOD) thesaurus. The ‘full’ concept graph used all restriction-free (level 0) UMLS 2011AB source vocabularies and SNOMED-CT. In the MSH WSD dataset, the candidate concepts of target terms are found in the MeSH vocabulary; for this dataset we also evaluated our method with a concept graph that uses just the MeSH source vocabulary.

Statistical analysis

The accuracy of our WSD system is influenced by the NER method, vocabulary selection, similarity measure, and context window size. To determine the impact of each parameter, we modeled disambiguation accuracy as a linear function of these parameters for each WSD dataset. We used the magnitude of the regression coefficients and their significance to quantify the impact of each parameter on WSD performance.

Clinical document classification evaluation

We evaluated document classifiers on the Cincinnati Computational Medicine Center (CMC)'s 2007 Medical NLP Challenge.³⁵ Forty-four teams participated in the CMC 2007 challenge, a multilabel classification task the goal of which was to assign ICD-9CM codes (labels) to radiology reports. The challenge dataset comprises a training set (n=978) and test set (n=976).

We processed all documents with both cTAKES and MetaMap and disambiguated text using the optimal WSD parameters from the MSH WSD dataset evaluation. We represented documents as binary vectors indexed by concepts. For each NER system we generated feature vectors with all concepts (all) and feature vectors with only disambiguated concepts (disambiguated). We used LibSVM V.3.1 to train a linear support vector machine (SVM) on each feature representation and label. We optimized SVM parameters by cross-validation on the training set. For the final evaluation we trained an SVM for each label on the training set using the optimal parameters identified by cross-validation and evaluated the SVM on the test set. Participating teams for this challenge were ranked by the micro-averaged F1 score and we computed the micro-averaged and macro-averaged F1 score for each NER system/feature representation combination. We applied the z-test of proportions to test the significance of differences in F1 scores between the all and disambiguated feature vectors.

Results and discussion

Word sense disambiguation

We evaluated a range of parameter combinations with our WSD method with the hope of identifying the best parameters to inform future WSD applications. However, we found that the best parameter combination varied across WSD datasets.

For each dataset we modeled WSD accuracy as a linear function of NER method, vocabulary selection, similarity measure, and context window size. Table 1 lists the coefficients and their significance for each parameter (see online appendix for WSD results for all parameter combinations and model coefficient p values). Parameters with a significantly positive coefficient improved WSD accuracy in general. The models had a high coefficient of determination (R²) for both datasets, suggesting that these parameters were highly correlated with the variability in WSD performance.

Table 1.

Effect of parameters on word sense disambiguation (WSD) accuracy

Parameter		MSH WSD coefficient		NLM WSD coefficient
NER	MetaMap	0.0653	****	−0.0073	**
Vocabulary	Entire UMLS	−0.0324	****	−0.0087	***
Vocabulary	MeSH	0.0129	***
Similarity measure	Lin	−0.0111	*	10.0346	****
	JC	0.0450	****	−0.0520	****
	LCH	−0.0497	****	0.0141	**
	Path	0.0006		−0.0417	****
	WUP	−0.0723	****	−0.0991	****
Context window size		−0.0001		0.0002	**
R²		0.8538		0.7929

Open in a new tab

*p<0.20, **p<0.05, ***p<0.01, ****p<0.001.

JC, Jiang and Conrath; LCH, Leacock and Chodorow; NER, named entity recognition; UMLS, Unified Medical Language System; WUP, Wu and Palmer.

Effect of context window size

For the NLM WSD dataset, accuracy increased significantly with window size (table 1) while, for the MSH WSD dataset, this was only true in combination with MetaMap for NER. Figure 2 depicts WSD accuracy by context window size for the JC and LCH measures on the MSH WSD and NLM WSD datasets, respectively.

Word sense disambiguation (WSD) accuracy by context window size. cTAKES, clinical Text Analysis and Knowledge Extraction System.

Effect of NER method

On the NLM WSD dataset, using context concepts identified via cTAKES resulted in significantly higher accuracy than MetaMap (table 1, figure 1). On the MSH WSD dataset this relationship was reversed, with MetaMap yielding a significantly better disambiguation performance.

Path finding versus intrinsic IC-based measures

The path finding-based LCH achieved significantly higher accuracy than others on the NLM WSD dataset whereas the IC-based JC measure achieved significantly higher accuracy on the MSH WSD dataset (tables 1 and 2).

Table 2.

Word sense disambiguation (WSD) accuracy by measure and concept graph

		Concept graph
Dataset and parameters	Similarity measure	MeSH	SNOMED-CT,MeSH, AOD, CSP	Entire UMLS
MSH WSDWindow 70MetaMap	JC	0.8071	0.8013	0.7876
	Path	0.7689	0.7343	0.6865
	Lin	0.7636	0.6986	0.6699
	LCH	0.6983	0.6577	0.6224
	WUP	0.7206	0.6155	0.5696
NLM WSDWindow 70cTAKES	LCH		0.5855	0.6430
	Lin		0.5654	0.5355
	Path		0.5609	0.5333
	JC		0.5561	0.5182
	WUP		0.4652	0.4856

Open in a new tab

AOD, Alcohol and Other Drug; CSP, computer retrieval of information on scientific projects (CRISP) thesaurus; cTAKES, clinical Text Analysis and Knowledge Extraction System; JC, Jiang and Conrath; LCH, Leacock and Chodorow; SNOMED-CT, Systematized Nomenclature of Medicine-Clinical Terminology; UMLS, Unified Medical Language System; WUP, Wu and Palmer.

Effect of vocabulary selection

On both datasets, using concept graphs derived from smaller subsets of the UMLS increased WSD accuracy in general. On the NLM WSD corpus, using the entire UMLS had a significantly negative effect on WSD accuracy (table 1). However, the best performance on the NLM WSD corpus was achieved using the entire UMLS (table 2). On the MSH WSD dataset, using the MeSH concept graph significantly increased accuracy compared with using the combination of SNOMED-CT, MeSH, AOD, and CRISP, which in turn significantly outperformed use of the entire UMLS.

These results suggest that tuning the set of UMLS vocabularies to the application domain improves WSD performance. The MSH WSD task was focused on disambiguation of terms found in MeSH, which may explain why a concept graph using just the MeSH vocabulary achieved the highest performance.

Comparison with other methods

The WSD method we developed outperformed all previously reported knowledge-based approaches with the exception of PPR (table 3). The semi-supervised Automatically Extracted Corpus (AEC) method outperformed ours on both WSD datasets.

Table 3.

Comparison with other methods

	NLM WSD		MSH WSD
This study	LCH+cTAKES	0.6430	JC+MetaMap	0.8071
Previous studies	Adapted Lesk⁷			0.7400
	PPR¹³	0.6810
	MRD³¹	0.6389		0.8070
	2-MRD³¹	0.5500		0.7799
	AEC³¹	0.6836		0.8383

Open in a new tab

AEC, automatically extracted corpus; cTAKES, clinical Text Analysis and Knowledge Extraction System; JC, Jiang and Conrath; LCH, Leacock and Chodorow; MRD, machine readable dictionary; PPR, Personalized PageRank; WSD, word sense disambiguation.

In addition to achieving excellent performance, our method does not suffer from some of the limitations of other approaches. The machine readable dictionary (MRD) method requires concept definitions, but the UMLS contains definitions for only 7% of its 2.1 million concepts. The AEC method requires training a model for each word to be disambiguated, and models trained on biomedical literature may not generalize to clinical text.

Our method outperformed the adapted Lesk implementation of McInnes et al who used path finding-based and corpus IC-based measures and found that corpus IC-based measures achieved the best performance.⁷ McInnes et al estimated corpus IC from the concept frequencies in the MEDLINE bibliographic database, which may limit applicability to clinical WSD. The difference between our results and those of McInnes et al may be attributable to the different NER systems used.

The PPR method, like our method, uses semantic relationships between concepts and does not rely on biomedical literature. PPR achieved a higher accuracy on the NLM WSD dataset but was orders of magnitude more computationally intensive than the adapted Lesk method (see below).

On the surface it seems that identifying the optimal parameters for our method represents a challenge. However, this is less complicated than it would appear: using a wide context window should improve WSD performance. The choice of NLP/NER system is typically motivated by considerations other than its effect on WSD accuracy. Most text-processing applications are focused on particular clinical phenomena or a specific problem domain; our results suggest that selecting a parsimonious set of UMLS source vocabularies based on the problem domain will yield optimal WSD performance. The remaining parameter is the choice of similarity measure. Unfortunately, it is not clear which similarity measure—the path finding-based LCH or the intrinsic IC-based JC—will achieve the best performance in general. Further research is needed to address this question.

Clinical document classification

Table 4 presents the results of document classification using all CUIs and disambiguated CUIs on the CMC 2007 challenge dataset and the p values for the significance of the difference in results. Informed by the results on the MSH WSD dataset, we used the following parameters for WSD on the CMC 2007 corpus: the parsimonious concept graph built from the SNOMED-CT, MeSH, CRISP, and AOD vocabularies; context window size of 50; and JC measure. Disambiguating terms led to a substantial reduction in the number of distinct concepts; classifiers trained on disambiguated concepts achieved significantly higher micro-averaged and macro-averaged F1 scores than those trained on all concepts. This performance improvement was meaningful. Using MetaMap and all concepts, our system would have obtained 21st place in the challenge, while using disambiguated concepts would boost our system to 13th place in the challenge.

Table 4.

Clinical document classification with and without word sense disambiguation (WSD)

		All	Disambiguated	p Value
MetaMap	Micro-F1	0.8199	0.8418	0.12
	Macro-F1	0.3867	0.4292	0.02
	No of concepts	3302	1938
cTAKES	Micro-F1	0.8175	0.8336	0.27
	Macro-F1	0.3600	0.3942	0.06
	No of concepts	2867	1828

Open in a new tab

cTAKES, clinical Text Analysis and Knowledge Extraction System.

One limitation of this study is that we did not directly evaluate WSD accuracy on ambiguous terms from clinical text. However, WSD is a means to an end, and the value of a WSD system cannot be measured solely by its disambiguation accuracy on a benchmark. The true value of a WSD system lies in its applicability to text processing tasks such as document classification.

System performance and interoperability

Our open source WSD system is written in the platform-independent Java language. We developed a WSD module for the unstructured information management architecture (UIMA) and integrated it with the MetaMap UIMA annotator and with cTAKES, two popular NLP frameworks for the biomedical domain.³⁰ ³⁶

Our WSD system disambiguated the entire NLM WSD corpus in 30 s on a 64-bit Ubuntu 10 Linux workstation with dual quad-core 3.00 GHz Intel Xeon processors; this corresponds to 10 000 terms/min. For comparison, the PPR system developed by Agirre et al running on similar hardware processed 37 terms/min.¹³

Conclusions

We have developed a WSD system that achieves high disambiguation accuracy on standard biomedical WSD datasets and show that our WSD system improves clinical document classification. We developed a WSD module compatible with MetaMap and cTAKES, two popular biomedical NLP systems. We have released as open source all tools, source code, and scripts needed to reproduce our results (available at http://code.google.com/p/ytex).

Acknowledgments

We are especially thankful to the NLM and others who developed the Word Sense Disambiguation benchmarks.

Footnotes

Funding: This work was supported in part by NIH grant T15 LM07056 from the National Library of Medicine, CTSA grant number UL1 RR024139 from the NIH National Center for Advancing Translational Sciences (NCATS), and VA grant HIR 08-374 HSR&D: Consortium for Health Informatics.

Competing interests: None.

Provenance and peer review: Not commissioned; internally peer reviewed.

References

1.Friedman C. A broad-coverage natural language processing system. Proceedings of AMIA Symposium; 2000;270–4 [PMC free article] [PubMed] [Google Scholar]
2.MacMullen WJ, Denn SO. Information problems in molecular biology and bioinformatics. J Am Soc Inform Sci Technol 2005;56:447–56 [Google Scholar]
3.Aronson AR, Bodenreider O, Chang HF, et al. The NLM indexing initiative. Proceedings of AMIA Symposium; 2000;17–21 [PMC free article] [PubMed] [Google Scholar]
4.Liao KP, Cai T, Gainer V, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken) 2010;62:1120–7 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Demner-Fushman D, Chapman WW, McDonald CJ. What can natural language processing do for clinical decision support? J Biomed Inform 2009;42: 760–72 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.National Library of Medicine UMLS® Reference Manual—NCBI Bookshelf. 2009. http://www.ncbi.nlm.nih.gov/books/NBK9676/ (accessed 30 Mar 2011). [Google Scholar]
7.McInnes BT, Pedersen T, Liu Y, et al. Knowledge-based method for determining the meaning of ambiguous biomedical terms using information content measures of similarity. Proceedings of AMIA Symposium; 2011;895–904 [PMC free article] [PubMed] [Google Scholar]
8.Savova GK, Coden AR, Sominsky IL, et al. Word sense disambiguation across two domains: Biomedical literature and clinical notes. J Biomed Inform 2008;41:1088–100 [DOI] [PubMed] [Google Scholar]
9.Liu H. A multi-aspect comparison study of supervised word sense disambiguation. J Am Med Inform Assoc 2004;11:320–31 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Plaza L, Jimeno-Yepes AJ, Díaz A, et al. Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts. BMC Bioinform 2011;12:355. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Humphrey SM, Rogers WJ, Kilicoglu H, et al. Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: preliminary experiment. J Am Soc Inform Sci Technol 2006;57:96–113 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Lesk M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. Proceedings of the 5th Annual International Conference on Systems Documentation; New York, NY, USA, 1986:24–6 [Google Scholar]
13.Agirre E, Soroa A, Stevenson M. Graph-based word sense disambiguation of biomedical documents. Bioinform 2010;26:2889–96 [DOI] [PubMed] [Google Scholar]
14.Patwardhan S, Banerjee S, Pedersen T. Using measures of semantic relatedness for word sense disambiguation. In: Gelbukh A, ed. Computational linguistics and intelligent text processing. Berlin/Heidelberg: Springer, 2003:241–57 [Google Scholar]
15.Agirre E, Alfonseca E, Hall K, et al. A study on similarity and relatedness using distributional and WordNet-based approaches. Proceedings of Human Language Technologies: the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics; Boulder, Colorado, 2009: 19–27 [Google Scholar]
16.Pedersen T, Pakhomov SVS, Patwardhan S, et al. Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform 2007;40:288–99 [DOI] [PubMed] [Google Scholar]
17.Sánchez D, Batet M. Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective. J Biomed Inform 2011;44:749–59 [DOI] [PubMed] [Google Scholar]
18.de Bruijn B, Cherry C, Kiritchenko S, et al. Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assoc 2011;18:557–62 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.D'Avolio LW, Nguyen TM, Farwell WR, et al. Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC). J Am Med Inform Assoc 2010;17:375–82 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Suominen H, Ginter F, Pyysalo S, et al. Machine learning to automate the assignment of diagnosis codes to free-text radiology reports: a method description. In: Hauskrecht M, Schuurmans D, Szepesvari C, eds. Proceedings of the ICML/UAI/COLT 2008 Workshop on Machine Learning for Healthcare Applications. Helsinki, Finland, 2008. http://www.tucs.fi:8080/publications/attachment.php?fname=inpSuGiPyAiPaSaSa08a.pdf [Google Scholar]
21.Aseervatham S, Bennani Y. Semi-structured document categorization with a semantic kernel. Pattern Recognit 2009;42:2067–76 [Google Scholar]
22.Garla VN, Brandt C. Ontology-guided feature engineering for clinical text classification. J Biomed Inform 2012;45:992–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Resnik P. Using information content to evaluate semantic similarity in a taxonomy. Proceedings of the 14th International Joint Conference on Artificial Intelligence; 1995:448–53 [Google Scholar]
24.Seco N, Veale T, Hayes J. An intrinsic information content metric for semantic similarity in WordNet. ECAI'2004, the 16th European Conference on Artificial Intelligence; 2004 [Google Scholar]
25.Wu Z, Palmer M. Verbs, semantics and lexical selection. Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics; Las Cruces, New Mexico, USA, 1994:133–8 [Google Scholar]
26.Leacock C, Chodorow M. Combining local context with WordNet similarity for word sense identification. Wordnet: a Lexical Reference System and its Application; 1998 [Google Scholar]
27.Lin D. An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning; Morgan Kaufmann Publishers, 1998:296–304 [Google Scholar]
28.Jiang JJ, Conrath DW. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics; 1997:19–33 [Google Scholar]
29.Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 2010;17:229–36 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Savova GK, Masanz JJ, Ogren PV, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 2010;17:507–13 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Weeber M, Mork JG, Aronson AR. Developing a test collection for biomedical word sense disambiguation. Proceedings of AMIA Symposium; 2001:746–50 [PMC free article] [PubMed] [Google Scholar]
32.Jimeno-Yepes AJ, McInnes BT, Aronson AR. Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation. BMC Bioinform 2011;12:223. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.McInnes BT. An unsupervised vector approach to biomedical term disambiguation: integrating UMLS and Medline. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop; Stroudsburg, Pennsylvania, USA: Association for Computational Linguistics, 2008:49–54 [Google Scholar]
34.U.S. National Library of Medicine MEDLINE Fact Sheet. http://www.nlm.nih.gov/pubs/factsheets/medline.html (accessed 9 Mar 2012). [Google Scholar]
35.Pestian JP, Brew C, Matykiewicz P, et al. A shared task involving multi-label classification of clinical free text. Proceedings of ACL BioNLP. Prague, 2007:97–104 [Google Scholar]
36.Rogers W. Using the MetaMap UIMA Annotator. http://metamap.nlm.nih.gov/README_uima.html (accessed 15 Jul 2012).

[R1] 1.Friedman C. A broad-coverage natural language processing system. Proceedings of AMIA Symposium; 2000;270–4 [PMC free article] [PubMed] [Google Scholar]

[R2] 2.MacMullen WJ, Denn SO. Information problems in molecular biology and bioinformatics. J Am Soc Inform Sci Technol 2005;56:447–56 [Google Scholar]

[R3] 3.Aronson AR, Bodenreider O, Chang HF, et al. The NLM indexing initiative. Proceedings of AMIA Symposium; 2000;17–21 [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Liao KP, Cai T, Gainer V, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken) 2010;62:1120–7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Demner-Fushman D, Chapman WW, McDonald CJ. What can natural language processing do for clinical decision support? J Biomed Inform 2009;42: 760–72 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.National Library of Medicine UMLS® Reference Manual—NCBI Bookshelf. 2009. http://www.ncbi.nlm.nih.gov/books/NBK9676/ (accessed 30 Mar 2011). [Google Scholar]

[R7] 7.McInnes BT, Pedersen T, Liu Y, et al. Knowledge-based method for determining the meaning of ambiguous biomedical terms using information content measures of similarity. Proceedings of AMIA Symposium; 2011;895–904 [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Savova GK, Coden AR, Sominsky IL, et al. Word sense disambiguation across two domains: Biomedical literature and clinical notes. J Biomed Inform 2008;41:1088–100 [DOI] [PubMed] [Google Scholar]

[R9] 9.Liu H. A multi-aspect comparison study of supervised word sense disambiguation. J Am Med Inform Assoc 2004;11:320–31 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Plaza L, Jimeno-Yepes AJ, Díaz A, et al. Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts. BMC Bioinform 2011;12:355. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Humphrey SM, Rogers WJ, Kilicoglu H, et al. Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: preliminary experiment. J Am Soc Inform Sci Technol 2006;57:96–113 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Lesk M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. Proceedings of the 5th Annual International Conference on Systems Documentation; New York, NY, USA, 1986:24–6 [Google Scholar]

[R13] 13.Agirre E, Soroa A, Stevenson M. Graph-based word sense disambiguation of biomedical documents. Bioinform 2010;26:2889–96 [DOI] [PubMed] [Google Scholar]

[R14] 14.Patwardhan S, Banerjee S, Pedersen T. Using measures of semantic relatedness for word sense disambiguation. In: Gelbukh A, ed. Computational linguistics and intelligent text processing. Berlin/Heidelberg: Springer, 2003:241–57 [Google Scholar]

[R15] 15.Agirre E, Alfonseca E, Hall K, et al. A study on similarity and relatedness using distributional and WordNet-based approaches. Proceedings of Human Language Technologies: the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics; Boulder, Colorado, 2009: 19–27 [Google Scholar]

[R16] 16.Pedersen T, Pakhomov SVS, Patwardhan S, et al. Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform 2007;40:288–99 [DOI] [PubMed] [Google Scholar]

[R17] 17.Sánchez D, Batet M. Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective. J Biomed Inform 2011;44:749–59 [DOI] [PubMed] [Google Scholar]

[R18] 18.de Bruijn B, Cherry C, Kiritchenko S, et al. Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assoc 2011;18:557–62 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.D'Avolio LW, Nguyen TM, Farwell WR, et al. Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC). J Am Med Inform Assoc 2010;17:375–82 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Suominen H, Ginter F, Pyysalo S, et al. Machine learning to automate the assignment of diagnosis codes to free-text radiology reports: a method description. In: Hauskrecht M, Schuurmans D, Szepesvari C, eds. Proceedings of the ICML/UAI/COLT 2008 Workshop on Machine Learning for Healthcare Applications. Helsinki, Finland, 2008. http://www.tucs.fi:8080/publications/attachment.php?fname=inpSuGiPyAiPaSaSa08a.pdf [Google Scholar]

[R21] 21.Aseervatham S, Bennani Y. Semi-structured document categorization with a semantic kernel. Pattern Recognit 2009;42:2067–76 [Google Scholar]

[R22] 22.Garla VN, Brandt C. Ontology-guided feature engineering for clinical text classification. J Biomed Inform 2012;45:992–8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Resnik P. Using information content to evaluate semantic similarity in a taxonomy. Proceedings of the 14th International Joint Conference on Artificial Intelligence; 1995:448–53 [Google Scholar]

[R24] 24.Seco N, Veale T, Hayes J. An intrinsic information content metric for semantic similarity in WordNet. ECAI'2004, the 16th European Conference on Artificial Intelligence; 2004 [Google Scholar]

[R25] 25.Wu Z, Palmer M. Verbs, semantics and lexical selection. Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics; Las Cruces, New Mexico, USA, 1994:133–8 [Google Scholar]

[R26] 26.Leacock C, Chodorow M. Combining local context with WordNet similarity for word sense identification. Wordnet: a Lexical Reference System and its Application; 1998 [Google Scholar]

[R27] 27.Lin D. An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning; Morgan Kaufmann Publishers, 1998:296–304 [Google Scholar]

[R28] 28.Jiang JJ, Conrath DW. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics; 1997:19–33 [Google Scholar]

[R29] 29.Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 2010;17:229–36 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Savova GK, Masanz JJ, Ogren PV, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 2010;17:507–13 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Weeber M, Mork JG, Aronson AR. Developing a test collection for biomedical word sense disambiguation. Proceedings of AMIA Symposium; 2001:746–50 [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Jimeno-Yepes AJ, McInnes BT, Aronson AR. Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation. BMC Bioinform 2011;12:223. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.McInnes BT. An unsupervised vector approach to biomedical term disambiguation: integrating UMLS and Medline. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop; Stroudsburg, Pennsylvania, USA: Association for Computational Linguistics, 2008:49–54 [Google Scholar]

[R34] 34.U.S. National Library of Medicine MEDLINE Fact Sheet. http://www.nlm.nih.gov/pubs/factsheets/medline.html (accessed 9 Mar 2012). [Google Scholar]

[R35] 35.Pestian JP, Brew C, Matykiewicz P, et al. A shared task involving multi-label classification of clinical free text. Proceedings of ACL BioNLP. Prague, 2007:97–104 [Google Scholar]

[R36] 36.Rogers W. Using the MetaMap UIMA Annotator. http://metamap.nlm.nih.gov/README_uima.html (accessed 15 Jul 2012).

PERMALINK

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification

Vijay N Garla

Cynthia Brandt

Abstract

Background

Methods

Results

Conclusions

Data sharing

Introduction

Background

Biomedical WSD

Figure 1.

Clinical document classification

Methods

Semantic similarity measures

Named entity recognition systems

WSD evaluation

Word sense disambiguation datasets

Concept graphs

Statistical analysis

Clinical document classification evaluation

Results and discussion

Word sense disambiguation

Table 1.

Effect of context window size

Figure 2.

Effect of NER method

Path finding versus intrinsic IC-based measures

Table 2.

Effect of vocabulary selection

Comparison with other methods

Table 3.

Clinical document classification

Table 4.

System performance and interoperability

Conclusions

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases