Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
letter
. 2001 Sep-Oct;8(5):512–515. doi: 10.1136/jamia.2001.0080512

UMLS Concept Indexing for Production Databases: A Feasibility Study

Furman S McDonald 1, Peter L Elkin 1
PMCID: PMC131049  PMID: 11522772

To the Editor:—In the recently published study by Nadkarni et al.,1 the authors used text-mining software to extract concepts from clinical documents. Matching of these concepts was attempted with the UMLS 99 Metathesaurus. Matches were then categorized as true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) from 8,745 terms in a “training set” and 1,701 terms in a “test set,” for a total of 10,446 terms. True positives were reported as 82.6 percent for the training set and 76.3 percent for the test set.

In 1999, we carried out an almost identical study using the identical version of the UMLS, on a larger scale, which resulted in very similar results that were presented at the 1999 AMIA Annual Symposium.2 In our study, 4,994 of the most frequently referenced terms were chosen from 1,000,000 terms randomly extracted from the general Mayo Clinic Master Sheet Index and the Impression/Report/Plan section of the Mayo Clinic clinical notes system, to form a general medicine set. The Mayo Clinic Department of Dermatology independently developed a lexicon of 9,050 unique terms describing lesions photographed in their practice, which formed a specialty-specific set. We used automated term composition and the UMLS to assess match rates. In addition, we looked at match rates on our total 14,044 terms based on filtering using the UMLS semantic types.

Comparison of the data from the two studies (Table1) reveals striking similarities.

Table 1 ▪.

Comparison of Data

Nadkarni et al.1
McDonald et al.2
Training Set Test Set General Test Set Dermatology Test Set
True positives 7,227 (82.6%) 1,298 (76.3%) 4,213 (84.4%) 6,947 (76.8%)
False positives 96 (1.1%) 34 (2.0%) 509 (10.2%) 964 (10.7%)
True negatives 1,306 (14.9%) 257 (15.1%) 245 (4.9%) 1,029 (11.4%)
False negatives 116 (1.3%) 112 (6.6%) 27 (0.5%) 110 (1.2%)
Totals 8,745 1,701 4,994 9,050

What we recognized in 1999, which was omitted from the analysis of Nadkarni et al., was that other metrics are important in the clinical interpretation of these data. Representing the data as shown in Table1 allows for useful combinations. The true-positive rate is the number of true positives divided by the sum of true positives and false negatives (TP/[TP+FN]), yielding sensitivity. Similar calculation of specificity (TN/[TN+FP]), positive predictive value (TP/[TP+FP]), and positive likelihood ratio (sensitivity/[1−specificity]) can be carried out. When these combinations are done, it is evident that concept matching in the UMLS is actually much better than was implied by the only true-positive incidence quoted by Nadkarni et al. (Table 2).

Table 2 ▪.

Comparison of Metrics

Nadkarni et al.1
McDonald et al.2
Training Set Test Set General Test Set Dermatology Test Set
Sensitivity (%) 98.4 92.1 99.4 98.4
Specificity (%) 93.2 88.3 32.5 51.6
Positive predictive value (%) 98.7 97.4 89.2 87.8
Positive likelihood ratio (%) 14.4 7.8 1.5 2.0

The differences in these metrics across the data sets are related to the relatively liberal definition of true positives and the relatively strict definition of false positives given by Nadkarni et al. Unlike them, we did take negation into account when determining true positives. Their definition of false positive was limited to acronyms, abbreviations, spelling/grammar errors, and proper names, whereas we had each “match” judged by a practicing internist to make the determination of true positive or false positive, regardless of term classification.

By applying automated term composition with filters based on the UMLS semantic types, we showed that we could balance sensitivity and specificity to optimize the other metrics (Table 3).

Table 3 ▪.

Metrics Derived by Use of Semantic Type Filtering

General Test Set Dermatology Test Set
Sensitivity (%) 88.1 75.2
Specificity (%) 73.7 82.2
Positive predictive value (%) 98.4 95.1
Positive likelihood ratio 3.34 4.22

Our study publication predates that of Nadkarni et al. by almost 14 months. It is clear from the data of both studies that concept indexing with the UMLS is actually highly sensitive, with quite a high positive predictive value. This conclusion was omitted by Nadkarni et al. but is worthy of further analysis as the UMLS, and the algorithms that use it, increase their specificity to match the already quite excellent sensitivity.

Surely, for a vocabulary to be useful it must evolve and its content must grow. The UMLS now contains more than 700,000 concepts, but it still does not cover all clinically useful terminology. As Cimino3 states, “a formal methodology is needed for expanding content.” Chute et al.4 reinforce this statement with the argument that “in the absence of a single, all-embracing health care terminology, there need to be coordination and organizing support for interrelated terminologies” and that “developers of clinical classifications must consider ways they can develop their systems to become part of an integrated set of terminology systems.”

If terms are added to a vocabulary indiscriminately, however, redundancy and combinatorial explosion may make the vocabulary unwieldy and difficult to search in a timely fashion. “An alternative approach is to enumerate all the atoms of a terminology and allow users to combine them into necessary coded terms, allowing compositional extensibility.”3,5,6 One risk of this approach is its potential for making the use of the vocabulary more complex.

We hypothesized that automated term composition as developed and tested in a randomized controlled trial7 would allow large-scale coverage of specialty-specific and general local vocabularies. This automated process would facilitate the appropriate inclusion of such terms into a larger vocabulary without creating redundancy.

As we noted in 1999, user-directed composition may allow salvage of many of the false-positive and true-negative matches, thus significantly increasing the incorporation rate.8 The true-negative terms, which do not yield to user-directed post-coordination of concepts to form a positive exact match, could form a set of terms that could be considered for incorporation into larger vocabularies without the onus of redundancy.

Given the large size of both the specialty-specific and local general terminological corpi used in our study, this method should be generalizable to other local specialty-specific and general terminology sets. These results help solidify the need for compositional mechanisms for terminological representation and show the utility of the considerable synonymy offered by the UMLS. Future research should focus on how to integrate colloquial terminologies such as the UMLS with formal reference terminologies.

References

  • 1.Nadkarni P, Chen R, Brandt C. UMLS concept indexing for production databases: a feasibility study. J Am Med Inform Assoc. 2001;8:80–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.McDonald FS, Chute CG, Ogren PV, Wahner-Roedler D, Elkin PL. A large-scale evaluation of terminology integration characteristics. Proc AMIA Annu Symp. 1999:864–7. [PMC free article] [PubMed]
  • 3.Cimino JJ. Desiderata for controlled medical vocabularies in the 21st century. Methods Inf Med. 1998;37:394–403. [PMC free article] [PubMed] [Google Scholar]
  • 4.Chute CG, Cohn SP, Campbell JR. A framework for comprehensive health terminology systems in the United States: development guides, criteria for selection, and public policy implications. J Am Med Inform Assoc. 1998;5(6):503–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cote RA, Robboy S. Progress in medical information management: Systematized Nomenclature of Medicine (SNOMED). JAMA. 1980;243:756–62. [DOI] [PubMed] [Google Scholar]
  • 6.Evans DA, Rothwell DJ, Monarch IA, Lefferts RG, Cote RA. Towards representations for medical concepts. Med Decis Making. 1991;11:S102–8. [PubMed] [Google Scholar]
  • 7.Elkin PL, Bailey KR, Chute CG. A randomized controlled trial of automated term composition. Proc AMIA Annu Symp. 1998:765–9. [PMC free article] [PubMed]
  • 8.Elkin PL, Mohr DN, Tuttle MS, et al. Standardized problem list generation, utilizing the Mayo canonical vocabulary embedded within the Unified Medical Language System. Proc AMIA Annu Fall Symp. 1997:500–4. [PMC free article] [PubMed]
J Am Med Inform Assoc. 2001 Sep-Oct;8(5):512–515.

Dr. Nadkarni replies:

Prakash Nadkarni 1

We thank Drs. McDonald and Elkin for pointing out the relevance of their 1999 paper1 to ours.2 (This excellent paper, which we had not read earlier, provided a valuable education for us.) However, we would politely demur to the description of their study (in the second paragraph of their letter) as “almost identical” to ours. The very real differences in our respective study designs and objectives, as we describe below, led us to significantly different conclusions about the utility of the 1999 UMLS for our respective purposes.

The study of McDonald et al. tried to match the contents of two hand-curated vocabularies with the UMLS, using an automated term composition approach. One of these vocabularies was a lexicon developed for dermatology, while the other comprised the 5,000 most common terms extracted from the Mayo clinical notes system. Curation involved correction of obvious spelling errors and removal of duplicates. Their objective was to quantify the value of algorithms for the electronic interrelation of different terminologies, a very valuable objective articulated by Chute et al.3

Our study, on the other hand, tried to match phrases in documents (not necessarily equivalent to terms) to UMLS concepts through an entirely electronic process. We took phrases as they existed in the documents, so malformed phrases or misspellings contributed to failure of concept recognition. Our objective was to determine the potential utility of electronic concept indexing in assisting document retrieval. We therefore had to classify and quantify the types of errors that occur when matching is attempted, in finer detail than merely as “false positives” or “false negatives.” Thus, we used categories such as algorithm failure (phrase too long), proper names confused with concepts, acronyms, and so on.

The first table in the letter, which quantifies TP, FP, TN and FN, is interesting. However, in our work, we had not attempted to quantify “true negatives,” so we had to spend some time trying to determine how McDonald and Elkin obtained the number of 1,306 TNs from our published work. After some guesswork, we finally figured out that this number represented the sum of the following match categories:

Redundant UMLS concepts 490

Homonyms 481

Concepts not in UMLS 158

General form of concept 127 missing from UMLS

Concept not useful for indexing 25

Too many non-stop words 25 (algorithm failure)

Total 1,306

From the perspective of McDonald and Elkin, which is to facilitate matching of terms between vocabularies using automated term composition, some of these categories (such as “Concepts not in UMLS”) could indeed be classified, for their purpose, as true negatives. However, a failure of our algorithm, which handles a maximum of five non-stop words per phrase, could hardly be classified so charitably. The categorization of homonyms as true negatives is also unduly charitable. A 1994 paper by Rindflesch and Aronson4 shows that it is possible to disambiguate different meanings of a homonym like “immunology,” which can refer to a laboratory test panel or to the study of a biological function, but that this requires natural language processing of the document using numerous special-purpose rules that are specific for individual homonyms.

From our perspective of trying to match phrases as they are encountered in text to UMLS concepts, however, all the above categories must be classified as failures, whether they are true negatives or not. This is because they cause the matching algorithm (more precisely, our present algorithm; better ones will, no doubt, be developed by others) to strike out. Thus, if an isolated homonymous phrase is encountered in a document and cannot be disambiguated, it cannot be indexed. The value of a concept index, whose whole purpose is to make it possible to retrieve a document in response to a query, is therefore reduced. (In such a case, the user must fall back on the word index, if it exists, but then numerous false positives are retrieved.) This is our reason for taking a somewhat more pessimistic view than McDonald and Elkin, as reflected in our conclusions. We reiterate that our guarded view applies to our objective, not to that of McDonald et al.

The similarities of our work to that of McDonald et al., as correctly pointed out in the letter, is in the matching algorithm, which tries exact matches first and automated term composition later, although McDonald et al. also introduce the innovative step of using UMLS semantic types to assist the matching process. The authors' point that concepts cannot be proliferated endlessly by combination of existing concepts (because of combinatorial explosion) is well made; we fully agree. Eventually, however, the decision to create a compound concept is often a pragmatic one. (Thus, “blood pressure” began as a compound concept, pressure of the blood, but clinicians now consider this concept atomic: high blood pressure, another compound concept, is itself such an important and common condition that even lay persons are supposed to know about it.)

References

  • 1.McDonald FS, Chute CG, Ogren PV, Wahner-Roedler D, Elkin PL. A large-scale evaluation of terminology integration char [PMC free article] [PubMed]
  • 2.Nadkarni P, Chen R, Brandt C. UMLS concept indexing for production databases: a feasibility study. J Am Med Inform Assoc. 2001;8:80–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Chute CG, Cohn SP, Campbell JR. A framework for comprehensive health terminology systems in the United States: development guides, criteria for selection, and public policy implications. J Am Med Inform Assoc. 1998;5(6):503–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Rindflesch TC, Aronson AR. Ambiguity resolution while mapping free text to the UMLS Metathesaurus. Proc Annu Symp Comput Appl Med Care. 1994:240–4. [PMC free article] [PubMed]

Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES