To the Editor:—In the recently published study by Nadkarni et al.,1 the authors used text-mining software to extract concepts from clinical documents. Matching of these concepts was attempted with the UMLS 99 Metathesaurus. Matches were then categorized as true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) from 8,745 terms in a “training set” and 1,701 terms in a “test set,” for a total of 10,446 terms. True positives were reported as 82.6 percent for the training set and 76.3 percent for the test set.
In 1999, we carried out an almost identical study using the identical version of the UMLS, on a larger scale, which resulted in very similar results that were presented at the 1999 AMIA Annual Symposium.2 In our study, 4,994 of the most frequently referenced terms were chosen from 1,000,000 terms randomly extracted from the general Mayo Clinic Master Sheet Index and the Impression/Report/Plan section of the Mayo Clinic clinical notes system, to form a general medicine set. The Mayo Clinic Department of Dermatology independently developed a lexicon of 9,050 unique terms describing lesions photographed in their practice, which formed a specialty-specific set. We used automated term composition and the UMLS to assess match rates. In addition, we looked at match rates on our total 14,044 terms based on filtering using the UMLS semantic types.
Comparison of the data from the two studies (Table1▶) reveals striking similarities.
Table 1 ▪.
Comparison of Data
| Nadkarni et al.1 |
McDonald et al.2 |
|||
|---|---|---|---|---|
| Training Set | Test Set | General Test Set | Dermatology Test Set | |
| True positives | 7,227 (82.6%) | 1,298 (76.3%) | 4,213 (84.4%) | 6,947 (76.8%) |
| False positives | 96 (1.1%) | 34 (2.0%) | 509 (10.2%) | 964 (10.7%) |
| True negatives | 1,306 (14.9%) | 257 (15.1%) | 245 (4.9%) | 1,029 (11.4%) |
| False negatives | 116 (1.3%) | 112 (6.6%) | 27 (0.5%) | 110 (1.2%) |
| Totals | 8,745 | 1,701 | 4,994 | 9,050 |
What we recognized in 1999, which was omitted from the analysis of Nadkarni et al., was that other metrics are important in the clinical interpretation of these data. Representing the data as shown in Table1▶ allows for useful combinations. The true-positive rate is the number of true positives divided by the sum of true positives and false negatives (TP/[TP+FN]), yielding sensitivity. Similar calculation of specificity (TN/[TN+FP]), positive predictive value (TP/[TP+FP]), and positive likelihood ratio (sensitivity/[1−specificity]) can be carried out. When these combinations are done, it is evident that concept matching in the UMLS is actually much better than was implied by the only true-positive incidence quoted by Nadkarni et al. (Table 2▶).
Table 2 ▪.
Comparison of Metrics
| Nadkarni et al.1 |
McDonald et al.2 |
|||
|---|---|---|---|---|
| Training Set | Test Set | General Test Set | Dermatology Test Set | |
| Sensitivity (%) | 98.4 | 92.1 | 99.4 | 98.4 |
| Specificity (%) | 93.2 | 88.3 | 32.5 | 51.6 |
| Positive predictive value (%) | 98.7 | 97.4 | 89.2 | 87.8 |
| Positive likelihood ratio (%) | 14.4 | 7.8 | 1.5 | 2.0 |
The differences in these metrics across the data sets are related to the relatively liberal definition of true positives and the relatively strict definition of false positives given by Nadkarni et al. Unlike them, we did take negation into account when determining true positives. Their definition of false positive was limited to acronyms, abbreviations, spelling/grammar errors, and proper names, whereas we had each “match” judged by a practicing internist to make the determination of true positive or false positive, regardless of term classification.
By applying automated term composition with filters based on the UMLS semantic types, we showed that we could balance sensitivity and specificity to optimize the other metrics (Table 3▶).
Table 3 ▪.
Metrics Derived by Use of Semantic Type Filtering
| General Test Set | Dermatology Test Set | |
|---|---|---|
| Sensitivity (%) | 88.1 | 75.2 |
| Specificity (%) | 73.7 | 82.2 |
| Positive predictive value (%) | 98.4 | 95.1 |
| Positive likelihood ratio | 3.34 | 4.22 |
Our study publication predates that of Nadkarni et al. by almost 14 months. It is clear from the data of both studies that concept indexing with the UMLS is actually highly sensitive, with quite a high positive predictive value. This conclusion was omitted by Nadkarni et al. but is worthy of further analysis as the UMLS, and the algorithms that use it, increase their specificity to match the already quite excellent sensitivity.
Surely, for a vocabulary to be useful it must evolve and its content must grow. The UMLS now contains more than 700,000 concepts, but it still does not cover all clinically useful terminology. As Cimino3 states, “a formal methodology is needed for expanding content.” Chute et al.4 reinforce this statement with the argument that “in the absence of a single, all-embracing health care terminology, there need to be coordination and organizing support for interrelated terminologies” and that “developers of clinical classifications must consider ways they can develop their systems to become part of an integrated set of terminology systems.”
If terms are added to a vocabulary indiscriminately, however, redundancy and combinatorial explosion may make the vocabulary unwieldy and difficult to search in a timely fashion. “An alternative approach is to enumerate all the atoms of a terminology and allow users to combine them into necessary coded terms, allowing compositional extensibility.”3,5,6 One risk of this approach is its potential for making the use of the vocabulary more complex.
We hypothesized that automated term composition as developed and tested in a randomized controlled trial7 would allow large-scale coverage of specialty-specific and general local vocabularies. This automated process would facilitate the appropriate inclusion of such terms into a larger vocabulary without creating redundancy.
As we noted in 1999, user-directed composition may allow salvage of many of the false-positive and true-negative matches, thus significantly increasing the incorporation rate.8 The true-negative terms, which do not yield to user-directed post-coordination of concepts to form a positive exact match, could form a set of terms that could be considered for incorporation into larger vocabularies without the onus of redundancy.
Given the large size of both the specialty-specific and local general terminological corpi used in our study, this method should be generalizable to other local specialty-specific and general terminology sets. These results help solidify the need for compositional mechanisms for terminological representation and show the utility of the considerable synonymy offered by the UMLS. Future research should focus on how to integrate colloquial terminologies such as the UMLS with formal reference terminologies.
References
- 1.Nadkarni P, Chen R, Brandt C. UMLS concept indexing for production databases: a feasibility study. J Am Med Inform Assoc. 2001;8:80–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.McDonald FS, Chute CG, Ogren PV, Wahner-Roedler D, Elkin PL. A large-scale evaluation of terminology integration characteristics. Proc AMIA Annu Symp. 1999:864–7. [PMC free article] [PubMed]
- 3.Cimino JJ. Desiderata for controlled medical vocabularies in the 21st century. Methods Inf Med. 1998;37:394–403. [PMC free article] [PubMed] [Google Scholar]
- 4.Chute CG, Cohn SP, Campbell JR. A framework for comprehensive health terminology systems in the United States: development guides, criteria for selection, and public policy implications. J Am Med Inform Assoc. 1998;5(6):503–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cote RA, Robboy S. Progress in medical information management: Systematized Nomenclature of Medicine (SNOMED). JAMA. 1980;243:756–62. [DOI] [PubMed] [Google Scholar]
- 6.Evans DA, Rothwell DJ, Monarch IA, Lefferts RG, Cote RA. Towards representations for medical concepts. Med Decis Making. 1991;11:S102–8. [PubMed] [Google Scholar]
- 7.Elkin PL, Bailey KR, Chute CG. A randomized controlled trial of automated term composition. Proc AMIA Annu Symp. 1998:765–9. [PMC free article] [PubMed]
- 8.Elkin PL, Mohr DN, Tuttle MS, et al. Standardized problem list generation, utilizing the Mayo canonical vocabulary embedded within the Unified Medical Language System. Proc AMIA Annu Fall Symp. 1997:500–4. [PMC free article] [PubMed]
