Abstract
The Unified Medical Language System (UMLS) is being designed to provide uniform access to computer-based resources in biomedicine. For the foreseeable future, the foundation of the UMLS will be a metathesaurus of concepts, synthesized from existing sources, including MeSH, SNOMED, ICD-9-CM, CPT-4, DSM-III and other biomedical nomenclatures and classification systems. In Meta-1, the first version of the Metathesaurus, the synthesis is being implemented using a three-part methodology: 1) Concept names (terms) and intra-source relationships, such as synonymy, have been extracted from each source, and converted to a homogeneous representation; 2) inter-source lexical matches have been used to combine terms from different sources into Metathesaurus entries; and 3) some 30,000 of these entries, those containing MeSH terms and a selected sample of terms from other domains, will be reviewed by humans, enhanced, and modified, as appropriate. This methodology must eventually support incremental development and an audit trail, and it must preserve relationships added during human review. The 30,000 Meta-1 entries will contain in excess of 60,000 biomedical terms, and these terms will participate in more than 100,000 thesaurus relationships. These “normative” relationships will be supplemented by “empirical” relationships computed from certain UMLS resources. The first of the empirical relationships will be counts of the occurrence and co-occurrence of Meta-1 concepts in MEDLINE.
Full text
PDF




