Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Mar 2.
Published in final edited form as: J Biomed Inform. 2019 May 7;94:103193. doi: 10.1016/j.jbi.2019.103193

Alternative Classification of Identical Concepts in Different Terminologies: Different Ways to View the World

Vipina K Keloth 1, Zhe He 2, Gai Elhanan 3, James Geller 1
PMCID: PMC7050413  NIHMSID: NIHMS1562908  PMID: 31048072

Abstract

In previous research, we have studied concepts that occur in pairs of medical terminologies and are known to be identical, because they have the same ID number in the Unified Medical Language System (UMLS). We observed that such concepts rarely have exactly the same sets of children (=subconcepts) in the two terminologies. The number of common children was found to vary widely. A special situation was identified where the children in one terminology relate to the common parent in a very different way than the children in the other terminology. For example, children in one terminology might subdivide a parent concept by anatomical location in one terminology and by disease kind in the other terminology. We coined the term “alternative classification” (of the same parent concept) for such situations. In previous work, only human experts could recognize alternative classifications. In this paper, we present a mathematically expressed criterion for likely cases of alternative classifications. We compare the recommendations of this criterion, expressed by a mathematical quantity called “EFI” becoming zero, with the decisions of a human expert. It is found that the human expert agreed with the criterion in 72% of all cases, which is a big improvement over having no computable criterion at all. Besides alternative classifications, common parent concepts in a pair of terminologies might also indicate a possible import of a child concept missing in one terminology, different granularities, or errors in either one of the two terminologies. In this paper, we further investigate different kinds of alternative classifications.

Keywords: Ontologies and Terminologies, UMLS, NCIt, MEDCIN, Concept import, Alternative Classification

1. Introduction

Over the past decades, the biomedical research community has made tremendous efforts in developing ontologies and terminologies that encode biomedical knowledge about entities and their relationships to each other [1]. A large amount of effort has also gone into providing the right infrastructure for their maintenance and enabling interoperability among the terminologies. The Unified Medical Language System (UMLS) [2] was designed with the aim to develop multi-purpose tools that would enable effective retrieval of information and understanding of the meaning of different entities across different ontologies and terminologies. The UMLS Metathesaurus has over 200 general and specialized biomedical terminologies in about 25 different languages; however, the majority of these terminologies are in English. Terminologies are used in clinical studies, research, public health reporting, for administrative purposes, etc. The concepts in these terminologies are organized in hierarchical structures based on their relationships to one another. Synonymous terms are clustered into a unique concept, identified by a Concept Unique Identifier (CUI). The source information for a concept that appears in different UMLS terminologies is preserved with the help of Atom Unique Identifiers (AUIs) and source abbreviations (SAB) [2].

Some of the sources in the UMLS qualify as ontologies, while others do not. These distinctions have been ably discussed by other researchers [35] and are not the topic of this paper. Thus, we will use the words “terminology” and “ontology” in an inclusive sense for the content of the UMLS, whether a source is an ontology or “only” a terminology (vocabulary,…). When a source defines itself as an ontology, such as SNOMED CT [6], we will prefer to use “ontology.” Terminologies vary from one another in several aspects like the domain or subject area they cover, the level of abstraction, the level of detail, the modeling philosophy and what language its terms are taken from. For example, the National Cancer Institute thesaurus (NCIt) [7] is a reference terminology that includes broad coverage of the cancer domain, whereas the Gene Ontology (GO) [8] is an ontology for describing genes and their functions. Even though terminologies differ in their domains, there is significant overlap in the conceptual content among many pairs of terminologies. For instance, according to the 2018AA version of the UMLS, NCIt has a 20.9% overlap with MeSH [9] and an 18.3% overlap with SNOMED CT.

In our previous research [1013] we studied this overlap in detail and found that there are substantial differences in the density of the conceptual content among different ontologies. To be precise, we found that when we constrained the parent-child hierarchy by two concepts that are common in pairs of ontologies, the lengths of the paths between those concepts often differ. This difference in vertical density raised the question as to whether the concept(s) present in one ontology are missing in the other ontology. We defined specific topological patterns called Cross-Ontology Diamonds to describe this problem [14].

For example, Figure 1 shows a case of a 2/1 diamond. The “anchor” concepts Malignant neoplasm of ovary and Small cell carcinoma of ovary are present in both MEDCIN and NCIt. There are two intermediate concepts, namely Malignant ovarian surface Epithelial-Stromal Tumor and Epithelial ovarian cancer, in between the anchor concepts in NCIt, whereas there is only one intermediate concept Ovarian Carcinoma in MEDCIN. The question arises, whether the two intermediate concepts in NCIt are missing from MEDCIN and whether they should be imported into MEDCIN. We previously “mined” diamonds from different ontologies in the UMLS for importing concepts into NCIt and SNOMED CT [10, 12, 14].

Figure 1.

Figure 1.

An example of a 2/1 Cross-Ontology Diamond.

In later research, we relaxed the vertical constraint and addressed the problem of horizontal density differences [15]. We posed the following question: “Given a concept occurring in two different ontologies with some common children, is there a possibility of importing children from one ontology into the other?” We developed algorithms that suggest concepts that are potential imports as children of the common parent concept.

Figure 2 shows an example of a horizontal density difference between NCIt and MEDCIN. The concept Testosterone is present in both NCIt and MEDCIN. There are four child concepts that are common between these two terminologies, some of which are shown in the Figure 2 (in matching color boxes). The concept Testosterone methyl, which is a child of Testosterone in MEDCIN, does not exist anywhere in NCIt, i.e., neither as child of Testosterone nor anywhere else.

Figure 2.

Figure 2.

An example of identical concepts in two different terminologies with overlapping children.

Additionally, we have observed examples where a concept existing in two terminologies has children in both, yet these sets of children do not have a single child in common. Figure 3 shows an example of such a case. Based on our previous research, this is an unexpected situation. In such cases, it would appear that the two concepts are actually quite different in the two terminologies, even though they have the same CUI. Thus, this extreme difference is a possible indicator of an “alternative classification” or an error. While the discovery of such an alternative classification candidate or error candidate should be made based on a formula/algorithm, the final interpretation and decision always has to be made by a human expert. We will define “alternative classification” below and develop an algorithm for recognizing likely alternative classifications.

Figure 3.

Figure 3.

An example of identical concepts in two different terminologies with no common children.

Figure 3 exemplifies a case where the two occurrences of the same concept in two different terminologies have no common children. The concept Benign Nasopharyngeal Neoplasm has three children in NCIt and four children in MEDCIN. The three child concepts present in NCIt do not exist anywhere in MEDCIN and similarly the four child concepts in MEDCIN do not exist anywhere in NCIt.

The three child concepts in NCIt specialize the parent concept Benign Nasopharyngeal Neoplasm by disease kind (polyp, squamous papilloma and angiofibroma), whereas the four child concepts in MEDCIN specialize Benign Nasopharyngeal Neoplasm by anatomical location (superior wall, posterior wall, etc.). These two different classifications, which arise because of the difference in the modeling philosophies of the two terminologies, constitute what we call an alternative classification. Although these two hierarchies are individually correct, merging them in a naïve way would lead to a loss of structural information.

In this paper our goals are to 1) develop a method to identify highly likely cases of alternative classifications algorithmically; 2) develop a metric that identifies concepts with children that are highly likely to be proposed for import from one terminology into the other; 3) compare how different ranges of the metric might affect the number of concepts that should be considered for import; and 4) analyze different cases of alternative classifications in more detail.

2. Background

2.1. UMLS

The Unified Medical Language Systems (UMLS), initiated in 1986, was designed by and is maintained by the US National Library of Medicine [16]. It brings together many health and biomedical vocabularies and standards to enable interoperability and can be used to enhance and develop applications, such as Electronic Health Records, classification tools, dictionaries and language translators. The UMLS has three components namely, the Metathesaurus, Semantic Network and SPECIALIST Lexicon with Lexical Tools1. The UMLS Metathesaurus contains terms and codes from over 200 vocabularies, including CPT, ICD-10, LOINC, MeSH, RxNorm and SNOMED CT. Synonymous terms are grouped into concepts identified by a Concept Unique Identifier (CUI). Different concepts are linked to each other by means of various types of relationships and each relationship is identified by a Relationship Unique Identifier (RUI). The relationships in the UMLS include PAR (meaning, “has parent” relationship), CHD (has child relationship), RB (has broader relationship), SIB (has sibling relationship) etc. We used the 2017 AB release of the UMLS and focused on the PAR relationship with inverse_isa annotation for this study.

2.2. NCIt

National Cancer Institute Thesaurus (NCIt) is a reference terminology and biomedical ontology that mainly covers vocabulary for cancer-related clinical care and administrative activities. NCIt has been developed by NCI Enterprise Vocabulary Services (EVS) to facilitate standardization of terminology use across the biomedical community. NCIt is updated monthly and contains 129,111 concepts in the 2017 AB version of the UMLS.

Medical terminologies are constantly evolving with concepts being added, updated and sometimes removed. NCIt is maintained by a multidisciplinary team of editors, who in the past have added about 700 new entries each month. NCIt also provides facilities that allow users to place requests for adding, updating and retiring concepts and relationships [17].

2.3. MEDCIN

MEDCIN was created and is maintained by Medicomp Systems Inc2. It is a medical terminology that encompasses symptoms, tests, diagnoses, physical examinations, etc. It was designed to allow for rapid entry, retrieval and correlation of relevant clinical information at the point of care, to enable applications to store medical information as coded data elements and to produce narrative reports from the same data. In the 2017AB release of the UMLS, MEDCIN has around 335,627 concepts. MEDCIN has about 3% source overlap with NCIt and about 12.9% with SNOMED CT.

3. Related Work

A large number of ontology matching [18], ontology mapping [19] or alignment issues that hinder semantic harmonization and interoperability are caused due to the modeling policies adopted by different ontologies and the level of detail they require when representing the knowledge in the same domain to support their applications. Literature in this area uses either “granularity” or “density” to denote this difference in the level of detail in different ontologies. Rector et al. [20] defined density as “The number of semantically ‘similar’ concepts in a particular conceptual region” and further states that “High local density in an ontology usually co-occurs with high levels of specialisation and degree of detail, …..”. Weng and Fridsma [21] proposed a conceptual design for collaborative semantic harmonization that identified three key design principles, namely, reuse, collaboration and harmonization as modeling.

3.1. Ontology alignment/matching:

Ontology alignment or ontology matching – the process of finding correspondences between concepts in ontologies – has been studied for a long time [2224]. Most research in this field focuses on methods for finding simple 1-to-1 correspondences between concepts in two ontologies. The Ontology Alignment Evaluation Initiative (OAEI)3 works towards achieving consensus for evaluation of these methods. Only a very few alignment systems have focused on finding complex correspondences. Recently, Zhou et al. proposed a complex alignment benchmark based on the real-world GeoLink dataset [25]. The alignments in this dataset not only cover 1:1 correspondences, but also contain 1:n and m:n complex relations. They have identified 12 different kinds of simple and complex correspondence patterns and made available the alignments in both rule and EDOAL4 syntax. Oliveira and Pesquita proposed a set of algorithms to create ternary compound alignments (compound matching of three distinct ontologies) for large biomedical ontologies [26]. Another work related to ontology integration based on mapping, repair and conservative alignment proposed by Stoilos et al. focuses on building a framework that integrates a number of medical ontologies to support real-world health care services [27]. The integration starts with a seed ontology to which new ontologies are added to enrich and extend the seed ontology and they also developed algorithms to deal with structural incompatibilities.

3.2. Granularity differences in UMLS sources and ontology enrichment:

The research that explored granularity differences among ontologies in UMLS as a tool to enhance the conceptual content of the ontology mainly uses a rule-based approach or a topological-pattern-based approach. As an example of the rule-based approach, Sun and Zhang [28] identified granularity differences as well as similarities between large biomedical ontologies through rules. They investigated the examples of correspondence across two anatomical ontologies, synthesized patterns and constructed rules that were then fed into a rule inference engine to distinguish among different subclasses and classifications. In other work with this approach, the same authors [29] conducted a parallel study to construct rules to systematically identify the differences as well as similarities in a partonomy (a hierarchy of part-whole relationships) between two biomedical ontologies instead of using IS-A relationships. The rules were constructed by manual inspection and hence there is a chance that they do not cover all the cases of structural incompatibility. The main limitation of this approach is that every time a new mismatch pattern is identified new rules need to be added.

The topological-pattern-based approach mainly emphasizes identifying the vertical density differences among different biomedical ontologies. In this work, He et al. [13] used “structurally congruent concepts” in pairs of terminologies as a method for harmonizing them. They used six UMLS terminologies to pair with SNOMED CT in their study. The structurally congruent concepts were interpreted in six possible ways, including alternative classification, synonyms, structural errors, etc. This work was extended to identify the concepts that could enrich the conceptual content of SNOMED CT [12]. More complex topological patterns like m:n trapezoids were extracted with the help of the proposed trapezoid identification algorithm, and potential concepts for inclusion as parents, children, synonyms, etc. into SNOMED CT were identified. Analogous methods were tested to locate potentially missing concepts for NCIt, using eight source terminologies from the UMLS [10]. The usefulness of the NCI Metathesaurus instead of the UMLS Metathesaurus for enriching NCIt was also studied in detail [11].

The main limitation of the studies that use topological patterns is that although the process of identifying the potential missing concepts is automated, it still requires a human expert to review the suggestions and make the decision as to whether to import a concept or not. A study has been conducted to estimate the difficulty of this task for a domain expert and it proved that it is still challenging, even with the support of the algorithm that offers suggestions for import [14, 30].

Luo et al. [31] proposed a method for evaluating the granularity balance of IS-A and part-of relationships within a biomedical ontology. They used “parallel concept sets (PCS)” (two concepts that share a similar level of conceptual knowledge) and the length and strength of the paths between the PCSs to design evaluation models to improve the quality of one ontology.

4. Methods

4.1. Terminology Selection

We started by identifying terminologies in the UMLS that can be used as source terminologies to supply possible concepts for import into target terminologies. For this we used the following criteria:

  1. Only English terminologies could be processed, due to our own linguistic limitations.

  2. Only terminologies with an IS-A backbone support our hierarchy-based methodology. A necessary but not sufficient condition for this is that the terminology uses UMLS PAR(ent) relationships.

  3. PAR relationships are not always IS-A relationships, thus as an additional condition, only PAR relationships that are marked with inverse_isa annotations guarantee that IS-A relationships are expressed.

  4. If two terminologies have substantial overlap due to common ancestry and/or common domain, then only one of them will be chosen.

  5. The chosen terminologies should not have been the subject of our previous investigation on horizontal density [15].

  6. The research in this paper is oriented towards terms used in cancer care.

  7. As we are processing pairs of terminologies, interesting work can only be done if there is substantial overlap between the two terminologies of a pair.

In addition to NCIt and SNOMED CT there were ten terminologies/ontologies that satisfied the first three criteria, namely the Anatomical Therapeutic Chemical Classification System (ATC), Medical Entities Dictionary (CPM), Current Procedural Terminology (CPT), Foundational Model of Anatomy Ontology (FMA), Gene Ontology (GO), Human Phenotype Ontology (HPO), MEDCIN, The Veterinary Extension to SNOMED CT (SNOMEDCT_VET), Universal Medical Device Nomenclature System (UMD) and University of Washington Digital Anatomist (UWDA). Out of the ten terminologies/ontologies filtered by this process, we omitted two ontologies (SNOMEDCT_VET and UWDA) that are for the most part subsets of other ontologies in the list, satisfying the condition of the fourth criterion. We have worked extensively with SNOMED CT in our previous research on horizontal density differences [15], thus we need to exclude it by the fifth criterion. So far, we have only ruled out terminologies, but by the sixth criterion we need to rule in NCIt as a target of this research. Together with the overlap requirement of the seventh criterion, that leaves only the pair of MEDCIN and NCIt as a topic for the current research.

4.2. Algorithm Derivation

4.2.1. Previous Algorithm for Import Criteria

A detailed description of the previous algorithm that we used to import missing child concepts from a source into a target terminology can be found in our previous paper [15]. We created a multi-level “dictionary” (key-value pairs in the Python programming language), where each terminology has a sub-dictionary with all the concepts in it that have a parent-child relationship. Each concept has another sub-dictionary with all its parents and its children. This structure enables easy access to any concept and its parents and children in any of the terminologies.

Our goal was to suggest missing child concepts for import into SNOMED CT and NCIt (target terminologies) from all source terminologies selected based on criteria similar to those discussed above. For each pair of source and target terminologies, we found all the concepts that are common in the target and the source terminology. For each of the common concepts, we then found all its children in both the source and target terminologies and the children that are common between the two terminologies. We finally identified all child concepts that are present in the source terminology but are missing from the target terminology. To suggest concepts that have higher potential for import we computed a metric J, which was calculated as the ratio of the number of children that are common in both the terminologies to the number of children in the source terminology.

4.2.2. Improved Algorithm and Revised Metric

For our algorithm for computing alternative classification criteria with NCIt as the target and MEDCIN as the source terminology, we made some modifications to the previous algorithm to overcome an issue identified in prior work. We extended the algorithm taking into account the fact that one or more of the child concepts that are present in the source terminology, but are missing from the target terminology as immediate children, could possibly exist somewhere else in the target terminology. To overcome this issue, we made sure that each of the identified missing child concepts does not exist anywhere else in the target terminology by performing a search on the target terminology’s sub dictionary. We also completely revised the metric J to cover the different cases more precisely, as described in this Section.

With all the common concepts in NCIt and MEDCIN identified, as well as their children in both these terminologies along with all the common children and the child concepts missing from NCIt, we proceeded to develop a formula that computes the evidence for import (EFI). The goal was to distinguish between likely cases of alternative classifications versus predictions when concepts should be imported without further consideration from the source terminology into the target terminology.

We assumed a concept P that is the same in the source and target terminologies, by the authority of the UMLS Concept Unique Identifier (CUI), as discussed above. Intuitively, the more common children Psource and Ptarget have, the higher the evidence that the remaining children that are not common should be imported. Thus, we first compute

EFI^=CcommonCsource (1)
Csource>0 (2)

whereby Ccommon is the number of concepts that are children of P in both the source and the target terminology, and Csource is the number of concepts that are children of P in the source terminology. Note that EFI^ is a number between 0 and 1.

For example, having 11 children in the source and 8 common children is higher evidence for merging than having 11 children in the source and only 5 common children. In other words, we need to compute the ratio of the number of common children and source children, assuming the parent is the same in both terminologies.

When the number of common children becomes “almost” equal to the number of source children, then the evidence should become “almost” 1. When the number of common children becomes equal to the number of source children, then the evidence becomes 1, but then there are no concepts left to import. Thus, this is an uninteresting case and we will assume that the number of common children is strictly smaller than the number of source children.

Ccommon<Csource (3)

However, the evidence for import should also be based on the total number of children, not just on the ratio. For example, if there are 70 common children out of 110 source children, this should provide stronger evidence than if there are only 7 common children out of 11 source children, even though the ratio would be the same.

Therefore, we need an additional corrective factor F. For this purpose, we first introduce the parameter MAX, which represents the number of children of that parent P that has the most children in the source. (There might be several such parents with equally high numbers of children.)

Let #C (Pi, S) be the number of children of parent Pi in the source terminology S, then

MAX=#C(P,S)suchthati#C(P,S)#C(Pi,S) (4)

Thus, we compute

MAXCcommonMAX (5)

which becomes smaller when there are more children in common between source and target terminology. As we want to increase the evidence when there are many common children, we compute the following corrective factor F instead:

F=1MAXCcommonMAX=CcommonMAX (6)

The problem is that this corrective factor F becomes too small for terminologies containing concepts with many children. As F is between 0 and 1, a convenient way to make it larger but keep it in the same range [0,1] is to apply the square root to it.

Thus, applying the square root to F (6) and multiplying with (1), observing the conditions (2) and (3) we get as Evidence for Import:

EFI={EFI^*CcommonMAXwhenCcommon<CsourceandCsource>0undefinedotherwise (7)

By the above definition, the value of EFI^ is in the interval [0,1). The corrective factor F can only become 1 if Ccommon = Csource, which we excluded, and therefore F is also in the range [0,1). Thus, the value of EFI will always be in the range [0,1). In the extreme case, when there are no common concepts, EFI becomes zero. As noted above, when there are no children in common, this appears to be an indicator of an alternative classification or an error. In cases when EFI is closer to 0 it is more likely to indicate a mix of imports, alternative classifications and errors. When the value of EFI becomes closer to 1, it most likely indicates a possible import. We will formulate three Hypotheses making the above ideas more precise.

4.3. Sample Preparation

We computed the value of EFI^ and EFI for all concepts present in both NCIt and MEDCIN. A total of 1049 identical parent concepts in both MEDCIN and NCIt did not have any overlapping children and hence had their EFI = 0. The remaining 917 identical parent concepts had an EFI value >0 and <1. We created three samples for the domain expert (GE) to review.

Sample 1:

Sample 1 consisted of 50 parent concepts randomly chosen from those parent concepts that have no common children between the two terminologies (i.e., Ccommon = 0).

For each of the 50 concepts in Sample 1 (i.e., with no common children) we listed all the children of each parent concept (P) in both the source and target terminologies. We then presented the following four choices to the domain expert.

  1. The children in source and target terminologies form an alternative classification.

  2. Error in the source terminology – one or more children in the source terminology should not be children of P.

  3. Error in the target terminology – one or more children in the target terminology should not be children of P.

  4. A case of a finer level of detail in the source or target terminology – Suppose the parent concept (P) is Thoracic spinal ganglion. In the source terminology the immediate child concepts are T1 spinal ganglion, T2 spinal ganglion…, T12 spinal ganglion, whereas in the target terminology the immediate child is Variant thoracic spinal ganglion, while T1 spinal ganglion…, T12 spinal ganglion are listed as children of Variant thoracic spinal ganglion. This is a case of a finer level of detail in the target terminology.

These options are not mutually exclusive. It is possible that for a parent concept P there is an error in both the source and target terminology. Thus, we asked the domain expert to consider more than one choice if necessary.

Sample 2:

We ordered parent concepts according to their EFI value in decreasing order. Then we selected the top 50 parent concepts as Sample 2. We randomly selected one missing child concept for each of these 50 parent concepts and made the child concepts part of the sample. We performed a randomized controlled trial (RCT) for Sample 2. For this, we created a control group with the same 50 parent concepts as in Sample 2. For each of these parent concepts (P) we found a sibling of P and chose one of its children, which therefore is the cousin of P’s children, and included it into the control group. Thus, we have two groups for our RCT experiment – the experimental group and the control group. Figure 4 shows this selection process for one parent concept P. The order of the concepts in the two groups was randomized. When the domain expert is presented with two concepts for each parent concept (one suggested by the algorithm and the other the cousin of P’s children; separated due to the randomization) s/he should be able to distinguish whether there should exist IS-A links between these two concepts and the parent concept P.

Figure 4.

Figure 4.

Selecting experimental and control group concepts for a parent concept P.

Sample 3:

From the ordered list of parent concepts, based on the EFI values in decreasing order, we selected the bottom 50 concepts with an EFI value > 0. As before, we then added one randomly chosen missing child concept suggested by the algorithm for each parent concept to the sample.

We observed that the number of common children in Sample 3 is small and in 47 of the cases, there is only one common child between the source and target terminology. The maximum number of common children in Sample 3 was four, which was observed in one case. Also, as discussed before, this sample is likely to have errors or cases of alternative classifications. Taking this into account we presented the domain expert with the following four choices for a parent concept P.

  1. The suggested child concept should be imported into the target terminology.

  2. The suggested child concept should not be imported into the target terminology.

  3. Error in the source terminology – the suggested child concept should not be a child of the parent concept P.

  4. The common children and suggested child concept form an alternative classification.

As for Sample 2, we performed a RCT for Sample 3. We created two groups – the experimental group and control group in the same way as for Sample 2 (Figure 4).

For both Sample 2 and Sample 3, we also provided the domain expert with all the common children from the source and target terminologies, separately for each parent concept, as part of the sample for a better understanding of the context. We note that it is difficult to find domain experts who are willing to analyze samples of hundreds of concepts.

Based on our previous work [15] and preliminary research we formulated the following hypotheses.

Concerning Hypothesis 1, in preliminary research it was observed that for the case of no common children “many” parents defined alternative classifications. To quantify “many,” we note that there are two cases to make this observation practically applicable. We can distinguish between a case of absolute majority, where there are more alternative classifications than other choices, or a relative majority, where there are more alternative classifications than cases of the most common second category (error, import, or finer level of detail). We will formulate Hypothesis 1 for the stronger case.

Hypothesis 1:

For parent concepts with EFI=0 (Ccommon = 0), i.e., there are no overlapping children in the two terminologies for the same parent, it is more likely that the children define alternative classifications than the union of the other possible cases. (Possible other cases are error, import or finer level of detail).

Hypothesis 2:

Concepts proposed for import as children from a source terminology into a target terminology, based on their EFI values’ proximity to 1, will be distinguishable with statistical significance from concepts that are known to be cousins and not children.

Note that a concept could be both a cousin and a child at the same time, thus the wording of the hypothesis excludes this case explicitly. The domain expert should be clearly able to distinguish between IS-A links suggested by the source terminology and IS-A links that should not be there.

Hypothesis 3:

Concepts proposed for import as children from a source terminology into a target terminology, based on their EFI values’ proximity to 0, will be distinguishable with statistical significance from concepts that are known to be cousins and not children.

5. Results

For Sample 1, with no overlapping children for a parent concept between both the terminologies, we provided the domain expert with four choices as discussed. As the choices were not mutually exclusive, we received multiple responses for each concept. Table 1 shows all the possible combinations of choices as analyzed by the domain expert. Overall out of the 50 concepts, 36 (29+5+2) concepts were found to have children that are alternative classifications in the source and target terminology. Thus, 72% (=36/50) of concepts belonging to the category of alternative classification support the Hypothesis 1 that when there are no overlapping children in the two terminologies for the same parent, then it is more likely to be a case of alternative classification.

Table 1.

Results for Sample 1 as analyzed by the domain expert.

Sample 1: EFI=0
Alternative classification 29 Finer level of detail in source or target terminology 3 Error in source terminology + Error in target terminology 3
Error in source terminology 5 Alternative classification + Error in source terminology 5 Error in source terminology + Finer level of detail 1
Error in target terminology 1 Alternative classification + Error in target terminology 2 Error in target terminology + Finer level of detail 1

A review of alternative classifications in our sample uncovered the following. The two parent concepts are identical in each case, but are divided up according to different organizational viewpoints. There are four cases.

a). One set of qualifiers (e.g., from the qualifier hierarchy of NCIt) is applied in one terminology, while no qualifiers are applied in the other terminology under the same parent.

An example would be the concept Visual Impairment. In one terminology the child concepts are a result of applying the spatial qualifiers left and right (i.e., Visual impairment of right eye and Visual impairment of left eye) whereas no qualifier is used in the other terminology (e.g., Myopia). We found five such cases, out of 36, in Sample 1.

b). One set of qualifiers is applied in one terminology, while another set of qualifiers is applied in the other terminology.

As an example, consider the concept Irradiation of breast. One terminology uses the qualifiers “left” and “right” (i.e., Irradiation of left breast and Irradiation of right breast) while the other terminology uses the qualifiers “whole” and “partial” (i.e., Whole Breast Irradiation and Partial Breast Irradiation). In Sample 1, out of 36 alternative classifications, this happened two times.

As we are importing concepts into NCIt, this categorization is based on different sets of qualifiers in the Property or Attribute hierarchy of NCIt alone. We note that for other terminologies, such as ICD-10, these qualifiers could be considered as different axes of classification (e.g., left/right can be viewed as laterality and whole/part can be values on a degree axis).

c). Different axes of classification are used in the two terminologies.

For example, in the case of the concept Cardiac Lipoma, the children in one terminology are subcategorized based on the anatomical structure (e.g., Epicardial Lipoma), whereas the children in the other terminology are organized based on the histological finding (e.g. Fibrolipoma of heart and Myelolipoma of heart). We found 11 concepts of this type in Sample 1. One could argue that the difference between categories b) and c) is one of degree and not one of kind.5

d). Combinations

In many cases, alternative classifications show that the original modeling of the two terminologies is not consistent, and therefore they do not cleanly fit into one of the categories a), b), and c) above. For example, the parent Renal cyst (C3887499) has the children C0268799: acquired cyst of kidney, C0268800: simple renal cyst, C0403383: Infected renal cyst, C0431718: multiple renal cysts, and C3812408: congenital renal cyst in the source terminology MEDCIN and the children C0521621: Solitary Multilocular Kidney Cyst and C4022836: Solitary Cyst of Kidney in the target terminology NCIt. Notably, “acquired,” “simple,” “multiple,” and “congenital,” used in the source terminology, are qualifiers in the target terminology. The word “solitary,” used in the target terminology, is a qualifier there. Yet the term “infected,” which is used in a parallel manner to four qualifiers is in fact a finding. Furthermore, it is remarkable that two terminologies use qualifiers for the same parent concept, yet do not have a single qualifier in common. In Sample 1, 50% (18/36) of all parent concepts fall into this category.

Proposed solutions for the different cases above from a user perspective are as follows.

For cases a) and c) the default solution is that concept import is still possible. However, straightforward import would lead to a notable loss of information. Thus, in such a case we propose to create an intermediate level of two concepts that make explicit the different nature of the original child concepts and the imported children. This is demonstrated visually and abstractly in Figure 5.

Figure 5.

Figure 5.

Proposed solution for cases a) and c) of alternative classification.

For case b) we can import all concepts from the source terminology into the target terminology and this merge would not create any loss of information.

For case d) the correct solution would be to clean up the child structure first, e.g., in the above example by moving “infected” to a more appropriate position and then allow for direct import or for import while adding an additional level of concepts (Figure 5).

For Sample 2, out of the 50 concepts the domain expert (GE) determined that 40 concepts should be imported. Thus, we have 80% (=40/50) of cases of import. Table 2 shows the results for this sample. We performed Fisher’s exact test (two-tailed) on the control and experimental group for Sample 2. For Hypothesis 2 we obtained statistical significance with p-value<0.001 supporting our hypothesis. Thus, the child concepts suggested by our algorithm for import based on their EFI values’ proximity to 1 could be distinguished from the non-children (cousins) with statistical significance.

Table 2.

Results of each group as analyzed by the domain expert for Sample 2.

Sample 2
Groups Agreed with Import Disagreed
Experimental 40 10
Control 6 44

For Sample 3, out of the 50 concepts, the domain expert determined that 27 concepts should be imported. This corresponds to 54% of cases for import, which is much less than the percentage of import for Sample 2. Table 3 shows the results for both the experimental and control groups. We also performed Fisher’s exact test (two-tailed) on the control and experimental group for Sample 3. We obtained statistical significance with p-value<0.001 supporting Hypothesis 3. Thus, the child concepts suggested by our algorithm for import based on their EFI values’ proximity to 0 could be distinguished from the cousins with statistical significance.

Table 3.

Results of each group as analyzed by the domain expert for Sample 3.

Sample 3
Groups Agreed with Import Disagreed Alternative classification Error
Experimental 27 5 12 6
Control 6 29 1 14

6. Discussion

The idea of alternative classification means that the concepts in one group of children coming from one terminology are “different” from the concepts in the other group of children coming from the other terminology. For example, the children in one terminology could all be anatomical locations, while the children in the other terminology would be kinds of diseases. This was demonstrated with an example in Figure 3. Apart from the anatomical location and disease kind, other cases exist, for example, children of a parent concept are classified based on the procedural location, procedural finding and kind of procedure. Thus, the classification of the parent concept is done by two different criteria (of the above three) in the two terminologies.

We had initially assumed that a pair of concepts appearing in two terminologies, designated in the UMLS to be identical, but having no common children, would sometimes indicate that the concepts are in fact homonyms and not identical. This study and the detailed analysis of Sample 1 did not bear this out.

However, we observed cases where children of a parent concept are classified by two different criteria in the source terminology, which is covered by case d) above. Thus, if this observation is made in the source terminology, the question arises whether this should be classified as an error in the source. If so, it would be necessary to introduce an intermediate layer of concepts in the source terminology and thus there would be a major restructuring necessary whenever there are many such cases. Following Occam’s razor, it was desirable to keep changes in the source terminology minimal, because the task in this project is to import concepts into the target terminology, not to repair the source terminology. Furthermore, one could argue that mixing children of different kinds under one parent is not an error if each child-parent pair is observed in isolation.

When concepts in the source and target terminologies are classified by different criteria, we propose to insert two additional concepts in the target terminology. One will be the parent of all source concepts and the other one the parent of all target concepts. This corresponds to cases a) and c) and Figure 5 above. It avoids “mixing” children of different kinds under one parent. Thus, while we are not advocating a clean-up of the source terminology, we also do not want to create an inconsistent structure in the target ontology when importing new concepts.

Hypothesis 2 and Hypothesis 3 were separated, because the raw numbers seemed to indicate different “behavior” when EFI tends to 1, compared to when EFI tends to 0. Indeed, there are relatively more alternative classifications and errors for EFI close to 0. Nevertheless, for both polarities of EFI, the largest number of choices according to the domain expert was “import.” We observed that more than half of the pairs of identical parent concepts had no overlapping children. From the parent concepts that have at least one child in common, we selected 100 concepts, corresponding to about 10%, to create Samples 2 and 3. In our previous work on horizontal density [15] we experimented with threshold values beyond which the child concepts would be highly likely to be imported. However, we observed that a threshold value that works for one pair of terminologies might not work for another pair of terminologies, due to the variance in the Ccommon and Csource values, leading to either very small or very large samples. Thus, we decided to choose the top 50 and bottom 50 concepts based on the ordering of EFI values for concepts with EFI ≠ 0. As noted before, EFI cannot become 1.

An important distinction in this paper is between EFI = 0 and EFI ≠ 0. The case of EFI = 0 provides a fully automatic criterion for recognizing likely alternative classifications. To the best of our knowledge, no such criterion has been previously reported in the literature. Recognizing an alternative classification can then be followed by a human review to determine which approach should be taken when importing concepts into the target terminology. Ideally, the distinctions between different methods of import and different kinds of errors should be recognized by an algorithm with high reliability. While this goal is hard to reach, the current results establish a step into that direction.

7. Conclusion

We started this paper by refining a previously developed metric to indicate when concepts with a common parent should be imported into another terminology. In this previous work, alternative classifications had been observed, but determining which parents defined those required a human expert. In this paper, we showed that the EFI (Evidence for Import) metric, when it becomes zero, provides good evidence that the occurrences of the same concept in two terminologies defines an alternative classification. In 72% of the cases the domain expert agreed that the child concepts in the source and target terminologies are alternative classifications of the same parent concept.

In contrast, for EFI in the range of 0.10 – 0.35 it was found that direct import was the most likely choice, with statistical significance, when compared to a control sample. As the domain expert agreed with 80% of the algorithmically recommended imports, the proposed metric outperforms our previous approach where only 56% were recommended.

Acknowledgment

Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number R01CA190779. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

References

  • [1].Bodenreider O Biomedical ontologies in action: role in knowledge management, data integration and decision support. Yearbook of medical informatics. 2008:67–79. [PMC free article] [PubMed] [Google Scholar]
  • [2].Bodenreider O The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Research. 2004;32:D267–D70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Zemmouchi-Ghomari L, Ghomari AR. Ontology versus terminology, from the perspective of ontologists. Int J of Web Science. 2012;1:315–31. [Google Scholar]
  • [4].Natalia G, Thierry H, Olivier B. Ontologies and Terminologies: Continuum or Dichotomy? Journal of Applied Ontology. 2012;7:375–86. [Google Scholar]
  • [5].Schulz S, Jansen L. Formal ontologies in biomedical knowledge representation. Yearbook of medical informatics. 2013;8:132–46. [PubMed] [Google Scholar]
  • [6].U.S. National Library of Medicine. SNOMED CT. https://www.nlm.nih.gov/healthit/snomedct/, 2018. (accessed 3 December 2018).
  • [7].The National Cancer Institute. NCI thesaurus. https://ncit.nci.nih.gov/ncitbrowser/, 2018. (accessed 3 December 2018).
  • [8].Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 2000;25:25–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].BioPortal. Medical Subject Headings. https://bioportal.bioontology.org/ontologies/MESH, 2018. (accessed 3 December 2018).
  • [10].He Z, Chen Y, de Coronado S, Piskorski K, Geller J. Topological-pattern-based recommendation of UMLS concepts for National Cancer Institute Thesaurus. AMIA Annual Symposium Proceedings. 2016;2016:618–27. [PMC free article] [PubMed] [Google Scholar]
  • [11].He Z, Chen Y, Geller J. Perceiving the usefulness of the National Cancer Institute Metathesaurus for enriching NCIt with topological patterns. Studies in health technology and informatics. 2017;245:863–7. [PMC free article] [PubMed] [Google Scholar]
  • [12].He Z, Geller J, Chen Y. A comparative analysis of the density of the SNOMED CT conceptual content for semantic harmonization. Artificial intelligence in medicine. 2015;64:29–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].He Z, Geller J, Elhanan G. Categorizing the relationships between structurally congruent concepts from pairs of terminologies for semantic harmonization. AMIA Summits on Translational Science Proceedings. 2014;2014:48–53. [PMC free article] [PubMed] [Google Scholar]
  • [14].He Z, Keloth VK, Chen Y, Geller J. Extended Analysis of Topological-Pattern-Based Ontology Enrichment 2018 IEEE International Conference on Bioinformatics and Biomedicine. Madrid, Spain: 2018. p. 1641–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Keloth VK, He Z, Chen Y, Geller J. Leveraging Horizontal Density Differences Between Ontologies to Identify Missing Child Concepts: A Proof of Concept. AMIA Annual Symposium Proceedings. 2018;2018:644–53. [PMC free article] [PubMed] [Google Scholar]
  • [16].Lindberg DA, Humphreys BF, McCray AT. The Unified Medical Language System. Methods Inf Med. 1993;32:281–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].National Cancer Institute. Enterprise Vocabulary Service – Term Suggestion. https://ncitermform.nci.nih.gov/ncitermform/?version=cdisc, 2018. (accessed 3 December 2018).
  • [18].Euzenat J, Shvaiko P. Ontology Matching. 2 ed Berlin Heidelberg: Springer-Verlag; 2007. [Google Scholar]
  • [19].Kalfoglou Y, Schorlemmer M. Ontology mapping: the state of the art. Knowl Eng Rev. 2003;18:1–31. [Google Scholar]
  • [20].Rector A, Rogers J, Bittner T. Granularity, scale and collectivity: When size does and does not matter. J Biomed Inform. 2006;39:333–49. [DOI] [PubMed] [Google Scholar]
  • [21].Weng C, Fridsma DB. A Call for Collaborative Semantic Harmonization. AMIA Annual Symposium Proceedings. 2006;2006:1142-. [PMC free article] [PubMed] [Google Scholar]
  • [22].Noy NF, Musen MA. PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence: AAAI Press; 2000. p. 450–5. [Google Scholar]
  • [23].Euzenat J An API for ontology alignment Proceedings of the 3rd International Conference on Semantic Web Conference. Hiroshima, Japan: Springer-Verlag; 2004. p. 698–712. [Google Scholar]
  • [24].Doan A, Madhavan J, Domingos P, Halevy A. Ontology Matching: A Machine Learning Approach In: Staab S, Studer R, editors. Handbook on Ontologies. Berlin, Heidelberg: Springer Berlin Heidelberg; 2004. p. 385–403. [Google Scholar]
  • [25].Zhou L, Cheatham M, Krisnadhi A, Hitzler P. A Complex Alignment Benchmark: GeoLink Dataset. The Semantic Web – ISWC 2018. p. 273–88. [Google Scholar]
  • [26].Oliveira D, Pesquita C. Improving the interoperability of biomedical ontologies with compound alignments. J Biomed Semantics. 2018;9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Stoilos G, Geleta D, Shamdasani J, Khodadadi M. A Novel Approach and Practical Algorithms for Ontology Integration. International Semantic Web Conference 2018. p. 458–76. [Google Scholar]
  • [28].Sun P, Zhang S. Identifying granularity differences between large biomedical ontologies through rules. AMIA Annual Symposium Proceedings. 2010;2010:927–31. [PMC free article] [PubMed] [Google Scholar]
  • [29].Sun P, Zhang S. Using rules to investigate the differences in partonomy between biomedical ontologies. IEEE International Conference on Bioinformatics and Biomedicine; 2011:623–6. [Google Scholar]
  • [30].He Z, Geller J. Preliminary analysis of difficulty of importing pattern-based concepts into the National Cancer Institute Thesaurus. Studies in health technology and informatics. 2016;228:389–93. [PMC free article] [PubMed] [Google Scholar]
  • [31].Luo L, Tong L, Zhou X, Mejino JLV, Ouyang C, Liu Y. Evaluating the granularity balance of hierarchical relationships within large biomedical terminologies towards quality improvement. J Biomed Inform. 2017;75:129–37. [DOI] [PubMed] [Google Scholar]

RESOURCES