Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jun 1.
Published in final edited form as: J Biomed Inform. 2016 Mar 14;61:63–76. doi: 10.1016/j.jbi.2016.03.007

Utilizing a Structural Meta-ontology for Family-based Quality Assurance of the BioPortal Ontologies

Christopher Ochs a, Zhe He b, Ling Zheng a, James Geller a, Yehoshua Perl a, George Hripcsak c, Mark A Musen d
PMCID: PMC4893909  NIHMSID: NIHMS773712  PMID: 26988001

Abstract

An Abstraction Network is a compact summary of an ontology’s structure and content. In previous research, we showed that Abstraction Networks support quality assurance (QA) of biomedical ontologies. The development of an Abstraction Network and its associated QA methodologies, however, is a labor-intensive process that previously was applicable only to one ontology at a time. To improve the efficiency of the Abstraction-Network–based QA methodology, we introduced a QA framework that uses uniform Abstraction Network derivation techniques and QA methodologies that are applicable to whole families of structurally similar ontologies. For the family-based framework to be successful, it is necessary to develop a method for classifying ontologies into structurally similar families. We now describe a structural meta-ontology that classifies ontologies according to certain structural features that are commonly used in the modeling of ontologies (e.g., object properties) and that are important for Abstraction Network derivation. Each class of the structural meta-ontology represents a family of ontologies with identical structural features, indicating which types of Abstraction Networks and QA methodologies are potentially applicable to all of the ontologies in the family. We derive a collection of 81 families, corresponding to classes of the structural meta-ontology, that enable a flexible, streamlined family-based QA methodology, offering multiple choices for classifying an ontology. The structure of 373 ontologies from the NCBO BioPortal is analyzed and each ontology is classified into multiple families modeled by the structural meta-ontology.

Keywords: ontology classification, structural meta-ontology, abstraction network, family-based ontology quality assurance

Graphical Abstract

graphic file with name nihms773712f6.jpg

1. Introduction

Ontologies are often large and complex, with many interconnected classes and axioms. Biomedical ontologies are becoming increasingly important for data integration [1], data annotation [2], data interoperability [3], and data encoding in electronic health records (EHRs) [4] to support their meaningful use. The National Center for Biomedical Ontology (NCBO) BioPortal [5] is an online repository hosting over 450 biomedical ontologies. Due to the size and complexity of ontologies, errors and inconsistencies are unavoidable and ontology quality assurance (QA) is difficult. In previous work, we have developed Abstraction Networks [6], which are compact visual summaries of an ontology’s structure and content, to support ontology QA. A detailed description of Abstraction Networks is provided in Section 2.2. We have successfully applied different kinds of Abstraction-Network–based QA methodologies to various ontologies [713], including the National Cancer Institute thesaurus (NCIt) [14], SNOMED CT [15], the Gene Ontology (GO) [16], the Ontology of Clinical Research (OCRe) [17], the Sleep Domain Ontology (SDO) [18], the Ontology for Drug Discovery Investigations (DDI) [19], and the Cancer Chemoprevention Ontology (CanCo) [20].

The development of Abstraction-Network–based QA methodologies is labor intensive. For each ontology, we have to design a proper Abstraction Network, a process that involves reviewing the ontology’s structure manually or semi-automatically, and identifying elements that can be used to automatically summarize the ontology in a way that is useful for QA.

In the past, this process usually involved several months of effort, since an Abstraction Network derivation methodology requires a number of laborious steps. After the derivation methodology is in place, it is necessary to design a visual display of the Abstraction Network that presents the information that is most useful for QA. Next, software for automatically deriving and visualizing the Abstraction Network must be implemented. Finally, one has to explore how the Abstraction Network can be used to support the identification of sets of classes that are expected to have a higher likelihood of errors. This last step typically involves several domain experts reviewing hundreds of ontology classes in a consensus-based QA review process. Overall, this “one ontology at a time” process model is slow and requires a substantial effort of research and implementation for each specific ontology.

The reward of this approach, however, is that editors of the ontology can concentrate their QA review on sets of classes that are expected to have a higher likelihood of error instances. They gain a higher yield of corrections, compared to a random set of ontology classes, for their QA efforts, as measured by the ratio of the number of erroneous classes to the number of classes reviewed. As the limited resources typically available for QA make it impossible to review all classes of an ontology, such a high yield is desirable. Consider, for example, the partial-area taxonomy, a kind of Abstraction Network (described in detail in Section 2.3), which we previously derived [8, 21] for SNOMED CT [15] and the Gene Ontology (GO) [22], among others.

Abstraction Network derivation methodologies are based on the structure of an ontology. Because ontologies may share certain kinds of structural elements, we can identify families that are sets of structurally similar ontologies. A detailed description of families of structurally similar ontologies appears in Section 2.5. In the family-based approach, we replace the “one ontology at a time” process for Abstraction-Network-based QA by a uniform process for all of the ontologies in a family. There are two main benefits of this family-based approach: (1) the ability to have a uniform Abstraction Network derivation methodology for each ontology of a family, which means that software needs to be written only once per family, and (2) the identification of characterizations of sets of classes with a high likelihood of errors, based on a few family members. These characterizations might be valid for most other ontologies in the family (a testable hypothesis).

We define a structural feature as a type of axiom used in the definition of an ontology class. Examples of structural features that are commonly used are object properties (OPs), data properties (DPs), and certain configurations in the class hierarchy (e.g., are there multiple parents?).

Consider, for example, the family of ontologies where each member has a class hierarchy that is a Directed Acyclic Graph (DAG), such that multiple superclasses are allowed (and exist), and where the ontology classes have object properties that are used only in class restrictions. The SNOMED CT and GO ontologies mentioned earlier belong to this family. In fact, as we will see in the Results section, there are 66 ontologies hosted in BioPortal that belong to this family. Since the ontologies in this family share the same basic structure, there exists a derivation methodology for automatically obtaining a certain Abstraction Network, called a partial-area taxonomy, for each family member using the same algorithm. Furthermore, both of these ontologies contain so called overlapping classes (defined in Section 2.3), that have been shown in our previous research to have a higher rate of error than randomly chosen classes [21, 23] (see Section 2.4.1).

Consider another ontology, called the Chemical Entities of Biological Interest (ChEBI) [24], which is in the same family as SNOMED CT and GO. We can automatically derive a partial-area taxonomy for ChEBI using the same methodology and software as used for SNOMED CT and GO. Furthermore, ChEBI has overlapping classes in its partial-area taxonomy (8,697 total; 15.5% of ChEBI’s classes). Thus, we can hypothesize that the overlapping classes in ChEBI are more likely to contain errors than ChEBI classes that are non-overlapping classes. Using the family-based methodologies for deriving Abstraction Networks and performing quality assurance will enhance the QA efficiency for the ontologies in a family.

In an initial study of the family-based approach, we described seven disjoint families of ontologies for classifying the ontologies in the NCBO BioPortal repository [13]. We classified 186 BioPortal ontologies into these seven disjoint families and illustrated how it was possible to create uniform Abstraction Network derivation methodologies that were applicable to every ontology in a given family. However, the initial classification into families provided only one feasible classification. It did not provide an exhaustive, flexible classification scheme for all of BioPortal’s ontologies.

In this paper, we present a new way of classifying groups of structurally similar ontologies into families, resulting in a structural meta-ontology. A structural meta-ontology is an ontology based on the structural features used to model the classes in a set of ontologies (i.e., it is an ontology about the structure of ontologies). Each class in a structural meta-ontology represents a family of ontologies exhibiting a certain fixed combination of structural features. For example, the above mentioned family of 66 ontologies may be represented in the structural meta-ontology, but it was not one of the families described in our previous classification approach [13].

The remainder of this paper is organized as follows. In Section 2, we present Background on BioPortal, our previously published methods for deriving Abstraction Networks, and previously developed Abstraction-Network-based QA methodologies. Section 3 describes our method of deriving the structural meta-ontology. Additionally, in Section 3.2, we illustrate how the structural meta-ontology supports quality assurance of ontologies. Section 4 shows our results, including a figure of the complete structural meta-ontology for 373 BioPortal ontologies. Sections 5 and 6 contain Discussion and Conclusions, respectively.

2. Background

2.1 National Center for Biomedical Ontology (NCBO) BioPortal

BioPortal [5], created by the National Center for Biomedical Ontology (NCBO) [25], is a large repository of biomedical ontologies. BioPortal hosts over 450 biomedical ontologies released in various formats, including the Web Ontology Language (OWL) [26] and the Open Biological and Biomedical Ontologies (OBO) [1] format. BioPortal provides tools for browsing, searching, and visualizing [27] ontologies to support research in the biomedical sciences.

As the largest repository of biomedical ontologies, many studies have looked at different aspects of the structure of ontologies hosted in BioPortal. Mortensen et al. [28] encoded the Ontology Design Patterns (ODP) from several BioPortal ontologies to facilitate ontology development, however they found that design patterns are used infrequently in the BioPortal ontologies. Bail et al. [29] examined the justifications from an independently motivated corpus of actively used biomedical ontologies from the BioPortal and exhibited the structural features represented in Description Logic. Quesada-Martínez et al. [30] used all the ontologies available in BioPortal as external resources and examined their labels for supporting the axiomatic enrichment of existing biomedical ontologies.

Ghazvinian et al. [31] analyzed BioPortal’s ontologies to create 4 million mappings among classes in the ontologies based on lexical similarity of class names and synonyms and discussed how the mappings may help in the process of ontology design and evaluation. They found that most ontologies are only partially similar but a large portion of ontologies are mapped to at least one other ontology. Ghazvinian et al. [32] analyzed 53 BioPortal ontologies, identified OBO Foundry candidates and examined their level of term reuse and overlapping. In the study 96% of the OBO Foundry candidate ontologies were found to contain terms from other ontologies. Vescovo et al. [33] analyzed various aspects of partitioned BioPortal ontologies using “atomic decomposition” and presented an algorithm for extracting modules from decomposed ontologies, which makes it possible to quickly identify atoms for logically complete reasoning. Kamdar et al. [34] analyzed term reuse among 377 BioPortal ontologies and found that most terms are not reused by other ontologies. Ceusters et al. [35] discussed the importance of quality assurance in BioPortal’s ontologies. Particularly, they reviewed ontologies in BioPortal that cover the domain of pain assessment and found that most ontologies in BioPortal that cover this domain have significant issues. Pathak and Chute [36] analyzed the quality of mappings between ontologies in BioPortal. Horridge et al. [37] performed a logic-based analysis of BioPortal’s ontologies and found that most of the ontologies were consistently defined.

2.2 Abstraction Networks

Large ontologies are typically viewed in one of two ways, as indented text or as node–link diagrams [38]. BioPortal provides both kinds of interfaces. For most ontologies, however, it is difficult to view more than a few classes (and all of their associated hierarchical relationships) using an indented hierarchy (lateral relationships or multiple hierarchical relationships are often not displayed). Additionally, while node–link diagrams can be used to display larger portions of the classes of an ontology and their associated relationships, compared to an indented hierarchy, the diagrams become overwhelming as more classes and relationships are added. Abstraction Networks are intended to overcome these problems by displaying summary diagrams of large portions of an ontology.

We define an Abstraction Network as follows. An Abstraction Network of an ontology is a network of nodes and links, where each node summarizes a set of “similar” classes within an ontology. We note that the definition of “similar” differs from one Abstraction Network to another. For example, in partial-area taxonomies (see Section 2.3), similarity is based on property domains. In Tribal Abstraction Networks (TANs) [39] similarity is defined according to the subhierarchies that sets of classes belong to. Nodes are hierarchically related via child-of links that selectively reflect the subclass links of the ontology. For a review of Abstraction Networks and their properties, see the paper by Halper et al. [6].

Figure 1 illustrates the general process of deriving an Abstraction Network. A “good” abstraction network (1) is significantly smaller than the ontology that it summarizes, and (2) reflects enough of the structure and content of the original ontology so that a person looking at the Abstraction Network can get a good idea of what is in the original ontology. In our ongoing research, we have developed several kinds of Abstraction Networks that are applicable to different kinds of ontologies. Table 1 brings together, in one place, the kinds of Abstraction Networks appearing in this paper.

Figure 1.

Figure 1

(a) An ontology composed of 23 classes. Classes are shown as colored ovals and subclass relationships between classes are shown as upward directed arrows. “Similar” classes have been color coded and surrounded by colored dashed lines. (b) The Abstraction Network, composed of six nodes, derived according to the “similarity” defined for (a). For example, for some Abstraction Network derivation methodology, the group of six “similar” red-colored classes is abstracted by the red node. Nodes of the Abstraction Network in (b) are color coded according to the colors of the classes in (a). Hierarchical child-of links between Abstraction Network nodes are shown as upward directed arrows.

Table 1.

Prior work on Abstraction Networks

Number Name of Abstraction Network Example Publications and
Ontologies
Necessary Features of the Ontology
to Construct this Abstraction
Network
1 Area Taxonomy SNOMED CT [8], NCIt [9],
OCRe [7], SDO [11], GO
[21]
Object Properties and/or Data
Properties
2 Domain-Defined Partial-Area
Taxonomy
OCRe [7], SDO [11],
CanCo [13]
Object Properties and/or Data
Properties used with Domains (Some
additional conditions on the object
properties may apply.)
3 Restriction-Defined Partial Area
Taxonomy
SNOMED CT [8], NCIt [9],
GO [21], SDO [11]
Object Properties and/or Data
Properties used in Restrictions on
Classes (Some additional conditions
on the object properties may apply.)
4 Domain-defined and Restriction-
defined Partial-area Taxonomies
SDO [11], Ontology for
Drug Discovery
Investigations [12]
(combination of 2 and 3)
5 Disjoint Partial-Area Taxonomy
(Can be created for 3, 4, 5).
SNOMED CT [40] Object Properties and/or Data
Properties and Multiple Parents
6 Diff Abstraction Networks OCRe, SDO, and ERO [41] Object Properties and/or Data
Properties
7 Tribal Abstraction Network SNOMED CT [39] Multiple Parents
8 Ingredient Abstraction Network NDF-RT [42] Object Properties used in Restrictions
on Classes

Structural features, such as object properties (i.e., relationships), data properties (i.e., attributes), and multiple parents, can be used to identify groups of similar classes. We now describe how these common structural features can be used to derive Abstraction Networks.

In OWL, an object property is an important structural feature used to define classes. An object property represents a relationship that holds between instances. Object properties can be used in two different ways. (1) They can be assigned domains (and ranges; these are global restrictions on their usage) or (2) they can be utilized in restrictions on classes (local restrictions on their usage) in a subclass axiom or equivalence axiom [43]. Below is an example, in Manchester OWL syntax [44], of an object property with an explicitly defined domain taken from OCRe. The object property is named hasIdentifier and the union of the classes Study, Person, Organization, and Funding are defined as its domain, and the class Instance identifier is defined as its range.

  • ObjectProperty: hasIdentifier

    • Domain: Study or Person or Organization or Funding

    • Range: Instance identifier

Next, we provide an example of an object property used in a class restriction. Consider the class cell motility in GO (see Figure 2a). Cell motility is the subclass of two regular classes: cellular component movement and locomotion. Additionally it has a restriction that any kind of cell motility must have a part of relationship to a localization of cell.

  • Class: cell motility

    • SubClassOf:

      • cellular component movement

      • locomotion

      • part of some localization of cell

Figure 2.

Figure 2

(a) An excerpt of 20 classes in GO’s Biological process hierarchy. Superclass relationships are shown using upward directed arrows (e.g., the superclass of Cell wall switching is Cell cycle process). Classes that have restrictions with the same set of object property types, either explicitly or through inheritance, are shown in colored, dashed bubbles. For example, Cell cycle process and Cell motility have a restriction involving has part. (b) The area taxonomy for the classes in (a), consists of four areas. An area is labeled with its set of object properties and the total number of classes summarized by the area. For example, the six classes in the has part bubble in (a) are now summarized by one area named {has part} in (b). Areas are organized into levels and color coded according to their numbers of object properties. Child-of links between areas are shown as upward directed bold arrows. (c) The partial-area taxonomy for the classes in (a), consisting of six partial-areas in four areas. Partial-areas are shown as white boxes within their respective areas. A partial-area is labeled with the name of its root and the total number of classes it summarizes. For example, the partial-area Regulation of biological process summarizes five classes. Child-of links between partial-areas are shown as upward directed bold arrows (e.g., the Signal transduction partial-area is a child-of the Regulation of biological process partial-area). Partial-area taxonomies such as (c) always contain more details than area taxonomies such as (b).

Data properties (attributes) are similar to object properties, except that their ranges are defined as literal values, such as strings or numbers. Like object properties, data properties can be explicitly assigned domains or they can be used in class restrictions.

In an ontology, each class, except for the root of the ontology, will have one or more subclass of (or superclass) axioms associated with it. Class hierarchies can be represented as a directed acyclic graph (DAG), where classes may have multiple parents, or as strict tree structures, where each class has at most one parent. Superclass axioms can be utilized in deriving Abstraction Networks.

2.3 Partial-Area Taxonomies

The Abstraction Network that we have used the most in our work on quality assurance has been the partial-area taxonomy in its different variations (see Table 1). In this Section, we briefly summarize the method of deriving partial-area taxonomies. However, it is not essential to understand the nuances of this method in order to comprehend the derivation of the structural meta-ontology. The latter is the central theme of this paper. The partial-area taxonomy derivation methodology is introduced to illustrate how an Abstraction Network summarizes the structure and content of an ontology. Specifically, Figure 2 illustrates the derivation of a restriction-defined partial-area taxonomy on a subset of classes from GO.

We define an area as the set of all classes that are explicitly defined or inferred to be the exact domain of a given set of object properties O. In the domain-defined partial-area taxonomy derivation methodology [7], object property domains are determined by analyzing each property’s domain axiom. This approach is in contrast to the restriction-defined methodology [11], where domains are determined according to which classes have a given object property in a restriction. The list of names of the object properties is used to name the area. We define an area taxonomy as an Abstraction Network where the nodes are areas connected by child-of links based on the underlying subclass hierarchy. Figure 2b shows the domain-defined area taxonomy for the classes in Figure 2a.

We define a root of an area as a class that has no superclasses in the same area. An area may have more than one root. A root of an area defines a partial-area as a set of classes that includes the root and all its descendant classes in the area. Partial-areas are connected by child-of links derived from the underlying subclass relationships. Specifically, a partial-area A is child-of partial-area B if a parent of A’s root resides in B. The number of classes (including the root) in each partial-area is shown in parentheses. A partial-area taxonomy is an Abstraction Network where the nodes are partial-areas connected by child-of links. Figure 2c shows the partial-area taxonomy for the classes in Figure 2a.

Partial-areas are not necessarily disjoint [40]. A class may be in multiple partial-areas if it is in the subhierarchies of multiple roots in an area. If multiple partial-areas contain the same class they are referred to as overlapping partial-areas and the class(es) that are summarized by the multiple partial-areas are called overlapping classes. As described in Section 2.4.1, overlapping classes have been found to be more likely to have errors than non-overlapping classes.

The partial-area taxonomy derivation methodology described above utilizes only object properties. However, it is also possible to derive partial-area taxonomies using data properties. Data properties are similar to object properties except that their ranges are literal values (e.g., strings or numbers). Since the partial-area taxonomy derivation methodology defined above only considers the domains of object properties, the same methodology can be applied to data properties.

For ontologies with no (or few) object properties but with data properties, it is possible to derive data-property-based partial-area taxonomies. For example, the WikiPathways ontology [45] contains only data properties. While it is not possible to derive an object-property–based partial-area taxonomy for WikiPathways, one can derive a data-property-based partial-area taxonomy with eight areas and sixteen partial-areas, summarizing WikiPathway’s 42 classes.

2.4 Ontology Quality Assurance Methodologies

Quality assurance is an important part of an ontology’s life cycle [9]. In a special journal issue focused on the topic [46], Zhu et al. [47] reviews various quality assurance methodologies that are applicable to ontologies. Many of these methodologies were designed for larger ontologies, such as SNOMED CT, NCIt, and GO. Rector et al. [48, 49] described structure-based QA approaches for SNOMED CT. Mortensen et al. [50] discuss a crowd sourced methodology for verifying the correctness of hierarchical relationships in SNOMED CT and found that the crowd was almost as effective as a domain expert at uncovering incorrect relationships. Smith et al. [51] investigated GO’s adherence to formal ontology design principles and discussed various errors in GO’s modeling. Similarly, Ceusters et al. [52] looked at the NCIt’s adherence to ontological principles.

In particular, it is interesting to note the QA techniques used internally by some ontology curators. For example, Baorto et al. [53] discuss the maintenance of the Medical Entities Dictionary (MED), an ontology developed by Columbia Presbyterian Medical Center with over 100,000 classes. De Coronado et al. [54] discuss the quality assurance methodologies that are applied to NCIt. Gu et al. [55] discuss structural methods of auditing relationships in the Foundational Model of Anatomy (FMA).

The NCBO BioPortal hosts over 450 biomedical ontologies, including many highly used ontologies like SNOMED CT, NCIt, LOINC, FMA, NDF-RT, and ChEBI. Its purpose is similar to other ontology repositories, like the OBO Foundry [1] and Ontobee [56]. BioPortal is similar in scale to the National Library of Medicine’s Unified Medical Language System (UMLS) [57], which integrates over 160 biomedical ontologies into one consistent terminological system.

In previous research, we designed Abstraction-Network-based QA techniques to address ontology integration errors in the UMLS (see [58], [59]). For example, Gu et al. [58] showed that small sets of UMLS concepts with the same set of multiple semantic types were prone to having more errors. A thorough analysis of different kinds of inconsistent semantic type errors appears in He et al. [59]. Another example of UMLS integration errors is the appearance of directed cycles in the UMLS, even though no source ontology contains a directed cycle. For details see Bodenreider et al. [60] and Halper et al. [61].

BioPortal hosts each ontology separately. It does not combine classes from multiple ontologies into one terminological system (though there are cross maps between many ontologies). In contrast to our QA methods for the UMLS, which uncovered integration errors, in our current research we aim to identify QA techniques that are applicable to individual, but structurally similar, ontologies. Our goal is to improve the quality of individual ontologies using this approach. One example of such a QA technique is the review of overlapping classes in a partial-area taxonomy for an ontology (see below).

2.4.1 Abstraction-Network-based Quality Assurance

In previous studies we have developed various ontology QA methodologies using Abstraction Networks. These methodologies are focused on identifying Abstraction-Network-defined characteristics that indicate a set of classes has a higher likelihood of being erroneous as compared to classes that do not have the characteristic. If an ontology editor reviews the classes that exhibit one or more of these characteristics it is expected that more errors will be uncovered during the review than if other classes were reviewed. Table 2 lists six examples of characteristics that we have tested on different ontologies.

Table 2.

Six examples of Abstraction-Network-defined characteristics that indicate a higher error rate in classes

# Characteristic Abstraction Network Example Publications and Ontologies
1 Class in small partial-areas Restriction-defined Partial-
area Taxonomy
NCIt [9], SNOMED CT [62, 63]
2 Overlapping class Restriction-defined Partial-
area Taxonomy
GO [21], SNOMED CT [23, 64]
3 Class in small partial-areas
that inherit all of their
properties
Restriction-defined Partial-
area Taxonomy
SNOMED CT [62]
4 Class in small disjoint
partial-areas
Disjoint partial-area
Taxonomy
SNOMED CT [64]
5 Class in large clusters that
belong to many tribes
Tribal Abstraction Network SNOMED CT [39]
6 Class in the root area that are
deep in the class hierarchy
Area Taxonomy GO [21]

Consider the characteristic of classes in small partial-areas being more erroneous. In NCIt’s Biological process hierarchy we found that classes in partial-areas of size three or less were 2.63 (12.2% vs. 4.63%) times more likely to be in error than other classes (Table 1 in Min et al. [9]). In SNOMED CT’s Specimen hierarchy classes in partial-areas of size seven or less had an error rate 1.57 (10.7% vs. 6.8%) times higher than other classes (Table 2 in Halper et al. [62]). In SNOMED CT’s Procedure hierarchy the error rate was 1.75 (14.4% vs. 8.2%) times higher (Table 4 in Ochs et al. [63]). We hypothesize that these classes are more prone to errors because they represent a unique structure and semantics relative to other classes in the ontology.

Overlapping classes have been shown to be particularly prone to having a high error rate. Overlapping classes inherit the semantics of multiple roots, thus, they are more complex than non-overlapping classes. In SNOMED CT’s Specimen hierarchy overlapping classes were 1.89 (55% vs. 29%) more likely to be in error than non-overlapping classes (Table 4 in Wang et al. [23]). In SNOMED CT’s Bleeding subhierarchy overlapping classes had an error rate 4.33 (39% vs. 9%) times higher than non-overlapping classes (Table 1 in Ochs et al. [64]). Using a disjoint partial-area taxonomy [40], we found that classes in small disjoint partial-areas in the Bleeding subhierarchy were 13.4 (37.3% vs. 2.78%) times more likely to be in error than other classes (Table 1 in Ochs et al. [64]).

By reviewing sets of classes with relatively high error rates we have uncovered various kinds of structural errors in ontology content (e.g., incorrect superclasses, incorrect restrictions, and missing restrictions). Table 3 provides five examples of errors found in our various Abstraction-Network-based QA studies.

Table 3.

Five examples of ontology errors uncovered using Abstraction-Network-based QA methodologies

Ontology Error Type Characteristic Error Description
NCIt Redundant
restriction
Class in small partial-area Class Phagocytosis had two restrictions with the
object property has associated location and the
ranges of Cell and Phagocytic cell. The restriction
with Cell was redundant, as Cell is more general
than Phagocytic cell. [9]
SNOMED CT Incorrect
superclass
Class in small partial-area The superclass Specimen from lung obtained by
biopsy was incorrect. The correct superclass was
Specimen obtained by fine needle aspiration
procedure. [62]
GO Incorrect
superclass
Overlapping class The DNA damage response, detection of DNA
superclass of stimulus involved in DNA replication
checkpoint was incorrect. [21]
SNOMED CT Incorrect
restriction
Overlapping class The associated morphology restriction on the class
Peptic ulcer with hemorrhage AND obstruction
should have a range of Bleeding ulcer instead of
Hemorrhage. [64]
SNOMED CT Missing
superclass
Overlapping class The class Serum specimen from blood product was
missing the superclass Blood specimen from blood
product. [23]

We have also shown that irregularities in an Abstraction Network often indicate errors in the underlying ontology. We have [7, 1113] uncovered errors in smaller ontologies such as the Ontology of Clinical Research (OCRe) [17], Sleep Domain Ontology (SDO) [18], Cancer Chemoprevention Ontology (CanCo) [20], and Drug Discovery Investigations Ontology (DDI) [19] by identifying irregularities in the partial-area taxonomies for these ontologies. For example, using a partial-area taxonomy for OCRe we identified 33 statistical classes that were erroneously inferred as belonging to the Entity subhierarchy [7].

Using the structural meta-ontology described in this paper, we aim to enable Abstraction-Network-based QA methodologies that are applicable to ontologies of all sizes. The structural meta-ontology will allow us to identify which Abstraction Networks are applicable to which ontologies and, thus, which characteristics should be investigated.

2.5 Prior Work on Families of Ontologies

Ontologies can be categorized in different ways. BioPortal categorizes ontologies based on their subject domain (e.g., there are 54 anatomy ontologies) and release format (e.g., 97 OBO format ontologies). In our research we are interested in classifying ontologies into families according to their structure.

We define a family of structurally similar ontologies as a set of ontologies satisfying some overarching condition regarding their structural features [13]. Preliminary research on families of structurally similar ontologies was presented by He et al. [13]. However, that work was limited, in that it assumed that families of structurally similar ontologies have to be disjoint. In this paper, we overcome this restriction. For comparison with the results described in this paper we mention some of the outcomes of this preliminary study.

He et al. [13] reported on seven disjoint families for 186 BioPortal ontologies. These included (a) ontologies where object properties have domains defined and are not used in class restrictions (19 out of 186); (b) ontologies where object properties are assigned a domain and object properties are used in class restrictions (62 of 186), and (c) ontologies without object properties but with data properties (3 of 186).

The Cancer Research and Management ACGT ontology (ACGT) [65] has 169 object properties with assigned domains, 46 object properties used in restrictions, and 10 data properties with assigned domains. In a preliminary study by He et al., ACGT was only in the family of ontologies “where object properties are assigned a domain and object properties are used in class restrictions.” By dropping the requirement that families have to be disjoint, we gain the advantage that we can view ACGT as belonging to several families. This will be shown as advantageous for identifying appropriate quality assurance methodologies for ACGT.

3. Methods

Due to the complexity of the method for deriving a structural meta-ontology that covers a good portion of the BioPortal ontologies, we will describe the process in two steps. In the first step, we will derive a structural meta-ontology with two levels. Then we will discuss the general case, which considers any number of structural features. Following presentation of the general case methodology, we will present a three-level structural meta-ontology in Section 4 (Results).

3.1 Deriving a Structural Meta-ontology

Recall that a structural meta-ontology is an ontology that classifies the ontologies in a given family of ontologies according to their structure. The classes of a structural meta-ontology (which we refer to as meta-classes) categorize ontologies into structurally similar families based on their structural features. Every family corresponds to exactly one meta-class in the structural meta-ontology. The structural meta-ontology derivation methodology utilizes combinations of the existence (or absence) of a set of structural features to define its classes. We will now formalize this idea.

Given a set of ontologies O and a set F = {f1, f2, f3, … , fk} of k structural features, the structural meta-ontology for O is organized into k+1 levels of meta-classes, L0-Lk, where Level i, Li, 1 ≤ ik, is based on the combination of the existence or nonexistence of i structural features. At the level L0 we define a single root meta-class named Ontology that represents every ontology in O. All meta-classes in the structural meta-ontology are descendants of Ontology.

In Figure 3, we illustrate the derivation of the first two levels of the structural meta-ontology for a set of 373 BioPortal ontologies. The two structural features object properties (OPs) and data properties (DPs) are used to illustrate the derivation process (i.e., F = {object property, data property}). Each meta-class in Figure 3 is labeled with the (abbreviated) name of a combination of structural features, followed by the number of BioPortal ontologies in parentheses. For example, ∃ OP (279) means 279 BioPortal ontologies have at least one object property.

Figure 3.

Figure 3

(a) The first two levels of the structural meta-ontology. (b) The classes of Level 1 refined based on their usage. (c) The complete structural meta-ontology for the BioPortal ontologies with F = {object property, data property}.

Figures 3a, 3b, and 3c display various levels of the structural meta-ontology. Every meta-class at Level 1, as shown in Figure 3a, describes one structural feature.

In particular, level L1 describes how meta-classes broadly categorize ontologies based on the existence or non-existence of each structural feature. For each structural feature fi, two meta-classes are created, “∃fi” and “¬∃fi”, which are disjoint and represent existence and non-existence of structural feature fi, respectively. Therefore, L1 initially contains exactly 2k classes (for Figure 3, k=2). Every ontology in O will be an instance of exactly one class of each of the k disjoint pairs. The sum of instances for each disjoint pair is equal to the number of BioPortal ontologies in O. For convenience, each class is identified by a pair of numbers i-j. In this pair, j stands for the j-th class in level i. For example, 1–1 refers to ∃OP. Subclass relationships between classes are represented using upward directed arrows colored according to the superclass. In Figure 3(a), Ontology has two disjoint pairs of subclasses: (∃OP, ¬∃OP) and (∃DP, ¬∃DP). The numbers at each such pair add up to 373, the total number of ontologies in O, as follows: 279+94 = 373; 250+123 = 373.

We note that an ontology may contain object properties or data properties without assigning them to a domain or using them in a restriction. In such cases, these ontologies would be categorized into ¬∃OP and ¬∃ DP since the properties cannot be used in Abstraction Network derivation.

Object properties and data properties can be utilized in several different ways when developing an ontology. Meta-ontology classes that identify the existence of a structural feature can be refined by adding subclasses that describe the utilization of the structural feature. In the second row of Level 1 in Figure 3b, the L1 class ∃OP is first refined into two non-disjoint subclasses that represent object properties having explicitly assigned domains and object properties that are used in restrictions, respectively. These meta-classes are named ∃OP with Domain and ∃OP in Restriction, respectively. The meta-class ∃DP is similarly refined, since data properties can also be used in these two ways.

In the third row of L1 in Figure 3b, these classes are further refined into three disjoint subclasses: ∃ only OP with Domain, ∃ only OP in Restriction, and ∃ OP with Domain & ∃ OP in Restriction. We repeat this process for ∃DP. Note that a meta-class without such a feature (e.g., ¬∃OP) appears in row 1 of L1 in Figure 3b and is not further refined in L1. This refinement provides the most complete picture.

Every meta-class at Level 2 describes a unique combination of two structural features. Each meta-class in L2 is a subclass of exactly two leaf classes of different structural features from L1, as shown in Figure 3b. In the structural meta-ontology of Figure 3c, L2 is the lowest level because this meta-ontology was created using only two structural features (object properties and data properties). These classes categorize the existence and usage of object properties and data properties in a set of ontologies. For example, the L2 class 2–4 ∃ OP only in Restrictions, ¬∃ DP categorizes the 88 BioPortal ontologies that have object properties that are used only in class restrictions and have no data properties. Its two parents are 1–11 ∃ OP only in Restrictions and 1–3 ¬∃ DP.

Similarly, the meta-class 2–11 ∃ OP with Domain & ∃ OP in Restriction & ∃ DP with Domain & ∃ DP in Restriction, categorizes the 32 BioPortal ontologies that have object properties and data properties and some of the object properties and data properties are assigned domains and some are used in restrictions. Its two parents are 1 −10 ∃ OP with Domain & ∃ OP in Restriction and 1–13 ∃ DP with Domains & ∃ DP in Restrictions.

We continue now with the general case of k larger than 2. (A structural meta-ontology with k=3 is shown in Figure 4.) However, the method is general for any k > 2.

Figure 4.

Figure 4

The BioPortal structural meta-ontology for 373 BioPortal ontologies with F = {object properties, data properties, hierarchy type}. For readability, an excerpt of subclass relationships are shown for the classes with a relatively large number of member ontologies.

The meta-ontology classes at Level j Lj (j<=k) are combinations of j features. Each Lj class C, representing the set of j structural features in the ontologies that belong to the class C, is a subclass of j classes from the level Lj-1 above it. The last level, Lk, is the most specific categorization in the structural meta-ontology. The classes at this level are disjoint and exhaustive (i.e., every ontology is an instance of exactly one class in Lk). However, the classes of L2 to Lk-1 are not necessarily disjoint.

By picking different sets of structural features, F, it is possible to derive different structural meta-ontologies for O (see Discussion). Different structural meta-ontologies will focus on different structural aspects of the ontologies in O. More structural features can be considered or removed from consideration to control the number of classes in the structural meta-ontology. Figure 4 shows a structural meta-ontology derived using F={object properties, data properties, hierarchy type}.

To obtain a compact, simplified view of all of the structural meta-ontology classes a given ontology is a member of, one can select the single disjoint Lk class the ontology is classified into and extract that class’ ancestor hierarchy. The resulting subhierarchy illustrates the various families the ontology is a member of. In Figure 5, we illustrate such a subhierarchy for Family 3–27.

Figure 5.

Figure 5

The structural meta-ontology subhierarchy consisting of Family 3–27 and its ancestor classes.

3.2 Structural Meta-ontology in Support of Family-based Ontology QA

The goal of our work is to develop QA methodologies that are applicable to sets of structurally similar ontologies. This objective is accomplished by: (1) identifying families of structurally similar ontologies for which we can uniformly derive Abstraction Networks algorithmically and (2) finding characteristics for sets of classes within each ontology of a family that are known to indicate a higher probability of error. Examples of such characteristics include overlapping classes [23] and classes summarized by small partial-areas [62]. Overlapping classes are classes that belong to more than one partial-area in a partial-area taxonomy. Small partial-areas are partial-areas which summarize relatively few classes, where the bound for “small” depends on the specific ontology.

When examining GO and SNOMED CT, we have found that overlapping classes are more likely to have errors than are non-overlapping classes [10, 21, 23, 64]. Similarly, in NCIt and SNOMED CT, we have found that classes summarized by small partial-areas are more likely to have errors than are classes summarized by large partial-areas [9, 62, 63]. Following the family-based QA approach, one would expect that, once such characterizations are shown to imply more errors for a sufficient number of ontologies in a family (e.g., overlapping classes in GO and SNOMED CT), these characteristics should also apply to most of the remaining ontologies in the family. However, before one can investigate this phenomenon, one must identify which ontologies belong to the same family. The structural meta-ontology supports this family-based QA paradigm.

The applicability of the family-based QA approach is affected by the number of ontologies that belong to a given family. For a small family, there will be no benefits or limited benefits from applying the family-based QA approach. For example, any QA technique developed for a family consisting of only three ontologies will have to be confirmed for all three ontologies, but then no other ontologies exist to which the same technique can be applied.

While determining an optimal minimal family size is a topic of future research (see Section 5), one indicator that can be used to create a lower bound is based on estimating the success of a given Abstraction-Network-based characteristic within a family of ontologies. We hypothesize that, within a family of ontologies, classes with certain characteristics (e.g., overlapping classes) will have statistically significantly higher error rates than classes that do not have that characteristics (e.g., no overlapping classes). This will be a subject of future work.

However, the question is, for how many ontologies in a family does a given Abstraction-Network-defined characteristic have to be successful to conclude that it is likely going to be successful for most other ontologies of the family? The size of a structural meta-ontology family should be above such a threshold. Success is defined by classes with a certain characteristic having a statistically significant higher error rate than other classes. Since success or failure is binary, we calculated a confidence interval (CI) for a binomial distribution [66]. Six out of six successes in a row produces a 95% confidence interval of 0.54 to 1.00 for the underlying rate (using exact central CIs [67]), implying that the characteristic will be true, on average, for a majority of the remaining ontologies in the family. Alternatively, success for eight out of nine ontologies will suffice (95% CI 0.52 to 1.00).

Conducting six or nine characteristic QA evaluation studies takes considerable time and effort. The process of investigating if a given characteristic works for an ontology involves one or more domain experts reviewing hundreds of classes [68]. Thus, a family should be of sufficient size to make the required evaluation effort worthwhile. For example, if a family has twenty ontologies than after six successful studies were conducted than there is over a 50% chance that a characteristic will work for the other 14 ontologies in the family. This would indicate a savings of up to 70% (=14/20) versus studying every ontology in the family, as used to be performed in the one at a time ontology QA framework.

On the other hand, families that are too large pose a risk. When a family is too large there is anecdotally a likelihood of high variability in the member ontologies. While they may share a common set of structural features there may be other factors which affect the applicability of the family-based QA approach to such a large family. Some member ontologies of such a large family may have relatively few instances of a given structural feature. There may be too few data properties with an explicitly defined domain, and thus, it may not be possible to derive an Abstraction Network which adequately supports QA. Examples for such cases will be demonstrated in the Results section.

4. Results

In April 2015, we collected the then most recent release of every ontology hosted in BioPortal (439 total). We considered all ontologies that could be parsed by the OWL API [69] (373 total, 84.4% of BioPortal ontologies), including OWL ontologies [26], OBO format ontologies [1], and RDF/TTL format ontologies [70]. The 373 ontologies in our sample were converted into OWL format using Protégé [71]. When appropriate, an ontology was classified using the HermiT reasoner [72].

In our study, we analyzed three structural features—object properties, data properties, and hierarchy type— due to their importance for Abstraction Network derivation [7, 11, 39, 40]. Table 4 shows the number of ontologies that exhibit a particular structural feature. For object properties and data properties, we also looked at how many ontologies assigned properties to specific domains and how many used properties in class restrictions (some use both).

Table 4.

Commonality of the three structural features used to derive the structural meta-ontology.

Structural Feature (Usage) # of Ontologies % (n=373)
Directed Acyclic Graph 228 61.1%
Tree 145 38.9%
Object property 279 74.8%
Object property (Defined domain) 185 49.6%
Object property (Restriction) 224 60.1%
Data property 123 33.0%
Data property (Defined domain) 117 31.4%
Data property (Restriction) 39 10.5%

Note: The first column shows the structural features considered in this study. Within parenthesis is the usage of the structural feature. The second column indicates the total number of ontologies that exhibit the particular structural feature and usage. The third column provides the percentage of ontologies in the study that have the feature (# of Ontologies / 737)

After the structural features for each ontology were found, we derived a structural meta-ontology with F={object properties, data properties, hierarchy type} to classify each of the 373 BioPortal ontologies into families. The resulting structural meta-ontology is similar to the one shown in Figure 3c, derived using F={object properties, data properties}, except it now includes information on the structure of the subsumption hierarchy. For example, the first level of the new BioPortal structural meta-ontology has one additional pair of disjoint L1 classes: Tree and Directed Acyclic Graph (DAG). Specifically, the class Tree represents the set of all ontologies where, for each ontology, each class in the ontology, except the root, has exactly one superclass (i.e., no multiple inheritance) excerpt the root. The class DAG contains all other ontologies. No additional L1 classes were used to refine DAG or Tree based on usage.

Figure 4 shows the BioPortal structural meta-ontology derived using F={object properties, data properties, hierarchy type}. The structural meta-ontology consists of 81 classes organized into three levels. A total of 51 (=63.0%) of the classes on Levels 2 and 3 have instance ontologies. Unlike in Figure 3, in Figure 4 we do not show any structural meta-ontology classes that have no instance ontologies (e.g., 2–13 ¬ ∃OP & ∃ only DP in restrictions, shown in Figure 3b). Additionally, only an excerpt of subclass relationships is shown from the classes with relatively many instance ontologies.

Thirty-four of the ontologies (67%=34/51) of the structural meta-ontology’s Level 2 and 3 classes have more than ten member ontologies. Table 5 shows five Level 3 classes along with a few member ontologies from each family. Table 6 lists all of the members of Families 3–10 and 3–12, which both have eleven member ontologies.

Table 5.

Example ontologies from five Level 3 BioPortal structural meta-ontology families

Structural Meta-ontology Class # of Ontologies (%) Example Member Ontologies
3–6. Tree & ∃ only OP with
Domains & ∃ only DP with
Domains
18 (4.83%) Animal Natural History and Life History Ontology
(ADW), Dermatology Lexicon (DERMLEX),
International Classification of Functioning,
Disability and Health (ICF)
3–16. Tree & ¬∃ OP & ¬∃ DP 61 (16.4%) Basic Formal Ontology (BFO), Clinical Trials
Ontology (CTO)
3–20. DAG & ∃ only OP in
Restrictions & ¬∃ DP
61 (16.4%) Chemical Entities of Biological Interest Ontology
(CheBI), Gene Ontology (GO), SNOMED CT, Uber
Anatomy Ontology (UBERON), Zebrafish
Anatomy and Development Ontology (ZFA)
3–26. DAG & ∃ OP with Domain &
∃ OP in Restrictions & ∃ only DP
with Domains
32 (8.58%) Drug Interaction Knowledge Base Ontology
(DIKB), Foundational Model of Anatomy (FMA),
Ontology for Drug Discovery Investigations (DDI),
NanoParticle Ontology (NPO)
3–32. DAG & ¬∃ OP & ¬∃ DP 30 (8.04%) Human Disease Ontology (DOID), Human
Phenotype Ontology (HP), Vertebrate Trait
Ontology (VT)

Table 6.

The member ontologies of Family 3–10 and Family 3–12.

Family 3–10
Tree& & ∃ OP with Domain & ∃ OP in Restriction &
∃ OP with Domain
Family 3–12
Tree & ∃ OP with Domain & ∃ OP in Restriction &
W/O DP
Bioinformatics Web Service Ontology (OBIWS) Fire Ontology (FIRE)
Common Terminology Criteria for Adverse Events
(CTCAE)
Emotion Ontology (MFOEM)
Diagnostic Ontology (DIAGONT) Mental Functional Ontology (MF)
Epoch Clinical Trial Ontology (CTONT) NMR-Instrument Component of Metabolomics
Investigations Ontology (NMR)
Information Artifact Ontology (IAO) Ontology of Genes and Genomes (OGG)
Non-Randomized Controlled Trials Ontology
(NONRCTO)
Ontology of Genes and Genomes – Mouse (OGG-MM)
Ontology for Genetic Interval (OGI) Ontology for General Medical Science (OGMS)
Ontology for Newborn Screening Follow-up and
Translational Research (ONSTR)
Ontology of Data Mining Investigations (OntoDM-
KDD)
Parasite Experiment Ontology (PEO) RNA Ontology (RNAO)
Semantic DICOM Ontology (SEDI) Vaccine Ontology (VO)
Systems Chemical Biology and Chemogenomics
Ontology (CHEMBIO)
XEML Environment Ontology (XEO)

The structural meta-ontology of Figure 4 provides a more complete and accurate classification of the BioPortal ontologies into families than our previously developed family-based classification [13]. We note that our previous family classification system can be directly mapped into the structural meta-ontology. For example, Family 2 in our previous classification [13] is equivalent to Family 1–11 in Figure 4 and Family 3 [13] is equivalent to Family 1–13 in Figure 4. One can see that the classifications provided by the structural meta-ontology have finer granularity and produce smaller families of ontologies which are more structurally similar. Let us consider the classifications of four ontologies: OCRe, ACGT, the Gene Ontology (GO), and the Human Phenotype Ontology (HP) [73].

Using our previous classification method, OCRe and ACGT belong to the family of ontologies that has object properties with explicitly defined domains and object properties used in restrictions (“Family 4” [13], equivalent to Family 1–12). As described in Section 3, this classification does not provide any information about their use of data properties or about how their class hierarchies are structured. Furthermore, the family consists of 62 (62/186=33%) of the ontologies analyzed in our sample. In the current study, the equivalent classification, Family 1–12, has 130 (130/373=35%) member ontologies. This family is most likely too large to effectively enable the family-based QA approach, as a high amount of variability is expected in the other structural features that are used in these ontologies and more refined classifications are needed.

In the structural-meta-ontology-based classification, OCRe and ACGT belong to several families, the most refined of which is Family 3–27 DAG & ∃ OP with Domain & ∃ OP in Restriction & ∃ DP with Domain & ∃ DP in Restriction. This family consists of only 27 (27/373=7%) of the ontologies in our sample that are very similar structurally. Family 2–11, Family 2–22, and Family 2–30, with 32 (8.6%), 104 (27%), and 29 (7.8%) member ontologies, respectively, provide less granular classifications for OCRe and ACGT. The least granular classifications for OCRe and ACGT can be found at Level 1 (e.g., Family 1–5, Family 1–12, and Family 1–15). Even though Family 2–22 is relatively large, it is a fairly cohesive family with two of the three structural features in common, rather than these three classes on Level 1. The various options of classification provided by the structural meta-ontology for OCRe and ACGT are illustrated by the subhierarchy in Figure 5.

Our prior classification of GO presented problems similar to those of the classification of OCRe and ACGT. GO was in the family of ontologies that has object properties used in class restrictions (“Family 3” [13], equivalent to Family 1–13), which had 69 (69/186=37%) member ontologies. Using the structural meta-ontology classification, the equivalent family has 94 (94/373=25%) member ontologies. Many OBO Format ontologies belong to this family, since the conversion from OBO Format to OWL converts OBO relationships into OWL restrictions [74]. It is therefore not surprising that this family is relatively large. However, the family is too broad for the family-based QA approach to be successful since it provides no information about the use of data properties and the structure of the class hierarchy. For example, the family contains both the Chemical Entities of Biological Interest Ontology (ChEBI) [75], which is represented by a DAG, and the BioMedBridges Diabetes Ontology (DIAB) [76], which has a simple tree structure—two ontologies with different class hierarchy structures.

The structural meta-ontology classifies GO into several families, the most specific of which is Family 3–20 DAG & ∃ only OP in Restriction & ¬∃ DP, along with 60 (60/373=16%) other ontologies. Family 3–20’s parent classes are Family 2–4, Family 2–18, and Family 2–32 with 88 (24%), 66 (18%), and 142 (38%) member ontologies, respectively. The first benefit of the Family 3–20 classification for GO is that it is a family that is proportionally much smaller than the original classification, better enabling the family-based QA approach. The second benefit is that it shows that there is the potential for overlapping classes in the restriction-defined partial-area taxonomies derived for this family’s member ontologies. In our review of GO, we found that overlapping classes are more likely to be erroneous [10] (see Section 2.4.1) and we expect to find similar results in many of the ontologies that are in this family.

As we see, the flexibility of the structural meta-ontology framework enables a user to explore various classification options for a given ontology and enables such a user to select a family with a higher chance of properly enabling a QA methodology. This is in contrast to the more rigid collection of disjoint families considered in our earlier work [13].

5. Discussion

The structural meta-ontology enables a flexible structural classification methodology for BioPortal’s ontologies. In comparison, our previously developed classification [13], which consisted of only seven disjoint families, provided a rigid classification, where each ontology can belong to only one family. The structural meta-ontology enables our family-based QA approach by providing several classification choices, as each ontology may belong to one of several families.

The families at Level 3 of the structural meta-ontology are more specific than those described in our initial study [13]. This results in families (1) that are typically smaller and (2) that contain ontologies that are significantly more structurally similar (i.e., they form a more cohesive family). These two properties are highly desired for supporting the family-based QA approach.

There are several significant issues with the classifications provided by the structural meta-ontology. First, we observe that certain families have few member ontologies. This situation occurs when only a few ontologies exhibit a certain combination of structural features. Such families do not support the family-based QA approach, as discussed in Section 3. At Level 3 in Figure 4, we observe seven families with fewer than ten member ontologies. For example, Family 3–17 DAG & ∃ only OP in Restriction & ∃ only DP in Restriction has only one member ontology: G Protein-Coupled Receptor BioAssays Ontology (BAO-GPCR) [77]. It is desirable to classify such an ontology into families that have more member ontologies. Our structural meta-ontology classifies ontologies into multiple families. If a family has only a few member ontologies then the more general classification(s) should be used. For example, Family 3–17’s superclasses are Family 2–4, Family 2–17, and Family 2–25, with 88, 28, and one member ontologies, respectively. One should use either of the first two families instead of Family 3–17, in covering the BAO-GPCR ontology.

Family 3–14 Tree & ¬∃ OP & ∃ only DP with Domain has only three member ontologies: the Cell Behavior Ontology (CBO), PAV Provenance, Authoring and Versioning (PAV), and WikiPathways. Looking at Family 3–14’s parent classes, Family 2–14 ¬∃ OP & ∃ only DP with Domain, Family 2–23 Tree & ¬∃ OP, and Family 2–27 Tree & ∃ only DP with Domain have 3, 64, and 32 member ontologies, respectively. Thus, it would be better to use either Family 2–23 or Family 2–27 to classify these three ontologies.

Another issue is that the derivation of a structural meta-ontology is based on only the existence of a given structural feature. Existence of a structural feature is typically a necessary but not sufficient condition for Abstraction Network derivation. The prevalence of a structural feature also plays a significant role in the resulting Abstraction Network. For example, OCRe is a member of Family 3–27, but it has only 3/343 (=0.8%) classes with multiple parents. A similar situation is found with the Population and Community Ontology (PCO) [78], which is a member of Family 3–20, but only 2/1546 (=0.1%) of its classes have a restriction. In addition to prevalence of a structural feature, the use of the structural feature within the ontology also plays an important role in Abstraction Network derivation. For example, NCIt is a member of Family 3–28, and it does have many object properties with explicitly defined domains, but their domains are all the same root class. NCIt’s Disease, Disorder, or Finding class is the domain of 27 object properties. In contrast, the design pattern used by NCIt uses object properties in restrictions throughout the various subhierarchies. Hence, for NCIt, we prefer to accentuate the feature of object properties used in restrictions, over the feature of object properties with defined domains.

For OCRe, the lack of a meaningful number of classes with multiple parents prevents the derivation of certain kinds of hierarchy-based Abstraction Networks (e.g., the disjoint partial-area taxonomy [40] and the Tribal Abstraction Network (TAN) [39]), which can be derived for other member ontologies of Family 3–27. For PCO, the lack of more classes with restrictions results in a restriction-defined taxonomy with only two partial-areas (one summarizing 1544 classes). The domain-defined partial-area taxonomies derived for many NCIt hierarchies (e.g., Disease, Disorder, or Finding) consist of a single partial-area, making them not useful for QA. In contrast, the restriction-defined partial-area taxonomies provide a more useful view for QA purposes.

These issues can be addressed in several ways. First, as with the issue of small families, one can use the more general classifications provided by the structural meta-ontology. Using this approach, one could classify OCRe according to the superclass of Family 3–27 that does not involve hierarchy type: Family 2–11. An alternate approach would involve using a threshold value during the structural meta-ontology derivation process. One could specify that, for example, at least 10% of the classes in an ontology have to have multiple superclasses to be classified as a DAG. A third possibility would involve automatically deriving Abstraction Networks using the various available structural features in a family, and automatically determining, using various heuristics, if they provide a useful summarization of each ontology in a family. If a given Abstraction Network does not work for a member ontology (e.g., TANs for OCRe, restriction-defined partial-area taxonomies for PCO, and domain-defined partial-area taxonomies for NCIt), this information could be used to more properly classify the respective ontologies. In a future study, we will investigate each of these approaches and their impact on the family-based QA approach.

We observe that different portions of an ontology may be structured very differently. For example, eight of NCIt’s 20 subhierarchies do not have any classes with restrictions. A similar situation is observed in SNOMED CT, where only seven of its 19 hierarchies have attribute relationships (restrictions when SNOMED CT is represented in OWL). This situation is not surprising, since, due to the size of these ontologies and the time span of their development, they were not necessarily modeled by the same editor or even by the same editorial team. While a classification provided by the structural meta-ontology may be correct for the entire ontology, it may not be correct for major subsets of the ontology. Large ontologies (e.g., NCIt, SNOMED CT, and FMA) have subhierarchies that are much larger than many of the BioPortal ontologies. When dealing with such large ontologies one could classify each of their subhierarchies separately using a structural meta-ontology. Thus, the eight subhierarchies which do not have restrictions in NCIt would be classified differently from the eleven hierarchies that do, providing a more accurate picture of NCIt’s structure. Similarly, the 12 SNOMED CT hierarchies without attribute relationships (i.e., object properties) can be summarized by the Tribal Abstraction Network [39] while the seven hierarchies with attribute relationships can be summarized by partial-area taxonomies [8, 63, 64].

Another issue is the non-permanence of the classifications provided by the structural meta-ontology. As a BioPortal ontology undergoes development, additional structural features may be introduced. For example, if during a development or QA phase a data property with an assigned domain is added to an ontology that is a member of 3–20 DAG & ∃ only OP in Restrictions & ¬∃ DP, then the ontology will become a member of 3–18 DAG & ∃ only OP in Restrictions & ∃ only DP with Domains. Early in the development of an ontology, as new knowledge is represented in the ontology, it will likely migrate between families.

In the context of the structural meta-ontology, changes in classification are expected over time. One of the major purposes of the structural meta-ontology is to identify which Abstraction Networks and QA methodologies are applicable to a given ontology. For a single release of an ontology, the ontology belongs to a fixed set of families. This situation reflects that, for the given release, a defined set of Abstraction Network derivation methodologies and QA methodologies is applicable. If the structure of the ontology changes from one release to another, then certain Abstraction Network derivation methodologies may no longer apply. The classification provided by the structural meta-ontology families should accurately reflect this information, and helps to properly accommodate this dynamic situation.

5.1 Future Work

In a future study, we will investigate structural meta-ontologies derived using other structural features, (e.g., equivalence axioms), which can be used to derive Abstraction Networks and which can be found in many BioPortal ontologies. Among the 373 BioPortal ontologies analyzed in this study, a total of 160 (43%) have at least one equivalence axiom with a restriction.

We note that the structural features used in the derivation of the structural meta-ontology in Figure 4 (object properties, data properties, and hierarchy type) were chosen due to their importance in Abstraction Network derivation. However, these structural features are relatively simple. One can pick different kinds of structural features for the derivation of a structural meta-ontology (e.g., one can classify ontologies according to use of ontology design patterns [28]). For example, one could use the import of BFO [79] as a structural feature (49/373=13% of ontologies in our sample imported BFO classes), creating a structural meta-ontology which captures the ontologies in BioPortal that import BFO.

The issues discussed above, regarding the number of times a structural feature appears in an ontology, relate to uniform Abstraction Network derivation for a family. To thoroughly test this approach, in a future study, we will derive the same type of Abstraction Network for every member of a structural meta-ontology family and evaluate the Abstraction Network according to various metrics. For example, for a family where we can derive domain-defined partial-area taxonomies we will evaluate the number of areas, partial-areas, singleton partial-areas, and various other structural indicators. As discussed earlier, an Abstraction Network may not be practical when there are low values for some parameters (e.g., too few partial-areas in a partial-area taxonomy [11]). We anticipate that, within each family, we will find that the derived Abstraction Networks will have similar properties. For example, in initial results we have found that restriction-defined partial-area taxonomies derived for Family 3–20, (with member ontologies SNOMED CT, GO, and ChEBI, among others) have many overlapping classes [21, 40] relative to restriction-defined partial-area taxonomies for other families. In future studies we will investigate the error rates of overlapping classes in the ontologies of this family (e.g., ChEBI, UBERON).

In future work, we will track the classifications for a set of ontologies over time. For most ontologies, we anticipate little change will occur. However, if the classification for an ontology does change, it indicates a major difference in the ontology’s structure (e.g., being reorganized from a Tree into a DAG). Such a difference may warrant investigation to determine if the change was performed correctly for the whole ontology. We will investigate potential QA implications when such a change occurs.

6. Conclusions

In this paper, we described the derivation of a structural meta-ontology for classifying ontologies based on their structural features. We illustrated the usefulness of this meta-ontology by classifying a sample of 373 BioPortal ontologies into 81 families. In comparison to our previously developed classification approach, the structural meta-ontology was demonstrated to provide a more complete, flexible, and refined classification of BioPortal’s ontologies. Such a classification approach lends itself to a more effective use of the family-based QA framework for BioPortal ontologies.

Highlights.

  • We introduce a structural meta-ontology for classifying ontologies according to their structure.

  • The benefits of a structural meta-ontology for ontology quality assurance are illustrated.

  • A structural meta-ontology is derived to classify the ontologies in the NCBO BioPortal.

  • The structural meta-ontology enables family-based quality assurance of ontologies.

Acknowledgments

Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under Award Number R01 CA190779. The BioPortal resource of the National Center for Biomedical Ontology has been supported by the NIH Common Fund, the National Human Genome Research Institute, and the National Heart, Lung, and Blood Institute through grant U54 HG004028. The content is solely the responsibility of the authors and does not necessarily represent the views of the National Institutes of Health.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

The authors have no conflicts of interest. All authors are aware of the submission of this manuscript.

References

  • 1.Smith B, Ashburner M, Rosse C, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature biotechnology. 2007;25(11):1251–1255. doi: 10.1038/nbt1346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Consortium TGO. Gene Ontology annotations and resources. Nucleic acids research. 2013;(41):D530–D535. doi: 10.1093/nar/gks1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Rubin DL, Shah NH, Noy NF. Biomedical ontologies: a functional perspective. Briefings in Bioinformatics. 2008;9(1):75–90. doi: 10.1093/bib/bbm059. [DOI] [PubMed] [Google Scholar]
  • 4.Giannangelo K, Fenton SH. SNOMED CT survey: an assessment of implementation in EMR/EHR applications. Perspect Health Inf Manag. 2008;5:7. [PMC free article] [PubMed] [Google Scholar]
  • 5.Whetzel PL, Noy NF, Sham NH, et al. BioPortal: Enhanced Functionality via New Web services from the National Center for Biomedical Ontology to Access and Use Ontologies in Software Applications. Nucleic Acids Research (NAR) 2011;39(Web Server issue):W541–W545. doi: 10.1093/nar/gkr469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Halper M, Gu H, Perl Y, et al. Abstraction Networks for Terminologies: Supporting Management of “Big Knowledge”. Artificial intelligence in medicine. 2015;64(1):1–16. doi: 10.1016/j.artmed.2015.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ochs C, Agrawal A, Perl Y, et al. Deriving an Abstraction Network to Support Quality Assurance in OCRe. AMIA Annu Symp Proc. 2012:681–689. [PMC free article] [PubMed] [Google Scholar]
  • 8.Wang Y, Halper M, Min H, et al. Structural methodologies for auditing SNOMED. J Biomed Inform. 2007;40(5):561–581. doi: 10.1016/j.jbi.2006.12.003. [DOI] [PubMed] [Google Scholar]
  • 9.Min H, Perl Y, Chen Y, et al. Auditing as part of the terminology design life cycle. J Am Med Inform Assoc. 2006;13(6):676–690. doi: 10.1197/jamia.M2036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ochs C, Perl Y, Halper M, et al. Gene Ontology Summarization to Support Visualization and Quality Assurance. BICoB. 2015:167–174. [Google Scholar]
  • 11.Ochs C, He Z, Perl Y, et al. Choosing the Granularity of Abstraction Networks for Orientation and Quality Assurance of the Sleep Domain Ontology; Proceedings of the 4th International Conference on Biomedical Ontology; 2013. pp. 84–89. [Google Scholar]
  • 12.He Z, Ochs C, Soldatova L, et al. Auditing Redundant Import in Reuse of a Top Level Ontology for the Drug Discovery Investigations Ontology. VDOS. 2013 [Google Scholar]
  • 13.He Z, Ochs C, Agrawal A, et al. A Family-Based Framework for Supporting Quality Assurance of Biomedical Ontologies in BioPortal. Proc AMIA Annu Symp. 2013:581–590. [PMC free article] [PubMed] [Google Scholar]
  • 14.Fragoso G, de Coronado S, Haber M, et al. Overview and utilization of the NCI thesaurus. Comp Funct Genomics. 2004;5(8):648–654. doi: 10.1002/cfg.445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Stearns MQ, Price C, Spackman KA, et al. SNOMED clinical terms: overview of the development process and project status. Proc AMIA Annu Symp. 2001:662–666. [PMC free article] [PubMed] [Google Scholar]
  • 16.Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sim I, Carini S, Tu S, et al. The human studies database project: federating human studies design data using the ontology of clinical research. AMIA Summits Transl Sci Proc. 2010:51–55. [PMC free article] [PubMed] [Google Scholar]
  • 18.Arabandi S, Ogbuji C, Redline S, et al. Developing a Sleep Domain Ontology. AMIA Clinical Research Informatics Summit. 2010 [Google Scholar]
  • 19.Da Q, King R, Hopkins A, et al. An ontology for description of drug discovery investigations. J Integrative Bioinformatics. 2010;7(3):126–139. doi: 10.2390/biecoll-jib-2010-126. [DOI] [PubMed] [Google Scholar]
  • 20.Zeginis D, Hasnain A, Loutas N, et al. A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources. Semantic Web. 2014;5(2):127–142. [Google Scholar]
  • 21.Ochs C, Perl Y, Geller J, et al. Quality Assurance of the Gene Ontology Using Abstraction Networks. Journal of Bioinformatics and Computational Biology. 2015 doi: 10.1142/S0219720016420014. In press. [DOI] [PubMed] [Google Scholar]
  • 22.Consortium GO. The Gene Ontology (GO) database and informatics resource. Nucleic acids research. 2004;32(Suppl 1):D258–D261. doi: 10.1093/nar/gkh036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wang Y, Halper M, Wei D, et al. Auditing complex concepts of SNOMED using a refined hierarchical abstraction network. J Biomed Inform. 2012;45(1):1–14. doi: 10.1016/j.jbi.2011.08.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Degtyarenko K, Matos PD, Ennis M, et al. ChEBI: a database and ontology for chemical entities of biological interest. Nucleic acids research. 2008;36(1):D344–D350. doi: 10.1093/nar/gkm791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Musen MA, Noy NF, Shah NH, et al. The National Center for Biomedical Ontology. J Am Med Inform Assoc. 2012;19(2):190–195. doi: 10.1136/amiajnl-2011-000523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Motik B, Patel-Schneider PF, Parsia B. OWL 2 Web Ontology Language Structural Specification and Functional Style Syntax. W3C -- World Wide Web Consortium. 2009 [Google Scholar]
  • 27.Falconer SM, Callendar C, Storey M-A. A visualization service for the semantic web. Knowledge Engineering and Management by the Masses. 2010:554–564. [Google Scholar]
  • 28.Mortensen JM, Horridge M, Musen MA, et al. Applications of Ontology Design Patterns in Biomedical Ontologies. Proc AMIA Annu Symp. 2012:643–652. [PMC free article] [PubMed] [Google Scholar]
  • 29.Bail S, Horridge M, Parsia B, et al. The justificatory structure of the NCBO bioportal ontologies. ISWC. 2011;2011:67–82. [Google Scholar]
  • 30.Quesada-Martínez M, Fernández-Breis JT, Stevens R. Extraction and analysis of the structure of labels in biomedical ontologies; Proceedings of the 2nd International Workshop on Managing Interoperability and Complexity in Health Systems; 2012. pp. 7–16. [Google Scholar]
  • 31.Ghazvinian A, Noy NF, Jonquet C, et al. What Four Million Mappings Can Tell You about Two Hundred Ontologies. The Semantic Web - ISWC 2009, Lecture Notes in Computer Science. 2009;5823:229–242. [Google Scholar]
  • 32.Ghazvinian A, Noy NF, Musen MA. How orthogonal are the OBO Foundry ontologies? J Biomed Semantics. 2011;(Suppl 2)(S2) doi: 10.1186/2041-1480-2-S2-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Vescovo CD, Gessler D, Klinov P, et al. Decomposition and Modular Structure of BioPortal Ontologies; International Semantic Web Conference; 2011. pp. 146–161. [Google Scholar]
  • 34.Kamdar MR, Tudorache T, Musen MA. Investigating Term Reuse and Overlap in Biomedical Ontologies. ICBO. 2015;2015:42–46. [PMC free article] [PubMed] [Google Scholar]
  • 35.Ceusters W. Pain assessment terminology in the NCBO BioPortal: evaluation and recommendations. ICBO. 2014;2014:1–6. [Google Scholar]
  • 36.Pathak J, Chute CG. Debugging mappings between biomedical ontologies: Preliminary results from the NCBO bioportal mapping repository. ICBO. 2009;2009:95–98. [Google Scholar]
  • 37.Horridge M, Parsia B, Sattler U. The state of bio-medical ontologies. Bio-Ontologies. 2011;2011 [Google Scholar]
  • 38.Katifori A, Halatsis C, Lepouras G, et al. Ontology visualization methods—a survey. ACM Computing Surveys (CSUR) 2007;39(4):10. [Google Scholar]
  • 39.Ochs C, Geller J, Perl Y, et al. A Tribal Abstraction Network for SNOMED CT Hierarchies without Attribute Relationships. J Am Med Inform Assoc. 2014;22(3):628–639. doi: 10.1136/amiajnl-2014-003173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Wang Y, Halper M, Wei D, et al. Abstraction of complex concepts with a refined partial-area taxonomy of SNOMED. J Biomed Inform. 2012;45(1):15–29. doi: 10.1016/j.jbi.2011.08.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ochs C, Perl Y, Geller J, et al. Summarizing and Visualizing Structural Changes during the Evolution of Biomedical Ontologies Using a Diff Abstraction Network. Journal Of Biomedical Informatics. 2015;56:127–144. doi: 10.1016/j.jbi.2015.05.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Ochs C, Zheng L, Perl Y, et al. Drug-drug Interaction Discovery Using Abstraction Networks for "National Drug File – Reference Terminology" Chemical Ingredients. AMIA Annu Symp Proc. 2015:973–982. [PMC free article] [PubMed] [Google Scholar]
  • 43. [cited 2012 February 23];OWL Web Ontology Language Overview. Available from: http://www.w3.org/TR/owl-features.
  • 44.Horridge M, Drummond N, Goodwin J, et al. The Manchester OWL Syntax. OWLed. 2006:216. [Google Scholar]
  • 45.Pico AR, Kelder T, Iersel MPV, et al. WikiPathways: pathway editing for the people. PLoS Biol. 2008;6(7):e184. doi: 10.1371/journal.pbio.0060184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Geller J, Perl Y, Halper M, et al. Special issue on auditing of terminologies. J Biomed Inform. 2009;42(3):407–411. doi: 10.1016/j.jbi.2009.04.006. [DOI] [PubMed] [Google Scholar]
  • 47.Zhu X, Fan J-W, Baorto DM, et al. A review of auditing methods applied to the content of controlled biomedical terminologies. J Biomed Inform. 2009;42(3):413–425. doi: 10.1016/j.jbi.2009.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Rector AL, Brandt S, Schneider T. Getting the foot out of the pelvis: modeling problems affecting use of SNOMED CT hierarchies in practical applications. J Am Med Inform Assoc. 2011;18(4):432–440. doi: 10.1136/amiajnl-2010-000045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Rector AL, Iannone L. Lexically suggest, logically define: Quality assurance of the use of qualifiers and expected results of post-coordination in SNOMED CT. J Biomed Inform. 2011:199–209. doi: 10.1016/j.jbi.2011.10.002. [DOI] [PubMed] [Google Scholar]
  • 50.Mortensen JM, Minty EP, Januszyk M, et al. Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT. J Am Med Inform Assoc. 2014 doi: 10.1136/amiajnl-2014-002901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Smith B, Köhler J, Kumar A. On the application of formal principles to life science data: a case study in the Gene Ontology. Data Integration in the Life Sciences. 2004:79–94. [Google Scholar]
  • 52.Ceusters W, Smith B, Goldberg L. A terminological and ontological analysis of the NCI Thesaurus. Methods Inf Med. 2005;44(4):498–507. [PubMed] [Google Scholar]
  • 53.Baorto D, Li L, Cimino JJ. Practical experience with the maintenance and auditing of a large medical ontology. Journal of biomedical informatics. 2009;42(3):494–503. doi: 10.1016/j.jbi.2009.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.de Coronado S, Wright LW, Fragoso G, et al. The NCI Thesaurus quality assurance life cycle. J Biomed Inform. 2009;42(3):530–539. doi: 10.1016/j.jbi.2009.01.003. [DOI] [PubMed] [Google Scholar]
  • 55.Gu H, Wei D, Mejino JL, Jr, et al. Relationship Auditing of the FMA Ontology. J Biomed Inform. 2009 doi: 10.1016/j.jbi.2009.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Xiang Z, Mungall C, Ruttenberg A, et al. Ontobee: A Linked Data Server and Browser for Ontology Terms. ICBO. 2011 [Google Scholar]
  • 57.Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic acids research. 2004;32(Database issue):D267–D270. doi: 10.1093/nar/gkh061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Gu H, Perl Y, Geller J, et al. Representing the UMLS as an object-oriented database: modeling issues and advantages. J Am Med Inform Assoc. 2000;7(1):66–80. doi: 10.1136/jamia.2000.0070066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Morrey ZHCP, Perl Y, Elhanan G, et al. Sculpting the UMLS Refined Semantic Network. Online journal of public health informatics. 2014;6(2):e181. doi: 10.5210/ojphi.v6i2.5412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Bodenreider O. Circular hierarchical relationships in the UMLS: etiology, diagnosis, treatment, complications and prevention. Proc AMIA Symp. 2001:57–61. [PMC free article] [PubMed] [Google Scholar]
  • 61.Halper M, Morrey CP, Chen Y, et al. Auditing Hierarchical Cycles to Locate Other Inconsistencies in the UMLS. AMIA Annu Symp Proc. 2011;2011:529–536. [PMC free article] [PubMed] [Google Scholar]
  • 62.Halper M, Wang Y, Min H, et al. Analysis of error concentrations in SNOMED. AMIA Annu Symp Proc. 2007:314–318. [PMC free article] [PubMed] [Google Scholar]
  • 63.Ochs C, Perl Y, Geller J, et al. Scalability of Abstraction-Network-Based Quality Assurance to Large SNOMED Hierarchies. AMIA Annu Symp Proc. 2013:1071–1080. [PMC free article] [PubMed] [Google Scholar]
  • 64.Ochs C, Geller J, Perl Y, et al. Scalable Quality Assurance for Large SNOMED CT Hierarchies Using Subject-based Subtaxonomies. J Am Med Inform Assoc. 2014;22(3):507–518. doi: 10.1136/amiajnl-2014-003151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Brochhausen M, Spear AD, Cocos C, et al. The ACGT Master Ontology and its applications – Towards an ontology-driven cancer research and management system. J Biomed Inform. 2011;44(1):8–25. doi: 10.1016/j.jbi.2010.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Snedecor GW, Cochran WG. Statistical Methods. Iowa State University Press; 1967. p. 503. [Google Scholar]
  • 67.Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika. 1934:404–413. [Google Scholar]
  • 68.Gu H, Elhanan G, Perl Y, et al. A study of terminology auditors' performance for UMLS semantic type assignments. J Biomed Inform. 2012;45(6):1042–1048. doi: 10.1016/j.jbi.2012.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Horridge M, Bechhofer S. The OWL API: A Java API for Working with OWL 2 Ontologies. OWLED. 2009;529:11–21. [Google Scholar]
  • 70. [9 September 2015];W3C. RDF 1.1 Turtle: Terse RDF Triple Language. 2014 Available from: http://www.w3.org/TR/turtle/
  • 71.Noy NF, Crubézy M, Fergerson RW, et al. Protege-2000: an open-source ontology-development and knowledge-acquisition environment. AMIA Annu Symp Proc. 2003:953. [PMC free article] [PubMed] [Google Scholar]
  • 72.Shearer R, Motik B, Horrocks I. HermiT: a highly-efficient OWL reasoner; Proc 5th International Workshop on OWL: Experiences and Directions (OWLED); 2008. [Google Scholar]
  • 73.Köhler S, Doelken SC, Mungall CJ, et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic acids research. 2013:gkt1026. doi: 10.1093/nar/gkt1026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Tirmizi SH, Aitken S, Moreira DA, et al. Mapping between the OBO and OWL ontology languages. Journal of biomedical semantics. 2011;2(Suppl 1):S3. doi: 10.1186/2041-1480-2-S1-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Hastings J, Matos Pd, Dekker A, et al. The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic acids research. 2013;41(D1):D456–D463. doi: 10.1093/nar/gks1146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Vasant D, Neff F, Gormanns P, et al. DIAB: An Ontology of Type 2 Diabetes Stages and Associated Phenotypes. Phenotype Day at ISMB 2015. 2015:24–27. [Google Scholar]
  • 77.Przydzial MJ, Bhhatarai B, Koleti A, et al. GPCR ontology: development and application of a G protein-coupled receptor pharmacology knowledge framework. Bioinformatics. 2013:btt565. doi: 10.1093/bioinformatics/btt565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. [4 September 2015];Population and Community Ontology (PCO) Available from: https://github.com/PopulationAndCommunityOntology/pco.
  • 79.Grenon P, Smith B, Goldberg L. Biodynamic Ontology: Applying BFO in the Biomedical Domain. In: Pisanelli DM, editor. Ontologies in Medicine. IOS Press; 2004. pp. 20–38. [PubMed] [Google Scholar]

RESOURCES