Abstract
One of the basic challenges in developing structural methods for systematic audition on the quality of biomedical ontologies is the computational cost usually involved in exhaustive sub-graph analysis. We introduce ANT-LCA, a new algorithm for computing all non-trivial lowest common ancestors (LCA) of each pair of concepts in the hierarchical graph induced by an ontology. The computation of LCA is a fundamental step for non-lattice approach for ontology quality assurance. Distinct from existing approaches, ANT-LCA only computes LCAs for non-trivial pairs, those having at least one common ancestor. To skip all trivial pairs that may be of no practical interest, ANT-LCA employs a simple but innovative algorithmic strategy combining topological order and dynamic programming to keep track of non-trivial pairs. We provide correctness proofs and demonstrate a substantial reduction in computational time for two largest biomedical ontologies: SNOMED CT and Gene Ontology (GO). ANT-LCA achieved an average computation time of 30 and 3 seconds per version for SNOMED CT and GO, respectively, about 2 orders of magnitude faster than the best known approaches. Our algorithm overcomes a fundamental computational barrier in subgraph based structural analysis of large ontological systems. It enables the implementation of a new breed of structural auditing methods that not only identifies potential problematic areas, but also automatically suggests changes to fix the issues. Such structural auditing methods can lead to more effective tools supporting ontology quality assurance work.
Keywords: Biomedical ontology, partial order, graph-theoretic algorithm, SNOMED CT, lattice vs non-lattice, quality assurance
Graphical Abstract

1. Introduction
In graph-theoretic representation of ontologies in biomedicine such as SNOMED CT [1], ontological concepts correspond to graph nodes, and is-a relations correspond to edges of the graph. When rendering the is-a relations as a graph, the Hasse diagram convention orients more general concepts above (or higher than) more specific concepts.
One of the desirable properties of the resulting graph structure is that the subsumption relationship (is-a hierarchy) should form a lattice ([2]). There are in general two types of lattice-based approaches to ontology quality assurance. One involves the direct application of Formal Concept Analysis (FCA [3]), mostly for auditing semantic completeness or missing concepts [4]. The second involves the extraction of lattice-violating fragments [5, 6], or non-lattice fragments, which represent violations of the FCA principle that systematic engineering approaches for constructing concept hierarchies always result in order structures that are lattices in the sense of lattice theory [3]. This non-lattice approach for ontology quality assurance involves the extraction of graph substructures (i.e. sub-orders) that violate the lattice property, which states that any two concept nodes have at most one minimal shared (common) ancestor and at most one maximal shared descendant.
As illustrated recently in [7], the use of the non-lattice approach for improving the quality of an ontology consists of the following general steps:
- Identify node-pairs that violate the lattice property (i.e. non-lattice pairs) and extract the associated non-lattice fragments; 
- Detect ontological defects such as miss-aligned is-a relations or missing concepts in the extracted non-lattice fragments, often leveraging additional or external information; 
- Formulate and generate change suggestions automatically and present the suggestions in a usable format; 
- Perform reviews of the suggested changes and accept or reject such suggestions by a qualified ontology engineer or ontology editor, and incorporate the accepted changes into the next release. 
The non-lattice approach is unique in that while most ontology quality assurance techniques [8] merely identify potential errors, this approach can not only identify previously undiscovered errors confirmed by domain experts, but also suggest appropriate remediation (i.e., “auto-suggestion”) [7, 9]. For example, Figure 1 (top), extracted from the September 2017 release of SNOMED CT (US edition), contains a substructure (1A) of is-a relations on the left, involving 5 concepts. This is a non-lattice fragment, because the concept nodes labeled 1 and 2 have two maximal shared descendants: concept nodes labeled 4 and 5. With a combination of structural and lexical information represented in this fragment, one can infer that “Epithelioid hemangioendothelioma of lung” is-a “Malignant tumor of lung parenchyma.” Remarkably, adding such a missing edge (in red color) also makes the resulting subgraph (1B) conforming to the lattice property: concept nodes labeled 1 and 2 now have a unique maximal shared descendant: concept nodes labeled 4 (since concept 5 is no longer “maximal”). Similarly, the lower part of Figure 1 shows a non-lattice fragment (2A) in the Gene Ontology (GO) on the left, and the corrected structure (2B) on the right.
Figure 1.
An example (1A) of non-lattice fragment of size 5 in SNOMED CT, as well as the resulting lattice subgraph (2B) after a missing IS-A relation is added (red link). Similarly, (2A) is a non-lattice fragment of size 5 in GO and (2B) is the correction.
Both the FCA- and the non-lattice-based approaches incur computational costs that sometimes make exhaustive analyses prohibitive. For example, in Jiang and Chute’s work [4], only 10% of SNOMED CT sub-hierarchies were sampled in order to assess semantic completeness. Three months of sequential computation ([5]) or three hours of 25-node parallel processing ([6]) were required to detect non-lattice pairs for each version of SNOMED CT. The detection of non-lattice pairs is a fundamental step for non-lattice-based approach for ontology quality assurance. The non-lattice pairs serve as seeds for systematic generation of non-lattice fragments, but including all nodes in-between the seed nodes and the maximal shared descendants. Therefore, more efficient algorithms for detection of non-lattice pairs is highly desirable.
This paper introduces ANT-LCA, a new algorithm for computing all non-trivial lowest common ancestors (LCA) of each pair of concepts in the graph induced by an ontological system. Here the lowest common ancestors in the context of a graph are exactly the maximal shared descendants in the context of an ontology. In the remainder of the paper, we discuss algorithms in graph-theoretic and order-theoretic terms. But whenever working with specific ontological examples, we switch back to maximal shared descendants. Distinct from existing approaches, ANT-LCA only computes LCAs for non-trivial pairs, those having at least one common ancestor. To skip all trivial pairs that may be of no practical interest, ANTLCA employs a simple but innovative algorithmic strategy combining topological order and dynamic programming [10] to keep track of non-trivial pairs.
We provide correctness proofs and demonstrate about 2-orders of magnitude reduction, compared with the best parallel algorithms known to date, in computational time for two of the largest biomedical ontologies: SNOMED CT and Gene Ontology (GO). ANT-LCA achieved an average computation time of 30 and 3 seconds per version for SNOMED CT and GO, respectively, confirming our complexity analysis with a time-bound involving pairability-degree (i.e. the constant in big-O analysis of time-complexity) as a quadratic factor. ANT-LCA overcomes a fundamental computational barrier in subgraph analysis of ontological structures. It enables the implementation of a new breed of structural auditing methods that can not only identifies potential problematic areas, but also automatically suggests specific changes that are needed to fix the quality issues.
2. Background
2.1. LCA on directed acyclic graphs
In a directed acyclic graph (DAG), a common ancestor (CA) of a pair of nodes u, υ is a node w that is a shared ancestor of u, υ. A lowest CA is a node w such that no other shared ancestor is closer (nearer) to u, υ than w. A pair of nodes u, υ is trivial if they do not have a shared ancestor, or one of them is the ancestor of the other. Conversely, non-trivial pairs are those having at least one lowest common ancestor other than the nodes already in the pair. Given a subset of nodes X in a DAG, we denote the set of lowest common ancestors of X as lca(X), and common ancestors of X as ca(X), respectively. When X is a two-element set {a, b} with two or more lowest common ancestors, it is called a non-lattice pair.
A pair of nodes (x, y) is called pairable if lca{x, y} ≠ ∅, lca{x, y} ≠ {x}, as well as lca{x, y} ≠ {y}. Intuitively, x, y is pairable if they share at least one non-trivial common ancestor. In this case we also say that x is pairable with y, and (x, y) a non-trivial pair. We use notation x ↓ y to indicate that x is pairable with y. A trivial pair is a pair (x, y) that is not pairable. In fact, (x, y) is trivial if and only if lca{x, y} ⊆ {x, y}, i.e., lca{x, y} = ∅, lca{x, y} = {x}, or lca{x, y} = {y}. We write π(u) = {υ | u ↓ υ} for the set of all nodes υ that are pairable with u.
2.2. Non-lattice approach
The non-lattice approach [5] provides a mathematically grounded, error-agnostic method for exhaustive structural auditing of large and complex biomedical ontologies such as SNOMED CT. This approach focuses on the graph structure induced by the subsumption relation (is-a) in an ontology. It extracts non-lattice pairs, those with two or more lowest common ancestors, violating the lattice property.
Any non-lattice pair generates an induced non-lattice fragment, consisting of concepts in-between any lowest common ancestor and any member of the non-lattice pair, as well as all the relations between these concepts. Such induced non-lattice fragments represent important areas of focus for ontological auditing ([5, 7]), because they are inconsistent with the ontology design principle that the subsumption relationship (is-a hierarchy) should form a lattice ([2, 5]). Non-lattice fragments are also in conflict with the Fundamental Theorem of Formal Concept Analysis ([3]), which states that concept hierarchies derived from the duality of intension and extension always have their order structure being a (complete) lattice.
In fact, non-lattice fragments are often indicative of missing hierarchical relations or concepts. As a demonstration of the practical utility of the non-lattice-based approach, Cui et al. [7] identified four lexical patterns among non-lattice subgraphs in SNOMED CT. Each lexical pattern is associated with a potential specific type of error. Applying the structural-lexical method to SNOMED CT (September 2015 U.S. edition), 6,801 non-lattice subgraphs matched these lexical patterns, of which 2,046 were amenable to visual inspection. Evaluation of a random sample of 100 small subgraphs resulted in 59 confirmed errors by domain experts. Abeysinghe et al. [9] further applied the four patterns to audit National Cancer Institute (NCI) Thesaurus (version 16.12d) and introduced two new lexical patterns to uncover potential errors and suggest remediations. A total of 8,143 non-lattice subgraphs were identified in NCI Thesaurus, among which 809 matched the six lexical patterns. Domain experts evaluated a random sample of 50 small subgraphs and verified that 33 of them contained errors and made correct suggestions. Such hybrid structural-lexical methods are innovative and proved effective not only in detecting errors, but also in suggesting remediation for these errors.
2.3. The computational challenge
Exhaustive generation of non-lattice fragments for large ontological graphs such as SNOMED CT, with over 300,000 concepts and 450,000 is-a relations, is computationally expensive if not prohibitive, using an exhaustive sequential approach. For example, in [5], 34 million pairs of SNOMED CT concepts were examined and 518,000 non-lattice pairs were identified using SPARQL queries over an RDF representation of the ontology. The time involved for such an exhaustive approach, 3 months using standard desktop machines, is inadequate for quality assurance applications.
In more recent work, a general MapReduce pipeline called MaPLE for Lattice-based Evaluation [6] has been introduced for detecting non-lattice pairs. Using a Cloudera Hadoop cluster, MaPLE detected all non-lattice pairs in SNOMED CT, with an average total compute time of about 3 hours per version.
Our ANT-LCA algorithm provides a dramatic further reduction in computational time using sequential computation by a strategy that skips trivial pairs altogether, without even checking them.
3. Methods
3.1. The ANT-LCA algorithm
We present ANT-LCA in three components: initialization, pairability computation, and finding of shared ancestors imbedded into pairability computation. We treat pairability computation separately to highlight ANT-LCA’s core algorithmic insight without dealing with irrelevant overhead.
3.1.1. Initialization
The initialization phase for ANT-LCA takes a DAG (V, E) as input and uses a modified version of topological sort [10] to obtain a topological order (index) for each node in V. This step takes linear time in |V|.
After initialization, we have two order relations on V : ⊑ and ≤. Here ⊑ (and the strict version ⊏) stands for the partial order determined by the input DAG (V, E) [10] (i.e., υ1 ⊏ υ2 means there is an edge from υ1 to υ2). ≤ represents the usual arithmetic order on the topological index. By the property of topological sort, we have u ⊏ υ implies u < υ for any u, υ ∈ V.
3.1.2. Computing Pairability
The core algorithmic idea of ANT-LCA is captured by the computation of the pairable function pi(u), intended to compute the function π(u), where pi(u) is the set of nodes pairable with u computed up to step i, and π(u) is the set of all nodes pairable with u. Algorithm 1 initializes pi(u) by fixing proper values for p0(u) for each u ∈ V. Algorithm 2 updates pi(u) as i gets incremented, in order to capture all nodes pairable with u at the completion of the algorithm.
Algorithm 1.
Initialization phase for generating pairable sets. Here (V, E) is the input graph with V the set of nodes, and E the set of edges. p0(u) is the set of nodes pairable with u computed up to step 0 – the initialization step.
In Algorithm 1, i.to consists of all t such that (i, t) ∈ E. For each i, Algorithm 1 updates each u such that (i, u) ∈ E by appending distinct members in i.to, such as υ (Figure 2, left) that are not comparable with u, into p0(u). Strictly speaking, for Algorithm 1 to be correct for arbitrary graphs, line 3 should be modified as p0(u) ≔ i.to − {x | u ⊑ x}. This is, however, not necessary if nodes in i.to are not comparable with each other, as is the case when the input graph has no “redundant” edges (when the is-a relation in an ontology is minimally represented without edges that are derivable from transitive closure).
Figure 2.
Left: the iterative pattern for each edge (i, u) ∈ E. Right: initializing the pairable function for a graph consisting of topologically ordered nodes 1 to 9.
Figure 2 (right) contains the Hasse diagram of an example DAG in topological order. The initialization results for p0(u) obtained by Algorithm 1 are displayed beside each node.
Algorithm 2 updates each i’s upper neighbor u (lines 1 and 2) by adding those nodes that are pairable with i but not comparable with u (line 3). All nodes υ that are pairable with i also gets updated by adjoining the upper neighbors of i to its set of pairable nodes (line 5). For abbreviation, the notation of relative set union (∪x) is used in Algorithm 2. For subsets A, B of V and x ∈ V, we write A∪x B for A ∪ B while making sure that nodes in the resulting set are not comparable to x, i.e.,
In practice, one can take advantage of fast computation of transitive closure [11] and efficient disjoint union [12] for computing A ∪x B.
Algorithm 2.
Main steps for generating pairable sets. Here (V, E) is the input graph with V the set of nodes, and E the set of edges. pi(u) is the set of nodes pairable with u computed up to step i (> 0).
3.1.3. Computing common ancestors of non-trivial pairs
Algorithm 3 combines the computation of pairable nodes with the computation of (a subset of) their common ancestors qi(u, υ), which contains their lowest common ancestors. The main ingredients of Algorithm 3 is the addition of steps in lines 5, 13 and 14 which iteratively update common ancestors for pairable nodes (see section 3.1.4 for the intermediate results of step-by-step run of Algorithm 3 on the example in Figure 2). Note that Algorithm 3 does not guarantee that all common ancestors of u, υ will eventually be included in qi(u, υ), but it does include all lowest common ancestors of u, υ (see Theorem 4 in section 3.2). Therefore, an additional straightforward step is needed to extract the lowest elements in qi(u, υ) to obtain lca{u, υ}.
Algorithm 3.
Main steps for generating common ancestors for all and only pairable nodes. Here (V, E) is the input graph with V the set of nodes, and E the set of edges. qi(u, υ) is the set of common ancestors for nodes u and υ computed up to step i. Note that when i = 0, q0(u, υ)represents the initialization result (lines 1–8), computed before the main phase (lines 9–18).
3.1.4. Illustrative example
Although using only a small number of steps, the recursive nature involved in Algorithm 3 as well as the intricate behavior can be better demonstrated through an example. The following figures illustrate a step-by-step run of Algorithm 3 on the example in Figure 2. Edges being iterated and incremental value changes are highlighted in blue.
Updating up to node 3 and edge (3, 6) gives the result illustrated in Figure 3. Note that nothing gets updated when i = 1, 2. When i = 3, u = 6, υ = 4, since nodes 6 and 4 are not pairable, nothing gets updated. When i = 3, u = 6, υ = 5, since nodes 6 and 5 are pariable, we have p3(6) = {5, 7}, p3(5) = {3, 4, 6}, q3(6, 5) = q3(5, 6) = {1}.
Figure 3.
Updating up to node 3 and edge (3, 6).
As shown in Figure 4, for i = 3, u = 7, υ = 4, since nodes 7 and 4 are not pairable, no updates took place. For i = 3, u = 7, υ = 5, since p3(7) = {5, 6} and p3(5) = {3, 4, 6, 7}, we have q3(7, 5) = q3(5, 7) = {1}.
Figure 4.
Step for node 3 and edge (3, 7).
Figure 5 captures the snapshot for i = 4 and u = 6: when υ = 3, we have nodes 6 and 3 are not pairable and no update is needed; when υ = 5, we have q4(6, 5) = q4(5, 6) = {1, 2}.
Figure 5.
Step for node 4 and edge (4, 6).
Figure 6 shows the step for i = 4 and u = 7: when υ = 3, we have nodes 7 and 3 are not pairable; when υ = 5, the updated result is q4(5, 7) = q4(7, 5) = {1, 2}.
Figure 6.
Step for node 4 and edge (4, 7).
Figure 7 captures the following configurations. i = 5, u = 8, υ = 3: p5(3) = {4, 5, 8}, p5(8) = {3}, q5(3, 8) = q5(8, 3) = {1}; i = 5, u = 8, υ = 4: p5(4) = {3, 5, 8}, p5(8) = {3, 4}, q5(8, 4) = q5(4, 8) = {1, 2}; i = 5, u = 8, υ = 6: p5(8) = {3, 4, 6}, p5(6) = {5, 7, 8}, q5(8, 6) = q5(6, 8) = {1, 2}; i = 5, u = 8, υ = 7: p5(8) = {3, 4, 6, 7}, p5(7) = {5, 6, 8}, q5(8, 7) = q5(7, 8) = {1, 2}.
Figure 7.
Step for node 5 and edge (5, 8).
Finally, Figure 8 shows that for i = 6, 7, 8, nothing gets updated since node 9 is not pairable to any other node.
Figure 8.
For node i = 6, 7, 8, nothing gets updated since node 9 is not pairable to any other node.
3.2. Correctness of the algorithm
We establish the correctness of Algorithm 2 and Algorithm 3 in a sequence of lemmas and theorems. Proof details are given in Appendix A for those who are interested.
With respect to a topologically sorted input graph (V, E), we distinguish the set π(u) of all nodes pairable with u, and pi(u), the dynamic store of nodes pairable with u at a stage i of the algorithm. In the remainder of the paper we refer to nodes in V solely by their topological indices, integers that can also be incremented for algorithmic iteration in a while-loop.
According to Algorithm 2, pi(u) has the following straightforward properties:
- Monotonicity: for all w ∈ V, for all i ≤ j ∈ V, we have pi(w) ⊆ pj(w); 
- Symmetry: for all u, υ ∈ V, for all i ∈ V, u ∈ pi(υ) implies υ ∈ pi(u); 
- Diagonality: for all υ ∈ V, pυ(υ) = pυ−1(υ). 
Since Algorithm 2 initializes and grows pi(u) with only nodes pairable with u, we have
Theorem 1
For all u ∈ V, for all i ∈ V,
For proving containment in the other direction the next three lemmas serve as building blocks. Notationally, we use [x, y] to stand for the closed integer interval {i | x ≤ i ≤ y}.
Lemma 1
Suppose b ∈ lca(u, υ) and (b, u) ∈ E. For i ∈ [0, n], let (υi, υi+1) ∈ E be edges such that b = υ0 and υn = υ. Then u ∈ pυ(i−1)(υi) for all i ∈ [1, n].
Lemma 2
Let (υi, υi+1) ∈ E be edges in (V, E) for i ∈ [0, n], with b = υ0 and υn = υ. Suppose lca(x, υ) = b and x ∈ π(υi) for i ∈ [0, n]. If x ∈ pυk(υk+1) for some k, then x ∈ pυj (υj+1) for all j ∈ [k, n].
Lemma 3
For all 0 < i < n, we have pυi(υi) ⊆ pυi(υi+1), and moreover pυi(υi) ⊆ pυ(i+1)(υi+1), by monotonicity.
Lemmas 1, 2, and 3 show how pairability information is propagated along a path. Next we deal with the general situation of how this information is propagated to a pair of (pairable) nodes starting from the initial setting. To do so, consider a subgraph D = A ∪ B of (V, E), with A = {ui | i ∈ [1, m]} and B = {υj | j ∈ [1, n]} such that (ui−1, ui) and (υj−1, υj) are distinct edges with i ∈ [1, m] and j ∈ [1, n], where (see Figure 9)
- u0 = υ0, um = u, and υn = υ, 
- u ∈ π(υ) and u0 ∈ lca(u, υ), and 
- A ∩ B = ∅. 
Figure 9.
Subgraph with A = {ui | i ∈ [1,m]} and B = {υj | j ∈ [1,n]}.
Consider W = {wi | i ∈ [1, m + n]} = A∪B, with topological indices appearing in A∪B sorted in ascending order.
Definition 1
The i-th alternation index for W is the index αi, such that either wαi ∈ A but wαi+1 ∈ B, or wαi ∈ B but wαi+1 ∈ A.
The next lemma, whose proof appears in Appendix A, characterizes how pairability information “jumps” from one branch (say A) to the other (say B) at critical junctures of an alternation index.
Lemma 4
For any alternation index αi, we have: 1. if υt = wαi and us = wαi+1 then us ∈ pυt(υt); 2. if us = wαi and υt = wαi+1 then υt ∈ pus(us).
The following Theorem 2, whose proof appears in Appendix A, deals with the opposite direction of Theorem 1. It allows us to conclude that for each u ∈ V, if υ ∈ π(u) then there exists i ∈ V, such that υ ∈ pi(u) by choosing a large enough i. With it, all nodes pairable with u are accounted for by the function pi(u).
Theorem 2
For all i ∈ V and for all w ≤ i, we have
where [1, i] stands for the integer interval {j | 1 ≤ j ≤ i}.
Similar to pi(u), the binary function qi(u, υ) has the following properties, as can be directly derived from Algorithm 3:
- Monotonicity: for all u, υ ∈ V, for all i ≤ j ∈ V, we have qi(u, υ) ⊆ qj(u, υ); 
- Symmetry: for all u, υ ∈ V, for all i ∈ V, we have qi(u, υ) = qi(υ, u); 
- Diagonality: For all u, υ ∈ V, we have qu(u, υ) = qu−1(u, υ). 
By inspecting steps involved in Algorithm 3, we can establish this fact:
Theorem 3
For each i ∈ V and for each u ∈ π(υ), we have qi(u, υ) ⊆ ca{u, υ}.
The next lemma shows how alternation indices help propagate the common ancestor information to all relevant pairs in the graph.
Lemma 5
Suppose x ∈ lca{u, υ}, u ∈ π(υ), and suppose that (see Figure 9) (ui−1, ui) and (υj−1, υj) are distinct edges with i ∈ [1, m] and j ∈ [1, n], where x = u0 = υ0, um = u, and υn = υ. For any alternation index αi as given in Definition 1, we have 1. if wαi = υt and wαi+1 = us, then x ∈ qυt(υt, us); 2. if wαi = ut and wαi+1 = υs, then x ∈ qut(ut, υs).
Lemma 5 leads to the following theorem, which affirms the correctness of Algorithm 3.
Theorem 4
Suppose x ∈ lca{u, υ} with u ∈ π(υ). Then either x ∈ qu(u, υ) or x ∈ qυ(u, υ).
Theorem 4 shows that Algorithm 3 finds all lowest common ancestors of u, υ in qi(u, υ), for some i. It does not, however, guarantee that all common ancestors of u, υ will eventually be included in qi(u, υ). Neither does Algorithm 3 ensure that all elements in qi(u, υ) are LCAs of u and υ. Therefore, an additional straightforward step is needed to extract the lowest elements in qi(u, υ) after the termination of Algorithm 3, to obtain lca{u, υ}.
4. Results
ANT-LCA was implemented in Java based on JDK7. Experiments on SNOMED CT and GO were performed on a MacBook Pro running Mac OS X Yosemite, with 16GB RAM and Intel Core i7 processor. The Java code is available through GitHub (https://github.com/licongcui/nonlattice).
4.1. SNOMED CT
We used 9 versions of SNOMED CT (International Version) from 2012 to 2017, dated 07/2012 (i.e., July 2012), 01/2013, 07/2013, 01/2014, 07/2014, 01/2015, 07/2015, 01/2016, and 01/2017. Table 1 summarizes the basic results about each version of SNOMED CT, including number of concepts, number of is-a relations, number of concept pairs that are pariable after the initialization step in Algorithm 1, number of all pairable pairs, number of non-lattice pairs, and the compute time for non-lattice pairs and non-lattice fragments.
Table 1.
Summary of the basic statistics using ANT-LCA to process 9 versions of SNOMED CT. Initial Number of Pairable Pairs indicates the number of concept pairs that are pariable after the initialization step in Algorithm 1.
| 07/2012 | 01/2013 | 07/2013 | 01/2014 | 07/2014 | 01/2015 | 07/2015 | 01/2016 | 01/2017 | |
|---|---|---|---|---|---|---|---|---|---|
| Total Number of Concepts | 296,433 | 297,998 | 298,818 | 298,581 | 300,751 | 312,998 | 317,057 | 319,446 | 326,734 | 
| Total Number of is-a Relations | 440,049 | 442,711 | 444,919 | 443,944 | 446,462 | 463,339 | 470,040 | 473,121 | 487,686 | 
| Initial Number of Pairable Pairs | 150,639 | 151,996 | 153,892 | 153,645 | 153,934 | 158,488 | 161,346 | 162,689 | 171,966 | 
| Total Number of Pairable Pairs | 1,383,888 | 1,397,332 | 1,420,284 | 1,425,848 | 1,428,870 | 1,475,826 | 1,502,108 | 1,523,325 | 1,641,853 | 
| Total Number of Non-lattice Pairs | 578,237 | 583,433 | 593,498 | 594,076 | 594,106 | 614,018 | 625,484 | 633,307 | 683,744 | 
| Compute Time for Non-lattice Pairs (in seconds) | 28 | 28 | 29 | 29 | 27 | 29 | 30 | 28 | 32 | 
| Compute Time for Non-lattice Fragments (in seconds) | 524 | 527 | 548 | 524 | 502 | 512 | 541 | 554 | 747 | 
The 07/2012 version contained 296,433 concepts, with 440,049 direct is-a relations connecting concepts. Among all possible concept pairs, 150,639 were identified as pairable after the initialization step in Algorithm 1, a total of 1,383,888 were detected as pairable, among which 578,237 were found to be non-lattice pairs. It took 28 seconds to compute non-lattice pairs and 524 seconds to compute non-lattice fragments.
In general, it takes about 30 seconds for our algorithm to detect all non-lattice pairs for each version of SNOMED CT, consistent with our linear time analysis. We run each version 10 times and report the average time in row “Compute Time for Non-lattice Pairs” in Table 1.
The generation of all non-lattice fragments took less than 13 minutes for each version of SNOMED CT. This phase is more time-consuming than detection of non-lattice pairs because all nodes in-between a node in the non-lattice pair and the lowest common ancestors make up a fragment. For this part, we run each version 5 times and report the average time in row “Compute Time for Non-lattice Fragments” in Table 1.
4.2. Gene Ontology
We used 8 versions of GO from July 2015 to Febuary 2016. Table 2 shows the basic results about each version of GO. The 02/2016 version contained a total of 44,222 concepts, with 72,742 direct is-a relations connecting concepts. Among all possible concept pairs, 3,642 were identified as pairable after the initialization step in Algorithm 1, a total of 328,760 were detected as pairable, among which 102,948 were found to be non-lattice pairs. It took 3 seconds to compute non-lattice pairs and 32 seconds to compute non-lattice fragments.
Table 2.
Summary of the basic statistics using ANT-LCA to process 8 versions of GO.
| 07/2015 | 08/2015 | 09/2015 | 10/2015 | 11/2015 | 12/2015 | 01/2016 | 02/2016 | |
|---|---|---|---|---|---|---|---|---|
| Total Number of Concepts | 43,330 | 43,507 | 43,654 | 43,758 | 43,880 | 43,980 | 44,049 | 44,222 | 
| Total Number of is-a Relations | 70,826 | 71,167 | 71,443 | 71,700 | 71,926 | 72,153 | 72,268 | 72,742 | 
| Initial Number of Pairable Pairs | 3,502 | 3,537 | 3,547 | 3,564 | 3,574 | 3,573 | 3,575 | 3,642 | 
| Total Number of Pairable Pairs | 305,270 | 308,314 | 309,684 | 311,490 | 312,667 | 314,340 | 314,448 | 328,760 | 
| Total Number of Non-lattice Pairs | 92,322 | 93,828 | 94,275 | 94,821 | 94,912 | 95,458 | 95,506 | 102,948 | 
| Compute Time for Non-lattice Pairs (in seconds) | 2 | 3 | 3 | 3 | 3 | 2 | 2 | 3 | 
| Compute Time for Non-lattice Fragments (in seconds) | 31 | 31 | 32 | 32 | 33 | 30 | 30 | 32 | 
4.3. Experiments on random graphs
We also evaluated the performance of ANT-LCA on randomly generated, ontologically shaped DAGs. We implemented an algorithm (see Appendix B) to generate a random DAG(N, d, Cmin, Cmax), where N is the number of nodes, d is edge density of the DAG, and Cmin/Cmax are the minimum/maximum number of children a node can have. The edge density is defined as the ratio of the extra edges (that will be added after a random tree is generated) to the number of edges in the tree.
Densities in real world ontologies tend to be smaller than 1, even though it can be as high as . In our experiments, the edge density parameter was set between 0.02 and 1. This is a reasonable range to consider, since GO (02/2016 version) has d = 0.64 and SNOMED CT (01/2016 version) has d = 0.48 as their edge density, respectively.
Figure 10 is a plot of the average running time of ANT-LCA on randomly generated ontological structures of different sizes (number of nodes ranging from 100,000 to 1,200,000) and densities (0.02 to 1). The experimental results are consistent with our algorithmic analysis (see section 5): they show linear increase in time complexity across the density spectrum, with the slope (linear coefficient) getting larger for denser graphs.
Figure 10.
A plot of size vs. computational time in milliseconds. Different colors represent graphs of different density, with higher density requiring more computational time.
Figure 11 is a 3D view which illustrates the trend of time increase with respect to graph size and density. The experiments were performed on a linux running the CentOS with 16GB RAM and Intel(R) Xeon(R) X3430 2.40GHz quad core CPU.
Figure 11.
A 3D rendering showing the effect of size and density on required computational time.
5. Discussion
5.1. Time Complexity for Algorithm 2
Let σG be the pairability degree of graph G, defined as maxu∈Vπ(u), i.e. the maximum number of pairable nodes a single node can have in graph G. Algorithm 2 involves a main iteration process over all edges (i, u) ∈ E of the input graph, as given in lines 1 and 2. Then the time complexity for line 3 is (using set union complexity)
which is bounded by (with the assumption that the union cost is proportional to the size of the resulting set [12])
Hence,
Similarly, the time complexity for lines 4 and 5 is
We have
Therefore, the overall time-complexity of Algorithm 2 is bounded by . Space complexity is similarly bounded, but less of a concern here due to the availability of sufficiently large, standard sizes of RAMs. For sparse graphs with small σG, Algorithm 2 performs well, as our experimental result in the next section shows. In the worst case σG = |V|, and the running time in the worst case is O(|V|2 · |E|) and is the same as brute force search. In the best case σG is a constant, and the running time in the base case is O(|E|). The actual time needed for the algorithm, Σ(i,u)∈E Συ∈pi−1(i) |pi(υ)|, is very close to the best case for the data set in our experiments. Even though σG may be in the thousands, the average size of π(υ), a more realistic estimation for the actual computational time, is below 50.
Intuitively, the more tree-like the input ontology is, the closer to the best case time-complexity of O(|E|) our algorithm will achieve. The worst cases are when every pair of nodes is pairable, achievable when the ontology is dense with shared descendant concepts among its concept nodes.
5.2. Time Complexity for Algorithm 3
Note that we intentionally nested the for-loop in lines 4–6 of Algorithm 2, to faithfully account for the time-complexity for Algorithm 3. For Algorithm 3, the double nesting is necessary in order to compute pairable pairs while accumulating common ancestors (between u and υ). If we are interested only in computing pairability, then the nesting in lines 4–6 of Algorithm 2 is not necessary, and we obtain a better time-complexity of σG · (|E| + |V|).
The key steps involved in Algorithm 3 can be captured by Algorithm 2 except for the accumulation of common ancestors in steps 13 and 14. We assume the computation required for these two steps to be a constant by keeping up to two LCAs, in order to provide a fair comparison with existing algorithms (which only output a representative LCA for each pair). Therefore, the time-complexity of Algorithm 2 is also bounded by . Therefore, the best case and worst case analyses for Algorithm 2 apply to Algorithm 3 as well.
5.3. Related work on LCA
Many attempts have been made on improving the efficiency of algorithms for the all-pairs all-LCA problem [13, 14], i.e., finding all LCAs associated with each pair of nodes. More recently, Dash et al. [15] presented an approach that combines the efficiency of existing LCA algorithms on trees with range-interval labeling scheme and an efficient matrix multiplication. This approach achieves near-linear time for tree-like, rooted DAGs, but query results are limited to a single representative LCA per each pair of nodes. This is a limit for applications that require all-LCAs as query results. In general, the all-pairs all-LCA problem remains to be super-quadratic, since its time-complexity is inherently tied to algorithms for matrix-multiplication [16, 14]. For many DAGs arising in real-world applications such SNOMED CT (with over 300,000 of nodes), existing algorithms become impractical.
In general approaches to the LCA problem, one distinguishes the off-line and online computations. Off-line computation serves to preprocess the input graph in order to speedup online LCA queries. Our paper focuses on off-line processing in order to support constant online query for a representative LCA, or online query for all LCAs (with performance parameterized in the size of the resulting set).
A key distinction of ANT-LCA from existing approaches is that it ensures computation is performed on all and only non-trivial pairs. In fact, the time complexity of ANT-LCA is determined by the number of non-trivial pairs in the input graph, as our complexity analysis shows. Using the average size of pairable pairs for a give node, which is a more realistic reflection of the actual computational time, the time complexity for our experimental cases is approximately (50)2 · |E|.
Another distinction of our approach is that we compute all LCAs (of all non-trivial pairs) instead of a representative LCA. This makes our task more computationally intensive, and also makes many existing approaches to the LCA problem inapplicable. Our all LCA requirement is motivated by real-world application needs for implementing lattice-based approach to ontology quality assurance. Compared with the fastest all pairs representative LCA algorithm known to date with an O(|V| · |E|) time complexity [15], ANT-LCA provides a rough speed-up of three orders of magnitude for SNOMED CT. However, the worst time-complexity for our algorithm, |V|2 · |E|, is attained when virtually all nodes are pairable with all other nodes.
5.4. Related work using non-lattice subgraphs
This paper focused on an efficient algorithm to compute non-lattice pairs as a key part of step 1 in a 4-step non-lattice approach outlined in Introduction. More recent work has addressed other steps and reported specific application for improvements on SNOMED CT and NCI Thesaurus. In [7], a structural-lexical method was used to mine lexical patterns in non-lattice fragments in SNOMED CT to identify missing is-a relations and concepts. This method used 4 patterns to cover about 4% of all non-lattice fragments in SNOMED CT, with a solid precision rate (59%) of confirmed errors by domain experts. More recently, a new structural-lexical approach leveraged more existing knowledge in SNOMED CT by enriching the lexical attributes of each concept in non-lattice subgraphs to facilitate the identification of missing is-a relations [17]. This approach covered 7.4% of non-lattice subgraphs with higher precision (82.96%). Work reported in [9] demonstrated that the non-lattice approach can be applied to other ontologies than SNOMED CT (9.93% coverage of non-lattice fragments with 66% precision on identified errors in NCI Thesaurus).
Given such developments, it may seem reasonable to propose the reduced proliferation of non-lattice substructures (i.e., the total number of non-lattice pairs) as a ontology quality metric. However, due to many factors that are involved in creating newer releases of an ontology, we found it not to be the case that newer releases would measure better than earlier releases. It may still be possible to use this method to measure and track the quality of specific sub-hierarchies where non-lattice fragments are unusually dense, or to demonstrate that a non-trivial portion of ontological changes between the releases involve non-lattice fragments.
6. Limitations
Since ANT-LCA is designed for detecting lowest common ancestors for all non-trivial pairs in a DAG, it is generally applicable to other ontologies or terminologies which are hierarchically organized in a DAG. We have applied it to SNOMED CT, Gene Ontology, and NCI Thesaurus for ontology quality assurance.
There are two types of limitations. One is specific to the ANT-LCA algorithm, and the other is related to the non-lattice approach. The limitation of the ANT-LCA algorithm is that, although it is efficient and suitable for ontological graph structures that are tree-like, it may not work well with other types of graph structures when all pairs of nodes are pairable.
Limitations of the non-lattice approach include the following. (1): The approach may not be efficient for ontologies that are “shallow,” such as Ontology for General Medical Science (maximum depth 6), BRENDA Tissue and Enzyme Source Ontology (maximum depth 6), and Current Procedural Terminology (maximum depth 7), from BioPortal. (2): Our algorithm itself is agnostic to relation types, so it will still work for such relations as “part-of.” However, the non-lattice approach is not applicable to other types of relations since this approach is only meaningful for the is-a hierarchy (of any ontology) due to its theoretical underpinning based Formal Concept Analysis. We are not aware of any (theoretical) reasons that indicate non-lattice fragments to be problematic for other types of relations. However, this should not diminish the value of our non-lattice approach.
7. Conclusions
To summarize, this paper introduced an efficient algorithm for detecting non-lattice pairs and generating non-lattice fragments, for ontology quality assurance work. Our algorithm overcomes a fundamental computational barrier in sub-graph based structural analysis of large ontological systems. It enables the implementation of a new breed of structural auditing methods that not only identifies potential problematic areas, but also automatically suggests changes to fix the issues.
Highlights.
- A new algorithm for computing all non-trivial lowest common ancestors (LCA) of each pair of concepts in the hierarchical graph induced by an ontology. 
- Supports a fundamental step in non-lattice approaches for ontology quality assurance. 
- Provide correctness proofs for this fast algorithm which achieved 2-orders of magnitude speedup compared with best known approaches. 
- Non-lattice structural auditing methods can lead to more effective tools supporting ontology quality assurance work. 
Acknowledgments
This work was supported by the National Science Foundation through grants IIS-1657306 and ACI-1626364, and the National Institutes of Health (NIH) National Center for Advancing Translational Sciences through grant UL1TR001998. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Appendix A: correctness proofs
7.1. Proof of Lemma 4
We prove this by induction on the i-th alternation index αi.
Basis i = 1. Without loss of generality, suppose υj < u1 for all 1 ≤ j ≤ t. Then u1 ∈ pυ0(υ1). By monotonicity and Lemma 3 we have u1 ∈ pυt(υt).
For the inductive step, assume υt < us defines the (i + 1)-th alternation index, for some s > 1. We must have us−1 < υt. By Definition 1, the i-th alternation index must be of the form us−1 < υτ, for some τ ≤ t. By induction hypothesis and the symmetry of the graph structure, we have υτ ∈ pus−1(us−1).
By Algorithm 2, we have
Since pus−1(us−1) = pus−1−1(us−1) by diagonality, we have υτ ∈ pus−1(us). Furthermore,
7.2. Proof of Theorem 2
Basis: i = 1, 2. We have π(1) = ∅, 1 ∉ π(2), and 2 ∉ π(2). Therefore, π(1) ∩ [1, 1] = π(2) ∩ [1, 2] = ∅.
For the inductive step, suppose, using course of value induction, that for all j ≤ k, for all w ≤ j,
We plan to prove that for k + 1 we have that for all u ≤ k + 1,
For a given u ≤ k + 1, there are two possible cases for u:
We show that for each case, the desired set containment holds.
Case 1. If k + 1 ∉ π(u), then we have
as needed, by induction hypothesis and monotonicity.
Case 2. Suppose k+1 ∈ π(u), with u < k+1. Then there exists b with b ∈ lca(k+1, u) such that (u0, u1), (u1, u2), …, (um−1, um) and (υ0, υ1), (υ1, υ2), …, (υn−1, υn) are distinct edges of E with b = u0 = υ0, um = u and υn = k + 1.
We show that u ∈ pυn(υn). There are two cases: (a) um < υ1, and (b) υα < um for some α ≤ n − 1 (note that α cannot be n in this case).
For (a), by Lemma 4, we have υ1 ∈ pum(um). Furthermore,
For (b), assume α is the largest index such that υα < um. Then by Lemma 4, we have um ∈ pυα(υα). By Lemma 3, we have u ∈ pυn(υn). Hence k + 1 ∈ pk+1(u), by symmetry.
By induction hypothesis, we have
Since k + 1 ∈ pk+1(u), we have pk+1(u) ⊇ π(u) ∩ [1, k + 1] as needed.
7.3. Proof of Lemma 5
We prove this by induction on i, where αi is the i-th alternation index.
Basis i = 1. If wα1 = υt and wαi+1 = u1, then we have υj < u1 for all 1 ≤ j ≤ t. We prove that x ∈ qυj(υj, u1) for all 1 ≤ j ≤ t. When j = 1, we have u1 ∈ pυ1(υ1), by the fact that u1 ∈ p0(υ1) and by monotonicity. Since x ∈ q0(u1, υ1) (by line 5 of Algorithm 3, we have x ∈ qυ1(υ1, u1) by monotonicity and symmetry.
Suppose x ∈ qυj(υj, u1) for some 1 ≤ j < t. Then x ∈ qυj−1(υj, u1) by diagonality. Since u1 ∈ pυj(υj) (by Lemma 3), we have u1 ∈ pυj−1(υj), again by diagonality. Since (υj, υj+1) ∈ E, we have
Therefore, x ∈ qυj(υj+1, u1) and so x ∈ qυj+1(υj+1, u1) (by monotonicity). This finishes the induction to give us x ∈ qυt(υt, u1). A similar argument holds for the case when wα1 = ut and wαi+1 = υ1.
For the inductive step, let υt < us define the (i + 1)-th alternation index. We must have us−1 < υt. By Definition 1, the i-th alternation index must be of the form us−1 < υτ, for some τ ≤ t. In increasing order, we have the sequence us−1 < υτ ≤ υt < us. By induction hypothesis, we have x ∈ qus−1(us−1, υτ), and so x ∈ qus−1−1(us−1, υτ) (by diagonality).
We now show that x ∈ qυj(υj+1, us) for each j ∈ [τ, t] by induction.
For basis j = τ, we first show that (a): us ∈ pυτ−1(υτ), and (b): x ∈ qυτ(υτ, us), because if these are true, then by diagonality we have x ∈ qυτ−1(υτ, us), and Algorithm 3 gives us
Therefore x ∈ qυτ(υτ+1, us), as needed.
- To show that us ∈ pυτ−1(υτ), note by Lemma 4, we have υτ ∈ pus−1(us−1) (us−1 < υτ is an alternation). Furthermore,
- To see that we have x ∈ qυτ(υτ, us), note that induction hypothesis gives us x ∈ qus−1(us−1, υτ). Also,
 by instantiating Algorithm 3 with (us−1, us) ∈ E and υτ ∈ pus−1−1(us−1). Therefore, x ∈ qus−1(us, υτ). Furthermore,
If τ = t then the proof is already compete. If τ < t, this completes the induction basis j = τ, because x ∈ qυτ(υτ, us) implies x ∈ qυτ(υτ+1, us).
For the inductive step, assume x ∈ qυj(υj+1, us) for some j such that τ ≤ j < t − 1. Since υj ≤ υj+1 − 1, we have x ∈ qυj+1−1(υj+1, us). Since us ∈ pυτ−1(υτ), we have us ∈ pυj+1−1(υj+1) by diagonality and Lemma 3. By instantiating Algorithm 3 with (υj+1, υj+2) ∈ E and us ∈ pυj+1−1(υj+1),
Therefore, x ∈ qυj+1(υj+2, us). By induction, we have x ∈ qυt−1(υt, us), and so x ∈ qυt(υt, us) (by monotonicity).
A similar argument holds for the case when the (i + 1)-th alternation index is defined by ut < υs.
7.4. Proof of Theorem 4
If um < υj defines the last alternation index, then by Lemma 5, we have x ∈ qum(um, υj). That is, x ∈ qu(u, υj). If n = 1, then j = n = 1 and we have x ∈ qu(u, υ).
We show that x ∈ qυk(υk+1, u) for each k ∈ [j, n − 1] by induction for the case when n > 1.
Basis: k = j. By Lemma 4, we have υj ∈ pum(um) = pu(u). Moreover,
By instantiating Algorithm 3 with (υj, υj+1) ∈ E and u ∈ pυj−1(υj), we have
Since x ∈ qu(u, υj), we have x ∈ qυj−1(u, υj) = qυj−1(υj, u) by monotonicity (u ≤ υj − 1) and symmetry. Therefore, x ∈ qυj(υj+1, u).
For inductive step, assume x ∈ qυk(υk+1, u) for some k such that j ≤ k < n − 1. Since u ∈ pυj(υj), we have u ∈ pυk+1(υk+1) by Lemma 3, and so u ∈ pυk+1−1(υk+1) by diagonality.
By instantiating Algorithm 3 with (υk+1, υk+2) ∈ E and u ∈ pυk+1−1(υk+1), we have
Since x ∈ qυk(υk+1, u) (induction hypothesis), we have x ∈ qυk+1−1(υk+1, u) by monotonicity (with υk ≤ υk+1 − 1). Therefore, x ∈ qυk+1(υk+2, u).
This completes the induction, and we have x ∈ qυn−1(υn, u). Therefore, x ∈ qυn(υn, u) by monotonicity, i.e., x ∈ qυ(υ, u).
When υn < uj defines the last alternation index, a similar argument gives x ∈ qu(u, υ).
Appendix B: pseudocode for random graph generation
The random ontological graph generator algorithm (Algorithm 4) works in two steps:
- Lines 7 to 18 generate a random rooted tree. Starting from the root, the number of children is randomly selected between Cmin and Cmax. This process will continue until the number of nodes in the generated tree is equal to N. 
- Lines 19 to 26 add extra edges by selecting two random nodes. The number of edges that will be added is controlled by the density.Algorithm 4.Procedure to generate a random ontology.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1. [accessed 13 January 2018];SNOMED CT. 2018 http://www.snomed.org/snomed-ct.
- 2.Zweigenbaum P, Bachimont B, Bouaud J, Charlet J, Boisvieux JF. Issues in the structuring and acquisition of an ontology for medical language understanding. Methods of information in medicine. 1995;34:15–24. [PubMed] [Google Scholar]
- 3.Ganter B, Wille R. Formal concept analysis: mathematical foundations. Springer Science & Business Media; Berlin, Germany: 2012. [Google Scholar]
- 4.Jiang G, Chute CG. Auditing the semantic completeness of SNOMED CT using formal concept analysis. J. Am. Med. Inform. Assoc. 2009;16(1):89–102. doi: 10.1197/jamia.M2541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zhang GQ, Bodenreider O. Large-scale, exhaustive lattice-based structural auditing of SNOMED CT; AMIA Annual Symposium Proceedings; 2010. pp. 922–926. [PMC free article] [PubMed] [Google Scholar]
- 6.Cui L, Tao S, Zhang GQ. Biomedical ontology quality assurance using a big data approach. ACM T. Knowl. Discov. D. 2016;10(4):41. [Google Scholar]
- 7.Cui L, Zhu W, Tao S, Case JT, Bodenreider O, Zhang GQ. Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT. J. Am. Med. Inform. Assoc. 2017;24(4):788–798. doi: 10.1093/jamia/ocw175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhu X, Fan JW, Baorto DM, Weng C, Cimino JJ. A review of auditing methods applied to the content of controlled biomedical terminologies. J. Biomed. Inform. 2009;42(3):413–425. doi: 10.1016/j.jbi.2009.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Abeysinghe R, Brooks MA, Talbert J, Cui L. Quality Assurance of NCI Thesaurus by Mining Structural-Lexical Patterns; AMIA Annual Symp Proc; 2017. pp. 364–373. [PMC free article] [PubMed] [Google Scholar]
- 10.Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to Algorithms. The MIT Press; Cambridge MA: 1990. [Google Scholar]
- 11.Demetrescu C, Italiano GF. Fully dynamic transitive closure: Breaking through the O(n2) barrier; 41st Annual Symposium on Foundations of Computer Science Proceedings; 2000. pp. 381–389. [Google Scholar]
- 12.Galil Z, Italiano GF. Data structures and algorithms for disjoint set union problems. ACM Computing Surveys (CSUR) 1991;23(3):319–44. [Google Scholar]
- 13.Bender MA, Farach-Colton M, Pemmasani G, Skiena S, Sumazin P. Lowest common ancestors in trees and directed acyclic graphs. J. Algorithm. 2005;57(2):75–94. [Google Scholar]
- 14.Czumaj A, Kowaluk M, Lingas A. Faster algorithms for finding lowest common ancestors in directed acyclic graphs. Theor. Comput. Sci. 2007;380(1):37–46. [Google Scholar]
- 15.Dash SK, Scholz SB, Herhut S, Christianson B. A scalable approach to computing representative lowest common ancestor in directed acyclic graphs. Theor. Comput. Sci. 2013;513:25–37. [Google Scholar]
- 16.Eckhardt S, Mühling AM, Nowak J. Fast lowest common ancestor computations in DAGs. Algorithms ESA. 2007:705–716. [Google Scholar]
- 17.Cui L, Bodenreider O, Shi J, Zhang GQ. Auditing SNOMED CT hierarchical relations based on lexical features of concepts in non-lattice subgraphs. J. Biomed. Inform. doi: 10.1016/j.jbi.2017.12.010. [DOI] [PMC free article] [PubMed]















