Skip to main content
EPA Author Manuscripts logoLink to EPA Author Manuscripts
. Author manuscript; available in PMC: 2021 May 1.
Published in final edited form as: Aquat Toxicol. 2020 Mar 30;222:105478. doi: 10.1016/j.aquatox.2020.105478

Semantic Characterization of Adverse Outcome Pathways

Rong-Lin Wang 1
PMCID: PMC7393770  NIHMSID: NIHMS1607200  PMID: 32278258

Abstract

This study was undertaken to systematically assess the utilities and performance of ontology-based semantic analysis in adverse outcome pathway (AOP) research. With an increasing number of AOPs developed by scientific domain experts to organize toxicity information and facilitate chemical risk assessment, there is a pressing need for objective approaches to evaluate the biological coherence and quality of these AOPs. Powered by ontologies covering a wide range of biological domains, abundant phenotypic data annotated ontologically, and some sophisticated knowledge computing tools, semantic analysis has great potential in this area of application. With the events in the AOP-Wiki first annotated into logical definitions and then grouped into phenotypic profiles by individual AOPs, the coherence and quality of AOPs were assessed at several levels: paired key event relationships (KER), all possible event pair combinations within AOPs, and the phenotypic profiles of AOPs, genes, biological pathways, human diseases, and selected chemicals. The semantic similarities were assessed at all these levels based on a unified cross-species vertebrate phenotype ontology encompassing the logical definitions of AOP events as well as many other domain ontologies. A substantial number of KERs and AOPs in the AOP-Wiki were found to be semantically coherent. These same coherent AOPs also mapped to many more genes, pathways, and diseases biologically aligned with the intended chain of events therein leading to their respective adverse outcomes. Significantly, these findings imply that semantic analysis should also have utilities in developing future AOPs by selecting candidate events from either the existing AOP-Wiki events or a broader collection of ontology terms semantically similar to the molecular initiating events or adverse outcomes of interest. In addition, semantic analysis enabled AOP networks to be constructed at the level of phenotypic profiles based on similarities, complementing those based on event sharing by bringing genes, pathways, diseases, and chemicals into the networks too—thus greatly expanding the biological scope and our understanding of AOPs.

Keywords: adverse outcome pathway, ontology, semantic analysis, toxicity event network

1. Introduction

The concept of adverse outcome pathway (AOP) is increasingly being adopted as a framework to better organize chemical toxicity information along the levels of biological hierarchy. A typical AOP consists of a molecular-initiating event (MIE), multiple key events (KE), and an adverse outcome (AO) causally or mechanistically linked together (Ankley et al., 2010). An MIE marks the beginning interaction between a chemical stressor and a macromolecule such as a protein receptor. Some of the milestone responses downstream at cellular, tissue, and organ levels are then selected as key events. An AO is typically an apical endpoint of interest at organismal or population level. The AOP framework facilitates toxicological research in at least several ways. First, it provides a systemic biological context to a toxicity study typically of limited scope by design. The results from such a study, for example, the molecular phenotypes of gene expression and protein interaction by high throughput omics screenings or the morphological changes induced at organ and organismal levels after chemical exposures could then be better interpreted mechanistically along the entire pre-established phenotypic continuum of toxicities. Second, by attempting to bring together key biological milestones surrounding an AO or MIE, an AOP may help to reveal the potential knowledge and data gaps along its specific continuum of cascading events under consideration for future research. Third, a holistic approach of the AOP framework lays the foundation for jointly considering multiple AOPs as interacting networks, which likely are a more realistic biological model for organism responses to chemicals and lead to a better prediction of their apical adverse outcomes relevant to current regulatory needs. Over the years, there has been steady progress in the development of AOPs and their applications. As of February 2019, there were nearly 1,000 KEs, more than 1,100 key event relationships (KERs), and 200 plus AOPs in various stages of development deposited in the AOP-Wiki, a component of AOP Knowledgebase sponsored by the Organization for Economic Cooperation and Development (OECD; 2018). A recent quick search of the PubMed (December 2019) returned more than 400 studies with “adverse outcome pathway” as keywords in their titles and/or abstracts. The rapid advances in this field underscore the need to develop and adopt innovative tools for objectively evaluating the biological coherence of existing AOPs and aiding the more efficient construction of additional putative AOPs in the future.

Semantic analysis, or ontology-based semantic mapping (OS-Mapping), brings a promising approach to the field of studying phenotypes across all levels of biological organization and integrating them with other types of omics data. An ontology could be defined as a representation of a knowledge domain by controlled vocabularies and associated grammars, the latter consisting of a set of Web Ontology Language (OWL) constructs (Gruber, 1995). Phenotypes are typically recorded as unstructured text, which, unlike other biological information such as numerical omics data or discrete DNA sequences, are not readily amenable to bioinformatics computing. Phenotypes, however, can be annotated into full logical definitions (i.e., ontology classes, terms, or concepts) by using the entity-quality (EQ) syntax (Washington et al., 2009; Mungall et al., 2010; Hoehndorf et al., 2011; Köhler et al., 2011; Gkoutos et al., 2017). A logical definition here is defined as several phenotypic expressions composed of more atomic terms from multiple reference ontologies and linked together by appropriate object properties from the Relations Ontology (Mungall et al., 2011; Köhler et al., 2011; http://purl.obolibrary.org/obo/ro.owl). The EQ syntax describes how an entity such as an anatomical part, a biological process, or a biological function, is altered in its quality. There are many reference domain ontologies developed thus far, such as the Gene Ontology, Chemical Entities of Biological Interest Ontology, Cell Ontology, Phenotype and Trait Ontology, and various anatomy ontologies (http://obofoundry.org/). Once annotated as logical definitions, phenotypes can be incorporated into a comprehensive ontology graph encompassing many of these domain-specific and species-oriented ontologies. As nodes connected by various object properties in this graph, individual ontology classes can be characterized by their information contents as determined by their positions in the graph (Sánchez et al., 2011). When a graph is viewed from the top (the pre-defined root class of owl: Thing) to bottom (the child nodes without further descendants), the lower a node is positioned, the more specific it is thus more informative. The information content of nodes can be incorporated into a variety of semantic similarity measures to relate nodes to one another computationally either in pairs or in groups (Resnik, 1995; Pesquita et al., 2008). In essence, free-text phenotypes once difficult to analyze become computable logical definitions, facilitating their integration across the levels of biological organization, species, and new approach methodology (NAM) data types. Such an integration places a subject of interest in a much wider biological context and creates opportunities for knowledge discovery.

While there have been considerable advances in AOP research in recent years, both the existing AOPs and their future development could benefit from an independent and more objective approach for assessing their biological coherence and quality. Currently, AOP developers primarily rely on their subject matter expertise surrounding a given MIE or AO to manually construct AOPs. This expert approach, while quite successful so far, is limited by its efficiency and degree of subjectiveness during selecting significant milestones between an MIE and AO of interest as candidate AOP events. Since these AOP events represent phenotypes across various levels of biological organization, they should be amenable to semantic analysis. As part of the biological effect continuum leading to a common AO, the relevant AOP events are assumed to be biologically coherent. One way to measure AOP coherence is by using semantic similarity as a proxy. This could be achieved by estimating the mean pairwise similarity of events within an AOP and the similarity between an AOP as a phenotypic profile (hereafter “profile”) and a large collection of pre-established target profiles of genes, biological pathways, and diseases. A profile here consists of multiple ontology terms, each of which represents a distinct phenotype. Each profile may be anchored or organized based on a chemical, gene, pathway, disease, or any other entity of interest. The organizing principle behind the AOP framework and the nature of an ontology graph are such that, a coherent AOP is expected to have relatively high similarities within itself because its biological connections should be reflected in the inherent logical connections of an ontology graph. Consequently, the profile of such an AOP should also map well to pre-established biological profiles. With the rapid growth of public phenomics (high throughput phenotyping) data in multiple species, continuing development of ontologies across diverse knowledge domains such as anatomy, chemicals, genes, proteins, and diseases, and the emergence of increasingly sophisticated and publicly available bioinformatics resources, such an ontology-based semantic analysis approach could become a valuable resource to AOP research in the future.

The objectives of this study were several-fold: 1) to annotate all the events available in the AOP-Wiki into computable logical definitions by using EQ expressions; 2) to assess the biological coherence of existing AOPs measured by both within-AOP pairwise event similarities and the degree of alignment between AOP profiles and a collection of pre-established profiles of genes, pathways, and diseases assembled from public domains; 3) to compare AOP networks based on event sharing or semantics; and 4) to assess the utility of ontology-based semantic analysis in constructing putative AOPs in the future. The findings will be discussed in the context of the strength and weakness of this approach in its toxicological applications.

2. Materials and Methods

The overall workflow of this study, outlined in Figure 1A, includes the steps of compiling events from the AOP-Wiki, annotating them into logical definitions, assembling query and target phenotypic profiles, and executing semantic analysis. Additional information about some of these steps is also available elsewhere (Wang et al., 2019).

Figure 1.

Figure 1.

Semantic analysis of AOPs. A) the overall workflow, and B) creation of logical definitions.

2.1. The AOP-Wiki repository

The AOP-Wiki serves as the central repository of AOPs hosted by the Society for the Advancement of Adverse Outcome Pathways (SAAOP; https://aopwiki.org/). It is also part of a larger effort of the AOP Knowledgebase sponsored by the OECD. The AOP developers represent many institutions from multiple countries. At the time this study was initiated, most of the AOPs deposited in the AOP-Wiki were in various stages of development (S. Table 1), with their member events varying considerably in the amount of furnished biological details and other supporting documentation (S. Table 2). Of the 1,104 events and 212 AOPs downloaded from the site around February 2018, including KEs, MIEs, and AOs, 910 of them were assigned into 208 AOPs at the time (S. File 1). One additional KE (KE1515) was also added later into the dataset. Among the 1,131 unique KERs with various degrees of evidence, 1,031 were considered. Due to the phenotypic complexity of some of these events and later updates on their membership assignment to AOPs, the exact number of events and AOPs stated in this report could vary slightly depending on the type of analysis. Hereafter, unless otherwise specified, the generic term “event” represents a KE, MIE, or AO; similarity refers to semantic similarity; a profile refers to a phenotypic profile anchored on an AOP, a chemical, a gene, a pathway, or a disease.

2.2. Pattern-based ontological annotation of AOP events

Ontological annotations could in theory be conducted based on events, KERs, or even AOPs. For increased efficiency and reduced likelihood of errors, however, events are considered the simplest targets of annotation since they represent the most basic organizing units of an AOP. Annotated events could then enable semantic analysis across all levels including KERs and AOPs. The potential downside of this strategy is that, depending on the biological coverage of an underlying ontology, a lack of applicable domain information in the original events as faithfully annotated may bias the estimated semantic coherence for some of the KERs and AOPs when their relevance and validities are contingent upon such contextual factors as specific tissues, organs, developmental processes, and life stages.

Ontological annotation of AOP events was conducted according to the EQ syntax (Washington et al., 2009; Hoehndorf et al., 2011; Gkoutos et al., 2017; Wang et al, 2019), but with some improvements. Briefly, a template specifically designed for annotating AOP events was first created for Phenote editor (Phenote_1_8_13_windows-x64_install4j.exe, released 11–29-2012; http://www.berkeleybop.org/index.html). This template laid out a spreadsheet to import a text file containing all the event fields and instructed the editor to load the relevant Open Biological and Biomedical Ontologies (OBO) covering various knowledge domains required for annotations. A total of 21 OBO ontologies were selected, including chebi (chemical entities of biological interest), cl (cell), doid (human disease), ecocore (core ecological entities), geno (genotype), go (gene), hp (human phenotype), mp (mammalian phenotype), mpath (mouse pathology), nbo (neuro behavior), ncit (NCI Thesaurus), obi (biomedical investigations), pato (phenotype and trait), pco (population and community), pr (protein), ro (relation), so (sequence types and features), uberon (cross-species anatomy), uo (units of measurement), vt (vertebrate trait), and zfa (zebrafish anatomy and development). After a careful review of all the contextual details available, each event was annotated primarily based on its atomized title description by using appropriate terms from these ontologies following the EQ syntax. At times, it was also necessary to supplement the AOP-Wiki with external information from literature. The goal was to capture all the essential biological elements involved in an event, such as the underlying biological process, molecular function, anatomy, and life stage, into a single or sometimes multiple logical definitions. The formula for an EQ annotation could be generalized as [E1a-R-E1b]-[Q-QL]-[E2a-R-E2b], where “E” stands for entity, “R” for object property, “Q” for quality, and “QL” for quality modifier. Each of the square brackets here symbolizes a discrete component of expression. Some of the phenotypically complex events were composite in nature. They were split into multiple annotations and later logical definitions. Overall, the annotation of the 1,000-plus events in the AOP-Wiki was conducted over a period of several months in three rounds before being finalized for subsequent semantic analysis. At completion, a text file was generated containing EQ expressions denoting various entity and quality terms for each event.

At this stage, various components of an EQ annotation need to be further linked into a full logical definition by using additional object properties as appropriate (Figure 1B). Inspired by the approach of Dead Simple OWL Design Patterns (DSODP, Osumi-Sutherland et al., 2017), the patterns of these EQ expressions were generated by replacing all their ontology terms other than those of RO, Basic Formal Ontology (BFO) and PATO with a common letter “X” and shortening PATO term identifiers by removing their non-alphabetic parts (e.g., from PATO_0001997 to PATO). As a result, the original EQ annotations were reduced to a small number of unique patterns. One such example was KE10, “Accumulation, Acetylcholine in synapses”. It was annotated in EQ as “GO_0045202 (synapse),,,PATO_0002270 (increased accumulation),,,,CHEBI_15355 (acetylcholine),,” and reduced to the pattern of “X,,,PATO,,,,X,,”. Up to ten ontology terms could be present as part of an EQ annotation. Each pattern was then uniquely denoted by its corresponding class expression model, in this case, “PATO1 that (obo:RO_0000052 some X1) that (obo:RO_0002503 some X2)”. A custom Python script was developed to generate logical definitions in Manchester format by simply taking an EQ annotation, matching it to its unique pattern and class expression model, and substituting the corresponding variables with the original ontology terms. After these steps, the logical definition of KE10 became “obo:BFO_0000051 some (obo:PATO_0002270 that (obo:RO_0000052 some obo:GO_0045202) that (obo:RO_0002503 some obo:CHEBI_15355))”. This approach allowed an efficient and precise translation of 1,076 AOP events as EQ expressions into 1,187 logical definitions (1,070 events and 1,180 logical definitions after removing all but one equivalent class within AOPs; S. File 1). The remaining 29 events (1,105 minus 1,076) were excluded due to a lack of details or duplication. Of the 91 events each represented by multiple logical definitions, 76 had two, 12 had three, two had four, and one had five. These many-to-one logical definitions were treated effectively as event-equivalents in this study. All logical definitions were converted from Manchester to OWL format by using Robot, a versatile command line tool for ontology manipulations (http://robot.obolibrary.org/).

2.3. Assembly of query and target phenotypic profiles

Query profiles consisted of primarily the events organized by AOPs (Figure 1A). To learn if the toxicity responses by chemical species from a previous study could be linked to some of the AOPs, they were also included as part of the query profiles (Wang et al., 2019). The AOP profiles were assembled by grouping the logical definitions derived from EQ-annotated events according to their respective AOPs. The assignment of events to their AOPs was based on the updated AOP-event relationships in the AOP-Wiki in February 2019. Seven of the original 212 AOPs (AOP146, 151, 199, 211, 239, 240, and 243) without any events assigned to them were dropped. An additional 15 new AOPs were added. Those events not assigned to specific AOPs were all placed into a group as “unknown”. In the end, 221 AOP profiles (220 AOPs plus the unknown group) were assembled. In order not to bias the similarities within and among AOPs, the logical definitions of each AOP profile were made unique by excluding all but one equivalent ontology class as determined by a reasoner. For example, in AOP66, the phenotype “decreased testosterone by the fetal Leydig cells” appeared in multiple events. Only one of their logical definitions was retained. In addition, 19 chemical-species profiles (CSPPs) were also constructed by subjecting the previously annotated chemical toxicity responses (Wang et al., 2019) to the same procedure of pattern-based conversion of EQ annotations to logical definitions as described above. Target profiles of genes, biological pathways, and diseases were assembled from multiple public sources in February 2019 (S. Table 4). For pathway profiles, genes in individual pathways were simply replaced by their associated ontology terms (where available) from human, mouse, and zebrafish, respectively. In total, there were 37,325 target profiles assembled, including 22,920 genes, 5,759 pathways, 8,406 diseases, 221 AOPs, and 19 CSPPs. The AOP and CSPP profiles served as both the query and part of the target profiles so AOP-AOP and AOP-CSPP comparisons could be made in addition to their mappings to genes, pathways, and diseases.

2.4. OS-Mapping analysis

The semantic analysis was conducted based on a cross-species, cross-domain phenotype ontology (http://purl.obolibrary.org/obo/upheno/vertebrate.owl; Köhler, 2013). Once loaded, this root ontology initiated successive rounds of imports, starting with seven primary ontologies for managing imports and bridging domains, and 38 secondary ontologies covering domains of anatomy, behavior, cell, chemical, disease, gene, pathology, phenotypes, protein, relations, etc. These domain ontologies had different release dates, with most of them between January and March of 2019. After all these ontologies were merged and reasoned into a single ontology by using a reasoner, it became ready to serve as the foundation for semantic analysis of query against target profiles by using a previously developed OS-Mapping Java application (Wang et al., 2019).

Some of the analysis-specific input parameters for the OS-Mapping Java application were set as follows: ANALYZE_ONTOLOGY, true; GENERATE_SIM_SCORE_DISTRIBUTION, true; grpwiseDirNo, 2; grpwiseIndirNo,4; icNo, 3; INFER_DISJOINT_CLASS_AXIOMS, false; inputOwl,vertebrateLocalLoad02202019Linux.owl; MIN_TERMS_PER_PROFILE, 1; NUM_ELK_WORKERS,12; NUM_SIMULATED_GRPS, 200; OUTPUT_LCA, false; pairwiseNo, 1; reasonerNo, 3; ROOT_ONTOLOGY, true; SAVE_MERGED_ONTOLOGY,true; TARGET_TERM_PAIRWISE_SIMILARITY, true; TERM_PAIRWISE_SIMILARITY_OUTPUT_MIN_CUTOFF, 0.0. Additional details about these parameters are available in the supplementary materials (S. File 2). The analysis was divided into 36 different jobs executed on the EPA Atmos Linux cluster, taking a combined total of 2,180 hours.

Several post-analysis steps were taken to further process the outputs of OS-Mapping analysis. First, the semantic similarities of event/logical definition pairs were extracted from the outputs for all possible combinations in each AOP. Next, the statistical significance of the mean similarity of event/logical definition pairs in each AOP was assessed by resampling using a custom Python script. For a given AOP, its mean similarity value was compared against a cutoff generated by resampling 10,000 times from the similarities of all pairwise combinations of 1,180 logical definitions. The size of resampling was determined by the number of events/logical definitions contained in the AOP. Lastly, the network relationships among AOPs, CSPPs, genes, pathways, and diseases were visualized by importing their similarities into the Cytoscape software (Shannon et al., 2003) and displayed as non-directional nodes with connecting edges weighed by their corresponding similarities.

3. Results

In this study, the entire collection of over 1,000 events from the AOP-Wiki was annotated into logical definitions, grouped into AOP-based profiles, and then semantically compared against themselves as well as more than 37,000 public profiles of genes, pathways, and diseases. The analysis was based on a cross-species phenotype ontology encompassing a variety of knowledge domains across human, mouse, and zebrafish species. Also included in the analysis were logical definitions previously derived from the toxicity responses reported in hundreds of chemical exposure studies and their respective profiles organized by chemical and species (Wang et al., 2019). This section is organized into the following parts: an overview of pairwise AOP event similarities (Figure 2A, 2B); the characterization of AOPs and their KERs by such similarities (Table 1A, 1B); the number of AOP mappings to the various target profiles anchored by genes, pathways, diseases (Figure 3); the relationship between these mappings and AOP semantic coherence (Table 2A, 2B); the biology underlying a selected sample AOP (Table 3); and finally, the networking relationships among AOPs based on semantics or event sharing (Figure 4 and 5).

Figure 2.

Figure 2.

A) Distribution of pairwise similarities among 1,180 logical definitions. Out of 695,610 possible pairs, about half had scores ≥ 0.115 and were shown. The rest were all zeros. The pairwise similarity measure was SIM_PAIRWISE_DAG_NODE_LIN_1998 (Lin 1998). B) Distribution of mean pairwise event similarities within 212 AOPs. The pairwise similarity measure was SIM_PAIRWISE_DAG_NODE_LIN_1998 (Lin 1998). The mean, median, maximum, and minimum number of events per AOP were 7.4, 7, 29, and 1 respectively.

Table 1A.

AOP semantic coherence by mean pairwise event similarities. Semantic coherence is measured by the pairwise event similarities within individual AOPs. Statistical significance for a given AOP was assessed by comparing its mean event similarity against the cutoffs generated by resampling from all pairwise combinations of 1,180 logical definitions 10,000 times. The size of resampling was determined by the number of events each AOP contained. A total of 212 AOPs remained after excluding eight AOPs each having only a single event and the unknown group. (NS: not significant).

P0.05 P0.01 P0.001 NSp0.05 All
No. AOPs 71 53 38 141 212
Mean similarity 0.276 0.303 0.335 0.098 0.122

Table 1B.

The distribution of mean similarities of KER event pairs classified by evidence and adjacency. The adjacency of a KER indicates if the two events involved in a relationship are immediate neighbors or not in an AOP sequence. Of 1,131 unique KERs as of February 2019, 1,031 were considered. The remaining KERs were not included because either the relevant KEs were not available in the AOP-Wiki at the onset of this study or they were removed due to having other equivalent classes.

Weight of evidence by the AOP-Wiki All adjacent KERs (No. KERs) Adjacent KERs (No. KERs) in significant AOPs (P0.05) only All non-adjacent KERs (No. KERs) Non-adjacent KERs (No. KERs) in significant AOPs (P0.05) only
High 0.194 (342) 0.313 (206) 0.130 (68) 0.195 (40)
Moderate 0.176 (149) 0.267 (97) 0.181 (50) 0.254 (27)
Low 0.084 (57) 0.166 (20) 0.127 (36) 0.136 (32)
Not determined 0.146 (394) 0.239 (181) 0.118 (51) 0.093 (5)
Combined 0.165 (942)1 0.272 (504) 0.139 (205)1 0.188 (104)
1

The total number of events adds up to more than 1,031 because some events were annotated into multiple logical definitions.

Figure 3.

Figure 3.

AOP and CSPP mappings to genes, pathways, diseases, and themselves. A total of 240 query profiles (AOPs: 221, CSPPs: 19) were compared to 37,325 target profiles (genes: 22,920; pathways: 5,759; diseases: 8,406; AOPs: 221; CSPPs: 19). Mappings were assessed at P0.01 level, with each P-value generated specifically for a query/target comparison by constructing 200 random profiles of the same size as the target from the entire collection of target profiles.

Table 2A.

AOP semantic coherence and biological mappings. AOPs are grouped by within-AOP event similarity. AOP mappings to target profiles were based on P0.01. (NS: not significant).

Coherence measure and total targets mapped AOP NS (141) AOPs P0.05 (71)
Mean pairwise event similarity (Std) 0.098 (0.059) 0.276(0.124)
Genes (human, mouse, zebrafish) 3364 (372,2530, 462) 8531 (1068,6338,1125)
Pathways (Reactome, KEGG) 127 (121,6) 1132(1019,113)
Diseases (OMIM, ORPHA) 926 (845,81) 2785 (2340,445)
AOPs 219 217
CSPPs 14 19

Table 2B.

Biological mappings of two contrasting AOPs with low or high semantic coherence. Also included for comparison is a group of logical definitions not assigned to any AOPs—in effect a random group. AOP mappings to target profiles were based on P0.01. AOP148: EGFR Activation Leading to Decreased Lung Function; AOP68, Modulation of Adult Leydig Cell Function Subsequent to Alterations in the Fetal Testis Proteome.

AOP size/coherence measure/total targets mapped AOP148 AOP68 A random group
No. logical definitions 11 9 196
Mean pairwise event similarity 0.093 (NS) 0.467 (P0.001) 0.086 (NS)
Genes 9 2060 66
Pathways 0 435 3
Diseases 1 535 2
AOPs 9 143 86
CSPPs 0 15 1

Table 3.

The top five mappings of AOP68 to genes, pathways, diseases, other AOPs, and chemical species. AOP68: Modulation of Adult Leydig Cell Function Subsequent to Alterations in the Fetal Testis Proteome. P-value = 0.01.

Target profiles (No. mapped) Target profile definition Similarity score indirect
Genes (2060) 0.830–0.264
M_Rhoxl3 Reproductive homeobox 13 0.830
M_Rfx2 Regulatory factor x, 2 0.824
M_Nanos2 Nanos c2hc-type zinc finger 2 0.820
M_Nup210l Nucleoporin 210-like 0.819
M_Hmgb2 High mobility group box 2 0.818
Pathways (435) 0.764–0.264
R-HSA-5601884 Piwi-interacting RNA biogenesis 0.764
R-MMU-1300652 Sperm oocyte membrane binding 0.737
R-MMU-1300642 Sperm motility and taxes 0.736
R-MMU-2046105 Linoleic acid metabolism 0.732
R-MMU-375281 Hormone ligand-binding receptors 0.722
Diseases (535) 0.815–0.283
M_OMIM_300200 Adrenal hypoplasia, congenital 0.815
M_OMIM_102530 Spermatogenic failure 6 0.803
M_OMIM_154230 46, XY sex reversal 4 0.790
M_OMIM_607080 46, XY gonadal dysgenesis, partial, with minifascicular neuropathy 0.790
M_OMIM_300068 Androgen insensitivity syndrome 0.777
AOPs (142) 0.849–0.157
AOP66 Modulation of adult leydig cell function subsequent glucocorticoid activation in the fetal testis 0.849
AOP67 Modulation of adult leydig cell function subsequent to estradiaol activation in the fetal testis 0.849
AOP74 Modulation of adult leydig cell function subsequent to hypermethylation in the fetal testis 0.822
AOP70 Modulation of adult leydig cell function subsequent to proteomic alterations in the adult leydig cell 0.723
AOP238 Excessive reactive oxygen species production leading to reduced ATP production-associated reproduction decline 0.712
CSPPs (15) 0.520–0.288
Atrazine_rat Atrazine_rat 0.520
EE2_ZF Ethinylestradiol_zebrafish 0.516
Malathion_rat Malathion_rat 0.497
EE2_FHM Ethinylestradiol_fathead minnow 0.485
Cypermethrin_rat Cypermethrin_rat 0.483

Figure 4.

Figure 4.

Semantic similarity networks of AOPs, CSPPs, genes, pathways, and diseases. The networks were constructed based on the top five mappings of genes, pathways, diseases, chemicals, and other AOPs by each of the 220 AOP queries at P0.01 in the prefuse force-directed layout by using Cytoscape, with a total number of 1,041 unique nodes. The anchors of these profiles served as nodes and connected by non-directional edges weighed by their corresponding similarity scores, with a wider edge in darker red color denoting a higher similarity. A) all 1,041 nodes and 3,072 edges, and B) AOP68 and its first neighbors (34) and all their edges (175). Four highlighted AOPs also form a subnetwork due to their shared events. AOP68, Modulation of Adult Leydig Cell Function Subsequent to Alterations in the Fetal Testis Proteome.

Figure 5.

Figure 5.

The AOP68 subnetwork based on event-sharing. The subnetwork contains four AOPs and 14 events. The events belonging to a common AOP are connected by arrows of the same color. A) default subnetwork, and B) consolidated subnetwork. AOP66, Modulation of Adult Leydig Cell Function Subsequent Glucocorticoid Activation in the Fetal Testis; AOP67, Modulation of Adult Leydig Cell Function Subsequent to Estradiol Activation in the Fetal Testis; AOP68, Modulation of Adult Leydig Cell Function Subsequent to Alterations in the Fetal Testis Proteome; AOP74, Modulation of Adult Leydig Cell Function Subsequent to Hypermethylation in the Fetal Testis. The 14 events are: 505, Decreased Sperm Quantity/Quality in the Adult, Decreased Fertility; 540, Decreased Testosterone by the Fetal Leydig Cells, Dysgenesis of Fetal Leydig Cells; 541, Decreased Testosterone by the Fetal Leydig Cells, Decreased Coup-Tfii Stem Leydig Cells; 543, Decreased Fertility in the Adult, Decreased Sperm Quantity and/or Quality in the Adult Testis; 653, Decreased Testosterone by the Fetal Leydig Cells, Increased Corticosterone; 654, Decreased Testosterone by the Fetal Leydig Cells, Activation by Other Glucocorticoid Receptor Agonists; 655, Decreased Testosterone by the Fetal Leydig Cells, Increased Coup-Tfii in Fetal Leydig Cells; 656, Decreased Number and Function of Adult Leydig Cells, Decreased Coup-Tfii Stem Leydig Cells; 657, Decreased Testosterone by the Fetal Leydig Cells, Dysgenesis of Fetal Leydig Cells; 658, Decreased Testosterone by the Fetal Leydig Cells, Increased Estradiol; 659, Decreased Testosterone by the Fetal Leydig Cells, Activation by Other Estradiol Agonists; 660, Decreased Testosterone by Fetal Leydig Cells, Dysgenesis of Fetal Leydig Cells; 661, Decreased Testosterone by the Fetal Leydig Cells, Alterations in the Fetal Testis Proteome; 662, Decreased Testosterone by the Fetal Leydig Cells, Hypermethylation in the Fetal Testis.

The pairwise similarities among 1,180 logical definitions representing 1,070 events varied widely and gave rise to a highly skewed distribution (Figure 2A, S. File 1). Among a total of 695,610 possible pairs, nearly half (49.2%) shared no similarity at all. The similarity ranges and respective percentages of total for the remaining pairs were: 0.01 – 0.2 (33.8%), 0.21 – 0.4 (13.2%), 0.41 – 0.6 (2.2%), 0.61 – 0.8 (1.1%), 0.81 – 1.0 (0.33%). The median and mean similarities were only 0.115 and 0.122. A similar trend was also observed when event similarities were considered within 212 AOPs individually (Figure 2B): almost 28% of AOPs had similarities less than 0.086—the mean value of 196 events not assigned to any AOPs thus basically a random group—and 41% of AOPs had similarities less than 0.122. The knowledge of these event similarity distributions both within AOPs and across all events provided necessary background information during subsequent evaluation of individual AOPs and KERs.

The event similarities within AOPs were further tested statistically by resampling. Of the remaining 212 AOPs considered after excluding eight single-event AOPs and the unknown group, 71 were significant at P0.05 (Table 1A, S. File 3). A majority of the 71 AOPs were developed for vertebrates, particularly rat, mouse, human, and fish. Six non-vertebrate AOPs were also found significant. For many others, their intended taxonomic applications were not specified. Of the 16 OECD-endorsed AOPs as of February 2020, only five were significant. On average, these significant AOPs each had 8.7 events and a similarity of 0.276—a score over two-fold of that across all event pairs. Common among the significant AOPs were those involved in hormone and reproductive functions (S. File 3). In contrast, among the 141 non-significant AOPs, there was an average of 7.1 events and similarity of 0.098 per AOP. Their coverage of biology appeared to be far more diverse. As to the 1,000-plus KERs considered, several patterns were noticeable in their similarity distributions (Table 1B): the subgroup of KERs present in significant AOPs had greater similarities than those of the overall group; adjacent KERs (two immediate neighboring events) appeared to have greater similarities than those of non-adjacent ones, particularly in the significant AOPs; and within the adjacent KERs, those marked as having high evidence in the AOP-Wiki also had greater similarities.

Besides the event similarities within an AOP, its semantic coherence could also be assessed indirectly by grouping its member events as a query profile and comparing it against other pre-established profiles based on empirical evidence. Many of the 240 query profiles mapped to themselves and to numerous target profiles anchored by genes, pathways, and diseases (Figure 3, Table 2A), indicating broad alignments of some chemical- inducible toxicities with the phenotypic consequences of disruptions at various levels of biological organization in vertebrate species. The AOPs mapped most to mouse genes, followed by the diseases of Online Mendelian Inheritance in Man (OMIM; https://www.omim.org) in HP terms, human genes, zebrafish genes, and OMIM diseases in MP terms. The taxonomy of these mappings coincided with the intended applicability of many AOPs in rat, mouse, human, and fish (S. File 3). In total, 14,090 of these targets were considered significant at P0.01—almost a third of all targets compared. This number was likely inflated, however, for reasons to be discussed later.

With the mean pairwise event similarity within an AOP taken as a measure of its semantic coherence, the relationship between the number of biological mappings of an AOP and its semantic coherence was examined by dividing 212 AOPs into two groups of significant (P0.05) vs. non-significant AOPs and comparing them against each other (Table 2A). The high-coherence group of 71 AOPs (S. File 3) had a similarity of nearly three-fold of that from its counterpart. With a similar trend, the numbers of genes, pathways, and diseases mapped to the high-coherence AOPs were also more than two, eight, and three times respectively of those from the low-coherence group. When two individual AOPs with widely diverging similarities, one from each group, were compared to each other (Table 2B), a similar trend was evident in the number of mappings but with even greater gaps between the two.

AOP coherence could also be assessed by examining in greater depth the degree of alignment between the member events of an AOP and its mapped biological targets (Table 3). The AOP68 was selected as a representative of highly coherent AOPs. According to the AOP-Wiki, this AOP was designed to model the process of “modulation of adult Leydig cell function subsequent to alterations in the fetal testis proteome”. It contained five events annotated into nine logical definitions (S. Table 3). In total, the AOP68 mapped to 3,187 target profiles at P0.01, including 2,060 genes, 435 pathways, 535 diseases, 142 AOPs, and 15 CSPPs. Of the top five mappings of each category, they all appeared to be involved in reproductive biology, thus in line with the overall biological scope of AOP68. More specifically, the top five genes all contribute to the phenotypes of cellular, endocrine/exocrine glands, and reproductive systems (www.informatics.jax.org). Among the top five pathways, three are obviously involved in reproductive functions (R-MMU-1300652, R-MMU-1300642, R-MMU-375281). Of the remaining two, the pathway of piwi-interacting RNA biogenesis (R-HSA-5601884) participates in male germline development (Pillai and Chuma, 2012; Ozata et al., 2019) while linoleic acid metabolism has reproductive implications as well (Wathes et al., 2007). All the top five mapped diseases and AOPs were clearly reproduction-related in nature. Not as obvious in this regard were some of the CSPPs such as atrazine-rat, malathion-rat, and cypermethrin-rat, which were either herbicide or organophosphate insecticides. While they were not listed as stressors to this AOP, there was evidence indicating their involvement in reproductive functions (Wang et al., 2019).

To analyze AOPs and their targets of chemicals, genes, pathways, and diseases in a network context, statistically significant query-target mappings were graphed into a network of nodes with their connecting edges weighted by the corresponding similarity scores (Figure 4A). There were at least five major clusters in the network, one of which (marked by the dashed oval) contained some of the most highly similar AOPs and their targets. When a minimum similarity was arbitrarily set at 0.7, this cluster was reduced to 94 nodes, including 32 AOPs, 35 genes, five pathways, and 22 diseases (S. File 4). Most of them appeared to be related to reproductive biology in some ways. To further examine this “reproductive” cluster, AOP68 was again selected along with its first neighbors (directly connected nodes) and all their connections to form a subnetwork (Figure 4B). As expected, by bringing in additional mappings of AOP68 (Table 3), this subnetwork greatly expanded the biological scope of the four highlighted AOPs linked together by shared events alone. And the subnetwork was highly enriched with reproductive functions and processes. In addition, this subnetwork revealed more clearly the phenotypically weighted interrelationships between AOP68 and its top mappings, and among the top mappings themselves. For example, high similarity was evident between AOP268 and AOP238, between AOP216 and AOP238, as well as between AOP268 and R-MMU-1300652. In contrast, the AOP68 network based on event-sharing only was limited to four AOPs (Figure 4B) but at the level of individual events, thus effectively allowing the formation of alternative linear AOPs from different paths between an MIE and AO (Figure 5A). Judged by their titles, several events appeared to be redundant and could be further consolidated (Figure 5B).

4. Discussion

The AOP framework facilitates the paradigm shift in toxicology from animal testing to a more resource-efficient approach; one that focuses on mechanistic pathways, leveraging high-throughput screenings of effects, and predictive toxicology (NRC, 2007). As more and more putative AOPs are developed manually by domain experts, there is a need for objective methods to independently assess their qualities. With the ongoing advances in phenomics (Houle et al., 2010; Brown et al., 2018) and the availability of a wide variety of OBO domain ontologies, semantic analysis has the potential to help meet this need. This study was undertaken to evaluate the utility of this approach in AOP development.

The events in the AOP-Wiki were first annotated into computable logical definitions. They were then merged with many OBO ontologies of relevance and reasoned into a unified common ontology, which is essentially a directed acyclic graph. The AOP event-derived logical definitions were further organized into profiles according to individual AOPs and compared semantically against a collection of public profiles by genes, pathways, diseases, as well as themselves, based on the common ontology. To discuss the findings in this study, this section is arranged into the following parts: 1) the curation completeness of AOP events as related to their ontological annotations; 2) AOP quality as measured by their semantic coherence and the biological alignment with pre-established profiles of genes, pathways, and diseases; 3) informing AOP biology by network analysis; and 4) perspectives on future development of additional AOPs based on organizing AOP events and/or logical definitions according to their semantic similarities to AOs of interest.

4.1. Curation of AOP events and ontological annotation

The quality and efficiency of ontological annotation of AOP events depends on the amount of biological details in the AOP-Wiki as provided by their original contributors. While it is possible to computationally map free-text information to individual ontology terms in batch, the creation of a full logical definition still requires manual intervention and presents a major bottleneck in the process of semantic analysis. To accelerate this process, the complete information about an event with regard to the relevant gene, protein, molecular function, biological process, anatomical parts, and life stage is important so no additional effort has to be made to delineate its biology. In addition, greater details about an event also facilitate the selection of appropriate object properties and makes its logical definition more specific. The nature of an ontology is such that, the more specific an annotation is, the more informative its derived OWL class will be in later analysis. Furthermore, the practice of reusing existing AOP events should be encouraged. Duplicative events not only slow down AOP development but also introduce bias into semantic analysis later. And lastly, a composite event where distinct sub-events are evident is less preferable as it typically needs to be split into multiple logical definitions. In this study, the ontological annotation of events was primarily based on their title descriptions. For many of them, a title description alone was adequate. A case in point is KE525, defined as “apoptosis of adult Leydig cells, decreased testosterone by adult Leydig cells”. This title describes two likely related sub-events in a specific cell type at a given life stage: a celluar process and the reduction of a specific hormone. While their causal relationship was not explicitly made clear anywhere in the AOP-Wiki page and there was a lack of details about the mechanism and assay methods, it could still be satisfactorily annotated into two logical definitions: RLWIEIB_1005286__KE525, has part some (decreased rate that (inheres in some (testosterone biosynthetic process that (occurs in some (Leydig cell that (part of some adult organism)))))); RLWIEIB_1005287__KE525, has part some (present that (inheres in some (apoptotic process that (occurs in some (Leydig cell that (part of some adult organism)))))). A composite event like this was often seen in the AOP-Wiki. Its derived multiple logical definitions were later treated effectively as distinct events in semantic analysis without changing the outcome for their parent AOP. When an event lacks both an informative title and details about its mechanisms and assays, it would necessitate more lengthy research to resolve biological ambiguities for a better annotation. An example in this case is KE1421, “Activated, LXR”, for which hardly any information was provided. After some effort, it was later annotated as: RLWIEIB_1006100__KE1421, has part some (increased rate that (inheres in some (nuclear receptor activity that (has participant some oxysterols receptor LXR-alpha)))), considerably more specific and informative than the original description.

4.2. AOP quality by semantic coherence and biological alignment

Semantic analysis provides an objective approach to assess the quality of AOPs. An AOP could be regarded as a phenotypic continuum from an MIE to an AO containing only selected causal events with their intermediate phenotypes left out. However, because of the logically interconnected nodes in a merged and reasoned phenotype ontology graph, the undescribed intermediate phenotypes can still be inferred by their semantic similarities to the corresponding AOP events. During development, events and their respective KERs were constructed and placed into various AOPs by domain experts. The quality of an AOP could perhaps be considered by its event composition (a given combination of events selected as representative milestones in a biological effect continuum), event order (the sequence of events indicating the biological progression from an MIE to an AO), and biological alignment (mappings of AOPs to the pre-established profiles of genes, pathways, and diseases). The first aspect could be indirectly assessed by examining the mean event similarities within an AOP, or semantic coherence. The second aspect could be addressed by evaluating the similarities of event pairs underlying KERs in the statistically significant AOPs versus those of the overall group. The third aspect could be examined by comparing the number of mappings of significant AOPs to that of non-significant AOPs as well as by comparing the purported biology of an AOP to that of its mappings. It is reasonable to assume that the member events causally leading to a common AO in an AOP should be phenotypically more related to one another than a random group of events of the same size. In other words, a valid AOP should be semantically coherent. Likewise, two causally related events in a KER should also be more similar to each other than a random pair and more likely to be enriched in an AOP. The events of an adjacent KER should be more similar than those indirectly connected, non-adjacent ones. And if an AOP is biologically valid, it should align with some of the pre-established profiles of genes, pathways, and diseases, as well as toxicologically related chemicals and other AOPs. Of the 212 AOPs analyzed, about a third had their mean pairwise event similarities within AOPs significantly greater than those of same-sized random groups at P0.05 (Table 1A), a strong indication of their phenotypic consistency among member events—thus semantic coherence. The validity of the significant AOPs is further strengthened by the observations on their KERs: that the event pairs of adjacent KERs were more similar than those of non-adjacent ones; that the event pairs of adjacent KERs with stronger biological weight of evidence were more similar than those of KERs in general (Table 1B). These findings thus indicate a broad agreement of the contributions made by domain experts with the AOP-Wiki development guidelines. Additional evidence for these AOPs comes from the finding that they mapped to a far greater number of genes, pathways, and diseases than non-significant AOPs and a random group of events (Table 2A, 2B). A closer examination of one of the significant AOPs, AOP68, reveals that this reproductive-function-oriented AOP mapped to many genes, pathways, and diseases sharing similar biology (Table 3). Overall, these findings largely support the utility of ontology-based semantic analysis in the validation of putative AOPs.

As to the remaining non-coherent AOPs, there are several possible interpretations. One is that some of them may have missing or misplaced events due to either biological knowledge gaps or human errors, as many AOPs were considered putative and still under development (S. Table 1). Alternatively, biological misinterpretation or ontological misrepresentation may have occurred to some of the events during their annotations into logical definitions. A third possibility is that the underlying ontologies, all work in progress and evolving, may still lack appropriate terms to accurately represent some of the events or the phenotypes of genes, pathways, and diseases. If so, the impact would be carried over into the derived profiles. Finally, the cross-species, cross-domain vertebrate phenotype ontology in this study may not be suited for many of the AOPs developed for non-vertebrates (S. File3) unless they describe biology at more fundamental levels conserved across evolutionarily distant taxa. As to the specific case of the discrepancy between the OECD-endorsed AOPs and their semantic coherence, its underlying cause remains unclear. On the one hand, the OECD-endorsed AOPs have all passed rigorous scientific evaluations, but on the other, it is difficult to dispute the assumption that a valid AOP should be semantically coherent. Besides aforementioned possibilities, another likely contributing factor here is the applicable domain information missing in some of the original events and their logical definitions. Such a lack of specificities with regards to relevant cells, tissues, organs, developmental processes, or life stages could distort the assessment of semantic coherence of the KERs and AOPs with narrowly defined biological scopes. This issue may be further exacerbated when the underlying ontology has inadequate coverage of the specific biology under consideration. Given probably a sizable amount of information in the AOP-Wiki overlooked by using annotated events alone, a better strategy, besides a more comprehensive phenotype ontology, lies in the joint consideration of events and KERs in AOP evaluations. This requires the future development of logical definitions dedicated to KERs, which is far more complex than annotating individual events themselves.

It should be noted that the total number of significant mappings by query profiles to their targets (Figure 3, Table 2A) is probably inflated. There may be several contributing factors. First, the target profiles comprised orthologous genes and pathways from three vertebrates, as well as human diseases coded by HP or MP terms. Many of them thus have redundant information and lead to duplicated mappings. Second, the significance testing of a query profile to a target is based on resampling from an ontology graph with many logically connected nodes. At times, resampling may be biased because some sampled ontology classes, depending on the query profile size and the number of random profiles generated, could be correlated (Wang et al., 2019). Further research is needed to study how such a bias may impact statistical stringency and if there is a better alternative testing method. Finally, in the case of AOPs, many of them had shared events. Some events were also clearly duplicated. As a result, many more AOPs would become semantically similar to one another.

4.3. Informing AOP biology by network analysis

Semantic AOP networks expand and complement the default AOP networks based on event-sharing. AOP networks emerge by default because multiple AOPs become interconnected when they contain shared events (Figure 5A, 5B; Villeneuve et al., 2014a, b). Similar, but semantics-based, networks could also be constructed among AOPs and its mapped targets based on their similarity scores of the respective profiles (Figure 4A). These two types of AOP networks differ in their organizing principles, scopes, and probably research applications. Instead of being driven by individual shared events, semantic AOP networks account for all events in an AOP phenotypically. Such networks in effect offer a relational and weighted view of AOPs based on their composite phenotypes, each of which was represented by a logical definition. For example, the default AOP68 subnetwork contained the member events of four AOPs, describing how decreased levels of testosterone in fetal Leydig cells as a result of six MIEs (1. increased corticosterone, 2. activation by other glucocorticoid receptor agonists, 3. increased estradiol, 4. activation by other estradiol agonists, 5. alterations in the fetal testis proteome, and 6. hypermethylation in the fetal testis) leads to two AOs: decreased sperm quantity/quality in the adult, decreased fertility; decreased fertility in the adult, decreased sperm quantity and/or quality in the adult testis. Since the two AOs are almost identical, they should probably be merged. Likewise, the events 540, 657, and 660 may be consolidated into a single event too (Figure 5B). There are likely cases, however, where event duplications may not be as obvious, leading to potential errors in network topology. A great advantage of event-sharing based networks, on the other hand, is that their interconnected member events enable detailed analysis of network topology and the interactions among various types of events across AOPs (Knapen et al., 2018; Villeneuve et al., 2018). These interactions are in effect different possible paths between an MIE and AO, leading to the formation of novel candidate AOPs. In contrast, since semantic AOP networks do not consider events individually, their duplications should present less of an issue. Also apparent in semantic AOP networks is their much wider biological context as shown in the example of AOP68 (Figure 4B): additional connected AOPs such as AOP69, AOP70, and AOP238; many other genes, pathways, diseases, and chemicals similarly involved in reproductive biology. Such an expanded network allows not only a more contextual understanding of how a perturbation may propagate through multiple levels of biology and its range of adverse effects, but also specific hypotheses to be formulated for testing which path might be activated under what conditions. As such, the AOP networks based on semantics and event sharing should complement each other well.

4.4. Perspectives on semantic AOPs

Semantic analysis could have potential applications in developing putative AOPs. Historically, the development of putative AOPs has been conducted manually by domain experts. While this effort is largely successful, there is room for improved efficiency and reduced subjectiveness. As demonstrated in this study, ontology-based semantic analysis is quite effective for assessing the quality of the putative AOPs. It is thus reasonable to assume that this approach could be adopted for constructing additional putative AOPs in the future. The candidate events semantically similar to an MIE or AO of interest could be identified and arranged along a gradient of similarities. There are at least two potential sources for candidate events including MIEs and AOs. The first is the AOP-Wiki itself. This can be illustrated by using the AOP148 as an example. Titled “EGFR Activation Leading to Decreased Lung Function”, it contains ten events and is currently marked as under development (Table 4). This AOP had a very low mean pairwise event similarity and achieved few mappings (Table 2B). In fact, for reasons unclear, the similarities between the AO event 1250 and events 941, 919, 920, 921, 924 were all zeros. If a wider collection of the AOP-Wiki events is considered for their similarities to the AO event 1250, multiple alternative models to the current AOP148 can be proposed for further evaluation by domain experts. For example, one such model may start with the event “oxidative stress” as an MIE instead (Table 4; Hauber et al., 2006) and bring in several events as members with successively greater similarities to the AO. The second source for candidate events could be OBO ontologies, particularly GO, HP, MP, ZP, DO, and MPATH, which cover biological processes/functions, phenotypes of human/mouse/zebrafish, diseases, and pathologies across the levels of biological organization. One advantage of constructing an AOP by using OBO ontology terms is that it readily enables semantic analysis of AOPs without any additional annotations. The granularity of candidate events can also be easily controlled due to the inherent logical structure in ontologies. Because of the numerous terms in these ontologies potentially sharing similarities to an MIE or AO, there will be many possible models to choose from for a putative AOP of interest. One such alternative model based on ontology terms is shown as another example for the event 1250 (Table 4). Because of their reliance on the similarities of candidate events to either MIEs or AOs, semantic AOP models, as seen here in the two proposed alternatives, may appear to be somewhat deviated biologically from the original constructs. Also, notably absent in the proposed construct for AOP148 are GO terms representing molecular processes. It turned out that hundreds of them mapped to the event 1250 all had low similarities (around 0.1). In fact, of several dozen ontology terms examined representing lung development, morphology, disease, and function, many of them had maximum similarities to GO processes less than 0.2, with a few exceptions. The DOID_3082 (interstitial lung disease), DOID_850 (lung disease), UBERON_0002048 (lung), and UBERON_0019190 (mucous gland of lung) were similar to GO_0008150 (biological process) at 0.28. The GO_0030324 (lung development) and GO_0060425 (lung morphogenesis) were similar to GO_0048513 (animal organ development) and GO_0009887 (animal organ morphogenesis) at 0.88 and 0.92 respectively. The lack of mappings to more specific GO molecular processes at higher similarities by both the event 1250 and other lung-related terms here is not attributable to the limited representation (9450/42938 = 22%) of all GO processes in the cross-species phenotype ontology powering this study because the processes excluded from the latter are not connected to the phenotypes.

Table 4.

Current AOP148 (under development) and its two proposed alternative constructs.

Event ID Similarity to the AO Event definition
Current model
941 (MIE) 0 Activation, EGFR
919 0 Occurrence, trans-differentiation of ciliated epithelial cells
920 0 Occurrence, metaplasia of goblet cells
921 0 Occurrence, hyperplasia of goblet cells
923 0.244 Increase, proliferation of goblet cells
924 0 Activation, Sp1
914 0.244 Decrease, apoptosis of ciliated epithelial cells
962 0.571 Increase, mucin production
1251 0.244 Chronic, mucus hypersecretion
1250 (AO) --- Decrease, lung function
Alternative 1: from the AOP-Wiki events
1088 0.244 Increased, oxidative stress
1438 0.244 Increased production of pulmonary, pro-inflammatory cytokines
923 0.244 Increase, proliferation of goblet cells
149 0.270 Increase, inflammation
914 0.244 Decrease, apoptosis of ciliated epithelial cells
962 0.571 Increase, mucin production
445 0.597 Increased, respiratory distress/arrest
735 0.709 Increase, hyperplasia (terminal bronchiolar cells)
1250 (AO) --- Decrease, lung function
Alternative 2: from OBO ontologies
MP:0003674 0.244 Oxidative stress
MP:0014030 0.246 Abnormal mucous gland physiology
MP:0008713 0.258 Abnormal cytokine level
HP:0002781 0.586 Upper airway obstruction
MP:0010861 0.632 Increased respiratory mucosa goblet cell number
MP:0011141 0.709 Increased lung endothelial cell apoptosis
MP:0001861 0.715 Lung inflammation
HP:0006703 0.717 Aplasia/hypoplasia of the lungs
HP:0002088 0.830 Abnormal lung morphology
1250 (AO) --- Decrease, lung function

5. Conclusion

In summary, by annotating toxicity responses described in unstructured text into computable logical definitions and taking advantage of vast and growing public phenomics resources including many domain ontologies, this study demonstrated that a significant number of putative AOPs currently under development were semantically coherent and well aligned with the genes, biological pathways, and diseases underlying the intended chain of events leading to the respective AOs. Besides its utility in objectively evaluating the quality of putative AOPs, this semantic analysis approach could potentially aid their future development as well by selecting from both the existing AOP events and the wider collection of various ontology terms as candidate events based on their semantic similarities to an MIE or an AO of interest. The bottleneck in this approach currently lies in the annotation of phenotypic responses into logical definitions, which is mostly a manual effort (Mungall et al., 2011; Köhler et al., 2011). While the discovery of appropriate ontology terms from unstructured text is relatively straightforward with tools currently available, the challenge lies in the subsequent step of binding these terms into biologically sound logical definitions using appropriate object properties such as those from the Relations Ontology to properly reflect the spatial, temporal, anatomical, mechanistic, and other significant relationships of the denoted entities. With the active research ongoing and advances made in ontology learning (Wong et al., 2012; Asim et al., 2018; Alobaidi et al., 2018), however, constructing logical definitions from unstructured text should become more automated in the future.

Supplementary Material

1

Acknowledgements

The author thanks many members of the toxicology community for their contributions to the development of AOPs thus making this study possible. Special thanks are due to Cataia Ives, Jennifer Olker, and Kellie Fay for their critical reviews of this manuscript. The author also deeply appreciates the valuable feedback from two anonymous reviewers during the peer review. The information in this document is funded wholly (or in part) by the U.S. Environmental Protection Agency. It has been subjected to review by the EPA Center for Computational Toxicology and Exposure and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.

Abbreviations

AO

adverse outcome

AOP

adverse outcome pathway

Bfo

Basic Formal Ontology

Chebi

Chemical Entities of Biological Interest ontology

Cl

Cell Ontology

CSPP

chemical species phenotypic profile

Doid

Disease Ontology

DRE

Danio rerio

DSODP

Dead Simple OWL Design Patterns

Ecocore

an Ontology of Core Ecological Entities

EQ

entity-quality

Geno

Genotype Ontology

Go

Gene Ontology

Hp

Human Phenotype Ontology

HSA

Homo sapiens

KE

key event

KER

key event relationship

MIE

molecular initiating event

MMU

Mus musculus

Mp

Mammalian Phenotype Ontology

Mpath

Mouse Pathology Ontology

NAM

new approach methodology

Nbo

Neuro Behavior Ontology

Ncit

National Cancer Institute Thesaurus OBO Edition

NRC

Nation Research Council

Obi

Ontology for Biomedical Investigations

OBO

Open Biological and Biomedical Ontology

OECD

Organisation for Economic Co-operation and Development

OMIM

Online Mendelian Inheritance in Man

OS

Mapping, ontology-based semantic mapping

OWL

web ontology language

Pato

Phenotype and Trait Ontology

Pco

Population and Community Ontology

Pr

Protein Ontology

SAAOP

Society for the Advancement of Adverse Outcome Pathways

Uberon

Uberon Multi-Species Anatomy Ontology

Uo

Units of Measurement Ontology

Vt

Vertebrate Trait Ontology

Zfa

Zebrafish Anatomy and Development Ontology

Zp

Zebrafish Phenotype Ontology

Footnotes

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Alobaidi M, Malik KM, Sabra S (2018). Linked open data-based framework for automatic biomedical ontology generation. BMC Bioinformatics 19:319 DOI: 10.1186/s12859-018-2339-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ankley GT, Bennett RS, Erickson RJ, Hoff DJ, Hornung MW, Johnson RD, Mount DR, Nichols JW, Russom CL, Schmieder PK, Serrrano JA, Tietge JE, Villeneuve DL (2010). Adverse outcome pathways: A conceptual framework to support ecotoxicology research and risk assessment. Environ. Toxicol. Chem 29(3), 730–41. DOI: 10.1002/etc.34. [DOI] [PubMed] [Google Scholar]
  3. Asim MN, Wasim M, Khan MUG, Mahmood W, Abbasi HM (2018). A survey of ontology learning techniques and applications. Database. 1–24 DOI: 10.1093/database/bay101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Brown S, Holmes CC, Mallon AM, Meehan TF, Smedley D, Wells S (2018). High-throughput mouse phenomics for characterizing mammalian gene function. Nat. Rev. Genet 19, 357–370. 10.1038/s41576-018-0005-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Gkoutos GV, Schofield PN, Hoehndorf R (2017). The anatomy of phenotype ontologies: Principles, properties and applications. Brief. Bioinform 10.1093/bib/bbx035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gruber TR (1995). Toward principles for the design of ontologies used for knowledge sharing? Int. J. Hum. Stud 43 (5–6), 907–928. DOI: 10.1006/ijhc.1995.1081. [DOI] [Google Scholar]
  7. Hauber H, Foley SC, Hamid Q (2006). Mucin overproduction in chronic inflammatory lung disease. Can. Respir. J 13(6), 327–335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Hoehndorf R, Schofield PN, Gkoutos GV (2011). PhenomeNET: a whole-phenome approach to disease gene discovery. Nucleic Acids Res. 39 (18). DOI: 10.1093/nar/gkr538. e119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Houle D, Govindaraju DR, Omholt S (2010). Phenomics: the next challenge. Nat. Rev. Genet 11, 855–866. [DOI] [PubMed] [Google Scholar]
  10. Knapen D, Angrish MM, Fortin MC, Katsiadaki I, Leonard M, Margiotta-Casaluci L, Munn S, O’Brien JM, Pollesch N, Smith LC, Zhang X, Villeneuve DL (2018). Adverse outcome pathway networks I: Development and applications. Environ. Toxicol. Chem 37(6): 1723–1733. DOI: 10.1002/etc.4125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Köhler S, Bauer S, Mungall CJ, Carletti G, Smith CL, Schofield P, Gkoutos GV, Robinson PN (2011). Improving ontologies by automatic reasoning and evaluation of logical definitions. BMC Bioinformatics. 12:418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Köhler S, Doelken SC, Ruef BJ, Bauer S, Washington N, Westerfield M, Gkoutos G, Schofield P, Smedley D, Lewis SE, Robinson PN, Mungall CJ (2013). Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research. F1000Res. DOI: 10.12688/f1000research.2-30.v2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Lin D (1998). An information-theoretic definition of similarity Proceeding in 15th International Conference of Machine Learning. Madison, WI, 296–304. [Google Scholar]
  14. Mungall CJ, Gkoutos GV, Smith CL, Haendel MA, Lewis SE, Ashburner M (2010). Integrating phenotype ontologies across multiple species. Genome Biol. 11 (1), R2 DOI: 10.1186/gb-2010-11-1-r2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Mungall CJ, Bada M, Berardini TZ, Deegan J, Ireland A, Harris MA, Hill DP & Lomax J (2011). Cross-product extensions of the gene ontology. J. Biomed. Inform 44(1), 80–86. DOI: 10.1016/j.jbi.2010.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. National Research Council (2007). Toxicity Testing in the 21st Century: A Vision and a Strategy. The National Academies Press, Washington. [Google Scholar]
  17. Organization for Economic Cooperation and Development. (2018). Collaborative Adverse Outcome Pathway Wiki. https://aopwiki.org (accessed February, 2018 and 2019) [Google Scholar]
  18. Osumi-Sutherland D, Courtot M, Balhoff J, Mungall C (2017). Dead simple OWL design patterns. J Biomed Semant 8, 18 DOI: 10.1186/s13326-017-0126-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Ozata DM, Gainetdinov I, Zoch A, O’Carroll D, Zamore PD (2019). PIWI-interacting RNAs: Small RNAs with big functions. Nat. Rev. Genet 20, 89–108. [DOI] [PubMed] [Google Scholar]
  20. Pesquita C, Faria D, Bastos H, Ferreira A, Falcão A, Couto FM (2008). Metrics for GO-based protein semantic similarity: A systematic evaluation. BMC Bioinformatics. 9 (Suppl. 5), S4 DOI: 10.1186/1471-2105-9-S5-S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Pillai RS, Chuma S (2012). piRNAs and their involvement in male germline development in mice. Dev Growth Differ. 54(1), 78–92. DOI: 10.1111/j.1440-169X.2011.01320.x. [DOI] [PubMed] [Google Scholar]
  22. Resnik P (1995). Proceeding IJCAI ‘95 Proceedings of the 14th International Joint Conference on Artificial Intelligence, 1:448–453, Montreal, Quebec, Canada: August 20 – 25, 1995. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. [Google Scholar]
  23. Sánchez D, Batet M, Isern D (2011). Ontology-based information content computation. Knowl. based Syst 24, 297–303. [Google Scholar]
  24. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B & Ideker T (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Villeneuve DL, Crump D, Garcia-Reyero N, Hecker M, Hutchinson TH, LaLone CA, Landesmann B, Lettieri T, Munn S, Nepelska M, Ottingerj MA, Vergauwen L & Whelan M (2014a). Adverse outcome pathway (AOP) development I: Strategies and principles. Toxicol. Sci 142(2), 312–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Villeneuve DL, Crump D, Garcia-Reyero N, Hecker M, Hutchinson TH, LaLone CA, Landesmann B, Lettieri T, Munn S, Nepelska M, Ottingerj MA, Vergauwen L & Whelan M (2014b). Adverse outcome pathway (AOP) development II: Best practices. Toxicol. Sci 142(2), 321–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Villeneuve DL, Angrish MM, Fortin MC, Katsiadaki I, Leonard M, Margiotta-Casaluci L, Munn S, O’Brien JM, Pollesch NL, Smith LC, Zhang X, Knapen D (2018). Adverse outcome pathway networks II: Network analytics. Environ. Toxicol. Chem 37(6), 1734–1748. DOI: 10.1002/etc.4124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Wang RL, Edwards S, Ives C (2019). Ontology-based semantic mapping of chemical toxicities. Toxicol. 412, 89–100. DOI: 10.1016/j.tox.2018.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Washington NL, Haendel MA, Mungall CJ, Ashburner M, Westerfield M, Lewis SE (2009). Linking human diseases to animal models using ontology-based phenotype annotation. PLoS Biol. 7 (11). DOI: 10.1371/journal.pbio.1000247. e1000247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Wathes DC, Abayasekara RE & Aitken RJ (2007). Polyunsaturated fatty acids in male and female reproduction. Biol. Reprod 77, 190–201. DOI: 10.1095/biolreprod.107.060558. function. Nat. Rev. Genet. 19, 357–370. . [DOI] [PubMed] [Google Scholar]
  31. Wong W, Liu W & Bennamoun M (2012). Ontology learning from text: A look back and into the future. ACM Comput. Surv DOI: 10.1145/2333112.2333115. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES