Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2022 Jun 10;18(6):e1010148. doi: 10.1371/journal.pcbi.1010148

Deriving time-concordant event cascades from gene expression data: A case study for Drug-Induced Liver Injury (DILI)

Anika Liu 1,2,3,*, Namshik Han 1,4, Jordi Munoz-Muriedas 2,5, Andreas Bender 3,*
Editor: James Gallo6
PMCID: PMC9292124  PMID: 35687583

Abstract

Adverse event pathogenesis is often a complex process which compromises multiple events ranging from the molecular to the phenotypic level. In toxicology, Adverse Outcome Pathways (AOPs) aim to formalize this as temporal sequences of events, in which event relationships should be supported by causal evidence according to the tailored Bradford-Hill criteria. One of the criteria is whether events are consistently observed in a certain temporal order and, in this work, we study this time concordance using the concept of “first activation” as data-driven means to generate hypotheses on potentially causal mechanisms. As a case study, we analysed liver data from repeat-dose studies in rats from the TG-GATEs database which comprises measurements across eight timepoints, ranging from 3 hours to 4 weeks post-treatment. We identified time-concordant gene expression-derived events preceding adverse histopathology, which serves as surrogate readout for Drug-Induced Liver Injury (DILI). We find known mechanisms in DILI to be time-concordant, and show further that significance, frequency and log fold change (logFC) of differential expression are metrics which can additionally prioritize events although not necessary to be mechanistically relevant. Moreover, we used the temporal order of transcription factor (TF) expression and regulon activity to identify transcriptionally regulated TFs and subsequently combined this with prior knowledge on functional interactions to derive detailed gene-regulatory mechanisms, such as reduced Hnf4a activity leading to decreased expression and activity of Cebpa. At the same time, also potentially novel events are identified such as Sox13 which is highly significantly time-concordant and shows sustained activation over time. Overall, we demonstrate how time-resolved transcriptomics can derive and support mechanistic hypotheses by quantifying time concordance and how this can be combined with prior causal knowledge, with the aim of both understanding mechanisms of toxicity, as well as potential applications to the AOP framework. We make our results available in the form of a Shiny app (https://anikaliu.shinyapps.io/dili_cascades), which allows users to query events of interest in more detail.

Author summary

Understanding mechanisms from systems-scale biological data is of great relevance in toxicology as well as drug discovery; however how to generate causal hypotheses instead of correlations is by no means clear. In this work, we study the conserved temporal order of events and present an automatable framework to quantify and characterize time concordance across a large set of time-series. We apply this concept to events derived from time-resolved gene expression and histopathology from the TG-GATEs in vivo liver data as a case study. We were able to recover known events involved in the pathogenesis of Drug-Induced Liver Injury (DILI), and identify potentially novel pathway and transcription factors (TFs) which precede adverse histopathology. As complementary sources of evidence for causality, we additionally show how time concordance and prior knowledge on plausible interactions between TFs can be combined to derive causal hypotheses on the TFs’ mode of regulation and interaction partners. Overall, the results derived in our case study can serve as valuable hypothesis-free starting points for the development of Adverse Outcome Pathways for DILI, and demonstrate that our approach provides a novel angle to prioritize mechanistically relevant events.

Introduction

Adverse drug reactions are a major reason for compound failure in the clinical trials [1,2] and a significant cause for post-marketing withdrawals. To counter exposing patients to these risks, it is desired to identify adverse events earlier in the individual patient but also in the drug development process. Mechanistic understanding of adverse event pathogenesis is crucial in this regard, i.e. to derive early safety biomarkers or in vitro assays. However, current understanding of toxicity is largely incomplete, in particular for complex phenotypes such as organ injury which can usually be caused by a wide range of compounds perturbing the biological system at different points mediated through multiple biological scales and entities [3,4].

Multiple interrelated concepts have been introduced to formalize mechanistic knowledge in the context of toxicity including Adverse Outcome Pathways, AOPs [4,5]. These begin with a molecular initiating event (MIE) which describes the first interaction of the compound with the system, e.g. a target protein, which is then linked to the adverse event (AE) through a causal cascade of key events (KEs) on different biological levels, like activation of cellular pathways or changes in the tissue or organ. Thereby, the Organization for Economic Co-operation and Development [6] published three criteria to evaluate causality between events within AOPs based on the original Bradford Hill considerations established in the context of epidemiological studies [7] and previous work on the related Mode Of Action (MOA) concept [8]: i) Biological plausibility, ii) essentiality of key events and iii) empirical support for key event relationships. Empirical support for a causal relation between events Ecause, which could be a MIE or KE, and Econsequence, which could be an AE, is further separated into time concordance (Ecause happens before Econsequence), dose concordance (Ecause happens at lower dose than Econsequence) and incidence concordance (Ecause affects a larger population and is hence more frequent than Econsequence).

Computational approaches can thereby support these predominantly expert- and knowledge-driven mechanistic efforts by prioritizing mechanistically relevant events or by providing additional insight on the relation between an event and a given phenotype. For instance, computationally predicted AOPs (cpAOPs) prioritize plausible events and event relationships as starting points for expert-driven AOP development by integrating functional and statistical associations between biological entities on different levels [911]. In contrast, probabilistic quantitative AOPs (qAOPs) provide additional insight on the predictivity of key event relationships (KERs) by aiming to predict the adverse event from in vitro assays implementing the expert-curated AOP as scaffold [12,13]. To support increasing implementation of in vitro methods, however, it needs to be better understood which readouts describing molecular or cellular effects are indicative for systems-level adverse effects in the first place.

Biological readouts such as transcriptomics are particularly suited to study such intermediate key events as they provide broad insights into cellular changes, e.g. in contrast to target profiling, which can then lead to the identification of predictive signatures and mechanistically relevant insights. This is for example true in the context of DILI [1418], which is a major cause for attrition in drug development and accounts for around half of the cases of acute liver failure in the US and European countries [19,20]. In this regard, in particular time-series data is interesting as it is able to trace the dynamic effects throughout pathogenesis. Previous studies focussed on the time (and dose) dependence of gene expression-derived events in the context of adverse findings [21,22], so the changes of individual events across changes in time (and dose), and also aimed to predict later adverse findings from fixed early timepoints [14,15]. From a mechanistic perspective, however, neither activation at a certain timepoint nor a certain progression over time is mandatory, but only time concordance, so activation of the key event before the downstream adverse effects.

In this study, we hence quantify time concordance across gene expression- derived cellular events and adverse events based on histopathology across a wide range of compounds. To do so, we introduce the concept of “first activation” for mechanistic analysis, which focusses only on the earliest timepoint an event can be reliably detected and then orders events within a time-series by their timepoint of first activation (Fig 1A). In contrast to previous time concordance analyses in AOPs which addressed a defined set of KER and known KE [2325], this analysis derives statistical evidence for temporal concordance across time-series and can do so for any combination of events based on gene expression or histopathology. Although the confidence of these temporal orders per time series is limited by the noisiness of gene expression data and the low time resolution, statistical significance can be evaluated across time series and we only consider relations time-concordant if the preceding event is significantly more frequently found before the later event than in time series without it (Fig 1B). Furthermore, this also allows us to separate out events which depict general perturbation response but are unspecific, as well as rare events, which are predictive but only observed for a small subset of compounds.

Fig 1. Quantifying time concordance based on first activation.

Fig 1

(A) The event activation of the events A-D and the later event is shown over time, as well as their timepoint of first activation, at which the event first passes the defined activation criteria. If an event takes place before a defined later event, which in our study is adverse histopathology, it is time-concordant. Time concordance indicates that there is potentially a causal relation between both events, and is distinct from time-dependence which is defined based on the correlation to the later event or time. (B) Based on the frequency of an event before or at the same time as the later event and its frequency in background time-series without the later event, a confusion matrix and different time concordance metrics can be derived. (C) Time concordance can both prioritize novel links and provide further evidence on potential mechanistic links between events. Events are indicated as nodes and mechanistic links between them as edges.

We demonstrate the utility of this concept in this work using liver gene expression and histopathology data from repeat-dose studies in rats provided by the TG-GATEs database (S1 Fig). This allows us to take advantage of previous data curation and work on the dataset itself, in particular by Sutherland et al. [15] who provide an adverse classification of each compound-dose combination and toxscores summarising histopathological findings in each condition. Furthermore, Drug-Induced Liver Injury (DILI) is well understood in comparison to other organ-level toxicities and we hence know which processes are expected to precede injury, including cell death, inflammation and other adaptive stress responses [19]. The concept, however, is generally applicable beyond the toxicity area and transcriptomics data and can be used to derive mechanistic event cascades from time-series data of any kind as long as the first activation of events within the time-series can be defined.

We first describe the time concordance for known processes, similar to mechanistic qAOPs, and then prioritize predictive, time-concordant KE providing a strong data-driven, automatable starting point for AOP development, aligning with the objective of cpAOPs (more detailed comparison in S1 Table). We then combine data-driven time concordance and prior knowledge on event relations between transcription factors (TFs) and gene expression to generate hypotheses for causal gene-regulatory mechanisms in DILI pathogenesis and to generally show how time concordance can stratify and support other streams of causal evidence. Overall, we show that time-resolved gene expression and histopathology data can be used to quantify time concordance across a large set of compounds and events, which allows us to characterize known mechanistic links and to prioritize new ones (Fig 1C).

Results and Discussion

In order to derive the time concordance between cellular events and later adverse histopathology, we use the workflow outlined in Fig 2 with each step being also introduced in the subsequent sections and details on their respective implementation being described in Methods. We first derived TF and pathway activity across expression profiles from the same experiment and subsequently defined the first up- or downregulation TFs or pathways as events. Furthermore, we obtained binary histopathology labels describing the occurrence of each histopathological finding at different levels of severity and frequency from the toxscores provided by Sutherland et al. [15]. Subsequently, we derive the earliest timepoint of each event, e.g. pathways or adverse histopathology, within each time-series. As last step of the time concordance analysis, we then evaluate which gene expression-derived events are significantly enriched before or at the time where adverse histopathology is found, as well as additional time concordance metrics outlined in Table 1.

Fig 2. Workflow to quantify time concordance between preceding gene expression-derived events and later adverse histopathology.

Fig 2

First, events are derived from the gene expression and histopathology data. Pathway and TF activity is inferred based on the expression of the respective gene sets using GSVA [26] and binary histopathology labels are derived from the continuous toxscores. Secondly, we derive the first activation of expression-based events as well as of adverse histopathology. Lastly, we quantify the time concordance between potential preceding events (PE) which are derived from gene expression and adverse histopathology as potential later event (LE).

Table 1. Metrics quantifying the time concordance between a potential preceding event PE and potential later event LE, and their relation to the original Bradford Hill (BH) considerations.

BH consideration Metric Formula Description
Consistency True positive rate (TPR) p(PELE|LE) Fraction of time series with event PE with specified temporal relation among time series with event LE
Specificity Positive predictive value (PPV) p(PELE|PE) Fraction of time series with event LE with specified temporal relation among time series with event PE
Temporality p-value One-sided Fisher’s Exact test Likelihood of observing event PE and LE with specified temporal relation with equal or higher frequency by chance assuming a hypergeometric distribution.
Strength Effect size in time series with LE Median (logFC) Median logFC of PE observed in time-series with LE (in comparison to vehicle control)

Adverse histopathological findings and their temporal relation

To define the earliest timepoint of adverse histopathology within each time-series, we used the annotations of time series as adverse or non-adverse by Sutherland et al. [15], as well as the toxscores, which summarise the severity and frequency for each histological finding and each compound-dose-time combination as mean severity score and range from 0 (normal) to 4 (severe). These toxscores were used to define three levels for each histological finding: “null” (toxscore > 0), “low” (toxscore > 0.67) and “high” (toxscore >1.34). For example in case of a toxscore of 1, both “null” and “low” are considered to be present. We then evaluated which histology groups were frequently found in the adverse compound-dose combinations (observed in >10% of adverse time series corresponding to at least 5 out of 40 cases) with at least 50% of findings being in adverse time series (Fig 3A). All of the included histology groups are significantly enriched in adverse conditions, however, these criteria were implemented to identify findings with a certain specificity and frequency instead of allowing a trade-off between both. The histology groups which passed the filtering are regarded as adverse histopathological findings and include hepatocellular single cell necrosis and biliary hyperplasia at all toxscore thresholds. In contrast, only some of the three toxscore thresholds were selected with the above criteria for all other findings, e.g. the two higher toxscore cut-offs for hepatocellular necrosis and inflammation and only the “high” cut-off for increased hepatocellular mitosis. In all cases, the lower toxscore level was also frequently observed in non-adverse conditions and hence considered too unspecific. In contrast, only the two milder levels of fibrosis were included in the selection, as severe fibrosis was observed rarely. While we focus on the described definition of adverse histopathological findings in this study, the difficulty in summarising a complex phenotype such as DILI into a binary classification, adverse or not adverse, is well established [27,28] and is also demonstrated by the discrepancies between DILI classifications from DILIst [29], DILIrank [30] and those derived by Sutherland et al.[15] based on the TG-GATEs data (S2 Table). We are aware that also broader or more targeted phenotypes might be of interest, and we hence provide a Shiny app where results for alternative definitions of adverse and non-adverse histopathology groups can be explored.

Fig 3. Distribution and relation of histopathological findings across time series.

Fig 3

A) Histopathology labels are defined for each histopathological finding at 3 different toxscore cut-offs, namely “null” (toxscore>0), “low” (toxscore>0.67) and “high” (toxscore>1.34). For each label, the number of occurrences in the 40 adverse time series and the fraction of adverse time series among all occurrences of the given histopathology label are shown. Histopathological findings, out of which at least 50% and at least 5 of the occurrences were found in adverse conditions timeseries were considered adverse B) Number of conditions with histopathological findings at different timepoints, as well as the frequency of the respective first activations C) Time of first activation across timeseries labelled as adverse or non-adverse. Each time series is annotated with the dose level in repeat-dose studies, as well as with whether or not the time series was considered adverse by Sutherland et al. [15].

For the adverse histopathology labels, the distribution of toxscores and first activation over time (Fig 3B) shows that some findings are predominantly found late, like fibrosis, while others are predominantly found early, e.g. hepatocellular single cell necrosis. Next, out of all 360 time-series with at least 6 measured timepoints, the 61 time-series in which any of the adverse histopathology labels is found were identified, which covered 38 compounds (S2 Table). In those, the earliest evidence of an adverse phenotype is used to approximate the timepoint of the primary adverse phenotype. Across all timeseries with adverse histopathology, we find that hepatocellular single cell necrosis is most frequently the primary adverse phenotype, while biliary hyperplasia at any severity is in most cases a secondary effect (Figs 3C and S3).

Known pathways in DILI preceding adverse histopathology

To identify cellular mechanisms in the early pathogenesis of DILI, we next studied time-concordant cellular changes preceding later adverse histopathology (see Methods). This identified 911 pathway-level events (37.3%), and 108 TF-level events (33.6%) with significant enrichment (p-value<0.05) before or at adverse histopathology. We next evaluated time concordance for a set of ten known events in DILI (Fig 4 and S3 Table). Recycling of bile acids and salts was the most significantly enriched geneset overall and hence also among the ones linked to known events. Also down-regulation of the other bile acid gene sets was significantly enriched (p-value < 0.05) pointing to an overall down-regulation of bile acid metabolism. While cell death was also only found to be up-regulated, dysregulation in both directions was found to precede injury for all other key events (Fig 4). However, only for peroxisomal processes, namely peroxisomal protein import and beta-oxidation of very long fatty acids, both directions were significantly enriched indicating that dysregulation in either direction might be linked to injury. Overall, significantly enriched gene sets are found for all ten represented known events in DILI (p-value < 0.05) indicating that our analysis is able to recover known cellular events.

Fig 4. Enrichment of known events in DILI before adverse histopathology based on gene sets as well as individual gene members.

Fig 4

The enrichment of first activation before or at adverse histopathology is shown for gene sets mapping to known key events in DILI, for which first activation was defined as first timepoint of differential GSVA-derived gene set activity. Furthermore, also enrichment of individual genes within these genesets is shown and was derived based on the first timepoints of differential expression. Aligning with the expected direction, a significant down-regulation of Liver X Receptor (LXR) signalling and bile acid-related pathways is observed, while all other gene sets were found to be more significantly up-regulated. Only for peroxisomal pathways, both directions were significantly enriched indicating that dysregulation in direction might be linked to adverse histopathology.

To gain insights on a more fine-grained level, we next analysed the enrichment of significantly and strongly (absolute log fold change > 1) dysregulated individual genes from the above gene set (S2 File). Among the ten most significantly enriched gene-level events, three are involved in known processes, namely the up-regulation of Acyl-CoA thioesterase 2 (Acot2), Acyl-CoA thioesterase 3 (Acot3) and Carnitine O-Acetyltransferase (Crat) which are involved in fatty acid beta oxidation [31,32]. Multiple genes among the ten most significantly enriched gene-level events are also involved in mitochondrial and peroxisomal processes except for Gadd45a, Growth Arrest And DNA Damage-Inducible Protein which has a known role in hepatic fibrosis [33], Neutral Cholesterol Ester Hydrolase 1 (Nceh1) which is involved in cholesterol metabolism in macrophages [34], Ras-Related Protein Rab-30 (Rab30) which is elevated in early liver regeneration [35], as well as the Serine/Threonine Protein Kinase NIM1 (Nim1k).

For JNK signalling, we did not find any significantly enriched genes indicating that while the overall process is changing none of the individual genes shows strong and frequent expression changes. In contrast, the opposite was found for oxidative stress with the Jun Proto-Oncogene (Jun) being one of the most significantly enriched gene-level events but lacking significant changes on the gene-set level. This shows that both gene- and gene-set level analysis can provide complementary insights into cellular changes preceding DILI, and that in some cases effects can be attributed in individual genes which might give more detailed information about the cellular changes.

While significant enrichment before or at adverse histopathology can be regarded as a necessary criterion for time concordance, the temporal event relationship can be further characterised based on the observed behaviour across experimental conditions which may be useful to further prioritize mechanistically relevant pathways in a hypothesis-free manner. Following the Bradford-Hill considerations, we hypothesize that this might be the case for observed effect size, frequency and specificity of event occurrence before adverse histopathology. Firstly, we investigated how strongly pathways were dysregulated comparing the maximal absolute log fold changes (|logFCs|) before or at adverse histopathology in each adverse time-series for significantly time-concordant events (Fig 5). High median maximal |logFCs| were overall found for mitochondrial and peroxisomal pathways and the highest median maximal |logFC| among all significant events was found for mitochondrial fatty acid oxidation of unsaturated fatty acids. At the same time, however, the high variance for pathways with high median maximal |logFC| as well as the only moderately high |logFC|s observed for other known pathways in DILI, such as programmed cell death. This indicates that a high magnitude of |logFC| is not necessary to contribute to an adverse event, but at the same time can be a useful property to further prioritize important pathways.

Fig 5. Observed max.

Fig 5

|logFC| before adverse histopathology. For known processes in DILI which correspond to significantly enriched events before adverse histopathology, the max |logFC| before adverse histopathology is shown. In comparison to other known pathways and the overall background distribution, a high logFC is found for mitochondrial beta oxidation followed by peroxisomal beta oxidation and mitophagy.

We next analysed to what extent dysregulation in a pathway is predictive for later adverse histopathology. To this end, we calculated across how many adverse time-series each pathway is observed, summarised by the true positive rate (TPR), and the positive predictive value (PPV) indicating whether presence of the key event is a confident indicator for the later adverse event (Fig 6). We focus on significantly enriched events only (p-value < 0.05) and find a trade-off with respect to the highest TPR and PPV (Fig 6; for distribution of all events see S4 Fig). This generally shows that either highly frequent events with lower specificity can be identified, e.g. increased mitophagy (TPR: 0.41, PPV: 0.72), or more specific events at the expense of lower relative frequency, e.g. bile acid recycling (TPR: 0.30, PPV: 1).

Fig 6. True positive rate (TPR) and positive predictive value (PPV) before or at histopathology of genes and gene sets in known key events in DILI.

Fig 6

Events related to the given known key event are shown in red or blue indicating an up- or downregulation, respectively. Genes with a p-value < 0.0001 involved in known key events in DILI are additionally labelled. The background distribution of all significantly enriched genes or gene sets is shown in grey (p-value < 0.05).

Surprisingly, lower relative frequencies are particularly observed for stress response and signaling pathways with only Liver X Receptor (LXR)-dependent gene expression linked to lipogenesis reaching a TPR over 20%. One explanation for the lower observed frequencies is that these pathways are predominantly and initially mediated through post-transcriptional alterations instead of gene expression changes [36,37], making the expression of pathway members a weak proxy for pathway activation in early pathogenesis and explaining the overall low frequencies. In fact, one reason LXR-dependent changes might have achieved higher frequencies might be that they explicitly include the downstream regulated genes unlike the other signalling and stress response pathways [38].

Due to the previously discussed complementarity of gene- and gene set-level analysis, we also show TPR and PPV for individual genes with a focus on those which are involved in gene sets mapping to known key events. The most significant genes, already highlighted in Fig 4, reveal a high frequency for the up-regulation of the Acyl-CoA thioesterases Acot2, Acot3 and Acot4, as well as the for the Enoyl-CoA Hydratase 1 Ech1 which aligns with the relatively high frequency of pathway-level events linked to mitochondrial and peroxosomal processes. Furthermore, the most frequent gene-level events with a PPV = 1 are the up-regulation of Activating Transcription Factor 3 (Atf3) which was found to promote hepatic fibrosis [39] and Enoyl-CoA Delta Isomerase 1 (Eci1).

Known TFs in DILI preceding adverse histopathology

To gain insight into signalling and expression regulation preceding adverse histopathology, we next analysed transcription factors (TFs), which are involved in early perturbation response preceding downstream gene expression changes and also are likely to show strong signal in transcriptomics data given their direct link to gene expression. As known TFs in DILI, we thereby included TFs mediating the stress response and signalling pathways already introduced above, as well as nuclear receptors which take in important roles in liver physiology and malfunctions and can be, both, MIEs or KEs (mapping shown in Fig 7A). Consistent with the pathway-level results, an enriched up-regulation was found for Nuclear factor erythroid 2-related factor 2 (Nfe2l2) which is a key mediator of oxidative stress [18,40] as well as for the Nf-κB subunits Rela and Nfkb1 indicating inflammation [41], while the Oxysterols Receptors LXRα (Nr1h3) and LXRβ (Nr1h2) which control lipid metabolism showed enriched down-regulation [42].

Fig 7. Temporal concordance of nuclear receptors and adaptive response transcription factors (TFs) in DILI.

Fig 7

For known TFs in DILI the following time concordance metrics are shown: A) The enrichment significance before or at first adverse histopathology, B) Positive Predictive Value (PPV) and True Positive Rate (TPR), C) Max. mean |logFC| before or at first adverse histopathology. As background distribution in grey, the statistics for all inferred TFs is shown.

For ER stress, we included three TFs mapping to the three branches of unfolded protein response [43]: Activating transcription factor 4 (Atf4), Activating transcription factor 6 (Atf6) and X-box binding protein 1 (Xbp1). Atf4 up-regulation was found to be most significantly enriched, most frequent and also showing the largest logFC (Fig 7). This highlights its overall importance in mediating ER stress and is consistent with the known role for ATF4 in DILI [44]. While Atf4 is a member of the pro-apoptotic unfolded protein response branch, the Atf6 and Xpb1-mediated branches tend to be cytoprotective [45]. In agreement with this, Atf6 was not significantly enriched, however, Xbp1 showed rare but significantly enriched down-regulation.

Transcription Factor AP-1 (Jun), which is one of downstream target TFs of c-Jun N-terminal kinase (JNK) signaling, was not significantly enriched in either direction due its rare activation among adverse time series although JNK signaling up-regulation itself was significantly enriched with Jun up-regulation being one of the most significantly enriched gene-level events. However, JNK signaling is particularly known in acetaminophen-induced liver injury and in this context leads to hepatocyte death through interactions with Sab on the mitochondrial outer membrane and not through transcriptional regulation mediated by AP-1 [46,47]. As increased Jun activity is hence known to be a consequence of JNK signaling but not a cause of injury, it would be plausible to see enriched pathway activity but not in TF activity before adverse histopathology. Overall, we were hence able to show significant enrichment of some of the known TFs in DILI before adverse histopathology and can also biologically reason the absence of significance for others.

While none of the included TF-level events ranked as most significant or most strongly changing before adverse histopathology as in the analysis of pathway-level events, the down-regulation of Nr1h3, which is involved in lipid metabolism, was identified as most frequent event (Fig 7B) indicating that the linked physiological changes are commonly but not specifically found before injury. Similarly, the up-regulation of stress response, indicated by Nfe2l2 and Atf4, was found to be frequent aligning with their role in adaptive stress response [48]. Overall, frequency might hence be a useful metric to identify pre-adverse cellular events which precede injury but are not highly specific.

Data-driven prioritization of cellular events taking place before adverse histopathology

As many events were found to be significantly enriched before adverse histopathology, we next aimed at identifying and characterizing events most supported by time concordance, and hence at moving closer to the eventual aim of constructing AOPs from data. In our analysis, some known events in DILI ranked highest by enrichment p-value while others rank highest by max. |logFC| before adverse histopathology. In contrast, known TFs in DILI were found as most frequent ones in the dataset. We hence next looked into the top 10 TF- and pathway-level events identified using max. |logFC|, the enrichment p-value, and the TPR before or at adverse histopathology. These are shown in Fig 8 and S4 Table, while all time concordance metrics are summarised in S2 File. The most significantly enriched pathway-level event is decreased bile acid and salt recycling and also the down-regulation of multiple metabolic pathways, in particular targeting glycosaminoglycans, is found among the most significant pathway-level events pointing towards reduced liver function. Moreover, the most significantly enriched TF-level event was the down-regulation of Transcription factor activating enhancer binding protein 4 (Tfap4) which shows emerging roles in cell fate decisions [49], and is followed by Homeobox B13 (Hoxb13) for which expression has previously been found to correlate with hepatic inflammation in hepatic fibrosis [50].

Fig 8. Highest ranking events by time concordance metrics.

Fig 8

The ten transcription factor (TF)- and pathway-level events ranking highest by enrichment p-value, median max. |logFC| and true positive rate (TPR) before or at histopathology are shown.

Among the up-regulated events, the most significant enrichment is found for cell cycle checkpoints and DNA repair among the pathway-level events as well as E2F Transcription Factor 2 (E2f2), which controls cellular proliferation and liver regeneration [51], and was found among the most significant TF-level events. E2f2 up-regulation was also identified as 2nd most frequent TF event after the down-regulation of Nr1h3 and among the top 10 most strongly changing TF events further highlighting its strong time concordance. The most frequent genesets point to translation regulation via Eukaryotic translation initiation factor 2A (EIF2a) including the upstream response mediated by eIF-2-alpha kinase GCN2 and the downstream role in protein translation mediated through interactions with tRNA. EIF2a is part of the same branch of Unfolded Protein Response (UPR) as Atf4 and causes its preferential translation which, among others, mediates autophagy and proapoptotic response [43,52] and is a known predictor of DILI [44]. Furthermore, increased folding of actin by Chaperonin containing tailless complex polypeptide 1 (CCT) or tailless complex polypeptide 1 ring complex (TRiC) is found frequently and with large effect size. It has been previously linked to proteostasis and autophagy [53,54], but a role in DILI specifically is not yet known. As most strongly dysregulated events, metabolic pathways are found pointing to increased beta oxidation of fatty acids, as well as decreased cholesterol biosynthesis and tyrosine catabolism. Also the most strongly down-regulated TFs point towards lipid metabolism, i.e. the Sterol Regulatory Element Binding Transcription Factor 1, Srebf1, and the Sterol Regulatory Element Binding Transcription Factor 2, Srebf2, as well as Nr1h3 which controls Srebf1 expression. Overall, the derived time-concordant events, which take place between the beginning of treatment and onset of adverse histopathology, hence include known and plausible events in liver injury which can be further characterized based on their frequency, significance and logFC.

Identifying mechanistic hypotheses combining known TF functions and time concordance

While both pathways and TFs constitute interpretable events in this study, further prior knowledge is available on how TFs can function on a molecular level allowing us to derive more detailed hypothesis. Firstly, TF activity can generally be modulated through changes in expression or in post-transcriptional regulation as consequence of cellular signaling or environmental changes. In case of transcriptional regulation, changes in mRNA levels should precede changes in TF activity estimated based on regulon expression and hence time concordance can be used to gain support for transcriptional TF regulation. Being only interested in TF events with a potential mechanistic link to liver injury, we studied how significantly concordant expression and activity for each TF are enriched before adverse histopathology. The strongest evidence for a role in DILI pathogenesis is found for 18 TF events which show both significantly enriched TF expression and regulon activity, providing complementary evidence of TF importance and hinting at transcriptional regulation (Fig 9A). While this is not the case for the 17 TF events which only show significantly enriched TF activity but insignificant enrichment of differential expression, including increased E2f2 activity, this can be explained by post-transcriptional regulation potentially describing earlier response patterns which are a direct consequence of upstream signaling. In contrast, 35 TF events with only significant gene expression, such as increased Jun or Myc, might be already showing changes in expression but not sufficiently large changes in activity yet. As this rather indicates a role in later pathogenesis and expression is only regarded as supporting evidence, these TFs have not been included in the next analysis steps.

Fig 9. Transcription Factor (TF) activity and expression before adverse histopathology.

Fig 9

A) Significance of enrichment in adverse conditions for matched TF activity and expression-based events. Events only found on the expression or TF level are not included in the figure due to the inability to perform a statistical test for those. B) For significantly enriched TF activity-based events, the True Positive Rate (TPR) of observing TF activity before or at the time of adverse histopathology is shown, as well as the TPR for observing TF expression changes before TF activity in the time series where it precedes adverse histopathology.

To derive stronger mechanistic evidence for induction, we next evaluated how frequently expression changes precede TF activity in the same adverse time-series and compare this against the overall frequency of TF event occurrences preceding adverse histopathology (Fig 9B). Among the events with significant enrichment of TF expression and activity, the most frequent evidence for induction was found for the down-regulation of CCAAT/enhancer-binding protein alpha (Cebpa). In humans, decreased expression of the homologous CEBPA is not only known across liver diseases, exogenously increased CEBPA expression has also been shown to reverse liver injury and is explored as therapeutic target in hepatocellular carcinoma [55]. The event with the 2nd highest relative frequency of expression preceding TF activity as well as the highest frequency of TF activity preceding injury is Atf4, for which expression of the homologous gene in humans is known to be induced as part of the ER stress response contributing to adverse liver phenotypes [56,57]. In contrast, it was found that for the Aryl Hydrogen Receptor (Ahr) and the Peroxisome Proliferator-activated Receptor Alpha (Ppara) changes in expression never preceded those in TF activity which aligns with their roles as nuclear receptors which are generally post-translationally activated via ligand binding [58,59]. As this provides counterevidence for transcriptional induction, these were not included as induced TF in the subsequent analysis.

After investigating the mode of regulation for individual TFs above, we next considered how these TFs are interlinked. To this end, we identified protein-protein interactions and, for induced TFs, TF-target gene interactions between significantly enriched TFs, which showed significant enrichment before adverse histopathology for both expression and regulon activity, as well as evidence of expression preceding TF activity within the same adverse time series. Results of this analysis are shown in Fig 10, and details on the observed absolute and relative frequencies, as well as the source of the interaction are shown in S5 Table. One of the two identified interactions by highest absolute frequency is Nr1h3 down-regulation resulting in reduced Srebf1 activity. Furthermore, Srebf1 is also linked to upstream regulation by Nr1h2 which interacts with Peroxisome Proliferator-Activated Receptor (Ppara) in both directions, and this cross-talk between Ppara and LXR regulating Srebf1 expression has been explicitly studied in the context of fatty acid metabolism regulation [6062]. The 2nd most frequently observed interaction is the down-regulation of Transcription Factor 12 (Tcf12) inducing reduced activity of TEA Domain Transcription Factor 1 (Tead1). While Tead1 is indeed known to be involved in liver diseases and injury [63,64], the interaction itself has not been reported before in the context of liver injury and the same applies also for the other upstream Tead1 regulators identified. It should also be noted that for these interactions first activation is only found at the same time but not in the time-concordant order providing weaker evidence than, for example, the interaction between Nr1h3 and Srebf1. As additional larger TF cluster, decreased activity of the Hepatocyte Nuclear Factor 1 (Hnf1a), Retinoic Acid Receptor alpha (Rara) and Pancreatic And Duodenal Homeobox 1 (Pdx1) was found to lead to decreased expression and activity of Hepatocyte Nuclear Factor 14 (Hnf4a) which is linked to reduced expression and activity of CCAAT/enhancer-binding protein (Cebpa) through edges in both directions. This cluster stands out due to the high confidence score of all interactions, except the edge between Pdx1 and Hnf4a, indicating that there is strong support based on prior knowledge for the involved interactions. Furthermore, it was previously found that artificially increased expression of Hnf4a is able to reverse hepatic liver failure in rats, while also restoring expression of a highly interconnected TF network including Hnf1a and Cebpa which supports the identified interactions [65,66]. Two of the yet unknown TFs in DILI are Meis Homeobox 1 (Meis1) and Meis Homeobox 2 (Meis2) which are generally known in a developmental context [67,68]. However, their down-regulation in early pathogenesis is supported by enriched TF activity, differential expression before adverse histopathology as well as upstream regulators which are also enriched before adverse histopathology.

Fig 10. Causal relationships between TFs supported by time concordance.

Fig 10

For TFs which are significantly enriched before or at adverse histopathology, known causal relations are shown in which the upstream event is found before or at the downstream event in at least 20% of adverse cases. For induced TFs for which expression is found before regulon activity and significantly enriched, not only protein-protein interactions are considered but also upstream TF-target gene interactions annotated with DoRothEA [69] confidence scores (A: High confidence, C: Medium confidence).

Time-concordant events reflecting disease progression

While events do not have to be activated continuously to be causally involved in pathogenesis, events with consistent or increasing activation over time are particularly interesting as biomarkers as they can be experimentally measured without the chance of missing the timepoint of activation, and can potentially reflect disease progression beyond early pathogenesis. We therefore studied which TFs and pathways show time-dependent activation by testing for significant Spearman correlation between activation logFC and time in adverse time-series, and whether this overlaps with the previously derived time concordance (Fig 11). Overall, 118 pathways and 19 TFs were supported by both, significant time concordance and dependence, which represents 86.1% or 70.4% of the time-concordant events, and 59.9% or 48.7% of the time-dependent events, respectively.

Fig 11. Combining time dependence and concordance to identify mechanistically supported biomarkers.

Fig 11

A) The relation between time concordance, quantified by the enrichment p-value for event activation before adverse histopathology, and time dependence, quantified by the meta p-value for Spearman correlation between time and event activation across adverse conditions, is shown. B) For events with the most significant time-dependence, the distribution of correlation coefficients is shown providing further insight into the strength of correlation and consistency across adverse conditions.

On the pathway level, multiple genesets pointed to a reduced level plasma lipoprotein particle assembly and remodelling which indicates changes in lipid distribution. This aligns with the known dyslipidaemia in chronic liver diseases, including decreasing serum values of LDL, HDL, total cholesterol, and triglycerides with increasing severity of disease, based on which previous studies suggested that routine monitoring of lipid profiles can improve the outcome for CLD patients [70]. Furthermore, a down-regulation of response to metal ions was found which could be related to metallothioneins which protect against oxidative stress and are able to chelate heavy metals [71]. Both directions of dysregulation were previously observed in liver diseases: While a negative correlation with disease progression was found in hepatocellular carcinoma [72], a positive correlation was found in most other liver diseases including acetaminophen-induced liver injury [73]. This indicates that opposite directionality is more plausible based on current literature knowledge, but cannot be fully clarified. The most time-concordant and -dependent TF event was down-regulation of SRY-Box Transcription Factor 13 (Sox13) which is generally involved in cell fate [74] and embryonal development [75]. As Sox13 does not yet have well understood functions on a more detailed level, experimental validation of a potential role in DILI would be interesting. In contrast, the next most significant time dependence is found for the hepatocyte nuclear factors Hnf1a and Hnf4a, as well as Cebpa, which are known to negatively correlate with liver cirrhosis in rats [76,77]. Overall, this shows that a mechanistic role for time-concordant and -dependent events is strongly supported by the understanding of adverse liver phenotypes.

While, in general, events with highly significant time dependence also showed highly significant time concordance, some exceptions were found in which only one of both was highly significant. For instance, the pathway with the 2nd most significant time-dependence (p-value < 1044) is signaling via advanced glycosylation end product receptor (RAGE) which contributes to inflammation and oxidative stress generation and did not pass the significance threshold for time concordance (p-value = 0.058). RAGE expression and activity, which are both induced by binding of RAGE ligands, are thereby known to be up-regulated in various hepatic disorders resulting in a positive feedback loop explaining increasing or sustained RAGE activation [78]. This indicates that, while RAGE signaling is correlated with progression, there is no clear evidence for a role in early pathogenesis preceding adverse histopathological changes. In contrast, SUMOylation of TFs, is time-concordant (p-value = 0.002) but not -dependent (p-value = 0.48) indicating a mechanistic role in early pathogenesis which is not sustained over time. This aligns with the finely regulated and pleiotropic roles of SUMOylation in post-transcriptional regulation which have also been found to be involved in the context of liver diseases [79].

Limitations of this study

In this study, we introduced a time-concordance based approach to derive mechanistic insight from gene expression and histopathology data. We were able to recover known mechanisms in DILI as well as able to propose novel and detailed mechanistic hypotheses. However, the present analysis is based on a limited number time-series as well as only few timepoints within each time-series. This does not only mean that rare events might be missed as they occur between measured timepoints and that small effects might not be identified as significant, but also that there is potentially a bias based on the tested compounds towards the represented modes of toxicity.

Furthermore, the analysis is limited by how confidently biological processes are inferred from the data. This was for instance demonstrated by the differences between pathway and TF activation for signalling and stress response pathways highlighting the discrepancy between protein activation and gene expression. As only pathways induced through changes in gene expression or their downstream expression footprints [69] can be confidently detected, this means that good estimates of time concordance can predominantly be derived for intermediate or later key events while preceding key events or molecular initiating events which are not mediated by transcriptional regulation cannot be estimated based on the data. That being said, this is a limitation of gene expression data in general and the time concordance approach would also be able to integrate other data types describing events not covered yet.

Moreover, multiple choices were made to align our analysis to the AOP concept prioritizing mechanisms supported by prior knowledge over purely data-driven hypothesis. First, detailed insights might be lost by summarising results to the pathway level. While generally measurements for individual genes can be noisy, this can be summarised in different ways e.g. based on similarity in expression profiles [15]. In this study, however, we used curated gene sets due to their interpretability and to derive modular events as defined in the AOP framework. Additionally, prior knowledge was taken as ground truth, both in the gene set and interaction analysis, meaning that only generally known pathways and interactions could be discovered. Like all methods based on curated gene set and interactions, it was hence informed and biased by the current understanding of biology. However, this prior biological knowledge contributes to the biological plausibility of the derived events and relationships contributing to the weight of evidence of our findings in the context of AOPs.

Lastly, it should be highlighted that time concordance is necessary for causal relations but not sufficient to prove it. For instance, two events may be time-concordant because they are causally linked to a shared preceding cause. To distinguish these effects, the additional Bradford-Hill considerations can be helpful, but only prior knowledge has been considered in parts of this study. In particular essentiality would provide strong evidence for causality, however, requires targeted experiments and hence is unsuitable for hypothesis generation. In contrast, dose and incidence concordance are generally feasible from a data-driven standpoint but were not pursued in this case study due to the low number of doses and replicates.

Conclusion

In this study, we introduce “first activation” as concept to quantify the strength of temporal concordance between events across time series with the assumption that each activated event may have downstream effects irrespective of whether it is continuously or only transiently activated. With this approach, we study gene expression-based TF and pathway-level events found before adverse histopathology indicating liver injury in repeat-dose studies in rats from TG-GATEs as a case study. We find some known processes in DILI to be highly confident, e.g. bile acid recycling, while others are highly frequent but less specific including adaptive response pathways such as the eIF2α/ATF4 pathway [52].

Beyond quantifying time concordance for known and potentially novel events in DILI, we additionally show how time concordance can be combined with prior biological knowledge to generate hypothesis on potentially causal gene-regulatory cascades in DILI. Amongst others, this identifies LXRα down-regulation leading to decreased Srebf1 expression, an interaction known to regulate fatty acid synthesis in the liver [42], but also characterizes yet unknown TFs based on their time concordance, their mode of regulation (either transcriptional or post-transcriptional) and potential upstream regulators and downstream effectors. Two of the identified induced TFs are Meis1 and Meis 2 which are supported by significantly enriched decrease in expression and activity before adverse histopathology, as well as upstream regulators which also show significant enrichment of regulon activity and are found within the same time series. On top of time concordance, we also derive each event’s time dependence and show that events mechanistically involved in early pathogenesis do not necessarily reflect disease progression and vice versa. However, for some events, e.g. Sox13, both properties are found and these may be useful biomarkers which reflect injury progression and already change preceding histopathological manifestation.

We believe that the described analysis can provide supporting evidence for mechanistic links between events in line with the evolved Bradford-Hill considerations on time concordance and biological plausibility and can hence e.g. support AOP development. Furthermore, the approach is not limited to a particular adverse event and can instead quantify the interaction between any two events represented in time series in a data-driven and automatable fashion. Consequentially, this type of analysis could also be of interest to study the mechanism of action of particular compound classes or patterns of disease progression. We make the results of our analysis on the TG-GATEs in vivo liver data publicly available in a Shiny app through which users can query the most time-concordant events for more specific types of histopathology and study in detail in which time series time concordance was observed or not observed (https://anikaliu.shinyapps.io/dili_cascades).

Methods

Open TG-GATES data processing

The TG-GATES gene expression data from studies in 6-week-old male Crl:CD Sprague-Dawley (SD) rats with daily repeat-dosing (S1 Fig) was downloaded from the Life Science Data Archive (DOI: 10.18908/lsdba.nbdc00954-01-000). The raw liver gene expression levels were background corrected, log2 transformed, and quantile normalized with the rma function of the affy package per treatment across all doses and timepoints [80]. Quality control was then performed using the ArrayQualityMetrics package [81] and detected outliers with high distance to other experiments or unusual signal distribution were removed (List of removed outliers summarised in S1 File). The platform information for the Affymetrix Rat Genome 230 2.0 Array was derived from Gene Expression Omnibus [82] (GEO accession: GPL1355) and was then used to summarise probe IDs to rat gene symbols by median for all probes mapping uniquely to one gene symbol. Only the 360 compound-dose combinations with at least 6 measured timepoints after quality control were included. Out of these, all eight timepoints were measured in most time series, while only six timepoints were measured in two time series, and only seven timepoints in seven time series.

Definition of adverse histopathology

To characterize the extent of histological findings, we used the toxscores by Sutherland et al. [15] in order to consider both severity and frequency of events in a single numerical output measure. These are based on the lesion severity per animal which was first converted to a numerical scale (normal = 0, minimal = 1, slight = 2, moderate = 3, marked or severe = 4) and then averaged across all biological replicates as an aggregate measure for lesion frequency and severity. One characteristic of this measure is that the overall distributions varied between different findings, e.g. inflammation was more frequently annotated with low than with high toxscores while a more balanced distribution of scores was observed for hepatocellular single cell necrosis (S2 Fig).

To study which histological findings were enriched in adverse conditions, we first defined binary histopathology labels describing the presence of histological findings with different extents in each time-series. Based on the toxscore ranges used by Sutherland et al. [15], three toxscore cut-offs are implemented to describe each histopathological finding “Null” (toxscore > 0), “low” (toxscore > 0.67) and “high” (toxscore > 1.34). We then studied which labels were over-represented in adverse time-series. These were defined using the annotation of Sutherland et al. [15], where pathologists classified compound-dose combinations in the TG-GATEs database as adverse or non-adverse after 4 and 29 days of treatment. We used the 29 days classification to define 40 adverse time series and only regarded time-series as non-adverse for compounds which were not classified as adverse at any dose in the negative control, in order to account for the fact that some of the cellular changes of interest might already take place at lower doses, although the resulting phenotype is not considered adverse yet.

We defined findings as adverse histopathology if they are observed in at least 5 out of 40 adverse time-series to remove rare histopathological findings, and additionally require that at least 50% of findings are in adverse conditions to remove findings which are unspecific. All labels which were identified with these criteria are significantly enriched among time-series labelled as adverse by Sutherland et al. [15] in comparison to those that were considered non-adverse using a one-sided Fisher’s Exact test (p-value < 0.0001), performed using the fisher.test function of the stats R package [83]. However, this combination of additional criteria was chosen to exclude findings which are rare or weakly associated.

While not all compounds in the TG-GATEs database are drugs and some mechanisms of toxicity may not translate to humans, out of the 38 compounds represented in adverse time-series, 22 have additionally been classified as hepatotoxic in DILIst [28] and 18 in DILIrank (vMost-DILI-Concern or vLess-DILI-Concern) [27] (S2 Table). This overlap with compound-level DILI annotations by the FDA shows that the compounds in this study partially represent known mechanisms of DILI in humans, while also highlighting the fact that a clear classification is not possible.

Pathway and TF activity inference

The activity of pathways and TFs across all doses and timepoints of a treatment including vehicle controls was derived based on the expression of its gene set members using GSVA [26], which computes a gene set enrichment by sample matrix from the gene expression by sample matrix. This was performed using a Gaussian kernel requiring at least 5 genes per gene set, and overall provides the basis for the subsequent pathway- and TF-centric steps. As prior knowledge, we used pathway maps from Reactome [38] which were derived through MSigDB [84] and the msigdbr package [85]. TF activity gene sets were derived from DoRothEA [86] and mapped from human to rat gene symbols with biomaRt [87]. These gene sets describe known, functional TF-gene interactions and are assigned a confidence level based on the strength of evidence of these interactions. Thereby, only the 207 TFs with a high to medium confidence level of A-C were included and the few TF-gene interactions with a negative mode of regulation were removed to better infer TF directionality. To evaluate which pathway or TF is dysregulated, we computed the differential activity in comparison to the vehicle control group, which was treated for the same amount of time and as part of the same experiment, using the moderated t-statistic in limma [88]. We identify significantly dysregulated gene sets with a False Discovery Rate (FDR) < 0.05.

Temporal concordance of events

In this study, the order of events was derived based on each event’s timepoint of first activation within each time-series (Fig 1A). For pathways and TFs, we defined first activation as the earliest time of measurement at which significant differential regulation was observed (FDR < 0.05) in each direction, while an additional logFC cut-off has been implemented for individual genes. As first evidence of adverse morphological changes in the liver, we used the first timepoint at which any of the adverse histopathology label derived before were found.

We were then generally interested in potential preceding events PE which are first activated before or at the same time as a potential later event or outcome LE and used multiple metrics to quantify the degree of time concordance which can be related to the original work by Bradford Hill (Table 1). Thereby, the key later event in this study was adverse histopathology but we used a more general notation LE, as some of the following criteria to quantify time concordance are also applied in the TF analysis, where gene expression-derived events are used as later event. First, we used the true positive rate (TPR) which describes how frequently PE is observed before LE among all time-series with LE and hence its consistency across compounds. Secondly, we use the maximal effect size of PE observed before LE, summarised across time-series by median, to characterise the strength of association. To evaluate the significance of the findings, we additionally defined a set of background time-series unrelated to LE (Fig 1B). For adverse histopathology, these unrelated background time-series were the 133 time-series without any observed histological changes. We then computed the enrichment of PE before or at LE using the fisher.test function of the stats R package [83], first estimating the odds ratio using the conditional maximum likelihood estimate and subsequently testing the null hypothesis whether the odds ratio derived from a confusion matrix as described in Fig 1 is equal to or smaller than 1. Additionally, we compute the positive predictive value (PPV) of PE for LE, which describes how likely LE is observed at the same or a later time given the observation of PE.

Across all metrics, we only consider time-series in the statistics for which any event of the same type as PE, e.g. TF or pathway, was observed at the included timepoints, so before or at LE or at any timepoint in the background time-series. We do this to account for the fact that in some cases no changes are found which may be a consequence of the fact that there isn’t a measured timepoint before LE or that at the available timepoints expression changes cannot be detected. We argue that in these cases this should not be treated as evidence of absence of the given event, but rather as absence of evidence.

Combining time concordance on TF-TF interactions

We used three sources of causal prior knowledge to derive mechanistic hypotheses linking TFs: Protein-protein interaction between TFs derived from Omnipath through OmnipathR [89,90], TF-target gene interactions from DoRothEA [86] and the link between gene expression and protein levels following the central dogma of molecular biology. Using these interactions as backbone, we then derived those additionally supported by time concordance. Thereby, the dysregulation of the nodes was required to match the reported mode of regulation (edge sign) and the source node or upstream event was required to be observed in at least 20% of cases before or at the same time as the target node or downstream event. For induced TFs, significant enrichment of gene expression (|logFC|>0.5) and TF activity before adverse histopathology was required, as well as evidence for changes in expression preceding changes in the same direction in regulon activity within the same time series.

Time dependence

In each adverse time-series, we tested for Spearman correlation between timepoint and event activation logFC using the correlation R package [91], and include a logFC of 0 at timepoint 0 h assuming that there are no differences in comparison to the control group before treatment. We then identified pathways and TFs which only show significant Spearman correlation in one direction, positive or negative. For those events, we apply the Fisher’s combined probability test using the metap R package [92] across all adverse time-series to evaluate whether overall significant correlation between event activation and time is found.

Supporting information

S1 Table. Comparison of quantitative Adverse Outcome Pathway (qAOP) models.

Comparison of the first activation concept and other qAOP models with respect to their potential roles in AOP development.

(DOCX)

S2 Table. Compounds classified as adverse based on histopathology and concordance with previous annotations.

For the annotations by Sutherland et al. [15], who classified each compound at each measured dose as adverse or non-adverse at day 4 and day 29, the adverse doses for each compound are listed. Furthermore, the binary classification as adverse (1) and non-adverse (0) from DILIst [29] are included as well as the vDILIConcern and Severity Class classifications from DILIRank [30] which describe evidence for liver side effects observed in humans derived from post-marketing data.

(DOCX)

S3 Table. Time concordance metrics for Reactome pathway maps which map to known key events based on literature review.

(CSV)

S4 Table. Time concordance metrics for top 10 ranking events by True Positive Rate (TPR), significance and median max. |logFC|.

(DOCX)

S5 Table. TF-TF relations supported by known relations and time concordance.

For TF events which are significantly enriched before or at adverse histopathology, known interactions supported by time concordance are shown. With respect to the interaction, the absolute and relative frequency are shown for how often the source TF was observed “before” or “before or at” downstream TF activity. Additionally, the source of the interactions provided in Omnipath are shown for protein-protein interactions and the DoRothEA confidence level for TF-target gene interactions.

(DOCX)

S1 Fig. Open TG-GATEs study design.

6-week-old male Crl:CD Sprague-Dawley (SD) rats were treated with a range of compounds using daily repeat-dosing. For each compound, four doses were used including a vehicle control, and samples were taken at 8 timepoints. For each combination of compound, timepoint and dose, histopathology was annotated and gene expression measured for 3 replicates.

(PDF)

S2 Fig. Distribution of toxscores across histopathological findings.

(PDF)

S3 Fig. Frequency of histopathological findings before and after first adverse histopathology.

For adverse and non-adverse histopathological findings, the frequency before or at first adverse histopathology is shown (left). For adverse findings, this indicates how frequently they were one of the first adverse histopathological findings given that they cannot occur before by definition. This identifies single-cell necrosis at any severity (“null”), as the most frequent finding, both in absolute and relative terms.

(PDF)

S4 Fig. Background distribution of temporal association metrics across pathway and Transcription Factor (TF) events.

The dependency between different metrics is shown. A) Frequency of events by median max. |logFC| before or at histopathology and enrichment p-value. B) Frequency of events by true positive rate (TPR) and positive predictive value (PPV) before or at adverse histopathology. C) Direct relation between TPR, PPV and enrichment p-value. D) Direct relation between TPR, PPV and frequency in background time-series.

(PDF)

S1 File. Removed outlier samples.

(CSV)

S2 File. Time concordance metrics for all TFs, pathways as well as genes using both a minimal |logFC| of 0.5 and 1.

(XLSX)

Data Availability

Histopathology data was derived from the supplementary information of Sutherland et al. (2018), accessed through https://doi.org/10.1038/tpj.2017.17. TG-GATEs gene expression data was derived from the Life Science Database Archive (https://dbarchive.biosciencedbc.jp/en/open-tggates/download.html). The files and code for the Shiny app are deposited in GitHub (https://github.com/anikaliu/DILICascades_App) and Zenodo (doi:10.5281/zenodo.5767783).

Funding Statement

AL received funding from and JM was a full-time employee of GlaxoSmithKline (https://www.gsk.com) throughout the study. NH is funded by LifeArc (https://www.lifearc.org). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Hwang TJ, Carpenter D, Lauffenburger JC, Wang B, Franklin JM, Kesselheim AS. Failure of investigational drugs in late-stage clinical development and publication of trial results. JAMA Internal Medicine. 2016;176: 1826–1833. doi: 10.1001/jamainternmed.2016.6008 [DOI] [PubMed] [Google Scholar]
  • 2.Harrison RK. Phase II and phase III failures: 2013–2015. Nature Reviews Drug Discovery. 2016;15: 817–818. doi: 10.1038/nrd.2016.184 [DOI] [PubMed] [Google Scholar]
  • 3.Bai JPF, Abernethy DR. Systems Pharmacology to Predict Drug Toxicity: Integration Across Levels of Biological Organization *. 2012. [cited 5 Jun 2019]. doi: 10.1146/annurev-pharmtox-011112-140248 [DOI] [PubMed] [Google Scholar]
  • 4.Leist M, Ghallab A, Graepel R, Marchan R, Hassan R, Bennekou SH, et al. Adverse outcome pathways: opportunities, limitations and open questions. Archives of Toxicology. 2017;91: 3477–3505. doi: 10.1007/s00204-017-2045-3 [DOI] [PubMed] [Google Scholar]
  • 5.Ankley GT, Bennett RS, Erickson RJ, Hoff DJ, Hornung MW, Johnson RD, et al. Adverse outcome pathways: A conceptual framework to support ecotoxicology research and risk assessment. Environmental Toxicology and Chemistry Wiley Blackwell; Mar 1, 2010. pp. 730–741. doi: 10.1002/etc.34 [DOI] [PubMed] [Google Scholar]
  • 6.OECD (Organisation for Economic Co-operation and Development). USERS’ HANDBOOK SUPPLEMENT TO THE GUIDANCE DOCUMENT FOR DEVELOPING AND ASSESSING AOPs. OECD Environment, Health and Safety Publications Series on Testing and Assessment. 2018; 1–62.
  • 7.Hill AB. The Environment and Disease: Association or Causation? Proc R Soc Med. 1965; 295–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Meek ME, Palermo CM, Bachman AN, North CM, Lewis RJ, Meek B, et al. Mode of action human relevance (species concordance) framework: Evolution of the Bradford Hill considerations and comparative analysis of weight of evidence. Journal of Applied Toxicology John Wiley and Sons Ltd; 2014. pp. 595–606. doi: 10.1002/jat.2984 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Oki NO, Nelms MD, Bell SM, Mortensen HM, Edwards SW. Accelerating Adverse Outcome Pathway Development Using Publicly Available Data Sources. Current Environmental Health Reports. 2016;3: 53–63. doi: 10.1007/s40572-016-0079-y [DOI] [PubMed] [Google Scholar]
  • 10.Oki NO, Edwards SW. An integrative data mining approach to identifying adverse outcome pathway signatures. Toxicology. 2016;350–352: 49–61. doi: 10.1016/j.tox.2016.04.004 [DOI] [PubMed] [Google Scholar]
  • 11.Bell SM, Angrish MM, Wood CE, Edwards SW. Integrating publicly available data to generate computationally predicted adverse outcome pathways for fatty liver. Toxicological Sciences. 2016;150: 510–520. doi: 10.1093/toxsci/kfw017 [DOI] [PubMed] [Google Scholar]
  • 12.Burgoon LD, Angrish M, Garcia-Reyero N, Pollesch N, Zupanic A, Perkins E. Predicting the Probability that a Chemical Causes Steatosis Using Adverse Outcome Pathway Bayesian Networks (AOPBNs). Risk Analysis. 2020;40: 512–523. doi: 10.1111/risa.13423 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Jeong J, Garcia-Reyero N, Burgoon L, Perkins E, Park T, Kim C, et al. Development of Adverse Outcome Pathway for PPARγAntagonism Leading to Pulmonary Fibrosis and Chemical Selection for Its Validation: ToxCast Database and a Deep Learning Artificial Neural Network Model-Based Approach. Chemical Research in Toxicology. 2019;32: 1212–1222. doi: 10.1021/acs.chemrestox.9b00040 [DOI] [PubMed] [Google Scholar]
  • 14.Kohonen P, Parkkinen JA, Willighagen EL, Ceder R, Wennerberg K, Kaski S, et al. A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury. Nature Communications. 2017;8. doi: 10.1038/ncomms15932 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Sutherland JJ, Webster YW, Willy JA, Searfoss GH, Goldstein KM, Irizarry AR, et al. Toxicogenomic module associations with pathogenesis: A network-based approach to understanding drug toxicity. Pharmacogenomics Journal. 2018;18: 377–390. doi: 10.1038/tpj.2017.17 [DOI] [PubMed] [Google Scholar]
  • 16.Souza TM, Kleinjans JCS, Jennen DGJ. Dose and Time Dependencies in Stress Pathway Responses during Chemical Exposure: Novel Insights from Gene Regulatory Networks. Frontiers in Genetics. 2017;8: 142–142. doi: 10.3389/fgene.2017.00142 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Rooney J, Hill T, Qin C, Sistare FD, Christopher Corton J. Adverse outcome pathway-driven identification of rat liver tumorigens in short-term assays. Toxicology and Applied Pharmacology. 2018;356: 99–113. doi: 10.1016/j.taap.2018.07.023 [DOI] [PubMed] [Google Scholar]
  • 18.Rooney J, Oshida K, Vasani N, Vallanat B, Ryan N, Chorley BN, et al. Activation of Nrf2 in the liver is associated with stress resistance mediated by suppression of the growth hormone-regulated STAT5b transcription factor. PLoS ONE. 2018;13. doi: 10.1371/journal.pone.0200004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Andrade RJ, Chalasani N, Björnsson ES, Suzuki A, Kullak-Ublick GA, Watkins PB, et al. Drug-induced liver injury. Nature Reviews Disease Primers. 2019;5: 1–22. doi: 10.1038/s41572-019-0105-0 [DOI] [PubMed] [Google Scholar]
  • 20.Regev A. Drug-induced liver injury and drug development: Industry perspective. Seminars in Liver Disease. 2014;34: 227–239. doi: 10.1055/s-0034-1375962 [DOI] [PubMed] [Google Scholar]
  • 21.Aguayo-Orozco A, Bois FY, Brunak S, Taboureau O. Analysis of Time-Series Gene Expression Data to Explore Mechanisms of Chemical-Induced Hepatic Steatosis Toxicity. Frontiers in Genetics. 2018;9: 396. doi: 10.3389/fgene.2018.00396 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhang JD, Berntenis N, Roth A, Ebeling M. Data mining reveals a network of early-response genes as a consensus signature of drug-induced in vitro and in vivo toxicity. Pharmacogenomics Journal. 2014;14: 208–216. doi: 10.1038/tpj.2013.39 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zgheib E, Gao W, Limonciel A, Aladjov H, Yang H, Tebby C, et al. Application of three approaches for quantitative AOP development to renal toxicity. Computational Toxicology. 2019;11: 1–13. doi: 10.1016/J.COMTOX.2019.02.001 [DOI] [Google Scholar]
  • 24.Hassan I, El-Masri H, Kosian PA, Ford J, Degitz SJ, Gilbert ME. Neurodevelopment and Thyroid Hormone Synthesis Inhibition in the Rat: Quantitative Understanding Within the Adverse Outcome Pathway Framework. Toxicological Sciences. 2017;160: 57–73. doi: 10.1093/toxsci/kfx163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Spinu N, Cronin MTD, Enoch SJ, Madden JC, Worth AP. Quantitative adverse outcome pathway (qAOP) models for toxicity prediction. Archives of Toxicology. Springer; 2020. pp. 1497–1510. doi: 10.1007/s00204-020-02774-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics. 2013. Available: http://www.biomedcentral.com/1471-2105/14/7http://www.bioconductor.org.Background doi: 10.1186/1471-2105-14-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Vall A, Sabnis Y, Shi J, Class R, Hochreiter S, Klambauer G. The Promise of AI for DILI Prediction. Frontiers in Artificial Intelligence. 2021;4: 15. doi: 10.3389/frai.2021.638410 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Liu A, Walter M, Wright P, Bartosik A, Dolciami D, Elbasir A, et al. Prediction and mechanistic analysis of drug-induced liver injury (DILI) based on chemical structure. Biology Direct. 2021;16: 1–15. doi: 10.1186/s13062-020-00285-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Thakkar S, Li T, Liu Z, Wu L, Roberts R, Tong W. Drug-induced liver injury severity and toxicity (DILIst): binary classification of 1279 drugs by human hepatotoxicity. Drug Discovery Today. 2020;25: 201–208. doi: 10.1016/j.drudis.2019.09.022 [DOI] [PubMed] [Google Scholar]
  • 30.Chen M, Suzuki A, Thakkar S, Yu K, Hu C, Tong W. DILIrank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans. Drug Discovery Today. 2016;21: 648–653. doi: 10.1016/j.drudis.2016.02.015 [DOI] [PubMed] [Google Scholar]
  • 31.Tillander V, Alexson SEH, Cohen DE. Deactivating Fatty Acids: Acyl-CoA Thioesterase-Mediated Control of Lipid Metabolism. Trends in Endocrinology and Metabolism. Elsevier Inc.; 2017. pp. 473–484. doi: 10.1016/j.tem.2017.03.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Violante S, Ijlst L, Ruiter J, Koster J, van Lenthe H, Duran M, et al. Substrate specificity of human carnitine acetyltransferase: Implications for fatty acid and branched-chain amino acid metabolism. Biochimica et Biophysica Acta—Molecular Basis of Disease. 2013;1832: 773–779. doi: 10.1016/j.bbadis.2013.02.012 [DOI] [PubMed] [Google Scholar]
  • 33.Hong L, Sun QF, Xu TY, Wu YH, Zhang H, Fu RQ, et al. New role and molecular mechanism of Gadd45a in hepatic Fibrosis. World Journal of Gastroenterology. 2016;22: 2779–2788. doi: 10.3748/wjg.v22.i9.2779 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Okazaki H, Igarashi M, Nishi M, Sekiya M, Tajima M, Takase S, et al. Identification of neutral cholesterol ester hydrolase, a key enzyme removing cholesterol from macrophages. J Biol Chem. 2008;283: 33357–33364. doi: 10.1074/jbc.M802686200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Chiba M, Murata S, Myronovych A, Kohno K, Hiraiwa N, Nishibori M, et al. Elevation and characteristics of Rab30 and S100a8/S100a9 expression in an early phase of liver regeneration in the mouse. International Journal of Molecular Medicine. 2011;27: 567–574. doi: 10.3892/ijmm.2011.614 [DOI] [PubMed] [Google Scholar]
  • 36.Liu A, Trairatphisan P, Gjerga E, Didangelos A, Barratt J, Saez-Rodriguez J. From expression footprints to causal pathways: contextualizing large signaling networks with CARNIVAL. npj Systems Biology and Applications. 2019;5: 1–10. doi: 10.1038/s41540-019-0118-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Schubert M, Klinger B, Klünemann M, Sieber A, Uhlitz F, Sauer S, et al. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nature Communications. 2018;9: 20. doi: 10.1038/s41467-017-02391-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, et al. The Reactome Pathway Knowledgebase. Nucleic Acids Research. 2018;46: D649–D655. doi: 10.1093/nar/gkx1132 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Shi Z, Zhang K, Chen T, Zhang Y, Du X, Zhao Y, et al. Transcriptional factor ATF3 promotes liver fibrosis via activating hepatic stellate cells. Cell Death & Disease 2020 11:12. 2020;11: 1–16. doi: 10.1038/s41419-020-03271-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Copple IM, Goldring CE, Jenkins RE, Chia AJL, Randle LE, Hayes JD, et al. The hepatotoxic metabolite of acetaminophen directly activates the Keap1-Nrf2 cell defense system. Hepatology. 2008;48: 1292–1301. doi: 10.1002/hep.22472 [DOI] [PubMed] [Google Scholar]
  • 41.Luedde T, Schwabe RF. NF-κB in the liver-linking injury, fibrosis and hepatocellular carcinoma. Nature Reviews Gastroenterology and Hepatology. NIH Public Access; 2011. pp. 108–118. doi: 10.1038/nrgastro.2010.213 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Schultz JR, Tu H, Luk A, Repa JJ, Medina JC, Li L, et al. Role of LXRs in control of lipogenesis. Genes and Development. 2000;14: 2831–2838. doi: 10.1101/gad.850400 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Hetz C, Zhang K, Kaufman RJ. Mechanisms, regulation and functions of the unfolded protein response. Nature Reviews Molecular Cell Biology. Nature Research; 2020. pp. 421–438. doi: 10.1038/s41580-020-0250-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Wijaya LS, Trairatphisan P, Gabor A, Niemeijer M, Keet J, Alcalà Morera A, et al. Integration of temporal single cell cellular stress response activity with logic-ODE modeling reveals activation of ATF4-CHOP axis as a critical predictor of drug-induced liver injury. Biochemical Pharmacology. 2021;190: 114591. doi: 10.1016/j.bcp.2021.114591 [DOI] [PubMed] [Google Scholar]
  • 45.Fredriksson L, Wink S, Herpers B, Benedetti G, Hadi M, de Bont H, et al. Drug-induced endoplasmic reticulum and oxidative stress responses independently sensitize toward TNFα-mediated hepatotoxicity. Toxicological Sciences. 2014;140: 144–159. doi: 10.1093/toxsci/kfu072 [DOI] [PubMed] [Google Scholar]
  • 46.Seki E, Brenner DA, Karin M. A liver full of JNK: Signaling in regulation of cell function and disease pathogenesis, and clinical approaches. Gastroenterology. 2012;143: 307–320. doi: 10.1053/j.gastro.2012.06.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Win S, Than TA, Zhang J, Oo C, Min RWM, Kaplowitz N. New insights into the role and mechanism of c-Jun-N-terminal kinase signaling in the pathobiology of liver diseases. Hepatology. John Wiley and Sons Inc.; 2018. pp. 2013–2024. doi: 10.1002/hep.29689 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Simmons SO, Fan C-Y, Ramabhadran R. Cellular Stress Response Pathway System as a Sentinel Ensemble in Toxicological Screening. Toxicological Sciences. 2009;111: 202–225. doi: 10.1093/toxsci/kfp140 [DOI] [PubMed] [Google Scholar]
  • 49.Wong MMK, Joyson SM, Hermeking H, Chiu SK. Transcription factor AP4 mediates cell fate decisions: To divide, age, or die. Cancers. MDPI AG; 2021. pp. 1–15. doi: 10.3390/cancers13040676 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Zuo L, Tan T, Wei C, Wang H, Tan L, Hao Y, et al. HOXB13 expression is correlated with hepatic inflammatory activity of patients with hepatic fibrosis. J Mol Histol. 2020;51: 183–189. doi: 10.1007/s10735-020-09868-7 [DOI] [PubMed] [Google Scholar]
  • 51.Delgado I, Fresnedo O, Iglesias A, Rueda Y, Syn WK, Zubiaga AM, et al. A role for transcription factor E2F2 in hepatocyte proliferation and timely liver regeneration. American Journal of Physiology—Gastrointestinal and Liver Physiology. 2011;301. doi: 10.1152/ajpgi.00481.2010 [DOI] [PubMed] [Google Scholar]
  • 52.B’Chir W, Maurin AC, Carraro V, Averous J, Jousse C, Muranishi Y, et al. The eIF2α/ATF4 pathway is essential for stress-induced autophagy gene expression. Nucleic Acids Research. 2013;41: 7683–7699. doi: 10.1093/nar/gkt563 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Pavel M, Imarisio S, Menzies FM, Jimenez-Sanchez M, Siddiqi FH, Wu X, et al. CCT complex restricts neuropathogenic protein aggregation via autophagy. Nature Communications. 2016;7: 1–18. doi: 10.1038/ncomms13821 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Grantham J. The Molecular Chaperone CCT/TRiC: An Essential Component of Proteostasis and a Potential Modulator of Protein Aggregation. Frontiers in Genetics. Frontiers Media S.A.; 2020. doi: 10.3389/fgene.2020.00172 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Reebye V, Huang KW, Lin V, Jarvis S, Cutilas P, Dorman S, et al. Gene activation of CEBPA using saRNA: Preclinical studies of the first in human saRNA drug candidate for liver cancer. Oncogene. 2018;37: 3216–3228. doi: 10.1038/s41388-018-0126-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Fusakio ME, Willy JA, Wang Y, Mirek ET, Baghdadi RJTA, Adams CM, et al. Transcription factor ATF4 directs basal and stress-induced gene expression in the unfolded protein response and cholesterol metabolism in the liver. Molecular Biology of the Cell. 2016;27: 1536–1551. doi: 10.1091/mbc.E16-01-0039 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Hao L, Zhong W, Dong H, Guo W, Sun X, Zhang W, et al. ATF4 activation promotes hepatic mitochondrial dysfunction by repressing NRF1-TFAM signalling in alcoholic steatohepatitis. Gut. 2020. [cited 6 May 2021]. doi: 10.1136/gutjnl-2020-321548 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Larigot L, Juricek L, Dairou J, Coumoul X. AhR signaling pathways and regulatory functions. Biochimie Open. 2018;7: 1–9. doi: 10.1016/j.biopen.2018.05.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Kersten S, Stienstra R. The role and regulation of the peroxisome proliferator activated receptor alpha in human liver. Biochimie. 2017;136: 75–84. doi: 10.1016/j.biochi.2016.12.019 [DOI] [PubMed] [Google Scholar]
  • 60.Yoshikawa T, Ide T, Shimano H, Yahagi N, Amemiya-Kudo M, Matsuzaka T, et al. Cross-talk between peroxisome proliferator-activated receptor (PPAR) α and liver X receptor (LXR) in nutritional regulation of fatty acid metabolism. I. PPARS suppress sterol regulatory element binding protein-1c promoter through inhibition of LXR signaling. Molecular Endocrinology. 2003;17: 1240–1254. doi: 10.1210/me.2002-0190 [DOI] [PubMed] [Google Scholar]
  • 61.Yoshikawa T, Shimano H, Amemiya-Kudo M, Yahagi N, Hasty AH, Matsuzaka T, et al. Identification of Liver X Receptor-Retinoid X Receptor as an Activator of the Sterol Regulatory Element-Binding Protein 1c Gene Promoter. Molecular and Cellular Biology. 2001;21: 2991–3000. doi: 10.1128/MCB.21.9.2991-3000.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Boergesen M, Pedersen TA, Gross B, van Heeringen SJ, Hagenbeek D, Bindesboll C, et al. Genome-Wide Profiling of Liver X Receptor, Retinoid X Receptor, and Peroxisome Proliferator-Activated Receptor in Mouse Liver Reveals Extensive Sharing of Binding Sites. Molecular and Cellular Biology. 2012;32: 852–867. doi: 10.1128/MCB.06175-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Kusumanchi P, Liang T, Zhang T, Ross RA, Han S, Chandler K, et al. Stress-Responsive Gene FK506-Binding Protein 51 Mediates Alcohol-Induced Liver Injury Through the Hippo Pathway and Chemokine (C-X-C Motif) Ligand 1 Signaling. Hepatology. 2021. [cited 6 Sep 2021]. doi: 10.1002/hep.31800 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Manmadhan S, Ehmer U. Hippo Signaling in the Liver–A Long and Ever-Expanding Story. Frontiers in Cell and Developmental Biology. 2019;7: 33. doi: 10.3389/fcell.2019.00033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Kyrmizi I, Hatzis P, Katrakili N, Tronche F, Gonzalez FJ, Talianidis I. Plasticity and expanding complexity of the hepatic transcription factor network during liver development. Genes and Development. 2006;20: 2293–2305. doi: 10.1101/gad.390906 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Nishikawa T, Bell A, Brooks JM, Setoyama K, Melis M, Han B, et al. Resetting the transcription factor network reverses terminal chronic hepatic failure. The Journal of Clinical Investigation. 2015;125: 1533. doi: 10.1172/JCI73137 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Schulte D, Geerts D. MEIS transcription factors in development and disease. Development. 2019;146. doi: 10.1242/dev.174706 [DOI] [PubMed] [Google Scholar]
  • 68.Berenguer M, Duester G. Role of Retinoic Acid Signaling, FGF Signaling and Meis Genes in Control of Limb Development. Biomolecules. 2021;11: 1–11. doi: 10.3390/BIOM11010080 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Garcia-Alonso L, Holland CH, Ibrahim MM, Turei D, Saez-Rodriguez J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Research. 2019;29: 1363–1375. doi: 10.1101/gr.240663.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Farooque U, Lohano AK, Dahri Q, Arain N, Farukhuddin F, Khadke C, et al. The Pattern of Dyslipidemia in Chronic Liver Disease Patients. Cureus. 2021;13. doi: 10.7759/cureus.13259 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Oliva L, D’Incà R, Medici V, Sturniolo GC. Metallothioneins and liver diseases. Metallothioneins in Biochemistry and Pathology. 2008; 289–316. doi: 10.1142/9789812778949_0014 [DOI] [Google Scholar]
  • 72.Huang G-W, Yang L-Y. Metallothionein expression in hepatocellular carcinoma. World Journal of Gastroenterology. 2002;8: 650. doi: 10.3748/wjg.v8.i4.650 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Devisscher L, Campenhout S van, Lefere S, Raevens S, Tilleman L, Nieuwerburgh F van, et al. Metallothioneins alter macrophage phenotype and represent novel therapeutic targets for acetaminophen-induced liver injury. Journal of Leukocyte Biology. 2021; 1–11. doi: 10.1002/JLB.3A0820-527R [DOI] [PubMed] [Google Scholar]
  • 74.Lefebvre V. The SoxD transcription factors–Sox5, Sox6, and Sox13 –are key cell fate modulators. The International Journal of Biochemistry & Cell Biology. 2010;42: 429–432. doi: 10.1016/J.BIOCEL.2009.07.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Wang Y, Ristevski S, Harley VR. SOX13 exhibits a distinct spatial and temporal expression pattern during chondrogenesis, neurogenesis, and limb development. Journal of Histochemistry and Cytochemistry. 2006;54: 1327–1333. doi: 10.1369/jhc.6A6923.2006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Liu L, Yannam GR, Nishikawa T, Yamamoto T, Basma H, Ito R, et al. The microenvironment in hepatocyte regeneration and function in rats with advanced cirrhosis. Hepatology. 2012;55: 1529–1539. doi: 10.1002/hep.24815 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Guzman-Lepe J, Cervantes-Alvarez E, l’Hortet AC de, Wang Y, Mars WM, Oda Y, et al. Liver-enriched transcription factor expression relates to chronic hepatic failure in humans. Hepatology Communications. 2018;2: 582–594. doi: 10.1002/hep4.1172 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Yamagishi S, Matsui T. Role of receptor for advanced glycation end products (RAGE) in liver disease. European Journal of Medical Research. 2015;20. doi: 10.1186/S40001-015-0090-Z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Zeng M, Liu W, Hu Y, Fu N. Sumoylation in liver disease. Clinica Chimica Acta. Elsevier; 2020. pp. 347–353. doi: 10.1016/j.cca.2020.07.044 [DOI] [PubMed] [Google Scholar]
  • 80.Gautier L, Cope L, Bolstad BM, Irizarry RA. Affy—Analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004;20: 307–315. doi: 10.1093/bioinformatics/btg405 [DOI] [PubMed] [Google Scholar]
  • 81.Kauffmann A, Gentleman R, Huber W. arrayQualityMetrics—A bioconductor package for quality assessment of microarray data. Bioinformatics. 2009;25: 415–416. doi: 10.1093/bioinformatics/btn647 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research. 2002;30: 207–210. doi: 10.1093/nar/30.1.207 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2020. Available: http://www.r-project.org/
  • 84.Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27: 1739–1740. doi: 10.1093/bioinformatics/btr260 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Dolgalev I. msigdbr: MSigDB Gene Sets for Multiple Organisms in a Tidy Data Format. 2020. [Google Scholar]
  • 86.Garcia-Alonso L, Ibrahim MM, Turei D, Saez-Rodriguez J. Benchmark and integration of resources for the estimation of human transcription factor activities. bioRxiv. 2018; 337915. doi: 10.1101/337915 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Smedley D, Haider S, Durinck S, Pandini L, Provero P, Allen J, et al. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Research. 2015;43: W589–W598. doi: 10.1093/nar/gkv350 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research. 2015;43: e47. doi: 10.1093/nar/gkv007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Türei D, Korcsmáros T, Saez-Rodriguez J. OmniPath: Guidelines and gateway for literature-curated signaling pathway resources. Nature Methods. 2016;13: 966–967. doi: 10.1038/nmeth.4077 [DOI] [PubMed] [Google Scholar]
  • 90.Türei D, Valdeolivas A, Gul L, Palacio-Escat N, Klein M, Ivanova O, et al. Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. Molecular Systems Biology. 2021;17: e9923. doi: 10.15252/msb.20209923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Makowski D, Ben-Shachar MS, Patil I, Lüdecke D. Methods and Algorithms for Correlation Analysis in R. Journal of Open Source Software. 2020;5: 2306. doi: 10.21105/JOSS.02306 [DOI] [Google Scholar]
  • 92.Dewey M. metap: meta-analysis of significance values. 2020. [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010148.r001

Decision Letter 0

Mark Alber, James Gallo

22 Feb 2022

Dear Ms. Liu,

Thank you very much for submitting your manuscript "Deriving time-concordant event cascades from gene expression data: A case study for Drug-Induced Liver Injury (DILI)" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

James Gallo

Associate Editor

PLOS Computational Biology

Mark Alber

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: This manuscript introduces “first activation” as the method to quantify time concordance to provide evidence for causality. The authors perform a case study on a liver dataset from repeat-dose study in rats using the gene expression and histopathology data. This manuscript uses known evidence and prior biological knowledge in DILI to validate the proposed methodology and points out some potential new TF interactions.

I think this manuscript is well-written and organized but could be improved with some analysis aided. I have following detailed comments.

General comments:

1. The key concept of this manuscript is “first activation” and the authors argue that this is generalizable and automatable to other topics. However, the discussion on this concept is quite limited, and should be extended. How the authors defined “first activation” was only mentioned in Methods and should be written in the main text. The threshold seems to be very important as this could lead to distinct results and authors should discuss how robust the current threshold is.

2. The authors mainly used 10 known events in DILI for their case study, but as they mentioned, they identified hundreds of both pathway-level and TF-level events in the datasets. Although using well-known events to validate the methodology is very much needed, analysis on some other less well-studied events as examples should be also discussed.

Minor points:

1. This manuscript focuses on one case study, a liver dataset from repeat-dose studies in rats, a figure showing the concept of this experiment, depicting the info about collecting timepoints, doses etc is helpful and needed.

2. To establish causality, there are multiple rules in Bradford-Hill criteria, while this manuscript focuses only on temporal order, the other rules should be discussed to infer causality.

3. Results that used method described in Method section should be indicated.

4. Line 72, AE was first mentioned and need full descriptions.

5. Fig 1, defining A,B,C,D as candidates preceding events in panel A) causes confusion in panel B), where B is defined as the potential later event.

6. More descriptions needed for S1 Table, not clear what + or ++ means.

7. Reasoning behind choosing 10% and 50% as the threshold needed.

8. Fig2A, unclear about what is Number of adverse conditions vs Fraction of adverse conditions.

9. Fig3 right panel, texts overlap.

10. Fig 4, the figure should be reorganized. The text of ‘Up-regulated’ and ‘Down-regulated’ are unnecessary since there are color codes for it. The order of these ten events is reversed as Fig 3. Why oxidative stress is missing?

11. Fig5 is duplicated 3 times in the manuscript.

12. Fig5, ‘If the event was over- represented before adverse histopathology (p-value < 0.05) the point was additionally circled in black’, very hard to see the black circle in figure, need to change to a different coloring scheme.

13. Fig 5, there are a lot of individual genes are highlighted, but the text did not mention why they were highlighted.

14. The captions and legends of figures in this manuscript need substantial improvement. The audience should be able to get the key ideas from the figure by looking at the figure and caption and legends.

15. Fig 8B, description on the legend should be clearer in the captions. For example, ‘active_tf’ should be described. Also the axis label ‘TPR(Expr|Before or at TF&adverse)’ needs description.

16. The upper or lower case of term (e.g. TF) should be kept consistent between figures and texts. For example, Line 373, ATF4 while in Fig 8B it’s Atf 8. There are many other places having this inconsistency.

17. Line 375, any idea why Ahr never preceeds TF?

18. Fig 9. Lighter gray and darker gray are very hard to distinguish, choose another color legend.

Reviewer #2: Reviewer’s Report

Title: Deriving time-concordant event cascades from gene expression data: A case study for Drug-Induced Liver Injury (DILI)

Reviewer's report:

Authors of this manuscript study time concordance pathway cascades using the concept of first activation to generate hypotheses on potentially causal mechanisms following Bradford-Hill criterias. They use liver data from repeat-dose studies in rats to study time concordant gene expression-derived events preceding adverse histopathology, which serves as surrogate readout for Drug-Induced Liver Injury (DILI). While this manuscript is based on an interesting hypothesis, the manuscript could be improved.

Major Compulsory Revisions:

This study uses association analysis to discover the mechanisms of adverse outcome pathways. When a cause–effect relationship can be established based on first activation,

How do authors discriminate direct effects from indirect effects, especially with limited time points with gaps in time? For example, gene C is an unobserved confounder for the association between gene A and B but A is not a cause of B.

How do authors account for transient changes of the genes due to stochastic variations or biological fluctuations from unwanted external factors that may be random?

Some genes may appear too significant because of their role as network hub genes in the dataset that are involved in many biological processes. These hub genes are connected to many upstream regulators so their occurrence in cause-effect relationship could be general and unspecific unless integrated with prior knowledge. I would like to hear the author's response on how this could affect their inferences.

I) Methods:

Some of the important details of the methods are scattered in results and method sections. The methods section of this manuscript could be further improved by including details on methodology. Please refer to my comments below. I would strongly recommend a flowchart to improve readability.

Authors state in the abstract section: “from the TG-GATEs database which comprises measurements across eight timepoints, ranging from 3 hours to 4 weeks post-treatment”. How frequent are the time points? How does the gap in time points affect false predictions? Also, it is not clear how many biological replicates are used for different time points*dose.

While abstract mention TG-GATEs database which comprises measurements across eight timepoints, ranging from 3 hours to 4 weeks post-treatment, the results section states: “These adverse histopathology labels were next used to define 61 time-series associated, covering 38 compounds, as adverse (S2 164 Table).” It is not clear whether authors used 61 time series data which had similar time points.

“To evaluate which pathway or TF is dysregulated, we computed the differential activity in comparison to the time-, vehicle-, and experiment-matched control group using limma and identify significantly dysregulated gene sets with a False Discovery Rate (FDR) < 0.05.” What are the exact comparisons authors performed?

“For pathways and TFs, we defined first activation as the earliest time of measurement at which significant differential regulation was observed (FDR < 0.05) in each direction, while an additional logFC cut-off has been implemented for individual genes” For pathways and TFs, we defined first activation as the earliest time of measurement at which significant differential regulation was observed (FDR < 0.05) in each direction, while an additional logFC cut-off has been implemented for individual genes.” It is not clear how first activation of a pathway is defined? Do you consider a pathway to be activated even if one TF is expressed?

II) Results:

“These adverse histopathology labels were next used to define 61 time-series associated, covering 38 compounds, as adverse (S2 164 Table)”: What were the initial time-series and how did you define 61 time-series based on histopathology labels. If authors removed time points for downstream analysis, how does it bias the predictions?

“Fig 3: Enrichment of known events in DILI before adverse histopathology based on gene sets as well as individual gene members.” When you state gene sets, which comparisons did you perform? It is not clear from the writing.

Reviewer #3: This study has several strengths, including the time-resolved transcriptomics demonstations. Importantly, the authors should be commended for their comprehensive efforts. The manuscript is clear and easy to read.

A discussion of the results in relation to the more general issue of the TFs and Drug-Induced Liver Injury is lacking.

Reviewer #4: First, I’d like to thank Dr. Gallo, associate editor of PLOS Computational Biology, the opportunity to provide a peer-review for the work “Deriving time-concordant event cascades from gene expression data: A case study for Drug-Induced Liver Injury (DILI)” by Anika Liu and collaborators.

“Adverse outcome pathway, or AOP,” is a conceptual framework in toxicology that attempts to link a molecular initiating event with a defined adverse outcome. Liu and collaborators propose to identify novel potentially causal pairs of AOP-framework related key events (KE) and adverse outcomes (AO) by means of analysing temporal associations between these elements in an automated, data-driven way, with the aim of aiding in identifying novel AOPs in the future.

The authors propose to achieve this goal by using several metrics obtained from a time series dataset of liver gene expression deposited in the TG-GATEs database. These data, together with liver histopathologies, were obtained from the livers of rats that had been treated with a battery of xenobiotics. After normalising the microarray gene expression data, the authors used the first significantly differentially expressed gene as potential KE and capitalised on a previous exploitation of the same database that rendered scores of the histopathologies (as the AOs) to construct a model that captures the temporal dimension of the Bradford Hill criteria for causality [ref. 7 in the author’s manuscript] to prioritise associations with some support of causality. Being a pilar motivation of this work to capture causality in the associations detected, the authors limited their usable data to the perturbations, histopathology findings, and pathways that were relevant for the DILI pathology. This necessarily affected the capacity to generate pathway hypotheses that were purely data-driven, which is a central motivation of this work. However, despite the shortcomings that the authors nicely describe in a “Limitations of this study” section, they did a great job exploiting gene expression data to derive multiple metrics (significant enrichment, logFC and frequency/TPR) that combined maximise the ability to prioritise first event-outcome pairs from the database. This work is thus a valuable contribution to the global effort currently being undertaken to maximise the ability to exploit datasets deposited in countless databases, possibly serving not only as inspiration for novel ways of extracting important information from data, but also as a primer that can be further developed. Although I am not an expert on the toxicology domain, I believe that, given the linearity of the definition of AOP (Ankley et al., 2010 [ref. 5]; Leist et al, 2017 [ref. 4]), and the fact that it does not require full knowledge of the mechanism or pathway, the proposed method in this study is appropriate for its ambition of helping to establish new AOP hypotheses.

That said, I believe the readership would enjoy of a better flow if the descriptions of the datasets, assays and methods were more detailed. For example, although the authors make a substantial depiction of the extracted data on Figure 2C, which includes insight on the TG-GATEs assays study design, I still fail to find there or elsewhere in the manuscript important information: it does not include details like number of replicates performed per time-series, a description of how the serial or ‘’repeat-dose” treatment of animals was performed, what kinds of controls were used, how the rat livers were collected for histopathology and for gene expression measurement, strains of the animals, how much (or what fraction) of the dataset was excluded from the current analysis, etc… A supplementary figure detailing the TG-GATEs study design could be sufficiently effective in facilitating a better understanding of the methodology that was subsequently applied in this study.

Likewise, the authors should be more explicit in describing parameters and formulas, specifically of the hypergeometric distribution they used in their models. Additionally, they may want to describe in more detail some of the packages used. For example, on page 29 line 577, the authors refer the usage of the Limma package. The narrative could be more intuitive for the general audience if the authors add a brief description of methods applied.

The figures and tables in the main document and supplementary material are good and generally adequate, but the authors may want to add a Q-Q plot associated with figure 3 to show the calibration of their hypothesis test. The authors should also be more detailed and descriptive in some of the legends. I list below some cases that could be improved (e.g., adding a legend to File S2, and a better description of Table S2), but maybe the authors will find more opportunities of improvement at other instances.

The abundance of detail in presenting their findings of the case study overshadows the core contribution of a novel methodology to derive the time-concordant events in the first place. The manuscript would benefit greatly if those two aspects would be presented each with more focus. One way of achieving this could be to generalise the examples that the authors curated into schematic models that highlight the strengths of the model: where in the generalized examples is the model contributing that a non-data-driven, non-hypothesis-free method would not.

Finally, I believe that the authors will have a more engaged audience, enjoying of a better reading flow and understanding of the reasoning intended after the authors make a new round of careful editing, either themselves or with the help of a professional scientific editor.

I share below more specific comments and suggestions, by section in the main document. Note that the comments are not ordered by importance, but by approximate order of appearance in the manuscript. Where only a subsection tittle is shown, it means there are not further comments for that section or figure.

Abstract

1. Adverse Outcome Pathway is not generally known. I suggest the authors to link the expression “Adverse Outcome Pathway” with the term “toxicity” early in the abstract (optimally in first sentence), if they intend to help the reader immediately identify the domain of the work in the main abstract.

2. Page 2, line 33-34: When referring to “significance, frequency and log fold change (logFC)”, the authors may want to specify that “gene expression” is the object of the metrics they are referring to.

Introduction

3. Acronym inconsistencies:

Page 4, line 72: The AE acronym has not yet been introduced/defined in my version of the manuscript. The authors may want to define it here.

Page 4, line 76: The acronym for mode of action is MOA (as seen in Meek, 2014), therefore “mode of action” should include capitalisation of the three words, or no capitalisation at all. It could also be contained in brackets.

Page 4, line 87: Defining an acronym with another acronym may be too distracting for the reader. Maybe the authors would like to replace “KE relationships (KERs)” with “key event relationships (KERs)” for the sake of text fluidity/readability.

4. Definition inconsistencies:

Page 4, lines 77-80: The description of the criteria to evaluate plausibility may be confusing, in particular the explanations in parenthesis. It is not clear what “A” and “B” refer to. Later in the manuscript, “A” refers to a key event and “B” to an adverse outcome. This does not seem to be what the authors intend to convey here. For example, to describe “incidence concordance” the authors write “(The magnitude of event A is larger than that of event B)”. However, a quick check of the referenced work of Meek et al, 2014 shows incidence is explained with the question: “Is the occurrence of the end (adverse) effect less than that for the preceding key events?”. I fail to interpret the two explanations to mean the same.

It could be that I am completely missing the point of the authors, but for the sake of clarity and fluidity, the authors may want to put the general parameters/events described in the intro in context of the actual models and kind of data that is being presented throughout the manuscript (if needed be, with examples).

On identical note, in Figure 1 and across the manuscript, the authors may want to standardise the definitions of: “A”, “B”, “event”, “early event”, “later event”, “potential preceding event”, “potential later event”, “key event”, “anchoring event”, and “outcome”. It is not intuitive to use the word “event” for the “later event” and, if the authors believe it makes sense, they may want to distinguish “event” when it means “early event” from when it means “later event” by using e.g. the word “outcome” instead, across the manuscript, figures and legends, for the later (i.e., use “outcome” every time the word “event” is used in the sense of the “later event”). If, for some sensible reason this is not advisable, at least the word “event” should always be accompanied with the consistent (always the same) preceding word, e.g., “early” and “anchoring” (or “later”).

5. Figure 1 (Page 6, line 114):

Panel 1A: I understand that “A”,”B”,”C” and ”D” in panel 1A are instances of “A” in the model (“early event” or “potential preceding event”), as defined in the panel 1B, while “B”, in panel 1B (the model), is the “potential later event”. In Panel 1A, “A”,”B”,”C” and ”D” could be replaced by e.g., “I”,”II”,”III”, and “IV”. This way, “A” should always represent “preceding event” and “B” “later event or outcome” in the figure and across the manuscript.

Panel 1B) As already stated, in case the authors also believe it makes sense, they may want improve readability and consistency by defining “B” as “potential later outcome”.

Results and Discussion

Adverse histopathological findings and their temporal relation

6. Notation inaccuracy

Page 7, line 150: I believe the authors are referring to intervals, but the notation used is ambiguous. Using explicit intervals will help readership: “null” is within (0, 0.67] and “low” is (0.67, 1.34] ?

7. Typos

Page 7, line 138: The authors may want to use the plural of the word “hypothesis”, i.e., “hypotheses”

Page 8, line 163: The subject seems to be missing at the end of the excerpt: “labels were next used to define 61 time-series associated”.

Page 8, lines 165 and 166: If by “insult” the authors mean “adverse outcome”, it would be better to use the latter.

8. Figure 2 (Page 8, line 168)

Panel 2A: Colors are hard to distinguish: authors may want to increase label circles (in legend) and consider to change colors or, since there seems to be pairs of harder to distinguish colors, add a pattern to one within the pair.

Figure 2 legend:

Page 9, line 172: see comment for Page 7, line 150

Page 9, line 174: panel 2B does not show frequencies, but number of conditions.

Page 9, line 177: numbered reference missing after “Sutherland et al”.

Known pathways in DILI preceding adverse histopathology

9. Typos and omissions

Page 9, line 185: replace “(Fig 3).” With: “(Fig 3 and Table S3).”

Page 10, line 208: perhaps authors want to replace “processes except Growth Arrest And (…)” with: “processes except genes involved in Growth Arrest and (…)”

10. Figure 3 (Page 10, line 193)

- It would be useful to show how well the data is following the model: to that effect, the authors may want to add Q-Q plots to this figure.

Figure 3 legend:

Page 10, line 197: The LXR acronym is not defined anywhere in the manuscript.

11. Figure 4 (Page 12, line 229)

- This plot attempts to illustrate the strength of pathway perturbation. For the first panel, the authors may want to display the distribution of the “all” datapoint. This could be achieved by superimposing violin instead of box plots for this panel in particular. (The same extra information could be displayed in other figures anytime that the data is not sparse, usually, when the “all” datapoints are presented - Figure 3, Figure 6A and 6C).

12. Figure 5 (Page 13, line 239) and related Figure S3 (Supplementary file)

- It could be interesting to see how the data vary with varying FDR (actual precision-recall curves).

- Note that figure 5 is repeated, at least on my copy of the document: Page 13, line 239, line 242 and page 14 line 254 show the same image.

Figure S3, related with Figure 5 (Supplementary file)

- Although these plots expose a number of metrics, it is not intuitive what one can learn from them. The authors may want to expand a bit more.

Known TFs in DILI preceding adverse histopathology

13. Figure 6 (Page 16, line 297)

Figure 6 legend:

- It is not clear what “all” refers to (all the TFs in the genome?).

14. Add reference to a figure in text

Page 15, line 292: In the sentence “(…)as most frequent event indicating that the(…)”, the authors may want to add a reference to Figure 6B: “(…)as most frequent event (fig.6B) indicating that the(…)”

Data-driven prioritisation of cellular events taking place before adverse histopathology

15. File S2 (Supplementary file)

Page 16, line 311: File S2 - Is a supplemental spreadsheet, first referred on Page 10, line 202 - it would be useful if the authors added a legend describing the columns names, e.g., in a new tab.

16. Typos, clarification and nomenclature consistency

Page 17, line 323: In addition to the acronym, “UPR” should be described.

Page 17, line 320-331: The authors may want to contextualize the analysis described in the 4 sentences starting in “The most frequent(…)” and ending in “(…)the most strongly down-regulated TFs.” relative to their inclusion as known or plausible events in a more specific way, in order to cater for the readership that is not expert in DILI.

Page 17, line 329: Gene names: please make sure you use this journal’s gene names convention and use it consistently across the manuscript.

Figure 7

Identifying mechanistic hypotheses combining known TF functions and time concordance

17. TF activity metrics clarification

- The authors may want to explain how regulon activity is measured, or what is the output of the packages that extract this data (here or/and in the material and methods section, see my point 25.1).

18. Figure 8 (Page 19, line 359)

Figure 8 B: It’s unclear what the units for transcription factor activity are or mean.

19. Figure 9 (Page 21, line 408)

19.1 - It’s hard to distinguish what is diamond from circles because the gene names prevent visibility; perhaps the shapes size could be increased, for example, or/and the names could be non-superimposing the shapes, or the color scheme changed.

19.2 - The shadings of grey are nearly indistinguishable for the important metrics of “Only at the same time” or “Ordered or at the same time”, authors may want to use a different color scheme

Time-concordant events reflecting disease progression

20. Figure 10 (Page 23, line 444)

Figure 10 legend:

Page 23, line 448 - typo in word “correlation”

Limitations of this study

Conclusions

21. Transforming specific pathway examples into a generalised model schematics depicting the strengths (and possibly shortcomings) of the method:

In addition to detailing the pathways themselves, it would be more useful to the readership interested in the methodology to use generalised terms that would explain the capabilities of this model. One way the authors could achieved this is by drawing a schematic illustrating how the author’s method was able to confirm an example of known events in DILI, and contrasting it with other generalised example(s) where the model was able to fill in the gaps in knowledge, at different levels.

22. Page 27, line 524: Link to Shiny app is missing at this location

Methods

Open TG-GATES data processing

Definition of adverse histopathology

23. Expand on methodology details

Page 28, line 563: More details should be given. Please be explicit about the parameters and equation used in the hypergeometric test. Also, please provide intuition for the choice of hypothesis test (i.e. why the hypergeometric distribution).

Page 28, line 559: The authors may want better define DILIst, DILIrank and explain table S2 further.

24. Typos

Page 27, line 539: Typo: Replace “To characterize the extend…” with “To characterize the extent…”

Page 27, line 542-3: And replace: ”(…)and then summarised across all biological replicates by mean as an aggregate(…)” with: “(…)and then averaged across all biological replicates as an aggregate(…)”

Page 27, line 550: see comment 6. (Page 7, line 150)

Pathway and TF activity inference

25. Expand on methodology details

25.1 - Page 28, line 567-8: The authors may want to give a more granular description of the TF activity output by the package(s) mentioned. How does TF activity data look like? It could also be useful to include a short explanation of the choices made.

25.2 - Page 28, line 575-7: The authors may also want to exactly describe which methods were used in addition to the packages that assisted on the implementation of those methods.

25.3 - A thorough description of the TC-GATEs dataset must also be shared in order to effectively understand the analysis. As suggested earlier, this could include a figure.

25.4 - In addition to a schematic figure describing the assays, it would be most useful to include more schematics of the author’s pipeline.

Temporal concordance of events

26. Page 30, line 598: The authors may want to share the parameters and equation used in the hypergeometric test.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: None

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010148.r003

Decision Letter 1

Mark Alber, James Gallo

26 Apr 2022

Dear Ms. Liu,

We are pleased to inform you that your manuscript 'Deriving time-concordant event cascades from gene expression data: A case study for Drug-Induced Liver Injury (DILI)' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

James Gallo

Associate Editor

PLOS Computational Biology

Mark Alber

Deputy Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: My comments are addressed in the revised manuscript.

Reviewer #3: None

Reviewer #4: The authors have satisfactorily answered to my concerns.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: None

Reviewer #3: Yes

Reviewer #4: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: No

Reviewer #4: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010148.r004

Acceptance letter

Mark Alber, James Gallo

30 May 2022

PCOMPBIOL-D-21-02220R1

Deriving time-concordant event cascades from gene expression data: A case study for Drug-Induced Liver Injury (DILI)

Dear Dr Liu,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsanett Szabo

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Comparison of quantitative Adverse Outcome Pathway (qAOP) models.

    Comparison of the first activation concept and other qAOP models with respect to their potential roles in AOP development.

    (DOCX)

    S2 Table. Compounds classified as adverse based on histopathology and concordance with previous annotations.

    For the annotations by Sutherland et al. [15], who classified each compound at each measured dose as adverse or non-adverse at day 4 and day 29, the adverse doses for each compound are listed. Furthermore, the binary classification as adverse (1) and non-adverse (0) from DILIst [29] are included as well as the vDILIConcern and Severity Class classifications from DILIRank [30] which describe evidence for liver side effects observed in humans derived from post-marketing data.

    (DOCX)

    S3 Table. Time concordance metrics for Reactome pathway maps which map to known key events based on literature review.

    (CSV)

    S4 Table. Time concordance metrics for top 10 ranking events by True Positive Rate (TPR), significance and median max. |logFC|.

    (DOCX)

    S5 Table. TF-TF relations supported by known relations and time concordance.

    For TF events which are significantly enriched before or at adverse histopathology, known interactions supported by time concordance are shown. With respect to the interaction, the absolute and relative frequency are shown for how often the source TF was observed “before” or “before or at” downstream TF activity. Additionally, the source of the interactions provided in Omnipath are shown for protein-protein interactions and the DoRothEA confidence level for TF-target gene interactions.

    (DOCX)

    S1 Fig. Open TG-GATEs study design.

    6-week-old male Crl:CD Sprague-Dawley (SD) rats were treated with a range of compounds using daily repeat-dosing. For each compound, four doses were used including a vehicle control, and samples were taken at 8 timepoints. For each combination of compound, timepoint and dose, histopathology was annotated and gene expression measured for 3 replicates.

    (PDF)

    S2 Fig. Distribution of toxscores across histopathological findings.

    (PDF)

    S3 Fig. Frequency of histopathological findings before and after first adverse histopathology.

    For adverse and non-adverse histopathological findings, the frequency before or at first adverse histopathology is shown (left). For adverse findings, this indicates how frequently they were one of the first adverse histopathological findings given that they cannot occur before by definition. This identifies single-cell necrosis at any severity (“null”), as the most frequent finding, both in absolute and relative terms.

    (PDF)

    S4 Fig. Background distribution of temporal association metrics across pathway and Transcription Factor (TF) events.

    The dependency between different metrics is shown. A) Frequency of events by median max. |logFC| before or at histopathology and enrichment p-value. B) Frequency of events by true positive rate (TPR) and positive predictive value (PPV) before or at adverse histopathology. C) Direct relation between TPR, PPV and enrichment p-value. D) Direct relation between TPR, PPV and frequency in background time-series.

    (PDF)

    S1 File. Removed outlier samples.

    (CSV)

    S2 File. Time concordance metrics for all TFs, pathways as well as genes using both a minimal |logFC| of 0.5 and 1.

    (XLSX)

    Attachment

    Submitted filename: point2point_AB.docx

    Data Availability Statement

    Histopathology data was derived from the supplementary information of Sutherland et al. (2018), accessed through https://doi.org/10.1038/tpj.2017.17. TG-GATEs gene expression data was derived from the Life Science Database Archive (https://dbarchive.biosciencedbc.jp/en/open-tggates/download.html). The files and code for the Shiny app are deposited in GitHub (https://github.com/anikaliu/DILICascades_App) and Zenodo (doi:10.5281/zenodo.5767783).


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES