Summary
Nonalcoholic steatohepatitis (NASH) is the most common chronic liver disease globally and a leading cause for liver transplantation in the US. Its pathogenesis remains imprecisely defined. We combined two high-resolution modalities to tissue samples from NASH clinical trials, machine learning (ML)-based quantification of histological features and transcriptomics, to identify genes that are associated with disease progression and clinical events. A histopathology-driven 5-gene expression signature predicted disease progression and clinical events in patients with NASH with F3 (pre-cirrhotic) and F4 (cirrhotic) fibrosis. Notably, the Notch signaling pathway and genes implicated in liver-related diseases were enriched in this expression signature. In a validation cohort where pharmacologic intervention improved disease histology, multiple Notch signaling components were suppressed.
Keywords: machine learning, NASH, transcriptomics, histology, pathology, pathogenesis, prognosis, fibrosis
Graphical abstract

Highlights
-
•
Machine learning-based histology and transcriptomics reveal a NASH gene signature
-
•
Expression of the 5-gene signature distinguishes stage F3 from F4 fibrosis
-
•
The 5-gene signature is also associated with risk of clinical events
-
•
Signature gene, JAG, validates Notch signaling role in a clinical cohort
Conway et al. identify a 5-gene signature associated with severe (F3 and F4) NASH, an increasingly prevalent disease with no available treatments. Higher expression of this signature correlates with greater disease severity and with risk of progression. This signature may provide insight into NASH pathogenesis and potential therapeutic targets.
Introduction
Nonalcoholic fatty liver disease (NAFLD) is a common clinical disorder representing the hepatic manifestation of metabolic syndrome. The combination of cellular injury, inflammation, and fibrosis characterizes the subset of patients with NAFLD with progressive nonalcoholic steatohepatitis (NASH).1 Patients with NASH with fibrosis progressing to cirrhosis have an increased risk of liver-related complications, and consequently, NASH is a leading cause of liver transplantation in the US.2,3 To better understand the etiology of NASH, genome-wide association studies (GWASs) studies have identified genetic risk factors associated with higher likelihood of NASH and progressive disease.4,5 The clinical consequence of these at-risk single-nucleotide polymorphisms (SNPs) is best illustrated by PNPLA3 (rs738409), while SNPs in other genes such as HSD17B13 (rs72613567) confer a decreased risk of NASH.4,6,7,8,9,10 The contribution of these SNPs in driving NASH pathogenesis, however, remains unclear.
The role of these alleles in NASH pathogenesis is poorly defined because multiple genes and environmental factors contribute to the NASH phenotype. This gene-environment interaction is best illustrated by the absence of liver disease in normal-weight individuals with the PNPLA3 risk allele, the variant most highly associated with progressive liver disease and clinical outcomes.11 Comparative analyses of RNA from tissues from patients with mild and severe forms of NASH have been limited by their reliance on cross-sectional data of patients with very early and advanced NASH.12,13 A potentially useful advance would be to prospectively characterize genomic-histology relationships in patients with advanced NASH fibrosis that are more likely to have clinical manifestations. In this way, genomic signatures may be derived that have associations with clinically relevant outcomes.
Another limitation of analyzing genetic associations with histologic features is the methodology associated with NASH histology interpretation. The NASH Clinical Research Network (CRN) scoring system (comprising the NAFLD Activity Score [NAS] and fibrosis score) represents an ordinal approach to a disease with a varied and linear spectrum of injury.14 Moreover, the original NASH CRN implicated, but did not incorporate, features such as portal inflammation and ductular reaction that have been proposed as important in NASH pathogenesis.15,16 One methodology that may overcome these limitations is the use of machine learning (ML)-based identification and quantification of NASH histology. Using high-resolution assessment of NASH histology, trained models exhaustively annotate all liver tissues and features relevant to NASH disease and, from this, generate human-interpretable features (HIFs) that describe relationships between features and tissues to produce quantitative and reproducible results.17,18 Our recent work demonstrated strong concordance with human pathologic findings and provided a quantifiable and perfectly reproducible system to characterize NASH histology at the whole-slide level.17 This allowed us to capture data on the spectrum of histologic features and describe their changes in response to therapies.19
Due to the granularity of the histologic features afforded by the ML-powered pathology approach, we hypothesized that integrating hepatic transcriptomic data with ML-quantified histology would identify biological pathways relevant to NASH pathogenesis in those patients most at risk for liver-related complications. The comprehensive nature of ML-based assessment enables the exploration of histologic-genomic associations not accessible in previous studies where genomics was combined with traditional histology interpretation.13 This study integrates ML-based histopathologic features associated with clinical outcomes with RNA sequencing (RNA-seq) data extracted from liver biopsies to identify ML-model-predicted genes associated with advanced NASH (F3/F4 fibrosis) and clinical outcomes. From this, we defined a 5-gene expression signature that correlates strongly with a patient’s risk of disease progression and demonstrates the central role of Notch signaling.
Results
ML enabled identification of histology and genomic correlates
The workflow developed for integrating histologic and genomic associations to identify potentially novel transcriptomic signatures that correlate with NASH severity and progression is shown in Figure 1. ML-predicted HIFs were selected that had previously been shown to comprehensively characterize NASH histology and the relationship of these features to clinical outcomes.17 The nine HIFs were measures of the proportional areas of steatosis, hepatocellular ballooning, lobular inflammation, fibrosis, portal inflammation, bile duct/ductules, hepatocellular swelling, normal hepatocytes in the tissue, and the ratio of steatosis to ballooning (Figure 2A). In a discovery cohort consisting of paired biopsy and transcriptomic data from two clinical trials that enrolled patients with pre-cirrhotic F3 and cirrhotic F4 fibrosis,20 lasso linear regression analysis was used to evaluate the ability of each individual HIF to predict the expression of 16,500 genes identified by RNA-seq analysis of tissue extracted from the same blocks as the histology slides. This lasso linear regression model significantly predicted a subset of all expressed genes, referred to as the HIF-selected gene group (∼3,000 genes with Pearson correlation > 0.5 and Bonferroni corrected p value < 0.001), that were the most significantly associated with any of the nine HIFs. HIFs with the highest average lasso regression coefficients for the HIF-selected genes were then selected as significantly predictive of gene expression (Figure 2A). HIFs that characterize the proportional areas of portal inflammation, bile duct/ductules, and fibrosis had the greatest average coefficients for the HIF-selected genes. These three predictive HIFs were used for subsequent histologic-transcriptomic analyses (Figure 1B).
Figure 1.
Schematic representation of workflow
Integration of ML-based histology with transcriptomics in a discovery dataset to generate a 5-gene signature that predicted clinical events in advanced NASH with feature identification of HIFs from histopathology and gene expression signatures from transcriptomic data (A), selection of important genes related to HIFs (B), identification of key genes from integrative network analysis (C), clustering of samples based on gene expression (D), and evaluation of clinical events (E).
Figure 2.
Integration of identified genes with histologic features of NASH severity pinpoints key genes and gene clusters
(A) Distribution of coefficients for the top predicted genes (r-squared > 0.5) from lasso linear regression using the 9 HIFs.
(B) Venn diagram showing intersection of top correlated genes for the three most predictive HIFs. Heatmap shows that expression of 75 genes divided the discovery dataset into two primary groups, Clusters 1 and 2 correlate with fibrosis stages F3 and F4. A third subgroup can be seen within cluster 2. Distribution of CRN fibrosis scores in each cluster is shown in the table.
(C) Network integrating the three predictive HIFs with 75 top correlated genes. Network-based analysis via graphical lasso identifies five genes that have significant connections with all 3 HIFs (JAG1, VIM, VWF, PDGFRA, and CLSTN1).
(D) Differential expression of JAG1, VIM, VWF, PDGFRA, and CLSTN1 in tissue from clinical trial participants that had F1, F2, F3, or F4 fibrosis (fibrosis was staged by a central pathologist as part of clinical trial procedures).
For all, comparisons were made using the paired Mann-Whitney U test (p value < 0.05).
ML models quantified the three predictive HIFs in NASH liver tissue biopsy samples, and the model output was correlated with the gene expression data to identify the strongest histologic-genomic associations. To identify a prognostic gene expression signature, we identified 75 genes that were most strongly correlated with all three predictive HIFs (all Bonferroni-corrected p values < 0.001) from a subset of the HIF-selected genes (200 genes that were most significantly associated with any predictive HIF; Figure 2B). The combination of the 75 top correlated genes and three predictive HIFs was used in subsequent analyses to derive transcriptomic signatures and determine their association with NASH disease severity and clinical progression (Figures 1C–1E).
Association of identified genes and histologic features with NASH severity
Hierarchical clustering was used to investigate associations between the top correlated gene expression signature (n = 75 genes) and NASH CRN scores in the discovery cohort (determined manually by the central pathologist during the clinical trial period), revealing two distinct sample clusters (clusters 1 and 2) that correlated with fibrosis stage (Figure 2B). The majority of the samples in cluster 1 had a NASH CRN fibrosis score of F3 (65%; 548/837), whereas cluster 2 samples had mostly a NASH CRN fibrosis score of F4 (79%; 735/933). An expression signature of the top correlated genes based on three ML histology-based features therefore could distinguish F4 from F3 fibrosis at the gene level.
Notably, within cluster 2 is a clearly demarcated subgroup, subcluster 2, distinguished by almost uniform fibrosis severity (96% [324/337] F4 samples compared with 69% [411/596] F4 samples in the rest of cluster 2 excluding subcluster 2 and 25% F4 samples in cluster 1; Figure 2B) and higher expression of the top correlated gene group signature compared with the rest of cluster 2. Subcluster 2 samples were also similarly enriched for severe ballooning (87% NAS for ballooning 2) and lobular inflammation (73% NAS for lobular inflammation 3) compared with cluster 1 (69% and 38%, respectively), although this enrichment was in line with overall composition of cluster 2 (Figure S1). The presence of subcluster 2 supports the hypothesis that, within NASH cirrhosis, a distinct gene signature characterizes those patients with histologic features notable for active fibrogenesis.
Identification of driver genes of NASH using a histologic-genomic integrative network
Using the top correlated gene group, we next examined putative driver genes of NASH progression by integrating genomics with the predictive HIFs in network analysis. Network connectivity was derived using a graphical lasso,21 leading to identification of five key genes that were the most densely connected to all the predictive HIFs (Figures 1C and 2C) (see STAR Methods for additional details on network development). Each of the genes, JAG1, VIM, VWF, PDGFRA, and CLSTN1 (ordered based on average Pearson correlation to the predictive HIFs [0.70, 0.66, 0.65, 0.64, and 0.63], all Bonferroni-corrected p values < 0.001), was significantly differentially expressed in tissue across the NASH CRN fibrosis stages F1–F4 (for all, Mann-Whitney test p value < 0.05), and expression increased with increasing severity of fibrosis (Figure 2D).
Association of the 5-gene signature with F3 or F4 NASH was evaluated by splitting the discovery cohort into its two distinct clinical trial datasets: STELLAR 3 and 4. STELLAR 3 enrolled F3 fibrosis subjects at baseline, with 16% (59/379) progressing to cirrhosis. STELLAR 4 enrolled F4 fibrosis subjects at baseline, 2.3% (10/438) of whom had liver-related events at a median of 15.9 months after enrollment.20 Hierarchical clustering of STELLAR 3 or 4 patients based on the expression level of the 5-gene signature at baseline divided each cohort into 2 groups that corresponded to low and high levels of gene expression (Figures 3A and 3B). The 5-gene signature was highly expressed in 49% of STELLAR 3 subjects and 47% of STELLAR 4 subjects. Association of the high- and low-expression groups with disease progression in STELLAR 3 and 4 demonstrated that subjects with high expression of all 5 genes had an elevated risk of progression to cirrhosis or clinical events, respectively (Figure 3C). In STELLAR 3, subjects with high expression of all five genes had increased risk for progression to cirrhosis (p = 0.008 (log rank test); C-index = 0.60; hazard ratio [HR] 2.14, 95% confidence interval [CI] [1.18, 3.86]). In STELLAR 4, subjects with high expression had increased risk for development of liver-related clinical events (p < 0.005; C-index = 0.77; HR 5.48, 95% CI [1.60, 18.82]). Inclusion of the baseline areas of the three HIFs to this hierarchical clustering again stratified patients into two groups that correspond to high gene expression plus high proportional areas of predictive HIFs or low gene expression plus low proportional areas of predictive HIFs (Figures 3D and 3E). In STELLAR 3, 28% of subjects highly expressed the 5-gene signature and had high proportional areas of predictive HIFs, whereas in STELLAR 4, 53% of subjects had the same high gene expression/high HIF area pattern. Addition of the ML HIFs improved the association (C-index) of the groups with progression to cirrhosis only marginally in STELLAR 3 (p < 0.005; C-index = 0.63; HR 2.87; 95% CI [1.62, 5.07]) and did not alter the prediction of liver-related events in STELLAR 4 (C-index = 0.74; p < 0.005; HR 4.75; 95% CI [1.35, 16.70]) (Figure 3C). This suggests that the expression pattern of our 5-gene signature alone could determine a patient’s risk of progression to cirrhosis or clinical decompensation.
Figure 3.
A histologic-genomic network identifies a 5-gene signature that correlates with clinical events
(A and B) Heatmaps showing that expression of the 5-gene signature (JAG1, VIM, VWF, PDGFRA, CLSTN1) alone reveals two groups corresponding to high and low expression of this signature in STELLAR 3 (A) and STELLAR 4 (B).
(C) Level of expression of the 5-gene signature identifies clinical trial participants at high and low risk of disease progression. Addition of the abundance of 3 significant HIFs to the gene expression signature for stratification improves prediction of risk. Associations between the patient clusters and disease progression were determined using the Kaplan-Meier estimator and Cox proportional hazards regression analysis with elastic net regularization (ratio = 0.01).
(D and E) Addition of the area proportion of the 3 predictive HIFs to the gene expression signature divides each cohort into two groups corresponding to high gene expression and area proportion of HIFs and low gene expression and area proportion of HIFs as revealed in the STELLAR 3 (D) and STELLAR 4 (E) heatmaps.
Genetic pathways associated with NASH disease progression
To determine how our 5-gene signature may reflect NASH pathogenesis, we explored the biological function of each gene and identified their active pathways using the KEGG database.22,23 The KEGG analysis showed that the five genes intersect across multiple different pathways (Table S1), limiting interpretation of the molecular circuitry of NASH pathogenesis. To address this, we selected genes from the original HIF-selected gene group (1,139 genes significantly associated with all three HIFs, portal inflammation, bile duct/ductules, and fibrosis, all p < 0.001) to create a protein-protein interaction (PPI) network of the relationships between the protein expressed by these genes (Figure 4A).24,25,26,27 Closely connected proteins or regions of the network correspond to functional modules.28 Network community detection was performed along with computation of the Wasserstein distance between genes (Figure 4B) to reveal six major groups or functional modules where the proteins within each group are densely connected (Figure 4C). Applying these six functional modules to the Database for Annotation, Visualization and Integrated Discovery (DAVID),29,30,31 eight enriched pathways were identified: T cell receptor signaling, NOD-like receptor, apoptosis, cGMP-PKG signaling, focal adhesion, PI3K-AKT signaling, transforming growth factor-β (TGF-β) signaling, and Notch signaling (Figure 4C). By cross-referencing with the genes in our 5-gene signature, we found that these genes are active in many of these pathways including Notch signaling, focal adhesion, and PI3K-AKT (Table S1). It should be noted that von Willebrand factor (VWF), one of the genes in the predicative signature, was not represented in this network (Figure 4C) because the database that we used for this analysis (DAVID) does not include VWF in the PI3K-AKT pathway.
Figure 4.
Network analysis identifies genetic pathways associated with disease progression
(A) Venn diagram showing the correlation of 1,500 significant genes with each of the 3 HIFs.
(B) Network community detection via Wasserstein distance applied to identify the subnetwork of genes that are densely connected internally within the network for pathway analysis.
(C) Protein-protein interaction network analysis of 1,139 genes that significantly correlated with all three HIFs identified pathways involved in progression of NASH from F3 and F4 fibrosis. Bonferroni corrected p values for each pathway are shown in parentheses.
Evaluation of the 5-gene signature in a validation dataset
The 5-gene signature was evaluated in a validation dataset of paired RNA-seq gene expression data and ML-predicted HIFs generated from liver biopsies from a separate clinical trial (ATLAS) that enrolled patients with advanced NASH fibrosis (42% F3 and 56% F4) at baseline, a distribution similar to the discovery cohort.19 Mirroring the analyses conducted previously, the 200 most highly expressed genes that were most strongly correlated with the ML HIFs portal inflammation, bile duct/ductules, or fibrosis were selected. There was substantial overlap between the 200 genes selected in the validation dataset with those selected in the discovery dataset, including JAG1, VIM, VWF, PDGFRA, and CLSTN1: 74% of genes significantly correlated with bile duct/ductules, 56% significantly correlated with portal inflammation, and 70% significantly correlated with fibrosis and were commonly expressed in biopsies from validation (ATLAS) and discovery (STELLAR) cohorts (Figure 5A).
Figure 5.
Discovery and validation cohort gene signatures demonstrate high degree of concordance to HIFs
(A) Overlap between 200 of the most highly expressed genes in tissue from STELLAR and ATLAS clinical trials, correlated with each of the three significant HIFs. STELLAR and ATLAS share similar significantly correlated genes (74% in bile duct, 56% in portal inflammation, and 70% in fibrosis for top 200 genes).
(B) Application of the 5-gene expression signature (JAG1, VIM, VWF, PDGFRA, CLSTN1) to tissue from the ATLAS trial identifies two cohorts of patients that correlate with patient cohorts with F3 and F4 fibrosis (as staged by the central pathologist as part of clinical trial procedures). Fisher exact test p value < 0.001.
Next, expression of the 5-gene signature (JAG1, VIM, VWF, PDGFRA, and CLSTN1) in the validation dataset was investigated. As in the discovery cohort, expression of the 5-gene signature again stratified biopsies (baseline and end of treatment [EOT] biopsies) in the validation cohort into two distinct clusters that significantly associated with CRN fibrosis stages F3 and F4 (Fisher’s exact test p value < 0.001), where higher expression of the gene signature associated with increased severity of fibrosis (Figure 5B). Expression of each of the five genes individually in biopsies from patients with F3 or F4 fibrosis from this cohort was investigated, and the results showed that relatively higher expression of each gene was associated with more severe F4 fibrosis (Figure 6). Thus, expression of the 5-gene signature defined two distinct fibrosis-stage clusters in this dataset. These results validate these genes as drivers of advanced fibrosis and strongly suggest that these genes relate to progression of fibrosis from the F3 (pre-cirrhosis) to the F4 (cirrhosis) stage of NASH.
Figure 6.
Comparison of expression of the 5-gene signature genes in F3 and F4 fibrosis in the validation cohort
Boxplots of the expression of each of the 5 genes (JAG1, VIM,VWF, CLSTN, and PDGFRA) in biopsies with F3 (light blue) or F4 (dark blue) fibrosis from the validation (ATLAS) cohort. Fibrosis stage was determined by the study’s central pathologist. In each case, the gene is expressed at higher levels in the samples with F4 fibrosis (p < 0.001). p values are from comparisons made between gene expression in F3 and F4 samples using the paired Mann-Whitney U test.
Regulation of Notch signaling is predictive of treatment responses in advanced NASH
Given the prominence of JAG1 and Notch signaling in the 5-gene signature and protein network analysis, respectively, we interrogated multiple components of Notch signaling at baseline and after 48 weeks of treatment in the validation cohort. The ATLAS trial demonstrated that combination treatment with an FXR agonist (cilofexor) and an ACC inhibitor (firsocostat) resulted in comprehensive histologic and biochemical treatment responses,19 allowing us an opportunity to determine whether treatment responses coincided with changes in Notch signaling. In placebo- and firsocostat-treated patients, expression levels of NOTCH1 declined (p = 8.4 × 10−4), but JAG1, NOTCH2, and the Notch transcriptional target HES1 demonstrated no changes over 48 weeks (Figures 7A and 7B). Cilofexor-treated patients demonstrated significant decreases in expression of JAG1 (p = 0.014) and NOTCH1 (Figure 7C) without changes in HES1. Only the combination arm of cilofexor/firsocostat demonstrated significant decreases in expression of JAG1 (p = 1.2 × 10−4) and NOTCH1/2 (p = 2.6 × 10−6 and 2.5 × 10−4, respectively) and the transcriptional target HES1(p = 0.011) (Figure 7D). In contrast, the other four genes in the 5-gene signature demonstrated less robust correlation by treatment group (Figure S2). These findings from a clinical trial demonstrate that decreases in expression of key Notch signaling components in response to therapy correspond with broad histologic improvement in patients with advanced NASH.
Figure 7.
Gene expression changes in the Notch pathway predict treatment responses in advanced NASH
(A) Boxplots of the expression at weeks 0 and 48 for the (A) placebo-treated patients; while expression levels of NOTCH1 decrease (p = 8.4 × 10−4), expression of JAG1, NOTCH2, and HES1 show no transcriptional changes between week 0 and 48 (p > 0.05).
(B) Similar to placebo-treated patients, firsocostat-treated patients show a decrease in expression levels of only NOTCH1(p = 0.011).
(C) Cilofexor-treated patients show significant decreases in expression of JAG1 (p = 0.014) and NOTCH1 (p = 0.0075).
(D) Only patients given the combination treatment of cilofexor and firsocostat demonstrated significant decreases in expression of JAG1 (p = 1.2 × 10−4), NOTCH1/2 (p = 2.6 × 10−6 and 2.5 × 10−4, respectively), and HES1 (p = 0.011). The two-sided p values are from comparisons between week 0 and 48 posttreatment between individuals receiving placebo, cilofexor, firsocostat, and a combination therapy of cilofexor and firsocostat using a Mann-Whitney U test. The line within the box indicates the median with the upper and lower ends corresponding to the first and third quartiles (25th and 75th percentiles), respectively, while the ends of the whiskers represent 1.5 times the interquartile range (IQR).
Discussion
The current study deployed an ML-based pathology platform to correlate histologic features with RNA-seq data and then evaluated these associations to clinical outcomes from patients prospectively enrolled in clinical trials of advanced (F3/F4) NASH fibrosis. Identification of a prognostic 5-gene signature using this approach predicted progression to cirrhosis in patients with F3 NASH and clinical events in those with F4 NASH cirrhosis. These results provide validation for the ML-based histologic assessment of portal inflammation and bile duct/ductule area, histologic features that have long been implicated, but not formalized, in NASH histologic scoring.1 The significant correlation between this gene signature and histologic features was reflected in the marginal improvement in predictive ability for clinical events when ML histology was combined with the 5-gene signature.
The development of an ML platform to reproducibly characterize NASH histology was instrumental in the development of the gene signature. The current NASH classification systems, stratified by an ordinal system, are limited in dynamic range and reproducibility.32 For instance, two of the key histologic features in this study, portal inflammation and bile duct/ductular proliferation, may be qualitatively assessed by expert pathologists, but reproducible and quantitative changes over time remain difficult to characterize without the benefit of an ML-based platform. In our previous study, we demonstrated that ML-derived histological features have superior prognostic ability compared with manual pathological features for routinely scored components of NASH histology.17 Finally the reproducibility of the ML-based platform obviates the well-known inter- and intraobserver variability associated with traditional pathological interpretations of NASH.
Multiple studies have previously examined human NASH mostly using a microarray approach to identify gene signatures associated with pathogenesis.12,33,34,35,36 Govaere et al. recently performed a comprehensive genomic analysis across the entire spectrum (NAS scores and fibrosis stages) of biopsy-confirmed NASH and NAFLD using RNA-seq analysis.13 They found that when patients with NASH F0/F1 served as the reference point, no differences in gene expression were found in patients with NASH F2, while a stepwise increase in the number of differentially expressed genes occurred in F3 and F4 samples.12 This distinct transcriptomic profile between F3/F4 patients from earlier forms of NASH indicates a broad range of genomic changes that coincide with the clinically relevant finding of increased liver-related outcomes in patients with F4 fibrosis.37 The absence of genomic differences in earlier stages of NASH suggest a strong degree of interchangeability among patients with early histologic fibrosis. In contrast, the larger number of genes separating F3 and F4 from earlier stages, and the even larger separation between F3 and F4 found in the present study, is reflected in multiple histologic features uniquely seen with the development of progressive fibrosis and cirrhosis.
A central finding of the present study was the identification of Notch signaling pathway genes in the advanced NASH population and their relationship to histologic progression and clinical events. The JAG1 gene, the canonical ligand for Notch signaling, correlated with two central histologic features of progressive NASH fibrosis, portal inflammation, and ductular proliferation. JAG1 encodes for the Jagged1 protein, the central developmental signal that delineates biliary lineage specification. Its presence determines differentiation to cholangiocytes, whereas its absence results in hepatocyte lineage specification. Interestingly, human loss-of-function mutations in JAG1 result in the classic biliary disease Alagille’s syndrome, where the relative paucity of bile ducts results in dramatic cholestasis but, in many patients, disproportionately little hepatic fibrosis until late in the stage of disease.38 Pajvani and colleagues demonstrated in elegant genetic models of Notch overexpression and knockdown the enhancement and reduction, respectively, of experimental NASH fibrosis.39,40 In the present study, we analyzed the Notch pathway genes identified in a recent study of NASH compensated cirrhosis where the combination of an ACC inhibitor and FXR agonist led to histologic and biochemical markers of improvement.19 Multiple Notch signaling genes including HES1 were suppressed with the combination therapy of an ACC inhibitor and an FXR agonist, which improved histology and other outcomes such as decreases in serum bile acids, liver transaminases, and biomarkers of fibrosis.19 Thus, using a target agnostic RNA-seq approach, we have identified a key role for Notch signaling genes in patients with advanced NASH fibrosis/cirrhosis and their decline with therapeutic modalities distinct from Notch signaling itself.
The other genes captured in the 5-gene signature have been implicated individually in the pathophysiology of advanced liver disease. For instance, the gene for von Willebrand factor (VWF) is expressed primarily by endothelial cells, and its serum protein (vWF) levels are relevant to advanced liver disease. Multiple clinical studies highlight vWF’s role as a biomarker that is relevant to portal hypertension, a measure of disease severity in cirrhosis and the physiologic determinant of clinical events in cirrhosis.41,42,43,44,45,46,47,48 The presence of VWF in the 5-gene signature suggests a pathophysiologic step whereby alterations in liver-directed blood flow and endothelial gene expression may precede overt changes in systemic hemodynamics.44 The identification of a VWF gene signal from baseline liver biopsies in this less-advanced cirrhosis population suggests a pathogenic role for vWF earlier in the spectrum of cirrhosis prior to the advent of obvious clinical manifestations. The presence of PDGFRA and VIM likely reflect pathological processes associated with activated hepatic stellate cells and the deposition of excessive matrix seen in advanced liver disease. PDGFRA is a critical proliferative and chemotactic signal for fibrogenic hepatic stellate cells, while VIM codes for vimentin, an intermediate filament expressed at high levels in activated fibroblasts.49,50 One limitation of the current work is the absence of protein or in situ hybridization confirmation of these gene signals from our samples. The ample literature related to these targets, however, confirms their presence in advanced fibrosis and cirrhosis.43,51 Future studies specifically delineating these targets at the histologic level would provide valuable confirmation.
The 5-gene signature from the present study has implications for understanding NASH pathogenesis and therapeutic approaches to address NASH. The discovery of at-risk SNPs has informed the understanding of genetic susceptibility, but their role in pathogenesis remains unclear.52 Unsupervised clustering approaches typically find a lack of enrichment of these alleles in patients with NASH with advanced disease versus those with early disease.13 A major exception is the presence of the PNPLA3 risk allele, consistent with widely published literature from multiple different cohorts supporting this association.1,2,3 The results from this study of patients with advanced NASH support a multifactorial pathogenesis involving canonical signaling pathways such as Notch along with genetic signals indicative of abnormal matrix deposition and endothelial cell dysfunction. These findings suggest that once an advanced stage of NASH is reached, the drivers of NASH pathogenesis reflect those pathways typically associated with chronic wound healing. The histological responses seen with successful treatment and their concordance with suppression of Notch signaling genes suggests that modulation of this pathway, directly or indirectly, may be fundamental to address advanced NASH.39,40
In conclusion, our methods leveraged the high-resolution nature of complementary ML histology and genomic approaches to identify genetic signatures that are associated with NASH fibrosis progression and clinical outcomes. This approach identified a 5-gene signature that correlated with clinical outcomes and validated the importance of portal inflammation and ductular proliferation in advanced NASH pathogenesis. Genomic responses to recent investigational combination therapies confirmed that successful treatment of patients with NASH with advanced fibrosis, including cirrhosis, involves the suppression of the Notch signaling pathway.
Limitations of the study
A limitation of the present work may reflect the selection of the top 200 genes associated with the ML histologic features. Since a major goal of the study was to identify a parsimonious set of genes that could explain the molecular underpinnings of the histology and clinical outcomes, a threshold of the top 200 genes was defined. A network-based method (graphical lasso) yielded five genes where connections between conditionally independent genes were removed. Similarly, for pathway analysis, we considered a larger subset of most of the significantly correlated genes to comprehensively recover pathways consisting of multiple related genes, which were related to the key histological features. Thus, our selection of the top 200 genes reflects a way of managing this bioinformatics approach for biological interpretation. We acknowledge the possibility that some relevant genes may have been missed with the current approach.
STAR★Methods
Key resources table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Biological samples | ||
| Human Liver Biopsies from patients with F3 NASH participating in the STELLAR-3 clinical trial | Gilead Sciences, Inc. | https://clinicaltrials.gov/ct2/show/NCT03053050 |
| Human Liver Biopsies from patients with F4 NASH participating in the STELLAR-4clinical trial | Gilead Sciences, Inc | https://clinicaltrials.gov/ct2/show/NCT03053063 |
| Human Liver Biopsies from patients with F3 or F4 NASH participating in the ATLAS clinical trial | Gilead Sciences, Inc | https://clinicaltrials.gov/ct2/show/NCT03449446 |
| RNA from liver biopsies from patients with F3 NASH participating in the STELLAR-3 clinical trial | Gilead Sciences, Inc | https://clinicaltrials.gov/ct2/show/NCT03053050 |
| RNA from liver biopsies from patients with F4 NASH participating in the STELLAR-4 clinical trial | Gilead Sciences, Inc | https://clinicaltrials.gov/ct2/show/NCT03053063 |
| RNA from liver biopsies from patients with F3 or F4 NASH participating in the ATLAS clinical trial | Gilead Sciences, Inc | https://clinicaltrials.gov/ct2/show/NCT03449446 |
| Critical commercial assays | ||
| TruSeq RNA Exome | Illumina | Cat. No. 20020189, 20020490, 20020183, and 20020492 |
| Deposited data | ||
| Source code for downstream analysis and figure generation | GitHub | https://github.com/Path-AI/nash-rna-seq-cell-reports-medicine |
| Software and algorithms | ||
| Salmon | N/A | https://github.com/COMBINE-lab/Salmon |
| CNN-based models for quantification of NASH histological features from H&E or Trichrome stained biopsies | PathAI | N/A |
| Other | ||
| Database for Annotation, Visualization and Integrated Discovery (DAVID) | N/A | https://david.ncifcrf.gov/home.jsp |
| Kyoto Encyclopedia of Genes and Genomes (KEGG) PATHWAY | N/A | https://www.genome.jp/kegg/ |
Resource availability
Lead contact
Further information and requests for resources described within should be directed to and will be fulfilled by the lead contact, Ilan Wapinski (ilan.wapinski@pathai.com).
Materials availability
No unique resources were developed for this study.
Experimental model and subject details
Human subjects
Anonymized liver tissue samples and digitized WSI of hematoxylin and eosin (H&E)- and trichrome -stained liver biopsies from adult patients with advanced fibrosis due to NASH that had in participated in any of three complete randomized controlled trials of NASH therapeutics (STELLAR-3 [NCT03053050], STELLAR-4 [NCT03053063], and ATLAS [NCT03449446]19,20) and had provided informed consent for future genomic research and tissue histology were used in this report as the source of histological and transcriptomic information. Institutional approval and oversight of these studies was previously reported.19,20 Liver tissue was collected from clinical trial participants by core needle biopsy at baseline and week 48 (end of treatment) and had to meet quality metrics as defined in the clinical protocols, before a single central pathologist generated slide-level scores according to the NASH CRN.19,20 This dataset analyzed here was a subset of all available samples. In total, our dataset included a discovery cohort of liver tissue and WSI from 1208 participants from the combined STELLAR studies (71.9% of all 1,679 patients), and 186 participants from the ATLAS study (47.4% of all 392 patients) where the latter served as the validation cohort for the 5-gene signature derived from the STELLAR studies.
Details of the original clinical trials have been previously published, and those publications should be referred to for more details.19,20 In short, the phase 3 STELLAR studies enrolled adult patients (Aged: 52–64; Female: 59.6%; While: 73.6%; Hispanic: 14.5%; F3: 47.8%; F4: 52.2%) with a histologic diagnosis of NASH (defined as the presence of at least grade 1 steatosis, hepatocellular ballooning, and lobular inflammation according to the NAS) and either bridging (F3) fibrosis (STELLAR-3) or compensated cirrhosis, stage F4, (STELLAR-4). Both studies were terminated after a preplanned efficacy analysis at week 48 demonstrated that the study drug, selonsertib (SEL), was ineffective compared with placebo.20 The phase 2b ATLAS study randomized enrolled adult patients (Aged: 55–66; Female: 62%; While: 89%; Hispanic: 26.8%; F3: 44%; F4: 56%) with advanced fibrosis (F3-F4) due to NASH to treatment with SEL, firsocostat (FIR), or cilofexor (CILO), alone or in two-drug combinations, for 48 weeks19
Method details
Generation and quantification of RNA-seq data from liver biopsy samples
To characterize NASH at the level of the predominant genetic networks and pathways active during disease progression, RNA-seq (TruSeq, Illumina, Inc., San Diego, CA) was used to generate transcriptomic profiles of liver tissue. RNA-seq was performed on samples extracted from paraffin-embedded tissue blocks as previously described.53 In brief, RNA-Seq (SureSelect protocol) was analyzed on formalin-fixed, paraffin-embedded (FFPE) liver biopsies. RNA quality control was assessed by DV200 > 10%. RNA was isolated from 2,006 tissue samples isolated at baseline and week 48, from the STELLAR and ATLAS clinical trials. To avoid batch effects, RNA isolated from the treatment and placebo arms of the two STELLAR trials were assayed together and, likewise, RNA isolated from the treatment and placebo arms of the ATLAS trials were assayed together for further analysis.
RNA-seq data were aligned to the reference transcriptome using Salmon.54 These were imported into R (v.4.0.5) and log transformed after conversion to counts per million (CPM) using edgeR (v.3.30.2).55 Patient samples (n = 194) were excluded if the time between treatment and biopsy was ≥3 days and stratified by treatment arm (placebo, cilofexor, firsocostat, or combination cilofexor and firsocostat). Comparisons between week 0 and week 48 gene expression were performed using a paired Mann Whitney U test.
ML model training for quantification of NAS features
Models were previously trained and are described in detail in both published data17,19 and unpublished data (J.S.I. et al., unpublished data). In summary, 116,000 pathologist-derived annotations of H&E− and trichrome-stained WSI were grouped into classes as appropriate and then used to generate training sets of image patches on the order of 500,000 samples. These patches were used to train a deep CNN with stochastic minibatch gradient descent using the ADAM optimizer56 to produce pixel level predictions of NAS components steatosis, lobular inflammation, and hepatocellular ballooning, as well as fibrosis, portal inflammation and bile duct. Models are comprised of 8–12 blocks of compound layers with a topology inspired by residual networks and inception networks with a softmax loss.57,58 Model training was monitored, and hyperparameters adjusted based on the performance of the model on pathologist annotations from the held-out validation set, until convergence was achieved. Model training was further augmented using a method called distributionally robust optimization59,60 to improve model generalization across multiple clinical and research contexts. ML-quantified features were computed at the image level to summarize both the H&E− and trichrome-based predictions.
Statistical analyses
A multivariate analysis was performed to find the ML-quantified histology features that are most predictive of the gene expression data from the STELLAR trials. A linear regression model with lasso regularization was applied to predict the expression of 16,471 genes by each of the nine ML-quantified HIFs and detect the most predictive ML HIFs. The association of each of the three predictive ML-quantified histological features (proportional areas of bile duct, portal inflammation, and fibrosis) to transcriptomic data was further investigated using Spearman’s rank correlation analysis to identify 200 significantly correlated genes (all corrected p < 0.001).
An integrative, data-driven, network was built using 75 genes (of the 200 genes) that significantly correlated with all three predictive ML-quantified histological features (proportional areas of bile duct, portal inflammation, as well as fibrosis). Network connectivity was derived from graphical lasso which is a technique to derive a sparse graph by removing edges from an initially fully connected graph (i. e. all the nodes are connected). Graphical lasso estimated the precision matrix (inverse of covariance matrix) associated with the initial graph. If the ij-th component of the precision matrix is zero, it indicates that nodes i and j are conditionally independent given the value of other nodes. The key genes are identified based on the connectivity to the three ML-quantified histologic features in the sparse integrative network after graphical lasso has been applied to the initial (fully connected) network.
Patient stratification was performed by hierarchical agglomerative clustering using only the five key genes, as well as the five genes plus the three ML-histological features. For hierarchical clustering, the Euclidean distance among samples were computed using only gene expression or normalized (min-max scaling) gene expression and ML-histologic features. Agglomerative merging of clusters is performed by Ward’s minimum variance linkage. Associations between the two resulting patient clusters and disease progression were determined using the Kaplan-Meier estimator and Cox proportional hazards regression analysis with elastic net regularization (ratio = 0.01). Disease progression for F3 patients was defined as progression to cirrhosis (most often due to imaging based on clinical suspicion or laboratory abnormalities) or a hepatic clinical event such as ascites or bleeding from portal hypertension. For F4 patients at baseline, a liver-related clinical event was defined as a hepatic clinical event (ascites, portal hypertensive bleeding, hepatic encephalopathy) or listing criteria for liver transplantation, or death.
Integrated transcriptomic and machine learning-based histologic profiling of liver biopsies
ML-based histologic assessment was performed using digitized images of H&E− and trichrome-stained slides from biopsies with corresponding RNA-seq data. Previously developed machine learning (ML) models were applied to the histology images to identify and quantify features assessed in the NASH Clinical Research Network (CRN) histologic scoring system (fibrosis, steatosis, lobular inflammation, and hepatocellular ballooning).61,62 The ML models also quantified bile ducts, portal inflammation, and normal hepatocytes, which are not evaluated in the NASH CRN system. Portal inflammation and bile duct proliferation in particular have previously been associated with NASH progression.15,16 The ML models generate human interpretable features (HIFs) that are measures of the proportionate area of liver tissue associated with a particular histologic feature. Therefore, the HIFs quantitate each histologic feature on a continuous scale (e.g., 0–100% of steatosis proportionate area), characterizing the prevalence of each feature on a biopsy with more granularity compared to manual assessment, which relies on an ordinal scoring system. See ML Model Training for Quantification of NAS Features for additional information on ML model development and quantitation.
The resulting dataset comprised transcriptomic profiles and matching ML HIFs from biopsies from 1,003 subjects enrolled in three trials of NASH with advanced fibrosis/cirrhosis. Data from the STELLAR 3 and 4 trials was used as a discovery dataset to identify histologic and transcription signatures associated with NASH severity and progression. Given that the drug evaluated in the STELLAR trials was deemed not efficacious, samples were aggregated across treatment arms for analysis. The ATLAS data was the held-out validation dataset.
Integrative network and pathway analysis
Network-based analysis was performed to detect the association of gene expression and ML-histologic features at the pathway level. To this end, a larger subset of 1,500 genes that significantly correlated with at least one of the three ML-histologic features was used. The protein-protein interaction (PPI) network was built for the 1,139 genes for which the expressions were significantly correlated with all the three ML-quantified histological features (proportional areas of bile duct, portal inflammation, and fibrosis. The edges in this PPI network are obtained from the literature-derived genetic and protein interactions in the Biological General Repository for Interaction Datasets (BioGrid)27 and the Human Protein Reference Database (HPRD)25,26 furthermore, the weights are assigned to edges using the Pearson correlations of the expressions in STELLAR samples.
Network community detection was performed to recover the components or subnetworks of associated genes within the PPI network for pathway analysis. To do this, 1-Wasserstein distance, also known as earth mover’s distance, was calculated among the genes within the PPI network.63 Wasserstein distance is the measure of the distance between two probability distributions by minimizing the “cost” required to turn one distribution into another. We assigned a probability distribution to each gene based on the weights of adjacent edges to that gene (to have a probability distribution we used the weight of the edge. The Wasserstein distance was then computed between these probability distributions where the ground distance was defined as the shortest path between neighboring genes within the (unweighted) PPI network. The Wasserstein distance between two genes is consequently smaller when the neighboring genes have more connections within the network. Subsequent hierarchical clustering of genes within the network via the calculated Wasserstein distances defines (non-overlapping) community structure of grouped genes that are densely connected internally. Pathway enrichment analysis was then performed for the genes in each of the 5 largest found communities using the Database for Annotation, Visualization and Integrated Discovery (DAVID).30,64 where the pathways are derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG) PATHWAY database.22
Acknowledgments
We thank Dr. Utpal Pavjani (Columbia University Medical Center, NY) for careful reading of the manuscript and thoughtful comments. We are grateful to the software engineering and ML teams at PathAI for developing the systems and pipelines used for model development and feature extraction. We also thank Biosciences Communications for developing the figures for this manuscript and SciStories for their work on the schematic depicting the project workflow. This work was jointly funded by Gilead Sciences, Inc., and PathAI.
Author contributions
M.P., A.T.-W., O.M.C.-Z., R.P.M., C.C., and I.W. conceived the project. M.P. generated the HIFs and identified the 5-gene signature. J.C., M.P., M.R., Y.G., R.S.H., and D.Z.P. performed all other data generation and analyses. M.P. and J.C. contributed equally to this project. All authors contributed to the preparation of the manuscript.
Declaration of interests
J.C. is an employee and shareholder of PathAI, Inc. M.P. is a shareholder of PathAI, Inc. Y.G. is a shareholder of Gilead Sciences, Inc. D.Z.P. is an employee of and shareholder of Gilead Sciences, Inc. O.M.C.-Z. is a shareholder of PathAI, Inc. V.M. is an employee and shareholder of PathAI, Inc. G.M.S. is an employee and shareholder of OrsoBio, Inc., and a shareholder of Gilead Sciences, Inc. M.C.M. is an employee and shareholder of PathAI, Inc. M.R. is an employee and shareholder of PathAI, Inc. A.H.B. is a co-founder, employee, and shareholder of PathAI, Inc. R.S.H. is an employee and shareholder of OrsoBio, Inc., and a shareholder of Gilead Sciences, Inc. R.P.M. is an employee and shareholder of OrsoBio, Inc., and a shareholder of Gilead Sciences, Inc. A.T.-W. is an employee and shareholder of PathAI, Inc. I.W. is an employee and shareholder of PathAI, Inc. C.C. is an employee and shareholder of Inipharm, Inc., and shareholder of Gilead Sciences, Inc.
Published: April 18, 2023
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.xcrm.2023.101016.
Contributor Information
Ilan Wapinski, Email: ilan.wapinski@pathai.com.
Chuhan Chung, Email: chuhanchung@inipharm.com.
Supplemental information
Data and code availability
-
•
The histopathology data collected for this study is maintained by PathAI to preserve patient confidentiality and the proprietary image analysis. Access to histopathology features will be granted upon reasonable request from academic investigators without relevant conflicts of interest for non-commercial use who agree not to distribute the data. Access requests can be made to the lead contact.
-
•
Not all original code can be made publicly available. The code for cell- and tissue-type model training, inference, and feature extractions are not disclosed. To safeguard PathAI’s intellectual property, access requests for such code will not be considered. The NASH histology cell- and tissue-type models that generated the HIFs used in this investigation are described in previous publications17,19 and and in unpublished data (J.S.I. et al, unpublished data). The source code for all downstream data analyses and figure generation in this work are publicly available and can be downloaded from GitHub: https://github.com/Path-AI/nash-rna-seq-cell-reports-medicine.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
-
•
Gilead Sciences shares anonymized individual patient data upon request or as required by law or regulation with qualified external researchers based on submitted curriculum vitae and reflecting non conflict of interest. The request proposal must also include a statistician. Approval of such requests is at Gilead Science’s discretion and is dependent on the nature of the request, the merit of the research proposed, the availability of the data, and the intended use of the data. Data requests should be sent to the lead contact.
References
- 1.Chalasani N., Younossi Z., Lavine J.E., Charlton M., Cusi K., Rinella M., Harrison S.A., Brunt E.M., Sanyal A.J. The diagnosis and management of nonalcoholic fatty liver disease: practice guidance from the American Association for the Study of Liver Diseases. Hepatology. 2018;67:328–357. doi: 10.1002/hep.29367. [DOI] [PubMed] [Google Scholar]
- 2.Romeo S., Kozlitina J., Xing C., Pertsemlidis A., Cox D., Pennacchio L.A., Boerwinkle E., Cohen J.C., Hobbs H.H. Genetic variation in PNPLA3 confers susceptibility to nonalcoholic fatty liver disease. Nat. Genet. 2008;40:1461–1465. doi: 10.1038/ng.257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Eslam M., Valenti L., Romeo S. Genetics and epigenetics of NAFLD and NASH: clinical impact. J. Hepatol. 2018;68:268–279. doi: 10.1016/j.jhep.2017.09.003. [DOI] [PubMed] [Google Scholar]
- 4.Anstee Q.M., Darlay R., Cockell S., Meroni M., Govaere O., Tiniakos D., Burt A.D., Bedossa P., Palmer J., Liu Y.-L., et al. Genome-wide association study of non-alcoholic fatty liver and steatohepatitis in a histologically characterised cohort. J. Hepatol. 2020;73:505–515. doi: 10.1016/j.jhep.2020.04.003. [DOI] [PubMed] [Google Scholar]
- 5.Abul-Husn N.S., Cheng X., Li A.H., Xin Y., Schurmann C., Stevis P., Liu Y., Kozlitina J., Stender S., Wood G.C., et al. A protein-truncating HSD17B13 variant and protection from chronic liver disease. N. Engl. J. Med. 2018;378:1096–1106. doi: 10.1056/NEJMoa1712191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Emdin C.A., Haas M.E., Khera A.V., Aragam K., Chaffin M., Klarin D., Hindy G., Jiang L., Wei W.-Q., Feng Q., et al. A missense variant in Mitochondrial Amidoxime Reducing Component 1 gene and protection against liver disease. PLoS Genet. 2020;16 doi: 10.1371/journal.pgen.1008629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rotman Y., Koh C., Zmuda J.M., Kleiner D.E., Liang T.J., the NASH CRN The association of genetic variability in patatin-like phospholipase domain-containing protein 3 (PNPLA3) with histological severity of nonalcoholic fatty liver disease. Hepatology. 2010;52:894–903. doi: 10.1002/hep.23759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Mann J.P., Pietzner M., Wittemans L.B., Rolfe E.D.L., Kerrison N.D., Imamura F., Forouhi N.G., Fauman E., Allison M.E., Griffin J.L., et al. Insights into genetic variants associated with NASH-fibrosis from metabolite profiling. Hum. Mol. Genet. 2020;29:3451–3463. doi: 10.1093/hmg/ddaa162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kabarra K., Golabi P., Younossi Z.M. Nonalcoholic steatohepatitis: global impact and clinical consequences. Endocr. Connect. 2021;10:R240–R247. doi: 10.1530/EC-21-0048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Stender S., Kozlitina J., Nordestgaard B.G., Tybjærg-Hansen A., Hobbs H.H., Cohen J.C. Adiposity amplifies the genetic risk of fatty liver disease conferred by multiple loci. Nat. Genet. 2017;49:842–847. doi: 10.1038/ng.3855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Anstee Q.M., Day C.P. The genetics of nonalcoholic fatty liver disease: spotlight on PNPLA3 and TM6SF2. Semin. Liver Dis. 2015;35:270–290. doi: 10.1055/s-0035-1562947. [DOI] [PubMed] [Google Scholar]
- 12.Moylan C.A., Pang H., Dellinger A., Suzuki A., Garrett M.E., Guy C.D., Murphy S.K., Ashley-Koch A.E., Choi S.S., Michelotti G.A., et al. Hepatic gene expression profiles differentiate presymptomatic patients with mild versus severe nonalcoholic fatty liver disease. Hepatology. 2014;59:471–482. doi: 10.1002/hep.26661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Govaere O., Cockell S., Tiniakos D., Queen R., Younes R., Vacca M., Alexander L., Ravaioli F., Palmer J., Petta S., et al. Transcriptomic profiling across the nonalcoholic fatty liver disease spectrum reveals gene signatures for steatohepatitis and fibrosis. Sci. Transl. Med. 2020;12 doi: 10.1126/scitranslmed.aba4448. [DOI] [PubMed] [Google Scholar]
- 14.Kleiner D.E., Brunt E.M., Van Natta M., Behling C., Contos M.J., Cummings O.W., Ferrell L.D., Liu Y.-C., Torbenson M.S., Unalp-Arida A., et al. Design and validation of a histological scoring system for nonalcoholic fatty liver disease. Hepatology. 2005;41:1313–1321. doi: 10.1002/hep.20701. [DOI] [PubMed] [Google Scholar]
- 15.Brunt E.M., Kleiner D.E., Wilson L.A., Unalp A., Behling C.E., Lavine J.E., Neuschwander-Tetri B.A., NASH Clinical Research NetworkA list of members of the Nonalcoholic Steatohepatitis Clinical Research Network can be found in the Appendix Portal chronic inflammation in nonalcoholic fatty liver disease (NAFLD): a histologic marker of advanced NAFLD-Clinicopathologic correlations from the nonalcoholic steatohepatitis clinical research network. Hepatology. 2009;49:809–820. doi: 10.1002/hep.22724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Gadd V.L., Skoien R., Powell E.E., Fagan K.J., Winterford C., Horsfall L., Irvine K., Clouston A.D. The portal inflammatory infiltrate and ductular reaction in human nonalcoholic fatty liver disease. Hepatology. 2014;59:1393–1405. doi: 10.1002/hep.26937. [DOI] [PubMed] [Google Scholar]
- 17.Taylor-Weiner A., Pokkalla H., Han L., Jia C., Huss R., Chung C., Elliott H., Glass B., Pethia K., Carrasco-Zevallos O., et al. A machine learning approach enables quantitative measurement of liver histology and disease monitoring in NASH. Hepatology. 2021;74:133–147. doi: 10.1002/hep.31750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Noureddin M. Artificial intelligence in NASH histology: human teaches a machine for the machine to help humans. Hepatology. 2021;74:9–11. doi: 10.1002/hep.31777. [DOI] [PubMed] [Google Scholar]
- 19.Loomba R., Noureddin M., Kowdley K.V., Kohli A., Sheikh A., Neff G., Bhandari B.R., Gunn N., Caldwell S.H., Goodman Z., et al. Combination therapies including cilofexor and firsocostat for bridging fibrosis and cirrhosis attributable to NASH. Hepatology. 2021;73:625–643. doi: 10.1002/hep.31622. [DOI] [PubMed] [Google Scholar]
- 20.Harrison S.A., Wong V.W.-S., Okanoue T., Bzowej N., Vuppalanchi R., Younes Z., Kohli A., Sarin S., Caldwell S.H., Alkhouri N., et al. Selonsertib for patients with bridging fibrosis or compensated cirrhosis due to NASH: results from randomized phase III STELLAR trials. J. Hepatol. 2020;73:26–39. doi: 10.1016/j.jhep.2020.02.027. [DOI] [PubMed] [Google Scholar]
- 21.Friedman J., Hastie T., Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9:432–441. doi: 10.1093/biostatistics/kxm045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kanehisa M., Sato Y., Kawashima M., Furumichi M., Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44:D457–D462. doi: 10.1093/nar/gkv1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.KEGG: Kyoto Encyclopedia of Genes and Genomes KEGG Kyoto Encyclopedia of Genes and Genomes. https://www.genome.jp/kegg/
- 24.Hwang W., Cho Y.-R., Zhang A., Ramanathan M. A novel functional module detection algorithm for protein-protein interaction networks. Algorithms Mol. Biol. 2006;1:24. doi: 10.1186/1748-7188-1-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Peri S., Navarro J.D., Amanchy R., Kristiansen T.Z., Jonnalagadda C.K., Surendranath V., Niranjan V., Muthusamy B., Gandhi T.K.B., Gronborg M., et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 2003;13:2363–2371. doi: 10.1101/gr.1680803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Keshava Prasad T.S., Goel R., Kandasamy K., Keerthikumar S., Kumar S., Mathivanan S., Telikicherla D., Raju R., Shafreen B., Venugopal A., et al. Human protein reference database—2009 update. Nucleic Acids Res. 2009;37:D767–D772. doi: 10.1093/nar/gkn892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Stark C., Breitkreutz B.-J., Reguly T., Boucher L., Breitkreutz A., Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006;34:D535–D539. doi: 10.1093/nar/gkj109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ying K.-C., Lin S.-W. Maximizing cohesion and separation for detecting protein functional modules in protein-protein interaction networks. PLoS One. 2020;15 doi: 10.1371/journal.pone.0240628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Sherman B.T., Hao M., Qiu J., Jiao X., Baseler M.W., Lane H.C., Imamichi T., Chang W. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update) Nucleic Acids Res. 2022;50:W216–W221. doi: 10.1093/nar/gkac194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Huang D.W., Sherman B.T., Lempicki R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
- 31.DAVID Bioinformatics Resources DAVID Bioinformatics Resources. https://david.ncifcrf.gov/home.jsp.
- 32.Davison B.A., Harrison S.A., Cotter G., Alkhouri N., Sanyal A., Edwards C., Colca J.R., Iwashita J., Koch G.G., Dittrich H.C. Suboptimal reliability of liver biopsy evaluation has implications for randomized clinical trials. J. Hepatol. 2020;73:1322–1332. doi: 10.1016/j.jhep.2020.06.025. [DOI] [PubMed] [Google Scholar]
- 33.Arendt B.M., Comelli E.M., Ma D.W.L., Lou W., Teterina A., Kim T., Fung S.K., Wong D.K.H., McGilvray I., Fischer S.E., Allard J.P. Altered hepatic gene expression in nonalcoholic fatty liver disease is associated with lower hepatic n-3 and n-6 polyunsaturated fatty acids. Hepatology. 2015;61:1565–1578. doi: 10.1002/hep.27695. [DOI] [PubMed] [Google Scholar]
- 34.Gerhard G.S., Legendre C., Still C.D., Chu X., Petrick A., DiStefano J.K. Transcriptomic profiling of obesity-related nonalcoholic steatohepatitis reveals a core set of fibrosis-specific genes. J. Endocr. Soc. 2018;2:710–726. doi: 10.1210/js.2018-00122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Haas J.T., Vonghia L., Mogilenko D.A., Verrijken A., Molendi-Coste O., Fleury S., Deprince A., Nikitin A., Woitrain E., Ducrocq-Geoffroy L., et al. Transcriptional network analysis implicates altered hepatic immune function in NASH development and resolution. Nat. Metab. 2019;1:604–614. doi: 10.1038/s42255-019-0076-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Teufel A., Itzel T., Erhart W., Brosch M., Wang X.Y., Kim Y.O., von Schönfels W., Herrmann A., Brückner S., Stickel F., et al. Comparison of gene expression patterns between mouse models of nonalcoholic fatty liver disease and liver tissues from patients. Gastroenterology. 2016;151:513–525.e0. doi: 10.1053/j.gastro.2016.05.051. [DOI] [PubMed] [Google Scholar]
- 37.Angulo P., Kleiner D.E., Dam-Larsen S., Adams L.A., Bjornsson E.S., Charatcharoenwitthaya P., Mills P.R., Keach J.C., Lafferty H.D., Stahler A., et al. Liver fibrosis, but No other histologic features, is associated with long-term outcomes of patients with nonalcoholic fatty liver disease. Gastroenterology. 2015;149:389–397.e10. doi: 10.1053/j.gastro.2015.04.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Kamath B.M., Bason L., Piccoli D.A., Krantz I.D., Spinner N.B. Consequences of JAG1 mutations. J. Med. Genet. 2003;40:891–895. doi: 10.1136/jmg.40.12.891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Zhu C., Kim K., Wang X., Bartolome A., Salomao M., Dongiovanni P., Meroni M., Graham M.J., Yates K.P., Diehl A.M., et al. Hepatocyte Notch activation induces liver fibrosis in nonalcoholic steatohepatitis. Sci. Transl. Med. 2018;10 doi: 10.1126/scitranslmed.aat0344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Yu J., Zhu C., Wang X., Kim K., Bartolome A., Dongiovanni P., Yates K.P., Valenti L., Carrer M., Sadowski T., et al. Hepatocyte TLR4 triggers inter-hepatocyte Jagged1/Notch signaling to determine NASH-induced fibrosis. Sci. Transl. Med. 2021;13 doi: 10.1126/scitranslmed.abe1692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Jachs M., Hartl L., Simbrunner B., Bauer D., Paternostro R., Scheiner B., Schwabl P., Stättermayer A.F., Pinter M., Eigenbauer E., et al. Decreasing von Willebrand factor levels upon nonselective beta blocker therapy indicate a decreased risk of further decompensation, acute-on-chronic liver failure, and death. Clin. Gastroenterol. Hepatol. 2022;20:1362–1373.e6. doi: 10.1016/j.cgh.2021.07.012. [DOI] [PubMed] [Google Scholar]
- 42.Starlinger P., Ahn J.C., Mullan A., Gyoeri G.P., Pereyra D., Alva-Ruiz R., Hackl H., Reiberger T., Trauner M., Santol J., et al. The addition of C-reactive protein and von Willebrand factor to model for end-stage liver disease-sodium improves prediction of waitlist mortality. Hepatology. 2021;74:1533–1545. doi: 10.1002/hep.31838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Györi G.P., Pereyra D., Rumpf B., Hackl H., Köditz C., Ortmayr G., Reiberger T., Trauner M., Berlakovich G.A., Starlinger P. The von Willebrand factor facilitates Model for End-Stage Liver Disease-independent risk stratification on the waiting list for liver transplantation. Hepatology. 2020;72:584–594. doi: 10.1002/hep.31047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Mandorfer M., Schwabl P., Paternostro R., Pomej K., Bauer D., Thaler J., Ay C., Quehenberger P., Fritzer-Szekeres M., Peck-Radosavljevic M., et al. Von Willebrand factor indicates bacterial translocation, inflammation, and procoagulant imbalance and predicts complications independently of portal hypertension severity. Aliment. Pharmacol. Ther. 2018;47:980–988. doi: 10.1111/apt.14522. [DOI] [PubMed] [Google Scholar]
- 45.Horvatits T., Drolz A., Roedl K., Herkner H., Ferlitsch A., Perkmann T., Müller C., Trauner M., Schenk P., Fuhrmann V. von Willebrand factor antigen for detection of hepatopulmonary syndrome in patients with cirrhosis. J. Hepatol. 2014;61:544–549. doi: 10.1016/j.jhep.2014.04.025. [DOI] [PubMed] [Google Scholar]
- 46.Ferlitsch M., Reiberger T., Hoke M., Salzl P., Schwengerer B., Ulbrich G., Payer B.A., Trauner M., Peck-Radosavljevic M., Ferlitsch A. von Willebrand factor as new noninvasive predictor of portal hypertension, decompensation and mortality in patients with liver cirrhosis. Hepatology. 2012;56:1439–1447. doi: 10.1002/hep.25806. [DOI] [PubMed] [Google Scholar]
- 47.La Mura V., Reverter J.C., Flores-Arroyo A., Raffa S., Reverter E., Seijo S., Abraldes J.G., Bosch J., García-Pagán J.C. Von Willebrand factor levels predict clinical outcome in patients with cirrhosis and portal hypertension. Gut. 2011;60:1133–1138. doi: 10.1136/gut.2010.235689. [DOI] [PubMed] [Google Scholar]
- 48.Iwakiri Y. Pathophysiology of portal hypertension. Clin. Liver Dis. 2014;18:281–291. doi: 10.1016/j.cld.2013.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Kikuchi A., Pradhan-Sundd T., Singh S., Nagarajan S., Loizos N., Monga S.P. Platelet-derived growth factor receptor α contributes to human hepatic stellate cell proliferation and migration. Am. J. Pathol. 2017;187:2273–2287. doi: 10.1016/j.ajpath.2017.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Troeger J.S., Mederacke I., Gwak G.-Y., Dapito D.H., Mu X., Hsu C.C., Pradere J.-P., Friedman R.A., Schwabe R.F. Deactivation of hepatic stellate cells during liver fibrosis resolution in mice. Gastroenterology. 2012;143:1073–1083.e22. doi: 10.1053/j.gastro.2012.06.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Wilhelm A., Aldridge V., Haldar D., Naylor A.J., Weston C.J., Hedegaard D., Garg A., Fear J., Reynolds G.M., Croft A.P., et al. CD248/endosialin critically regulates hepatic stellate cell proliferation during chronic liver injury via a PDGF-regulated mechanism. Gut. 2016;65:1175–1185. doi: 10.1136/gutjnl-2014-308325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Tilson S.G., Morell C.M., Lenaerts A.-S., Park S.B., Hu Z., Jenkins B., Koulman A., Liang T.J., Vallier L. Modeling PNPLA3-associated NAFLD using human-induced pluripotent stem cells. Hepatology. 2021;74:2998–3017. doi: 10.1002/hep.32063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Gindin Y., Chung C., Jiang Z., Zhou J.Z., Xu J., Billin A.N., Myers R.P., Goodman Z., Landi A., Houghton M., et al. A fibrosis-independent hepatic transcriptomic signature identifies drivers of disease progression in primary sclerosing cholangitis. Hepatology. 2021;73:1105–1116. doi: 10.1002/hep.31488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Patro R., Duggal G., Love M.I., Irizarry R.A., Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods. 2017;14:417–419. doi: 10.1038/nmeth.4197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Robinson M.D., McCarthy D.J., Smyth G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Kingma D.P., Ba J. 2014. Adam: A Method for Stochastic Optimization.http://arxiv.org/abs/1412.6980 [Google Scholar]
- 57.Krizhevsky A., Sutskever I., Hinton G.E., Pereira F., Burges C.J.C., Bottou L., Weinberger K.Q. In: Advances in neural information processing systems 25. Pereira F., Burges C., Bottou L., Weinberger K., editors. Curran Associates, Inc; 2012. Imagenet classification with deep convolutional neural networks; pp. 1097–1105. [Google Scholar]
- 58.He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition. arXiv. 2015 doi: 10.48550/arXiv.1512.03385. Preprint at. [DOI] [Google Scholar]
- 59.Sagawa S., Koh P.W., Hashimoto T.B., Liang P. Distributionally robust neural networks for group shifts: on the importance of regularization for worst-case generalization. arXiv. 2019 doi: 10.48550/arXiv.1911.08731. Preprint at. [DOI] [Google Scholar]
- 60.Heinze-Deml C., Meinshausen N. Conditional variance penalties and domain shift robustness. arXiv. 2017 doi: 10.48550/arXiv.1710.11469. Preprint at. [DOI] [Google Scholar]
- 61.Pouryahya M., Taylor-Weiner A., Pokkalla H., Pethia K., Elliott H., Glass B., Gindin Y., Han L., Jia C., Camargo M., et al. Integration of machine learning-based histopathology and hepatic transcriptomic data identifies genes associated with portal inflammation and ductular proliferation as predictors of disease progression in advanced fibrosis due to NASH. Hepatology. 2020;72:358A. [Google Scholar]
- 62.Carrasco-Zevallos O., Taylor-Weiner A., Pokkalla H., Pouryahya M., Biddle-Snead C., Han L., Huss R., Juyal D., Shanis Z., Pedawi A., et al. AI-based histologic measurement of NASH (AIM-NASH): a drug development tool for assessing clinical trial end points. J. Hepatol. 2021;75:S254. [Google Scholar]
- 63.Levina E., Bickel P. Proceedings Eighth IEEE International Conference on Computer Vision. Vol. 2. ICCV; 2001. The Earth Mover’s distance is the Mallows distance: some insights from statistics; pp. 251–256. [Google Scholar]
- 64.Huang D.W., Sherman B.T., Lempicki R.A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37:1–13. doi: 10.1093/nar/gkn923. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
-
•
The histopathology data collected for this study is maintained by PathAI to preserve patient confidentiality and the proprietary image analysis. Access to histopathology features will be granted upon reasonable request from academic investigators without relevant conflicts of interest for non-commercial use who agree not to distribute the data. Access requests can be made to the lead contact.
-
•
Not all original code can be made publicly available. The code for cell- and tissue-type model training, inference, and feature extractions are not disclosed. To safeguard PathAI’s intellectual property, access requests for such code will not be considered. The NASH histology cell- and tissue-type models that generated the HIFs used in this investigation are described in previous publications17,19 and and in unpublished data (J.S.I. et al, unpublished data). The source code for all downstream data analyses and figure generation in this work are publicly available and can be downloaded from GitHub: https://github.com/Path-AI/nash-rna-seq-cell-reports-medicine.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
-
•
Gilead Sciences shares anonymized individual patient data upon request or as required by law or regulation with qualified external researchers based on submitted curriculum vitae and reflecting non conflict of interest. The request proposal must also include a statistician. Approval of such requests is at Gilead Science’s discretion and is dependent on the nature of the request, the merit of the research proposed, the availability of the data, and the intended use of the data. Data requests should be sent to the lead contact.







