Multiple mechanisms for the gut microbiome contributing to the pathogenesis of nonalcoholic fatty liver disease (NAFLD) have been implicated. Here, we aim to investigate the contribution and potential application for altered bile acids (BA) metabolizing microbes in NAFLD by post hoc analysis of whole metagenome sequencing (WMS) data. The discovery cohort consisted of 86 well-characterized patients with biopsy-proven NAFLD and 38 healthy controls. Assembly-based analysis was performed to identify BA-metabolizing microbes. Statistical tests, feature selection, and microbial coabundance analysis were integrated to identify microbial alterations and markers in NAFLD. An independent validation cohort was subjected to similar analyses. NAFLD microbiota exhibited decreased diversity and microbial associations. We established a classifier model with 53 differential species exhibiting a robust diagnostic accuracy [area under the receiver-operator curve (AUC) = 0.97] for detecting NAFLD. Next, eight important differential pathway markers including secondary BA biosynthesis were identified. Specifically, increased abundance of 7α-hydroxysteroid dehydrogenase (7α-HSDH), 3α-hydroxysteroid dehydrogenase (baiA), and bile acid-coenzyme A ligase (baiB) was detected in NAFLD. Furthermore, 10 of 50 BA-metabolizing metagenome-assembled genomes (MAGs) from Bacteroides ovatus and Eubacterium biforme were dominant in NAFLD and interplayed as a synergetic ecological guild. Importantly, two subtypes of patients with NAFLD were observed according to secondary BA metabolism potentials. Elevated capability for secondary BA biosynthesis was also observed in the validation cohort. These bacterial BA-metabolizing genes and microbes identified in this study may serve as disease markers. Microbial differences in BA-metabolism and strain-specific differences among patients highlight the potential for precision medicine in NAFLD treatment.
Keywords: gut microbiota, NAFLD, secondary BA synthesis, whole metagenome sequencing data
Nonalcoholic fatty liver disease (NAFLD) has become one of the leading causes of liver disease worldwide, with the global prevalence estimated to be 24% (1). NAFLD is expected to be the number one cause for cirrhosis in the United States within a decade (2).
The pathogenic mechanism of NAFLD remains unclear. The current multiple-hit hypothesis is that NAFLD is a consequence of a myriad of factors acting in a parallel and synergistic manner in individuals with genetic predisposition (3). Factors such as insulin resistance, central obesity, environmental or nutritional factors, and gut microbiota, as well as genetic and epigenetic factors, are linked to its pathogenesis (2, 4, 5).
Recently, the cross talk between the gut and the liver is increasingly recognized, and many studies have reported dysregulated gut microbiota in patients with NAFLD (6–12). There are several potential mechanisms for the gut microbiota to influence NAFLD development. These effects are mediated by microbial components and metabolites, such as lipopolysaccharide, alcohol, and bile acids (BA) (13).
BA not only facilitate the digestion and absorption of fatty foods as detergent, but they also act as important signaling molecules via nuclear receptors, such as farnesoid X receptor (FXR) and G protein-coupled BA receptor (GPBAR1 or TGR5), to modulate hepatic BA synthesis and glucose and lipid metabolism. Recently, we observed suppressed BA-mediated FXR signaling in NAFLD liver and intestine, which is in harmony with increased secondary BA production. Furthermore, using 16S rRNA data, we observed elevated abundance of secondary BA metabolism-related bacteria and pathways in the gut microbiome of both patients with NAFLD and high-fat diet rat models (14). However, the 16S rRNA sequencing data have limited resolution which does not allow the identification of the species or an accurate functional analysis (15).
Whole metagenome sequencing (WMS) allows us to achieve a satisfactory resolution of the microbiome and trace microbes with specific functions (16, 17). Earlier we have used the WMS data to characterize the gut microbiota in patients with NAFLD with and without advanced fibrosis and identified 37 differential bacterial species, among which the abundance of Escherichia coli and Bacteroides vulgatus was increased in patients with advanced fibrosis and its association with microbial metabolites (9, 12, 18–20). WMS data were also used to study the associations between the gut microbiome and steatosis in obesity (19, 21). However, microorganisms with potential metabolizing BA and their contributions to the pathogenesis of NAFLD are less investigated. Here, we took advantage of WMS data and reported the structural and functional characteristics of the gut microbiome in NAFLD and its association with BA metabolism.
Data Information and Preprocessing
Discovery dataset.
The NAFLD datasets and relevant metadata (Sequence Read Archive, PRJNA373901) were described previously (9) comprising 86 patients with biopsy-proven NAFLD. All subjects provided written informed consent and the study protocol was approved by Institutional Review Board (approval number: UCSD IRB11298). The healthy control dataset was from PRJEB6070 (22), with 38 healthy individuals with BMI < 25. These subjects were chosen because of similar age and sex ratio compared with patients with NAFLD to effectively reduce bias (23) (Table 1 and Supplemental Table S1; all Supplemental material is available at https://doi.org/10.6084/m9.figshare.14807655.v2).
Table 1.
Characteristics of the cohort included in this study
Discovery Cohort |
Validation Cohort |
NAFLD | Control | NAFLD | Control | |
Sample size | 86 | 38 | 10 | 11 |
Age, yr | 51.56 ± 1.37 | 55.71 ± 2.07 | 53.7 ± 3.65 | 56.18 ± 2.00 |
BMI, kg/m2 | 31.16 ± 0.59 | 23.03 ± 0.30 | 34.1 ± 1.2 | 23.19 ± 0.28 |
Sex, F%/M% | 44.19/55.81 | 50.00/50.00 | 20.00/80.00 | 63.63/36.36 |
AST, U/L | 41.05 ± 3.23 | NA | 30.8 ± 2.4 | NA |
LDL cholesterol, mg/dL | 111.76 ± 3.95 | NA | 112.1 ± 3.67# | NA |
HDL cholesterol, mg/dL | 48.53 ± 1.72 | NA | 43.7 ± 0.86# | NA |
Triglycerides, mg/dL | 160.30 ± 10.31 | NA | 247.9 ± 11.20 | NA |
Total cholesterol, mg/dL | 190.02 ± 4.68 | NA | 204.9 ± 3.67 | NA |
Validation dataset.
Ten middle-aged subjects with NAFLD (25) (PRJNA420817) were recruited to a diet trial, and the initial baseline data before diet intervention were selected in this study. Eleven healthy subjects from MetaHit Project (24) (PRJEB1220) with similar age and sex ratio were chosen as controls (Table 1 and Supplemental Table S1).
Although this is a post hoc analysis of public datasets, we performed consistent sequencing data preprocessing. Briefly, the KneadData (http://huttenhower.sph.harvard.edu/kneaddata) tool was used to ensure that the data consisted of high-quality microbial reads free from contaminants. In detail, low-quality reads were removed using the below settings in Trimmomatic (SLIDINGWINDOW:4:15 MINLEN:75 LEADING:10 TRAILING:10). The remaining reads were mapped to the human genome (hg38) by bowtie2 (26), and the matching reads were removed as contaminant reads from the host.
Gene-Based Taxonomic and Functional Profiling of Gut Microbiota
MetaPhlAn2 (27) was used to identify the composition of gut microbial community and to assess the abundance of the prokaryotes within each sample. Species that failed to exceed 0.01% relative abundance in at least 20% samples were excluded.
The functional profiling of gut microbiome was determined by the HMP Unified Metabolic Analysis Network (HUMAnN2) (28). In brief, high-quality metagenomic reads were mapped to the pangenomes of species identified with MetaPhlAn2, and these pangenomes have been pre-annotated by UniRef90 families. Reads failed to map to a pangenome were aligned to UniRef90 by translated search with DIAMOND (29). Hits to UniRef90 were weighted according to alignment quality, sequence length, and coverage. In this study, enzyme abundance was quantified by regrouping (summed) according to Enzyme Commission (EC) number and pathway abundance by regrouping genes in pathways against Kyoto Encyclopedia of Genes and Genomes (KEGG) database.
Identification of Genes Required for Secondary BA Synthesis
To identify genes that encode enzymes catalyzing secondary BA synthesis, hidden Markov models (HMMs) of BA-related genes were constructed. Secondary BA synthesis mainly involves 1) deconjugation, 2) oxidation and epimerization, and 3) multistep 7α-dehydroxylation. Enzymes participating in these processes are bile salt hydrolase (BSH), hydroxysteroid dehydrogenase (HSDH), and enzymes required in the multi-step 7α-dehydroxylation, including bile acid-inducible operon (bai) 3α-hydroxysteroid dehydrogenase (baiA), bile acid-coenzyme A ligase (baiB), 7α-hydroxy-3-oxo-D4-cholenoic acid oxidoreductase (baiCD), bile acid 7α-dehydratase (baiE), bile acid coenzyme A transferase/hydrolase (baiF), 7β-hydroxy-3-oxochol-24-oyl-CoA 4-desaturase (baiH), and bile acid 7β-dehydratase (baiI) (30). Representative protein sequences of target enzymes were obtained from Integrated Microbial Genomes (IMG) database (31). High-quality sequences were selected and aligned in Clustal Omega (32) before they were used to construct HMMs on full-length proteins via hmmbuild in HMMER (3.1b2) (33). Model seed sequences were realigned to the model using hmmalign (default mode) before rebuilding models based on the obtained alignments until both model length and relative entropy per position were constant. Subsequently, all protein sequences in nonredundant gene catalog were screened (hmmsearch) for candidate protein sequences and sequences with hmmscore > lower quartile score and e-value less than 10−5 were identified as potential secondary BA synthesis-associated genes.
Assembly-Based Microbial Genomes
For functional analysis of the microbial genomes, we performed bin-based microbial genome assembly with the WMS data, including de novo assembly and nonredundant human gut gene catalog construction, coabundance clustering, and determination of metagenome-assembled genomes (MAGs), MAG-augmented assembly, and taxonomic annotation.
De novo assembly and nonredundant human gut gene catalog construction.
High-quality paired-end reads from each sample were used for de novo assembly with Megahit (34) into contigs of at least 500-bp length. Genes were predicted on the contigs with MetaGeneMark (35). A nonredundant gene catalog related to NAFLD was constructed with CD-HIT (36) using a sequence identity cutoff of 0.95, with a minimum coverage cutoff of 0.9 for the shorter sequences, and 11,348,567 microbial genes were contained.
Coabundance clustering and determination of MAG.
Bowtie2 was used to align high-quality reads to the nonredundant gene catalog. Aligned results were randomly sampled and downsized to 15 million per sample (FR-173, FR-719, FR-730, SRR4275396, SRR4275459, SRR4275469, and SRR4275470 were excluded for not enough reads) to adjust for sequencing depth and technical variability. The soap.coverage script (available at http://soap.genomics.org.cn/down/soap.coverage.tar.gz) was used to calculate gene-length normalized base counts, and the gene abundance profiling was calculated as the average abundance of 30 times of repeated sampling. All the genes were clustered into MAG using MSPminer (37) based on their abundance with default parameters.
MAG-augmented assembly and taxonomic annotation.
We performed augmented assembly for target MAG. Briefly, the MAG- and sample-specific reads were derived by aligning all high-quality reads to the MAG gene contigs with Burrows-Wheeler Aligner (0.7.17) (38), followed by de novo assembly with SPAdes (3.13.0) (39) using k-mers from 21 to 55. CVtree3.0 web server (40) was used to identify the taxonomy of the MAGs, which applies a composition vector to perform phylogenetic analysis.
Statistical Analysis
Differential feature identification and classification model construction.
Compositional features and functional features that are present in at least 20% of the samples and with average relative abundance over 0.01% in each group were selected for further differential analysis. Differential features were identified by two-tailed Mann–Whitney U tests adjusted by Benjamini–Hochberg. Features with a false discovery rate (FDR) value < 0.05 (FDR values < 0.1 for species) were identified as differential features. Then, differential compositional and functional feature profiles were used to build random forest (RF) model using Scikit-Learn v.0.19.0 in Python (v.3.7.4), which has been shown to outperform other learning algorithms for microbiome data (41). Fivefold cross-validation was performed to avoid overfitting issue, and we used an ensemble of 501 estimator trees and Gini entropy to evaluate the quality of a split at each node of a tree. Considering the imbalanced samples of control and NAFLD group, the parameter “class_weight=‘balanced’” was set. Feature importance were estimated via Gini importance and then the best models were rebuilt by adding features according to their importance ranks. Area under the receiver-operator curve (AUC) was used to measure the accuracy of the models.
Microbial coabundance analysis.
SparCC (42) was performed to construct compositionality-corrected microbial coabundance network, which is capable of estimating correlation values from compositional data. Correlations were calculated with 100 refining interactions, after which statistical significance of each correlation was estimated with 1,000 permutations. Only correlations with P < 0.05 were included in downstream analysis, and those interactions with magnitudes >0.4 were included in the “core community.” The importance of species in the community was calculated using Hyperlink-Induced Topic Search (HITS) algorithms in Python package “networkx.” The networks were then visualized with Cytoscape (43) and module analysis was performed with ModuLand in Cytoscape.
Other statistics.
Analysis of similarities (ANOSIM) was performed based on distance matrix for statistical comparisons of samples between groups or subtypes. P value was calculated using 9,999 permutations. P < 0.05 indicates significant difference. Hetamap was plotted via “pheatmap” package in R, and features were clustered based on Euclidean distance by “ward.D.” Differential features among healthy, normal-BA, and high-BA groups were identified with Dunn’s tests adjusted by Benjamini–Hochberg, and features with FDR values <0.05 were determined as significant differential features.
The main workflow of this study is summarized in Supplemental Fig. S1.
Gut Microbiota Alterations between Patients with NAFLD and Healthy Controls
WMS data from 86 well-characterized patients with biopsy-proven NAFLD and 38 healthy controls with similar characteristics (Table 1 and Supplemental Table S1) were chosen to study the structural and functional differences in gut microbiota between patients with NAFLD and healthy controls. And we have confirmed that sex or age distribution did not account for the observed microbial differences in this study (Supplemental Fig. S2).
Compositional Changes in NAFLD Gut Microbiota
We determined the microbial compositions of NAFLD and healthy controls using WMS data. Bacteroidetes, Firmicutes, Actinobacteria, and Proteobacteria were the dominant phyla that collectively account for around 90% proportions in both groups (Supplemental Fig. S3A). NAFLD individuals had lower bacterial diversity than healthy controls (Supplemental Fig. S3B). Besides, significant compositional differences were observed between these two groups (Supplemental Fig. S3C).
To identify microbial markers that may distinguish NAFLD from healthy subjects, differential species were determined with Mann–Whitney U tests. Fifty-three species with FDR values < 0.1 were identified as differential species (Fig. 1 and Supplemental Table S2). Among these, 11 species were dominant in patients with NAFLD, which mainly belong to Clostridia class, including Eubacterium siraeum, Clostridium bolteae, and E. coli, and to Bacteroidia class, including B. ovatus and Bacteroides stercoris. On the other hand, 42 species significantly reduced in patients with NAFLD were mainly of Bacteroidia class, including Bacteroides dorei and Alistipes shahii, and of Clostridia class, including Eubacterium eligens, Eubacterium hallii, and Faecalibacterium prausnitzii. In addition, RF model constructed with differential species achieved an AUC of 0.97 (accuracy: 0.91; sensitivity: 0.98; specificity: 0.75; precision: 0.89, and F1 score: 0.94) to detect patients with NAFLD from controls (Supplemental Fig. S4A). The RF model was further validated by an additional independent dataset, achieving an AUC of 0.96 (accuracy: 0.92; sensitivity: 0.90; specificity: 0.93; precision: 0.90, and F1 score: 0.90; Supplemental Fig. S4B).
Figure 1.
The differential species distinguishing patients with NAFLD from healthy controls. Differential species were selected by statistical tests (two-tailed Mann–Whitney U tests adjusted by Benjamini–Hochberg). Furthermore, the importance of the species that distinguish patients with NAFLD from healthy controls was evaluated with random forest model. The heatmap shows the relative abundance (log-transformed) of the differential species in the NAFLD and the healthy groups, the size of the dots is proportional to the importance and the color shows the FDR value (−log-transformed). “+” indicates increased abundance, whereas “−” indicates decreased abundance in NAFLD. FDR, false discovery rate; NAFLD, nonalcoholic fatty liver disease.
Ecological Structural Changes in NAFLD Gut Microbiota
Furthermore, at whole community level, microbial coabundance analysis was performed to investigate potential changes in ecological structure. Generally, the healthy microbial community based on 38 individuals was composed of 167 nodes and 1,613 edges, whereas the NAFLD microbial community based on 86 individuals was composed by 141 nodes and 1,776 edges. Then, we examined the “core community” (correlations with magnitudes > 0.4) of healthy and NAFLD groups, respectively. Considerable discrepancies existed in the “core community” of healthy and NAFLD groups (Fig. 2, A and B). In detail, the healthy “core community” was more complex, with 162 species and 565 interactions, compared with the NAFLD community, with 81 species and 166 interactions. Since the sample size was smaller in the healthy control group than in the NAFLD group, the complexity of the gut microbiota in healthy controls was likely underestimated. And the NAFLD community was separated into eight isolated components, an indication of unstable microbial community. Among them, the major component harbored most species from Clostridia class, such as BA production bacteria, C. bolteae (node number 78), C. clostridioforme (node number 138) with increased proportion in NAFLD, whereas species from Bacilli class were dominant in the second major component. Besides, species with increased abundance in patients with NAFLD (circle nodes in Fig. 2B) were dominant in the “core community” and positively associated with each other. Then, we looked into the top 20 hub species of “core community”, respectively. Ten of them were common in both group, such as C. bolteae, C. hathewayi, Dorea longicatena, and Flavonifractor plautii, which may play a role as the “keystone” to sustain the homeostasis (Fig. 2, C and D).
Figure 2.
Microbiota “core community” in healthy controls (A and C) and patients with NAFLD (B and D). The microbial interactions were calculated using SparCC with 100 refining interactions, and P value of each interaction is approximated with 1,000 permutations. Only interactions with P < 0.05 and interactions with magnitudes >0.4 were included in the “core community.” The species are color coded according to the class they belong to, and the node size indicates the hub score in their community. Subnetwork of top 20 hub nodes in healthy community (C) and NAFLD community (D) is also plotted. The nodes indicated by species name were common species in both subnetworks. NAFLD, nonalcoholic fatty liver disease.
Functional Changes in NAFLD Gut Microbiota
Microbial functional profiles were determined at pathway level using HUMAnN2, and 92 differential pathways were identified between the NAFLD and the healthy groups (Supplemental Table S3). Similarly, we identified eight important pathway features (Fig. 3A) to build RF model (AUC = 0.83; accuracy: 0.82; sensitivity: 0.93; specificity: 0.58; precision: 0.83, and F1 score: 0.88) that could distinguish patients with NAFLD from healthy subjects (Fig. 3B). Most pathways were mainly represented in NAFLD microbiota than in controls. These pathways included secondary BA synthesis (ko00121; Fig. 3C), benzoate degradation (ko00362), biosynthesis of ansamycins (ko01051), and oxidative phosphorylation (ko00190; Supplemental Fig. S5).
Figure 3.
The differential pathway markers distinguishing patients with NAFLD from healthy controls. Differential pathways were selected by two-tailed Mann–Whitney U tests adjusted by Benjamini–Hochberg. Pathways with FDR values < 0.05 were included. Important differential pathway markers were then identified with random forest model and with the top eight important pathways, the model achieved the highest AUC value. A: the importance of pathways evaluated in NAFLD with the random forest model. B: the AUC curve of random forest model with the top eight important pathways. C: the abundance of secondary A biosynthesis pathway (ko00121) in the healthy and the NAFLD groups. Values are means ± SD. *FDR < 0.05. AUC, area under the receiver-operator curve; FDR, false discovery rate; NAFLD, nonalcoholic fatty liver disease.
Microbial Genes and Genomes Associated with Secondary BA Synthesis
The fact that the secondary BA biosynthesis pathway was significantly elevated in NAFLD (Fig. 3C) prompted us to examine the relevant BA metabolism enzymes encoded by the microbiome. Taking advantage of the WMS data, we were able to quantify the gene abundance and to map these genes to specific microbial genomes.
Bacterial genes directly involved in secondary BA synthesis catalyze the deconjugation, the oxidation and epimerization, or the multistep 7α-dehydroxylation reactions (Fig. 4A). Protein sequences of target enzymes were collected from Integrated Microbial Genomes (IMG) database (Fig. 4A) (30). High-quality protein sequences were selected to construct HMMs to identify potential BA metabolism enzymes.
Figure 4.
The abundance of the bacterial genes related to secondary bile acid synthesis. A: genes responsible for secondary bile acid biosynthesis can be grouped into three categories: 1) deconjugation, 2) oxidation and epimerization, and 3) multistep 7α-dehydroxylation. B: gene abundance in healthy and NAFLD groups. Differences were identified by two-tailed Mann–Whitney U tests adjusted by Benjamini–Hochberg. ***FDR < 0.001. baiA, 3α-hydroxysteroid dehydrogenase; baiB, bile acid-coenzyme A ligase; baiCD, 7α-hydroxy-3-oxo-D4-cholenoic acid oxidoreductase; baiE, bile acid 7α-dehydratase; baiF, bile acid coenzyme A transferase/hydrolase; baiG, primary bile acid transporter; baiH, 7β-hydroxy-3-oxochol-24-oyl-CoA 4-desaturase; baiI, bile acid 7beta-dehydratase; BSH: bile salt hydrolase; FDR, false discovery rate; HSDH: hydroxysteroid dehydrogenase; NAFLD, nonalcoholic fatty liver disease.
The data (Fig. 4B) showed that genes encoding 7α-hydroxysteroid dehydrogenase (7α-HSDH), BSH, baiA, baiB, baiCD, and baiH were relatively more abundant than baiE, baiF, and baiI. Importantly, significantly increased abundance of 7α-HSDH, baiA, and baiB were observed in NAFLD group compared with controls. These data were consistent with the pathway analysis results and confirmed the increased secondary BA production in NAFLD (14).
To identify the BA metabolism microbial genomes, the MAG analysis was performed. Prevalent genes in the nonredundant gene catalog that presented in more than five samples were binned into 252 MAGs, which were considered to represent distinct microbial genomes. Among these, 50 MAGs that contain at least one gene encoding BSH, HSDH or bile acid inducible operons (Table S4) were defined as BA-metabolizing MAG. To obtain relatively complete microbial genomes, we re-assembled these 50 MAGs using high quality reads mapped to genes in each MAG.
Among these, 10 MAGs exhibited significantly increased abundance in NAFLD, whereas 18 MAGs were reduced in NAFLD (Fig. 5A). Among the 10 MAGs elevated in NAFLD, six MAGs belong to Bacteroides (order Bacteroidales), including B. vulgatus, B. ovatus, and B. stercoris. Other MAG genomes were assigned as Eubacterium rectale and E. biforme (order Clostridiales). BA-metabolizing MAGs with reduced abundance in NAFLD are mainly from Ruminococcus bromii, D. longicatena, and B. dorei. Furthermore, we explored the species’ contributions of pathways in via HUMAnN2 and found that the pathway secondary bile acid biosynthesis were mainly encoded by E. eligens (48.3%) and B. vulgatus (26.2%; Supplemental Fig. S6). This is consistent with the increased BA-metabolizing MAGs belonging to species B. vulgatus and E. eligens.
Figure 5.
BA metabolism MAG in NAFLD and healthy subjects. A: MAG exhibiting differential abundance between healthy controls and patients with NAFLD. Differential MAGs were selected by two-tailed Mann–Whitney U tests adjusted by Benjamini–Hochberg. MAG with FDR values < 0.1 were included. Values are means ± SE. Interaction network for BA metabolism MAG community in healthy controls (B) and patients with NAFLD (C). Microbial interactions were calculated using SparCC with 100 refining interactions, and P value of each interaction is approximated with 1,000 permutations. Only interactions with P < 0.05 were included. BA, altered bile acids; FDR, false discovery rate; MAG, metagenome-assembled genome; NAFLD, nonalcoholic fatty liver disease.
For a better understanding of the BA metabolism microbial community, microbial coabundance analysis was performed with BA-metabolizing MAGs. In contrast to the situation where more interactions existed in healthy group on whole community level, we found that the subnetwork of BA-metabolizing MAG was more complex with considerable interactions in NAFLD than in controls (164 and 100 edges, respectively; Fig. 5, B and C). In addition, most MAGs with higher proportions in patients with NAFLD were hub nodes in both healthy and NAFLD BA-metabolizing communities and were positively interacted, such as Bacteroides sp. MAG001, B. vulgatus MAG007, B. ovatus MAG026, B. vulgatus MAG030, and Bacteroides xylanisolvens MAG117. These are likely “house-keeping” species for BA metabolism. In contrast, B. stercoris MAG003, an MAG not included in the healthy network, was highly elevated in NAFLD, ranked high in the NAFLD network, and positively interacted with the “house-keeping” BA metabolism species. Similarly, E. biforme MAG036 and MAG089, which exhibited the lowest hub score in healthy network, ranked the highest in NAFLD network.
In general, the observed species were represented by multiple MAGs. Here, R. bromii was represented by seven MAGs, and E. eligens was represented by five MAGs. However, only one of the seven R. bromii MAGs was significantly increased in NAFLD group, whereas four others showed decreased abundance (Supplemental Table S5). Situations were similar in B. vulgatus (two of three increased) and E. rectale (one increased and two decreased). Unexpectedly, multiple MAGs of the same species were distributed in different modules both in healthy and NAFLD communities (Supplemental Table S6). Apparently, these observations indicate that strains within the same species may function differently.
Different BA Metabolism Potentials among NAFLD Microbiota and Emergence of Two Subtypes of NAFLD: High-BA versus Normal BA Subtype
Although the average abundances of the secondary BA metabolism pathway and related genes were increased in NAFLD, we noticed that the abundances exhibited a broad distribution among patients with NAFLD (Fig. 3C and Fig. 4B). Many of the NAFLD microbiota exhibited BA metabolism potentials similar to those of healthy controls. Based on the abundance of three differential BA-metabolizing genes (7α-HSDH, baiA, and baiB), patients with NAFLD were clustered into two subtypes: normal-BA subtype comprising 45 patients and high-BA subtype comprising 37 patients (Fig. 6A), which was not related to the disease severity (P = 0.7). The abundances of the three marker genes were all significantly higher in high-BA subtype but were similarly represented between normal-BA subtype and healthy control group (Fig. 6B). In addition, we performed the principal component analysis (PCA) based on the entire differential microbial enzymes and found that the normal-BA subtype and the healthy control group exhibited closer distance, as compared with the high-BA group (Supplemental Fig. S7). In further characterization of the microbial profiles of the patterns of the normal-BA and high-BA groups, we identified three species (Supplemental Table S7), 68 enzymes (Supplemental Table S8), and 16 pathways (Supplemental Table S9) that could distinguish the normal-BA subtype from the high-BA subtype, and, at the same time, could distinguish NAFLD from the healthy group. Based on the relative abundance of these differential features, the study subjects were clustered into three groups consistent with their BA metabolism potentials. Features were also clustered into two groups (Supplemental Fig. S8). One group (including species Flavonifractor plautii, enzymes 2-dehydropantoate 2-reductase and glutamate 5-kinase, and pathway glycosaminoglycan degradation, etc.) exhibited elevated abundance in normal-BA subtype and reduced abundance in high-BA subtype. The other group (including species E. coli and R. bromii, enzymes glycerol dehydrogenase and agmatinase, pathway citrate cycle, and phosphotransferase system, etc.) exhibited an opposite distribution among the study groups.
Figure 6.
Subgroups of patients with NAFLD with different abundances of the secondary BA synthesis genes. A: patients with NAFLD were clustered into two subgroups: normal-BA subgroup and high-BA subgroup according to the abundances of three differential secondary BA synthesis genes. B: comparison of the abundances of three differential secondary BA synthesis genes among healthy control, normal-BA, and high-BA groups. They were all significantly increased in high-BA subgroup, but were not different between normal-BA subgroup and healthy group (Dunn’s tests adjusted by Benjamini–Hochberg). ***FDR < 0.001. BA, altered bile acids; baiA, 3α-hydroxysteroid dehydrogenase; baiB, bile acid-coenzyme A ligase; FDR, false discovery rate; NAFLD, nonalcoholic fatty liver disease; ns, not significant; 7α-HSDH, 7α-hydroxysteroid dehydrogenase.
Elevated Secondary BA Synthesis Capability in the Validation Cohort of NAFLD
Similar analyses were performed with the validation dataset. The secondary BA synthesis genes 7α-HSDH, BSH, baiA, baiB, baiCD, baiF, and baiH were relatively more abundant than baiE and baiI. Importantly, significantly increased abundance of most secondary BA synthesis genes was observed in NAFLD compared with controls (Supplemental Fig. S9).
As for BA metabolism microbial genomes, we identified 13 MAGs, each carrying at least one gene encoding BSH, HSDH, or bai operon. Among these, nine MAGs exhibited a trend of increased abundance in NAFLD. Consistent with the discovery cohort, these nine MAGs belonged to B. vulgatus, and R. bromii (Supplemental Table S10). Statistical significance was not achieved for the increased abundances of the MAGs, likely due to the small sample size.
In this study, we identified the structural and functional differences in gut microbiota between NAFLD and healthy subjects, at the resolutions of gene, species, and strain. The current study using public WMS data underpinned the role of BA metabolism microbiome in NAFLD and potentially identified two microbiota-derived subtypes of NAFLD.
Mouse studies and fecal microbiota transplantation (FMT) experiments have provided evidence for a causal role of gut microbiota in NAFLD development (44). Human studies have compared gut microbiota communities between patients of NAFLD, NASH, NAFLD with cirrhosis, and controls to explore microbial signatures as noninvasive diagnostic tools (9, 12, 44). Earlier characterization of the gut microbiota in patients with NAFLD with and without advanced fibrosis identified 37 differential bacterial species (9). Oh et al. (12) identified 19 discriminatory species that could accurately predict NAFLD with cirrhosis with an AUC of 0.91. Here, we established a classifier model with 53 differential species exhibiting a robust diagnostic accuracy (AUC = 0.97) for detecting NAFLD.
Several hypothesis have provided mechanistic insights into the microbiota-related metabolites in NAFLD development, including lipopolysaccharide, alcohol, and BAs (13, 20, 45). We and others have reported elevated secondary BA production in NAFLD (14, 46). In our previous study (14), we observed much increased secondary BAs in NAFLD serum and consistently, an elevated taurine metabolizing microbiota, an indication of increased BA metabolism in the gut. However, we did not observe any significant change in the abundance of those microbes that directly metabolize BA (i.e., microbes encoding BSH, 7α-HSDH, and 7α-dehydroxylase), likely because the 16S rRNA sequencing approach was not able to provide a sufficient resolution for functional analysis (17). With the advantage of WMS data, the current study was able to provide convincing evidence at a satisfactory resolution, that is, at the levels of species and strain (16, 17, 47). Importantly, increased abundance of secondary BA metabolism genes and microbes were identified in NAFLD (Figs. 4 and 5) and similar observations were made with an independent validation cohort (Supplemental Fig. S9). As secondary BAs are potent antagonistic ligands for FXR, data presented here are a strong support for the hypothesis that elevated secondary BA synthesis by the microbiota contributes to NAFLD etiology (14, 48).
Although on average patients with NAFLD exhibited elevated BA metabolism microbiota, and higher serum deoxycholic acid (DCA) (secondary BA) when compared with healthy controls, our data showed that elevated BA metabolism microbiota was not a unanimous phenomenon in NAFLD. More than half of the patients with NAFLD (45 out of 82) had a microbiota with normal BA metabolism potential. Based on BA metabolism potentials, our patients with NAFLD can be clustered into two subtypes (Fig. 6). This indicates that BA-related pathomechanism does not apply to many patients with NAFLD, in line with the current multihit hypothesis (3). Besides the difference in BA metabolism potentials, these two subtypes of the gut microbiota also exhibit different abundances in other genes, pathways, and bacterial species. It is interesting to note that NAFLD microbiota with higher BA metabolism potentials also exhibited elevated representation of E. coli, a potent alcohol producer (6, 49), suggesting that the gut microbiota may impact NAFLD pathogenesis through multiple mechanisms in the same patient. BA-based therapies such as obeticholic acid has been shown to improve NASH (50). However, the response rates to obeticholic acid (OCA) in improvement of one stage of fibrosis in the FLINT trial was 35% versus 19% in placebo (51). It is plausible that patients with NAFLD with altered BA subtype may be more likely to respond to BA-based therapies and those with a normal-BA subtype should receive an alternate strategy paving the way for a microbiome-based precision medicine tool in NASH therapeutics. Since this is the proof-of-concept study only with metagenome data, more evidence with large sample size and metabolic data are needed to validate the BA-derived NAFLD subtypes in clinical.
One interesting observation in this study and others (16) is that many strains of the same species are functionally different. Specifically, different strains of B. ovatus were clustered into different functional modules (modules 0, 2, 4 in healthy communities and modules 3, 4, 6 in NAFLD communities; Supplemental Table S6). It is also interesting to note that only one of the four observed strains of B. ovatus was significantly increased in NAFLD group. Similar observations were reported for F. prausnitzii (16, 52) and E. coli (53, 54), suggesting the genomic variability within a microbial species (55). Some of the microbiome studies based on 16S rRNA platforms may need a reevaluation because of this genomic variability.
It was interesting to note that 10 BA-metabolizing bacterial strains, including B. stercoris, E. biforme, and R. bromii, were elevated and were dominant strains in NAFLD microbiota (Fig. 5A). These BA-metabolizing strains belong to two different phylum. Zhao et al. (16) proposed a concept in gut microbiota that a group of species that “exploit the same class of environmental resources in a similar way” may be considered as a “guild” in ecology (56) and members of a guild do not necessarily share taxonomic similarity, but they co-occur when adapting to the changing environment. Similarly, the 10 BA-metabolizing strains may act as a synergetic guild to promote the secondary BA production in the NAFLD microbial community. There were more positive correlations among these 10 strains in NAFLD community than in healthy community (Fig. 5, B and C), indicating elevated capabilities of secondary BA production and intensified competition among these secondary BA producers within the microbial guild of NAFLD. It is likely that these strains are responsible for elevated secondary BA production in NAFLD, contributing to NAFLD pathogenesis (14). Among these 10 strains, MAG036, MAG089, and MAG003 with increased abundance and the highest network importance in NAFLD may act as the “keystone” species (57), and therefore, represent potential targets for intervention.
At the whole community level, the NAFLD gut microbiota exhibited significantly reduced diversity compared with the healthy controls. In addition, much reduced correlations among the members of the NAFLD gut microbiota were observed. With less strains and sparse correlations, the gut microbial community in NAFLD is relatively weak and unstable. Similarly, reduced biodiversity were reported in the gut of obesity (58). It is postulated that long-term dietary habit is the major cause for the altered gut microbiota (59). The biodiversity disaster in the gut of humans demands immediate attention. The restoration of the gut microbial diversity may, at the same time, prevent or cure many of the microbiota-related diseases including NAFLD.
Our retrospective study has limitations. Since the current study is a post hoc analysis of public WMS data with a small sample size and limited patient information, the influence of demographic and environmental variables such as the accompanying obesity and diabetes on the microbiome remain to be explored. In addition, for ethical reasons, biopsy-proven healthy liver controls were not available. Instead, self-reported healthy volunteers were included as controls. Thus, the causality for microbiota-mediated secondary BA biosynthesis in NAFLD pathogenesis needs further investigation. For similar reasons, additional validation is needed for the proposed use of microbiome signature for NAFLD diagnosis.
In summary, we identified specific genes and bacterial strains responsible for elevated secondary BA production in NAFLD. These genes and strains may serve as novel therapeutic targets for microbiome-based high-BA subtype of NAFLD. Our study also revealed many other microbial characteristics of the NAFLD that demand attention such as the much reduced diversity and the ecological guild in the gut of NAFLD. These findings strongly support our hypothesis that elevated secondary BA synthesis contributes to the development of NAFLD.
The datasets supporting the conclusions of this article are available in the NCBI’s Sequence Read Archive repository (https://www.ncbi.nlm.nih.gov/bioproject/), under study accession numbers PRJNA373901, PRJNA420817, PRJEB1220, and PRJEB6070.
Supplemental Tables S1–S10 and Supplemental Figs. S1–S9: https://doi.org/10.6084/m9.figshare.14807655.v2.
This work was supported by National Natural Science Foundation of China 81774152 (to R.Z.), 81770571 (to L.Z.), 82000536 (to N.J.); National Postdoctoral Program for Innovative Talents of China BX20190393 (to N.J.); China Postdoctoral Science Foundation 2019M663252 (to N.J.) and 2019M651568 (to D.W.); Fundamental Research Funds for the Central Universities 19ykzd01 (to L.Z.) and 20kypy07 (to N.J.); the Guangzhou Science and Technology Plan Projects 201803040019 (to P.L.); Guangdong Province “Pearl River Talent Plan” Innovation and Entrepreneurship Team Project 2019ZT08Y464 (to L.Z.); and the National Key Clinical Discipline of China; and Funds from the University at Buffalo Community of Excellence in Genome, Environment and Microbiome (GEM) (to L.Z.). R.L. receives funding support from National Institute of Environmental Health Sciences (5P42ES010337), National Center for Advancing Translational Sciences (5UL1TR001442), and National Institute of Diabetes and Digestive and Kidney Diseases (R01DK106419).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
No conflicts of interest, financial or otherwise, are declared by the authors.
R.L., R.Z., and L.Z. conceived and designed research; N.J., R.L., Z.-H.Y., D.W., S.F., and R.B. analyzed data; N.J., D.W., R.Z., and L.Z. interpreted results of experiments; N.J. and D.W. prepared figures; N.J. and L.Z. drafted manuscript; N.J., R.L., Z.-H.Y., D.W., S.F., P.L., R.Z., and L.Z. edited and revised manuscript; N.J., R.L., Z.-H.Y., D.W., S.F., R.B., P.L., R.Z., and L.Z. approved final version of manuscript.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets supporting the conclusions of this article are available in the NCBI’s Sequence Read Archive repository (https://www.ncbi.nlm.nih.gov/bioproject/), under study accession numbers PRJNA373901, PRJNA420817, PRJEB1220, and PRJEB6070.