Analysis of machine learning based integration to identify the crosslink between inflammation and immune response in non-alcoholic fatty liver disease through bioinformatic analysis

Runzhi Yu; Yiqin Huang; Xiaona Hu; Jie Chen

doi:10.1016/j.heliyon.2024.e32783

. 2024 Jun 29;10(14):e32783. doi: 10.1016/j.heliyon.2024.e32783

Analysis of machine learning based integration to identify the crosslink between inflammation and immune response in non-alcoholic fatty liver disease through bioinformatic analysis

Runzhi Yu ^a, Yiqin Huang ^b, Xiaona Hu ^c,¹, Jie Chen ^d,^⁎,¹

PMCID: PMC11301246 PMID: 39108890

Abstract

Background

The prevalence of nonalcoholic fatty liver disease (NAFLD) is a major form of chronic liver disease. This study aimed to scrutinize the diagnostic biomarkers of NAFLD and their correlation with the immune microenvironment through bioinformatic analysis.

Methods

To identify genes associated with nonalcoholic fatty liver disease (NAFLD), we obtained microarray datasets (GSE63067 and GSE89632) from the Gene Expression Omnibus (GEO) database. Machine learning techniques such as Support Vector Machine (SVM), Least Absolute Shrinkage and Selection Operator (LASSO) and Random Forest (RF) were used to identify key genes. We performed gene ontology analysis to identify the driver pathways of NAFLD. External datasets (merging GSE48452, GSE66676 and GSE135251) were used to validate the identified genes and confirm protein levels by Western blotting. The CIBERSORT algorithm and immune-related techniques, such as ssGSEA, were used to assess the level of infiltration of different immune cell types and their functions. Finally, Spearman's analysis confirmed the relationship between pivotal genes and immune cells.

Results

Hub genes (BBOX1, FOSB, NR4A2, RAB26 and SOCS2) were identified as potential biomarkers. This study demonstrates that these hub genes are significantly dysregulated in NAFLD, suggesting that they may be useful as diagnostic indicators and possible targets for treatment. Also covered are their possible effects on inflammation, immune cell activation, and liver damage in NAFLD. A better understanding of the intricate relationship between metabolic inefficiency, immunological response, and liver pathology in NAFLD may be gained from this work, which can lead to the development of new diagnostic tools and clinical treatments.

Conclusion

The current study identified BBOX1, FOSB, NR4A2, RAB26 and SOCS2 as important diagnostic biomarkers for NAFLD. The study highlights the important function of immune cell infiltration in developing NAFLD. Their findings provide valuable molecular biological insights into the development of NAFLD and may lead to novel therapeutic strategies for treating this disease.

Keywords: Non-alcoholic fatty liver disease, Prognosis, Immune infiltration, Machine learning, Bioinformatics

1. Introduction

Nonalcoholic Fatty Liver Disease (NAFLD) is a broad term that encompasses a wide range of liver conditions, including nonalcoholic fatty liver disease (simple fatty liver) and nonalcoholic steatohepatitis (NASH). Thus, it represents a range of different liver diseases. It has been observed that NAFLD has affected a significant portion of the population. NAFLD is a liver disease that affects a significant portion of the global population, with prevalence rates ranging from 10 % to 48 % in different countries [1,2]. Recent studies have shown a sudden surge in the prevalence of NAFLD of up to 29.2 % [3]. The course of NAFLD usually includes a sequence of stages, namely, nonalcoholic simple fatty liver, nonalcoholic steatohepatitis, progressive liver fibrosis and eventually cirrhosis [4]. As a liver-related manifestation of the metabolic syndrome, NAFLD has become more prevalent due to the increasing prevalence of obesity, diabetes mellitus, hyperlipidemia and cardiovascular disease. Although NAFLD is a prevalent disease, it is difficult to diagnose early because of the lack of specific symptoms and biomarkers. Although liver biopsy is considered the gold standard for diagnosis and classification, it poses several challenges and uncertainties in assessment. Lifestyle interventions are the main treatment option for NAFLD because of the lack of precise molecular and immunological studies to guide targeted therapy. Therefore, exploring reliable diagnostic biomarkers for NAFLD is important for timely diagnosis and intervention to improve clinical outcomes. In the context of nonalcoholic fatty liver disease (NAFLD), metabolic dysfunction-associated steatohepatitis (MASH) as a key factor in the development and progression of steatohepatitis. Diabetes, insulin resistance, obesity, dyslipidemia, and other forms of metabolic dysfunction may exacerbate NAFLD symptoms and lead to liver damage and inflammation. Like other types of NAFLD, MASH is characterized by inflammation of the liver and hepatic steatosis, the formation of fat in the liver. Metabolic syndrome, including obesity, insulin resistance, and dyslipidemia, as well as increased liver enzymes and an enlarged liver (hepatomegaly), may be seen in patients with MASH. Imaging investigations like ultrasound or MRI may reveal signs of fat accumulation and inflammation in the liver; however, a histological study is still necessary to confirm the diagnosis of MASH.

Next-generation sequencing and other new technologies have led to significant advances in diagnosing NAFLD and identifying therapeutic biomarkers. Rapid advances in bioinformatics have led to new methods for predicting NAFLD. However, traditional methods, such as differential gene expression analyses (DEGs), may lead to losing intrinsic biological information when identifying pivotal genes. Furthermore, although multi-biomarker approaches can improve diagnostic accuracy, complex genetic architecture and imperfect methods limit their robustness [[5], [6], [7]]. Numerous predictive models for NAFLD have proven inefficient and inaccurate, falling short of effective screening and early detection requirements. However, the rise of machine learning algorithms, such as random forest (RF) and support vector machine-recursive feature elimination (SVM-REF), offers a promising avenue for biomarker discovery and highly accurate prognostic risk modelling [8,9]. To identify potential genes involved in the development of NAFLD, we combined weighted gene co-expression network analysis (WGCNA) with differential gene expression analysis (DEGs). Subsequently, we identified five best-characterized genes for predicting NAFLD using machine learning algorithms, including LASSO, RF and SVM- REF, and evaluated their predictive performance using receiver operating characteristic (ROC) curves. To better understand how these genes contribute to NAFLD, we also performed functional enrichment analysis, including GO, KEGG, DO and GSEA, and immune-related algorithms, such as ssGSEA, to assess the level of infiltration of different immune cell types and their functions. Our findings highlight the presence of five highly efficient diagnostic genes in patients with NAFLD, suggesting their promise as new targets for the diagnosis and prognosis of NAFLD and potentially leading to improved clinical outcomes.

2. Materials and methods

2.1. Acquisition of NAFLD datasets and removal of batch effects

We downloaded NAFLD-related gene expression profile data from the Gene Expression Omnibus database (GEO) for bioinformatics analysis. We selected two microarray datasets, GSE63067 (GPL570, control: 7, NAFLD: 11) and GSE89632 (GPL14951, control: 24, NAFLD: 39), for subsequent analyses. Three additional microarray datasets, GSE48452 (GPL11532, control: 41, NAFLD: 32), GSE66676 (GPL6244, control: 34, NAFLD: 33) and GSE135251 (GPL18573, control: 10, NAFLD: 206) were used as independent validation datasets. Supplementary Table 1 provides more information about these datasets. When converting probe IDs to gene symbols, we calculated the average of the probe positions as the gene expression level if multiple probe positions represented a gene. Next, we converted probe IDs to gene symbols using the annotation files for the corresponding platforms and eliminated probes that did not correspond to gene symbols. For further analysis, we converted the microarray data to logarithmic values. To account for batch effects, we used the Combat algorithm in the R package “sva " [10] to integrate and eliminate this effect to create a combined dataset.

2.2. Identification of differential expression genes

To determine the differential expression (DEGs) between NAFLD samples and control samples, we performed differential expression analysis using the R package “limma” with set criteria of | logFC| > 0.5 and p-value <0.05. We then used volcano and heat maps to show significantly up- and down-regulated genes.

2.3. Weighted gene Co-expression network analysis

We used the R package “WGCNA” to perform a weighted gene co-expression network analysis to identify possible functional modules that might delineate the biological functions of NAFLD samples [11]. We ensured that the combined gene matrix did not include abnormal samples and excluded any samples when deemed necessary. Co-expression modules were identified based on weighted correlation neighbour-joining matrices and cluster analysis, where genes with similar expression patterns were grouped. A topological overlap matrix (TOM) was generated from the adjacency matrix, and genes were classified into modules based on their dissimilarity in the topological overlap matrix (TOM). To achieve this, the cut height, minimum module size and soft threshold power were set to 0.25, 50 and 24, respectively (scale-free R2 = 0.9). Subsequently, we calculated gene importance (GS) and module membership (MM) and determined Spearman correlation coefficients and corresponding p-values between control and NAFLD groups and functional modules using Spearman's method. Finally, we selected the pivotal modules and the corresponding genes for in-depth analysis.

2.4. Functional enrichment analysis and protein-protein interaction network

We used genomic enrichment analysis (GSEA) with the c2. cp.kegg.v11.0 to identify potentially significant functional terms between the NAFLD and control groups' symbols genome as a reference. We considered q-values (false discovery rate, FDR) < 0.05 and P < 0.05 (11) for the enriched gene set to be significant. We characterized the up-regulated pathways as having normalized enrichment scores (NES) greater than zero, while the down-regulated pathways had NES less than zero. With the above analysis, we identified overlapping candidate genes between DGEs and modular genes and represented them using a Venn diagram produced by Venn Diagrams software [2]. To explore the functions and pathways of overlapping candidate genes, we performed enrichment analysis of gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and disease ontology (DO) using the “cluster Profiler” and “DOSE” R packages [12]. In addition, we constructed a protein-protein interaction (PPI) network using the online mapping tool “STRING.3″, which allowed us to explore the interactions of overlapping candidate genes. Finally, we constructed a co-expression network using the R package “graph”, which allowed us to study the correlation strength between selected genes.

2.5. Machine learning screening for optimal feature genes

We employed machine learning techniques like LASSO, SVM-RFE, and RF to predict disease outcomes and identify key prognostic factors. LASSO regression facilitates simultaneous variable selection and regularization to improve the predictive accuracy of statistical models [12]. SVM is a supervised learning method for regression and classification that uses the RFE algorithm to prevent overfitting and produce interpretable results [13,14]. To identify the most relevant trait genes, we applied the SVM-RFE technique to discover the genome with the highest discriminatory potential. RF methods rooted in classification trees often address various predictive challenges [15]. We determined the optimal number of trees by selecting the tree with the lowest error rate and best stability among 1–500 trees. After selecting the appropriate parameters, we constructed an RF model and selected key genes for NAFLD diagnosis based on a decreasing accuracy approach (Gini coefficient). By using common screening criteria in the RF algorithm [16], we selected the top 10 important genes (importance >2) as novel genetic markers to predict the prognosis of NAFLD. Ultimately, we accurately identified the best-characterized genes by examining overlapping genes in multiple machine-learning techniques.

2.6. Assessment of hallmark gene sets and immune cell infiltration

To assess the variation of specific genomes concerning other genes within the specimen, we used the CIBERSORT algorithm - a deconvolution method using gene expression data [17]. Our goal was to detect the presence of 22 immune cells in normal and NAFLD samples, and we accomplished this task using this algorithm. We then used box plots to visually represent the immune cell composition of patients with different immune patterns. We applied the Wilcoxon rank sum test to assess differences in immune cell proportions, with statistical significance determined as P < 0.05. In addition, we used the ssGSEA algorithm [18] to determine the relative levels of the 50 signature gene sets (h.all.v7.5.1. symbols.gmt) in the combined dataset. Finally, we calculated Spearman's correlations between the best-characterized genes and the set of 50 marker genes.

2.7. GSEA and correlation analysis of optimal feature genes

In addition, we used GSEA to determine the biological importance of the best-characterized genes, using the “c2. cp.kegg.v11.0. symbols” gene set from the Molecular Characterization Database4 as a reference. For each analysis, we performed 1000 permutations of gene sets to obtain normalized enrichment scores, with significant enrichment defined as FDR <0.05. In addition, we used Pearson correlation analysis to calculate correlations between expression levels of the best-characterized genes.

2.8. Western blotting analysis

We generated cell lysates in RIPA lysate (Beyotime, P0013B) containing 1 mM PMSF and 1 mM phosphatase inhibitor (Abcam, ab201112) on ice to extract total proteins. The extracted proteins were mixed with SDS loading buffer and denatured at 100 °C for 15 min. After centrifugation, we subjected the protein supernatant to SDS-polyacrylamide gel electrophoresis and transferred it to a nitrocellulose membrane. To detect the immunoreactive band signal, we incubated the primary antibody at 4 °C overnight and then incubated the secondary antibody at room temperature for 60 min. Finally, we visualized the immunoreactive bands using the ECL system and captured them using the BioRad imaging system.

2.9. Statistical analysis

We used R software (version 4.1.1) and GraphPad Prism (version 8.0.2) for data processing, statistical analysis, and plotting. We used the Wilcoxon rank sum test or Student's t-test to compare the two groups and Pearson or Spearman correlation tests to assess the correlation between variables. All P values were two-tailed, and we considered statistical significance at P < 0.05. It is worth noting that P values below 0.05 were considered statistically significant results.

3. Results

3.1. Screening of DEGs in NAFLD

Two microarray datasets (GSE63067 and GSE89632) from the GEO database were utilized, and 31 control and 50 NAFLD samples were obtained. To ensure accurate data analysis, batch effects between datasets were removed (Fig. 1A). Subsequently, 596 DEGs were identified (Supplementary Table 2), including 346 down-regulated genes and 250 up-regulated genes, shown in the heat map (Fig. 1B). Several genes were significantly up-regulated, including RAB26, ISM1, INHBE, TNFSF10 and CYP7A1, while genes such as FOSB and HBEGF were significantly down-regulated (Fig. 1C). In addition, we performed a GSEA analysis of KEGG to compare functional and biological pathways between NAFLD and control samples and found that immune-related biological functions and processes were significantly enriched in NAFLD, such as steroid hormone biosynthesis (Fig. 1D). In addition, ascorbate and aldehyde metabolism, base excision repair, and interconversion of pentose and glucuronide were significantly enriched in the NAFLD group (Fig. 1E). In contrast, the control group showed significant enrichment in the AGE-RAGE signalling pathway, IL-17 signalling pathway and TNF signalling pathway in diabetic complications (Fig. 1F).

Fig. 1 — Identification of DEGs and functional annotation. (A) Gene expression level statistics of the integrated dataset after removed batch effect (B) The heatmap of NAFLD-related DEGs express levels: blue-low gene expression; red-high gene expression. (C) The volcano plot of NAFLD-related DEGs expression. (D) Ridgeline plot of GSEA results. (E,F) The main signalling pathways that are significantly enriched in the NAFLD group (E), and in the control group (F).

3.2. WGCNA and screening of hub modules

Using 31 control and 50 NAFLD samples, a co-expression network was constructed using WGCNA, resulting in 15,647 genes. The samples were clustered, and the abnormal ones were removed according to a threshold value (Fig. 2A). By evaluating the scale-free R2 and average connectivity, we determined a soft power threshold of 24 (Fig. 2B). We then merged the strongly associated modules based on the clustering height limit 0.25, identifying seven modules presented on the clustering tree (Fig. 2C). To determine the correlation between modules, we performed an analysis and found no significant association between them (Fig. 2D). In addition, we demonstrated the reliability of module delineation by intra-module transcriptional correlation analysis and found no substantial association between them (Fig. 2E). Correlations between ME values and clinical characteristics were examined using the positive correlation method, and strong correlations were found between the green, brown, blue and red modules and NAFLD (Fig. 2F, Supplementary Fig. 1). A total of 3578 candidate genes in these four modules were used for subsequent analysis (Supplementary Table 4).

Fig. 2 — Weight correlation network analysis. (A) Sample clustering dendrogram with tree leaves corresponding to individual samples. (B) Analysis of the scale-free fit index (R 2) and the mean connectivity for various soft-thresholding powers. (C) The original and combined modules under the clustering tree with cut-off values height of 0.25. (D) Collinear heat map of module feature genes. Red color indicates a high correlation, blue color indicates the opposite results. (E) Clustering dendrogram of module feature genes. (F) Heat map of module-trait correlations. Red represents positive correlations and blue represents negative correlations.

3.3. Functional enrichment analysis of overlapping DEGs

In summary, 319 overlapping genes were identified from the differentially expressed genes and hub genes of the green, brown, blue, and red modules (Supplementary Table 5), referred to as candidate signature genes (Fig. 3A). We performed GO, KEGG, and DO analyses to elucidate the identified genes' potential biological functions and enrichment pathways. GO analysis revealed three categories: biological process (BP), cellular component (CC), and molecular function (MF). We found that candidate signature genes were significantly enriched in multiple BP categories, including regulation of chemokine production, positive regulation of inflammatory response, and regulation of inflammatory response. In addition, we observed significant enrichment in various CC categories, such as phosphatidylinositol 3-kinase complex and muscle myosin complex. The MF category showed significant enrichment, where DNA-bound transcriptional activator activity, RNA polymerase II specificity, 1-phosphatidylinositol-3-kinase regulator activity and phosphatidylinositol 3-kinase regulator activity were the enriched terms (Fig. 3B).

Fig. 3 — Identification and functional enrichment analyses of overlapping candidate genes. (A) Venn diagram showed the intersection of DEGs and module genes of WGCNA. (B–F) GO (B), KEGG (C–E). (F) Protein-Protein Interaction (PPI) network of overlapping candidate genes. (G) The co-expression network showing correlation intensity of hub genes from overlapping candidate genes.

Regarding the KEGG enrichment analysis, the identified genes were associated with various pathways, such as transcriptional dysregulation of cancer, osteoclast differentiation and cytokine-cytokine receptor interactions (Fig. 3C–E). In addition, DO analysis revealed that candidate signature genes were mainly enriched in atherosclerotic cardiovascular disease, atherosclerosis and hepatitis B (Supplementary Fig. 2). These findings suggest that patients with NAFLD exhibit changes in multiple dimensions of the immune system and may share pathological processes with other autoimmune diseases. In addition, we examined the protein-protein interaction (PPI) network of candidate signature genes through the String website and generated a PPI network. This network highlighted the important functions of these genes in the development of NAFLD (Fig. 3F and G).

3.4. Integrated LASSO, SVM-REF and RF algorithm for screening hub markers

To avoid duplication, we used three different machine learning algorithms to detect potential signature genes. After LASSO analysis, an initial library of 319 candidate signature genes was generated (Fig. 4A and Supplementary Table 6), and we screened 13 signature genes as diagnostic markers for NAFLD. In addition, using the SVM-REF algorithm and a 5-fold cross-validation of candidate signature genes, we identified 193 signature genes (Fig. 4B and Supplementary Table 7). For the RF algorithm, we identified the top 27 signature genes with importance greater than 2, such as FOS, FOSB, CYR61, GADD45B, ABCG8, and BBOX1 (Fig. 4C and D). Next, we obtained feature genes from the three machine learning algorithms and identified a total of five best feature genes by taking their intersection, including BBOX1, FOSB, NR4A2, RAB26, and SOCS2. These genes have the potential to serve as diagnostic markers for NAFLD and may play a key role in the progression of NAFLD (Fig. 4E).

Fig. 4 — Three machine learning algorithms were integrated to identify the optimal feature genes. (A) LASSO coefficient profiles of the candidate optimal feature genes and the optimal lambda was determined when the partial likelihood deviance reached the minimum value. Each coefficient curve in the left picture represents a single gene. The solid vertical lines in the right picture represent the partial likelihood deviance, and the number of genes corresponding to the lowest point of the cure is the most suitable for LASSO. (B) The SVM-RFE algorithm was used to further candidate optimal feature genes with the highest accuracy and lowest error obtained in the curves. The x-axis shows the number of feature selections, and the y-axis shows the prediction accuracy. (C) Relative importance of overlapping candidate genes calculated in random forest (Top 10 genes importance >2). Importance indexes on the x-axis and genetic variables are plotted on the y-axis. (D) Random forest for the relationships between the number of trees and error rate. The x-axis represents the number of decision trees and the y-axis is the error rate. (E) Venn diagram showing the seven optimal feature genes shared by LASSO, Random Forest, and SVM-REF algorithms.

3.5. The validation of hub biomarkers

3.5.1. BBOX1 (butyrobetaine (gamma), 2-oxoglutarate dioxygenase 1)

Fatty acid oxidation is an essential metabolic process, and BBOX1 encodes an enzyme involved in carnitine production. It is believed that changes in carnitine production and dysregulation of fatty acid metabolism both have a role in the development of non-alcoholic fatty liver disease (NAFLD). Although there is little proof linking BBOX1 to immune response pathways, lipokines inflammatory mediators produced by dysregulated lipid metabolism in non-alcoholic fatty liver disease can influence the activity of immune cells.

3.5.2. FOSB (FosB proto-oncogene, AP-1 transcription factor subunit)

FOSB, a member of the AP-1 transcription factor family, controls gene expression in response to various stimuli, including metabolic stress and inflammation. Inflammation and fibrosis of the liver are symptoms of non-alcoholic fatty liver disease (NAFLD), which may have its origins in AP-1 activity dysregulation. Immune and inflammatory response genes, including those encoding cytokines, chemokines, and cell adhesion molecules, are regulated by FOSB. The activation and activity of immune cells, including macrophages and T cells, in several organs, including the liver, may be modulated by it.

3.5.3. NR4A2 (nuclear receptor subfamily 4 group a member 2, also known as Nurr1)

NR4A2 controls oxidative stress, inflammation, and metabolism. It is a nuclear receptor. In non-alcoholic fatty liver disease (NAFLD), NR4A2 may regulate transcriptional activities that impact inflammation, hepatic lipid metabolism, and cell death. NR4A2 has a role in controlling how immune cells are made and how they function. It helps regulate immunological responses in several organs, including the liver, by influencing the activity of macrophages, dendritic cells, and T cells in response to inflammatory stimuli.

3.5.4. RAB26 (RAB26, member RAS oncogene family)

The RAS oncogene family includes RAB26, an enzyme that regulates lipid metabolism and the transport of intracellular vesicles. Numerous metabolic diseases, including NAFLD, have been linked to abnormalities in RAB26 expression. Inflammation and immune cell function in NAFLD may be influenced by RAB26's action in intracellular trafficking and lipid metabolism, even if its precise role in immune response pathways is still poorly understood.

3.5.5. SOCS2 (suppressor of cytokine signalling 2)

Inflammation and insulin resistance are two of the several cytokine signalling pathways that SOCS2 inhibits. The pathophysiology of non-alcoholic fatty liver disease (NAFLD) and metabolic syndrome has been linked to dysregulation of SOCS2 expression. SOCS2 regulates the activation and function of immune cells by blocking cytokine signalling pathways, including those mediated by interleukins and interferons. In NAFLD, inflammation and abnormal immunological responses may be aided by SOCS2 expression that is dysregulated.

To avoid redundancy, we investigated the expression levels of the first five signature genes in 50 NAFLD and 31 normal samples. Interestingly, the expression levels of BBOX1 and RAB26 were significantly elevated in NAFLD samples compared to controls (Fig. 5A–D; all P < 0.05), whereas the expression levels of FOSB, NR4A2 and SOCS2 were significantly decreased in NAFLD samples (Fig. 5B, C, E; all P < 0.05). In addition, we performed ROC curve analysis to quantitatively assess the diagnostic and predictive potential of the best-characterized genes (Fig. 5F). The AUC values of the ROC curves were as follows: BBOX1 (0.875, Fig. 5G), FOSB (0.906, Fig. 5H), NR4A2 (0.887, Fig. 5I), RAB26 (0.919, Fig. 5J), and SOCS2 (0.905, Fig. 5K). These results suggest that these signature genes can accurately estimate the progression of NAFLD and have a high diagnostic value.

Fig. 5 — Verification of expression and diagnostic efficacy in predicting NAFLD progression of optimal feature genes. (A–E) Box plots showing the expression of BBOX1 (A), FOSB (B), NR4A2 (C), RAB26 (D) and SOCS2 (E) in control and NAFLD samples. Statistic tests: Wilcoxon rank-sum test. (F–K) Roc curves (F) estimating the diagnostic performance of BBOX1 (G), FOSB (H), NR4A2 (I), RAB26 (J), and SOCS2 (K).

To evaluate the diagnostic efficacy of our biomarkers, we validated their performance in an external validation set (GSE135251, control: 10, NAFLD: 206) that also involved liver tissue sequencing. Of the five best-characterized genes, only FOSB, NR4A2, RAB26, and SOCS2 showed significant differential expression between samples in this dataset (Supplementary Fig. 3). To ensure the accuracy and reliability of our results, we integrated three external validation sets to obtain a comprehensive external validation set consisting of 271 NAFLD samples and 85 normal samples. Before analysis, we also standardized the GSE48452, GSE66676, and GSE135251 datasets (Supplementary Fig. 4). As shown in Fig. 6A–E, the differential expression of the five best-characterized genes in the external validation set was consistent with the differential expression in the training set (all P < 0.05). In addition, the external validation dataset exhibited high AUC values: BBOX1 (AUC: 0.734), FOSB (AUC: 0.751), NR4A2 (AUC: 0.709), RAB26 (AUC: 0.670), and SOCS2 (AUC: 0.652).

(Fig. 6F–K). Furthermore, with the diagnostic bar graph, we could illustrate how the expression levels of these five biomarkers contribute to the clinical diagnosis of NAFLD. By combining the scores of each gene, we could calculate the likelihood of a person being diagnosed with NAFLD (Fig. 6L).

3.6. Correlation between potential biomarkers and gene set enrichment analysis

First, after correlation analysis, we found that BBOX1 showed a positive correlation with the level of RAB26, while FOSB showed a positive correlation with the expression of NR4A2 and SOCS2 but a negative correlation with RAB26 (as shown in Fig. 7A). These results suggest that the identified signature genes have significant functional similarity. Subsequently, we performed a GSEA functional analysis of these five potential biomarkers and identified several enrichment pathways. In particular, the high BBOX1 subgroup was significantly associated with pathways such as amino acid-tRNA biosynthesis, basic transcription factors, the Fanconi anaemia pathway, a carbon pool of folate, and the biosynthesis of ubiquinone and other terpenoid-quinones (as shown in Fig. 7B). On the other hand, the low BBOX1 subgroup showed significant enrichment in pathways such as African trypanosomiasis, apoptosis-multiple species, cocaine addiction, IL-17 signalling pathway, and pertussis (as shown in Supplementary Fig. 5A). Similarly, the high FOSB subgroup was significantly associated with African trypanosomiasis, apoptosis-polymorphism, IL-17 signalling pathway, NF-kappa B signalling pathway and TNF signalling pathway (Fig. 7C). In contrast, the low FOSB subgroup showed significant enrichment in amino acid tRNA biosynthesis, basic transcription factors, chemokine signalling pathway, Fanconi anemia pathway and carbon pool of folate, among other pathways enrichment (as shown in Supplementary Fig. 5B). significantly enriched (as shown in Supplementary Fig. 5B). In addition, the high NR4A2 subgroup was significantly associated with allograft rejection, C-type lectin receptor signalling pathway, glycosaminoglycan biosynthesis-chondroitin sulfate/dermatan sulfate, leishmaniasis, niacin and nicotinamide metabolism (as shown in Fig. 7D), while the low NR4A2 subgroup showed significant enrichment in nicotine addiction, olfactory transduction, phototransduction, sphingolipid metabolism, taurine and hypotaurine metabolic pathways (as shown in Supplementary Fig. 5B). enrichment (as shown in Supplementary Fig. 5C). In addition, the high RAB26 subgroup was significantly associated with carbon pools for aminoacyl-tRNA biosynthesis, basic transcription factors, Fanconi anemia pathway, folate and selenium compound metabolism (as shown in Fig. 7E), while the low RAB26 subgroup showed significant enrichment in apoptosis-polymorphism, cocaine addiction, calcium reabsorption regulated by endocrine and other factors, graft-versus-host disease and IL-17 signalling pathways showed significant enrichment (as shown in Supplementary Fig. 5C) showed significant enrichment (as shown in Supplementary Fig. 5D). Finally, the high SOCS2 subgroup was significantly associated with African trypanosomiasis, apoptosis-multiple diseases, IL-17 signalling pathway, Legionnaires' disease and NF-kappa B signalling pathway (as shown in Fig. 7F), while the low SOCS2 subgroup showed significant enrichment in aminoacyl-tRNA biosynthesis, essential transcription factors, Fanconi anemia pathway, carbon pool of folate and thermogenesis (as shown in Supplementary Fig. 5E shown).

Fig. 7 — (A) Correlation analysis between potential biomarkers. (B–F) GSEA functional analysis of potential biomarkers (B, BBOX1, C, FOSB, D, NR4A2, E, RAB26, F, SOCS2).

3.7. Hallmark gene sets and immune cell infiltration

We used the CIBERSORT algorithm to compare immune cell infiltration and marker gene sets between NAFLD and control samples. Our results (see Fig. 8A and B) shows that monocytes and resting mast cells are significantly upregulated in NAFLD samples relative to control samples, while neoplastic B cells and activated dendritic cells are significantly downregulated. We also investigated the correlation between our identified core genes (BBOX1, FOSB, NR4A2, RAB26 and SOCS2) and immune cell content. Unfortunately, there was no statistically significant correlation between NR4A2 expression and the content of 22 immune cells (Fig. 8E); expression of BBOX1 had a significant positive correlation with resting mast cells, macrophage M2, and T-cell γδ, and a negative correlation with Tregs, neutrophils, and activated mast cells (Fig. 8C) expression of FOSB had a significant positive correlation with resting mast cells, Tregs, monocytes, neutrophils, and macrophages M0 and M1, while there was a negative correlation with activated dendritic cells, CD4 memory-activated T cells, γδ T cells, plasma cells, and macrophages M2 (Fig. 8D). We found a significant positive correlation between RAB26 expression and several immune cell types, including γδ T cells, resting mast cells, plasma cells and macrophage M2, while there was a strong negative correlation with monocytes, macrophage M0, Tregs, follicular helper T cells, neutrophils and activated mast cells (Fig. 8F). Finally, SOCS2 expression was significantly positively correlated with activated mast cells, follicular helper T cells, neutrophils, Tregs and monocytes, whereas it was strongly negatively correlated with macrophage M2 and γ-δ T cells (Fig. 8G). Our results suggest that these core genes may play a role in disease progression and regulate the immune microenvironment by regulating Tregs, monocytes, and neutrophils. To investigate whether there were differences in the enrichment of marker gene groups between the NAFLD and control groups, we used the ssGSEA algorithm to analyze the significance of the differences between the two groups for 50 marker gene groups. Our results, shown in Fig. 9A, indicate that several marker gene groups exhibited significant differences, including KRAS signalling-rising, peroxisome, bile acid metabolism, IL2-STAT5 signalling, angiogenesis, UV response, inflammatory response, epithelial-mesenchymal transition, MYC-target-V2, early estrogen response, apoptosis, IL6-JAK -STAT3 signalling, TGF-β signalling, hypoxia, and TNFα signalling-via NFKB. Our results indicate that these marker gene sets are overactivated in the NAFLD group compared to the control group. Furthermore, we observed that the five best-characterized genes are strongly associated with most marker gene sets (Fig. 9B).

Fig. 8 — Visualization of immune cell infiltration. (A) The relative proportions of 22 immune cells types between control samples and NAFLD samples. Panel (B) representative boxplot shows the differences of infiltrated immune cells between control samples and NAFLD samples. Statistics tests: Wilcoxon rank-sum text. (P < 0.05*; P < 0.01**; P < 0.001***; ns, no significance). (C–G) Correlation between immune cells and optimal feature genes BBOX1 (C), FOSB (D), NR4A2 (E), RAB26 (F) and SOCS2 (G); the larger the dots, the stronger the correlation. The color of the dots represents the P-value; the greener the color, the lower the P-value.

Fig. 9 — Analysis of hallmark gene sets. (A) The specific distribution of the 50 hallmark gene sets in NAFLD and control samples. (B) Correlation analysis of the 50 hallmark gene sets with seven optimal feature genes. Statistic tests: Wilcoxon rank-sum test (P < 0.2#; P < 0.05*; P < 0.01**; P < 0.001***; ns, no significance).

3.8. Western blot validation of optimal feature genes

This study analyzed the expression levels of the five best-characterized genes in the PA and control groups. Supplementary Table 9 provides detailed baseline information for the samples included in this analysis. The protein levels of BBOX1 and RAB26 were upregulated, while FOSB, NR4A2 and SOCS2 were downregulated compared to the control group (Fig. 10).

Fig. 10 — The expression of key genes in NAFLD models. *P < 0.05, **P < 0.01, ***P < 0.001 vs. Control.

Discussion: NAFLD is an umbrella term that encompasses a variety of liver diseases ranging from fat accumulation in the liver (steatosis) to the presence of fat and inflammation (NASH) and cirrhosis that are not associated with excessive alcohol consumption (defined as less than 20 g/day in women and less than 30 g/day in men). Typically, NAFLD is associated with a metabolic syndrome characterized by insulin resistance, hyperlipidemia, type 2 diabetes and obesity and is thought to be a manifestation of its liver [19]. The prevalence of NAFLD/NASH is expected to continue to rise, with associated mortality predicted to double by 2030 based on epidemiological models [20]. NAFLD is increasingly becoming a major HCC cause of HCC, making early detection challenging due to shared risk factors with HCC. Therefore, it is crucial to understand the pathogenesis of NAFLD, identify relevant biomarkers, develop effective treatments, and implement preventive measures [21]. Our study used genetic data from the GEO database of NAFLD patients and utilized the WGCNA algorithm to detect differential genes strongly associated with NAFLD development. We then synthesized the data using several machine-learning algorithms, such as LASSO regression, SVM-REF, and random forest. As a result, we identified the five best-characterized genes (BBOX1, FOSB, NR4A2, RAB26, and SOCS2) that were significantly associated with the diagnosis and potential development of NAFLD. In addition, we performed functional enrichment analysis of these genes. We conducted a literature review to investigate the role of the five best-characterized genes identified in this study in NAFLD. Of these five genes, FOSB and SOCS2 were the focus of more preliminary studies in NAFLD. fOSB is a member of the Fos oncogene family consisting of c-Fos, FosB, Fra-1 and Fra-2, and is widely recognized for its role in cancer.

This gene forms a dimer with Jun proteins (including c-Jun, JunB and JunD) to form the activator protein 1 (AP-1) complex, a transcription factor with an albino zipper (bZIP) pattern [22]. Previous studies have proposed that AP-1 plays a key role in promoting the progression of hepatocellular carcinoma (HCC) [23]. In a 2022 study, Bo Hu et al. showed that activation of FOSB transcription led to transcriptional activity of HIF1α in HCC cells, suggesting upregulation of glycolysis-related gene expression and subsequent promotion of HCC [24]. Regarding lipid metabolism, it was shown that AP-1 could have a major impact on hepatic lipid uptake and lipid droplet formation by activating or inhibiting peroxisome-activated receptor γ (PPARγ) transcription, depending on the dimer composition [25]. Although the contribution of FOSB in these cases remains unclear, the study by Luya Li et al. identified FOSB as a pivotal gene in the pathological process of NASH and found a high enrichment of the IL-17 signalling pathway and the TNF signalling pathway, which is partially consistent with our findings [26]. SOCS2 is a substrate recognition subunit of the Cullin/Ring ubiquitin ligase and an adapter protein [27]. Earlier investigations suggested that elimination of SOCS2 from the liver may protect against hepatic steatosis but also exacerbate insulin resistance in mice fed a high-fat diet [28]. In addition, SOCS2 plays a prominent role in some inflammatory conditions and has been implicated in regulating the function of human natural killer cells. Cynthia Honorato Val et al. found that a deficiency of SOCS2 protein, which reduces pro-inflammatory cytokines such as IL-6 and TNF, may be a mechanism to suppress the inflammatory response and restore homeostasis [29]. Shuo Li's study, on the other hand, proposed that SOCS2, present in macrophages, operates as a negative regulator of inflammation and apoptosis during NASH and contributes to the suppression of NASH. This role is achieved not only by regulating the NF-κB signalling pathway but also by regulating the inflammasome signalling pathway [29]. Therefore, additional investigations are necessary to reconcile the function of SOCS2. Furthermore, SOCS2 is mainly associated with growth hormone (GH) signalling and insulin-like growth factor-1 (IGF-1) and plays a role in cell growth [30]. In addition, SOCS2 is a mediator in many cancer pathways. For example, Mengnuo Chen et al. found that METTL3 suppresses SOCS2 expression through an m6A-YTHDF2-dependent mechanism and promotes hepatocellular carcinoma progression [31]. However, our study did not explore the role of SOCS2 as a biomarker for NAFLD, and more research is needed in this area. Unfortunately, basic or clinical experimental studies on the remaining genes (BBOX1, NR4A2, RAB26) and the pathogenesis of NAFLD are still lacking. So, the relationship between them has not been thoroughly discovered. Therefore, this may be a feasible direction for more studies on these three genes.

Our immune infiltration analysis showed that T cell regulation (Tregs), monocytes and neutrophils are closely associated with NAFLD. Tregs are a type of T cells that play a crucial role in preventing autoreactivity to self-antigens and overactivation of effector T cells, which can lead to tissue damage in the immune response caused by infection. This is achieved by producing the suppressor cytokines IL-10 and TGF-β [32]. Th17/Treg imbalance has been identified as an important molecular background in many liver diseases, including NAFLD [33]. Several cytokines such as IL-6, IL-10, TGF-β and microbiome can influence Th17/Treg differentiation, which may be our core one of how genes mediate Tregs [34]. Animal models of NAFLD have demonstrated a reduction in hepatic Treg cell numbers [[35], [36], [37]]. This reduction is caused by local reactive oxygen species (ROS)-induced apoptosis of Treg cells and can be reversed by concomitant use of the antioxidant MnTBAP [37]. Furthermore, the adoptive transfer of Treg cells has been shown to attenuate HFD-induced liver inflammation [37]. In contrast, studies of human hepatic steatosis have shown increased numbers of hepatic Treg cells, albeit indirectly [[33], [34], [35], [36], [37], [38]]. Furthermore, TGF-β, an important component of Tregs secretion, is associated with hepatic steatosis and fibrosis progression [39,40], suggesting that Tregs may have a dual function in NAFLD. Previous studies have highlighted an increased proportion of monocytes in the white blood cells (WBCs) of patients with NAFLD [41]. However, further analysis has identified three distinct subpopulations of monocytes, namely “classical” CD14++CD16⁻, “intermediate” CD14++CD16⁺, and “non-classical” CD14⁺CD16++ monocytes, each with specific surface markers, including CD36 and CD9 [42]. Several studies have explored these monocyte subpopulations and observed increases in intermediate and non-classical subpopulations, as well as increased expression of cell surface Toll-like receptors (TLRs), CD169, and CCR4 in patients with NAFLD relative to controls [43,44]. Oeztuerk et al. concluded that, based on the monocyte lineage biology, inflammatory monocytes are directly implicated in the pathogenesis and progression of NAFLD, a finding that is consistent with our findings.⁴² In the case of neutrophils, the neutrophil-to-lymphocyte ratio (NLR) represents a biomarker that can be derived from absolute counts of neutrophils and lymphocytes [45]. NLR has been shown to have potential as a liver disease biomarkers, including NAFLD, which is associated with higher disease severity [46]. Subsequent investigations found that neutrophils are important contributors of interleukin-17 (IL-17) in the human liver, especially in the later stages of liver fibrosis. This cytokine contributes to neutrophil recruitment and organ infiltration after initial injury, induces neutrophil production of cytokines and chemokines, and exacerbates injury [47]. Taken together, these studies suggest that immune cells such as neutrophils may be of value in the clinical management of NAFLD. Admittedly, our study relied on RNA sequencing data obtained from existing databases, due to which there may be some bias. Moreover, our findings on these key genes relied only on bioinformatic analysis methods. Therefore, future studies are expected to elucidate more specific mechanisms of these genes in NAFLD, and our team will undoubtedly work on further studies. In conclusion, this study aims to provide new ideas and hopefully new possibilities for diagnosing or treating NAFLD.

We comprehensively analyzed two datasets (GSE63067 and GSE89632) from the GEO database to identify differential genes. We applied various analysis techniques to these genes, including WGCNA, SVM machine learning, LASSO regression, and random forest analysis. Finally, we narrowed down to five candidate genes (BBOX1, FOSB, NR4A2, RAB26, and SOCS2) and confirmed them as pivotal genes by external datasets (GSE48452, GSE66676, and GSE135251) and Western blot validation. Although we evaluated the function of these biomarkers using bioinformatics analysis and biological experiments, it is important to conduct larger prospective studies to validate our findings. We are committed to further exploring these hub genes and their role in NAFLD.

4. Conclusions

Our study identified five highly efficient diagnostic genes in patients with NAFLD, highlighting their potential as new targets for diagnosis and possible control of NAFLD progression. We used multiple machine learning methods to improve the accuracy of gene selection and focused on identifying the genes that have the most significant impact on NAFLD. Our approach differs from similar studies and may provide new insights into clinical diagnosis and precise treatment strategies for NAFLD. We hope our findings will help advance the understanding and management of this disease. The potential for blood contamination in the liver biopsies, hepatic artery, and portal vein during the sample remains a limitation in our study despite our best efforts to disinfect and fat the hepatic veins before collection.

Funding

Provide funds through the following funding. Shanghai Outstanding Young Medical Personnel Training Program (Excellence Project of Shanghai Municipal Health Commission, 20224Z0009); Key specialized diseases construction of Huadong Hospital (ZDZB2225); Shanghai Key Supported Discipline Construction Project (General Medicine).

Data availability statement

This study utilized publicly available datasets, which can be accessed at the following link: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63067 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89632 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE135251 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48452 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE66676.

CRediT authorship contribution statement

Runzhi Yu: Methodology, Conceptualization. Yiqin Huang: Writing – original draft, Data curation. Xiaona Hu: Writing – review & editing, Data curation. Jie Chen: Writing – original draft, Methodology, Conceptualization.

Declaration of competing interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:Jie Chen reports financial support was provided by Shanghai Outstanding Young Medical Personnel Training Program (Excellence Project of Shanghai Municipal Health Commission: 20224Z0009); Key specialized diseases construction of Huadong Hospital (ZDZB2225); Shanghai Key Supported Discipline Construction Project (General Medicine). Jie Chen reports a relationship with Huadong Hospital Affiliated to Fudan University that includes: employment.

Jie Chen has patent pending related to the current research work.

The other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We would like to express our sincere gratitude to the GEO database for providing online resources for gene expression, and to the researchers who uploaded their valuable datasets. We also extend our heartfelt appreciation to all participants who contributed to our study.

Footnotes

^{Appendix A}

Supplementary data to this article can be found online at https://doi.org/10.1016/j.heliyon.2024.e32783.

Contributor Information

Xiaona Hu, Email: huxn06@163.com.

Jie Chen, Email: laughchen@126.com.

Appendix A. Supplementary data

The following are the Supplementary data to this article:

Multimedia component 1

mmc1.xlsx^{(72.6KB, xlsx)}

Multimedia component 2

mmc2.docx^{(948.7KB, docx)}

References

1.Murag S., Ahmed A., Kim D. Recent epidemiology of nonalcoholic fatty liver disease. Gut and Liver. 2021;15:206–216. doi: 10.5009/gnl20127. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Tobari M., Hashimoto X. Characteristic features of nonalcoholic fatty liver disease in Japan with a focus on the roles of age, sex and body mass index. Gut and Liver. 2020;14:537–545. doi: 10.5009/gnl19236. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Zhou J.H., Zhou F., Wang W.X., Zhang X.J., Ji Y.X., Zhang P. Epidemiological features of NAFLD from 1999 to 2018 in China. Hepatology. 2020;71:1851–1864. doi: 10.1002/hep.31150. [DOI] [PubMed] [Google Scholar]
4.Powell E.E., Wong V.W.S., Rinella M. Non-alcoholic fatty liver disease. Lancet. 2021;397:2212–2224. doi: 10.1016/S0140-6736(20)32511-3. [DOI] [PubMed] [Google Scholar]
5.Ye J.Z., Lin Y.S., Wang Q., Li Y.T., Zhao Y.J., Chen L.J. Integrated multichip analysis identifies potential key genes in the pathogenesis of nonalcoholic steatohepatitis. Front. Endocrinol. 2020;11 doi: 10.3389/fendo.2020.601745. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Liu L.Q., Lin B.G., Chen Z.Q., Deng M.X., Wang Y., Wang J.S. Identification of key pathways and genes in nonalcoholic fatty liver disease using bioinformatics analysis. Arch. Med. Sci. 2020;16:374–385. doi: 10.5114/aoms.2020.93343. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Feng G., Li X.P., Niu C.Y., Liu M.L., Yan Q.Q., Fan L.P. Bioinformatics analysis reveals novel core genes associated with nonalcoholic fatty liver disease and nonalcoholic steatohepatitis. Gene. 2020;742 doi: 10.1016/j.gene.2020.144549. [DOI] [PubMed] [Google Scholar]
8.Grobman W.A., Stamilio D.M. vol. 194. 2006. pp. 888–894. (Iconography: Methods of Clinical Prediction). [DOI] [PubMed] [Google Scholar]
9.Meng X.W., Cheng Z.L., Lu Z.Y., Tan Y.N., Jia X.Y., Zhang M. MX2: identification and systematic mechanistic analysis of a novel immune-related biomarker for systemic lupus erythematosus. Front. Immunol. 2022;13 doi: 10.3389/fimmu.2022.978851. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Langfelder P., Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 2008;9 doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Godec J., Tan Y., Liberzon A., Tamayo P., Bhattacharya S., Butte A.J. Compendium of immune signatures identifies conserved and species-specific biology in response to inflammation. Immunity. 2016;44:194–206. doi: 10.1016/j.immuni.2015.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Sepulveda J.L. Using R and bioconductor in clinical genomics and transcriptomics. J. Mol. Diagn. 2020;22:3–20. doi: 10.1016/j.jmoldx.2019.08.006. [DOI] [PubMed] [Google Scholar]
13.Kanehisa M., Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Yu G., Wang L., He Q.Y. DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics. 2015;31:608–609. doi: 10.1093/bioinformatics/btu684. [DOI] [PubMed] [Google Scholar]
15.Li H., Lai L., Shen J. Development of a susceptibility gene based novel predictive model for the diagnosis of ulcerative colitis using random forest and artificial neural network. Aging (Albany NY) 2020;12:20471–20482. doi: 10.18632/aging.103861. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Tian Y., Y ang J., Lan M., Zou T. Construction and analysis of a joint diagnosis model of random forest and artificial neural network for heart failure. Aging (Albany NY) 2020;12:26221–26235. doi: 10.18632/aging.202405. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Newman A.M., Steen C., Liu C., Gentles A., Chaudhuri A., Scherer F. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 2019;37:773–782. doi: 10.1038/s41587-019-0114-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Leek J.T., Johnson W.E., Parker H.S., Jaffe A.E., Storey J.D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–883. doi: 10.1093/bioinformatics/bts034. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Anstee Q.M., Mcpherson S., Day C.P. How big a problem is non-alcoholic fatty liver disease? BMJ. 2011:343. doi: 10.1136/bmj.d3897. [DOI] [PubMed] [Google Scholar]
20.Estes C., Anstee Q.M., Arias-Loste M.T., Bantel H., Bellentani S., Caballeria J. Modeling NAFLD disease burden in China, France, Germany, Italy, Japan, Spain, United Kingdom, and United States for the period 2016-2030. J. Hepatol. 2018;69:896–904. doi: 10.1016/j.jhep.2018.05.036. [DOI] [PubMed] [Google Scholar]
21.Ioannou G.N. Ioannou, epidemiology and risk-stratification of NAFLD-associated HCC. J. Hepatol. 2021;75:1476–1484. doi: 10.1016/j.jhep.2021.08.012. [DOI] [PubMed] [Google Scholar]
22.Halazonetis T.D., Georgopoulos K., Greenberg M.E., Leder P. C-Jun dimerizes with itself and with C-Fos, forming complexes of different DNA binding affinities. Cell. 1989;55:917–924. doi: 10.1016/0092-8674(88)90147-x. [DOI] [PubMed] [Google Scholar]
23.Cubero F.J., Zhao G., Trautwein C. JNK: a double-edged sword in tumorigenesis. Hepatology. 2011;54:1470–1472. doi: 10.1002/hep.24532. [DOI] [PubMed] [Google Scholar]
24.Hu B., Yu M., Ma X. IFNα potentiates anti-PD-1 efficacy by remodeling glucose metabolism in the hepatocellular carcinoma microenvironment. Cancer Discov. 2022;12:1718–1741. doi: 10.1158/2159-8290.CD-21-1022. [DOI] [PubMed] [Google Scholar]
25.Hasenfuss S.C., Bakiri L., Thomsen Thomsen MK., Williams E.G., Auwerx J., Wagner E.F. Regulation of steatohepatitis and PPARγ signaling by distinct AP-1 dimers. Cell Metabol. 2014;19:84–95. doi: 10.1016/j.cmet.2013.11.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Li L.Y., Wu J.X. Analysis of hub genes and molecular mechanisms in non-alcoholic steatohepatitis based on the gene expression omnibus database. Zhonghua Yixue Zazhi. 2021;101:3317–3322. doi: 10.3760/cma.j.cn112137-20210416-00913. [DOI] [PubMed] [Google Scholar]
27.Kung W.W., Ramachandran S., Makukhin N., Bruno E., Ciulli A. Ciulli, Structural insights into substrate recognition by the SOCS2 E3 ubiquitin ligase. Nat. Commun. 2019;10:2534. doi: 10.1038/s41467-019-10190-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Zadjali F., Santana-Farre R., Vesterlund M., Carow B., Flores-Morales A. SOCS2 deletion protects against hepatic steatosis but worsens insulin resistance in high-fat-diet-fed mice. Faseb Journal Official Publication of the Federation of American Societies for Experimental Biology. 2012;26:3282–3291. doi: 10.1096/fj.12-205583. [DOI] [PubMed] [Google Scholar]
29.Val C.H., Oliveira M., Lacerda D.R., Barroso A., Machado F.S. SOCS2 modulates adipose tissue inflammation and expansion in mice. J. Nutr. Biochem. 2019;76 doi: 10.1016/j.jnutbio.2019.108304. [DOI] [PubMed] [Google Scholar]
30.Greenhalgh C.J., Rico-Bautista E., Lorentzon M., Thaus A.L., Morgan P.O., Willson Willson TA., et al. SOCS2 negatively regulates growth hormone action in vitro and in vivo. J. Clin. Invest. 2005;115:397–406. doi: 10.1172/JCI22710. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.He L.E., Li H.Y., Wu A.Q., Peng Y.L., Shu G., Yin G. Functions of N6-methyladenosine and its role in cancer. Mol. Cancer. 2019;18 doi: 10.1186/s12943-019-1109-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Mangodt T.C., Van Herck M.A., Nullens S., Ramet J., De Dooy J.J., Jorens P.G. The role of Th17 and Treg responses in the pathogenesis of RSV infection. Pediatr. Res. 2015;78:483–491. doi: 10.1038/pr.2015.143. [DOI] [PubMed] [Google Scholar]
33.Rau M., Schilling A.K., Meertens J. Progression from nonalcoholic fatty liver to nonalcoholic steatohepatitis is marked by a higher frequency of Th17 cells in the liver and an increased Th17/resting regulatory T cell ratio in peripheral blood and in the liver. J. Immunol. 2016;196:97–105. doi: 10.4049/jimmunol.1501175. [DOI] [PubMed] [Google Scholar]
34.Chanana P.F. Immune imbalances in non-alcoholic fatty liver disease: from general biomarkers and neutrophils to interleukin-17 Axis activation and new therapeutic targets. Front. Immunol. 2016;7:490. doi: 10.3389/fimmu.2016.00490. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Kesarwala M.A.A.H., et al. NAFLD, causes selective CD4(+) T lymphocyte loss and promotes hepatocarcinogenesis. Nature. 2016;531:253–257. doi: 10.1038/nature16969. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.He B., Wu L., Wei X., Shao Y., Jiang J. The imbalance of Th17/Treg cells is involved in the progression of nonalcoholic fatty liver disease in mice. BMC Immunol. 2017;18:33. doi: 10.1186/s12865-017-0215-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Ma X., Hua J., Mohamood A.R., Hamad A., Ravi R., Li Z.A. Li, high-fat diet and regulatory T cells influence susceptibility to endotoxin-induced liver injury. Hepatology. 2010;46:1519–1529. doi: 10.1002/hep.21823. [DOI] [PubMed] [Google Scholar]
38.fat Microvesicular. Inter cellular adhesion molecule-1 and regulatory T-lymphocytes are of importance for the inflammatory process in livers with non-alcoholic steatohepatitis. Apmis. 2011;119:412–420. doi: 10.1111/j.1600-0463.2011.02746.x. [DOI] [PubMed] [Google Scholar]
39.Philippe Lefebvr, Fanny Lalloyer, Eric Baugé. Interspecies NASH disease activity whole-genome profiling identifies a fibrogenic role of PPARα-regulated dermatopontin. JCI Insight. 2017;2 doi: 10.1172/jci.insight.92264. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Yadav H., Quijano C., Kamaraju A., Gavrilova O., Malek R., Chen W., Chen, et al. Protection from obesity and diabetes by blockade of TGF-β/smad3 signaling. Cell Metabol. 2011;14:67–79. doi: 10.1016/j.cmet.2011.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Kim H.L., Chung G.E., Park I.Y., Choi J.M., Hwang S.M., Lee J.H. Elevated peripheral blood monocyte fraction in nonalcoholic fatty liver disease. Tohoku J. Exp. Med. 2011;223:227–233. doi: 10.1620/tjem.223.227. [DOI] [PubMed] [Google Scholar]
42.Oeztuerk S., Boehm B.O. Kratzer. A nonclassical monocyte phenotype in peripheral blood is associated with nonalcoholic fatty liver disease: a report from an emil subcohort. Horm. Metab. Res. 2016;48:54–61. doi: 10.1055/s-0035-1547233. [DOI] [PubMed] [Google Scholar]
43.Zhang J., Chen W., Fang L., Li Q., Jing F. Increased intermediate monocyte fraction in peripheral blood is associated with nonalcoholic fatty liver disease. Wien Klin. Wochenschr. 2018;130:1–8. doi: 10.1007/s00508-018-1348-6. [DOI] [PubMed] [Google Scholar]
44.Lambrecht J., Tacke F. Controversies and opportunities in the use of inflammatory markers for diagnosis or risk prediction in fatty liver disease. Front. Immunol. 2020;11 doi: 10.3389/fimmu.2020.634409. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Xu R., Huang H., Zheng Z., Wang F.S. The role of neutrophils in the development of liver diseases. Chinese Journal of Immunology. 2014;11(8) doi: 10.1038/cmi.2014.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Rafat M.N., Sherief A., Fetough A., Yousof M.A., Shahawy M.E. Shahawy, neutrophil to lymphocyte ratio as a new marker for predicting steatohepatitis in patients with nonalcoholic fatty liver disease. Int. J. of Advanced Research. 2015;32:297–302. [Google Scholar]
47.Macek Jilkova Z., SamiaMarche HeleneDecaens, ThomasSturm NathalieJouvin-Marche, EvelyneHuard BertrandMarche, Patrice N. Progression of fibrosis in patients with chronic viral hepatitis is associated with IL- 17(+) neutrophils. Liver Int. 2016:36. doi: 10.1111/liv.13060. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1

mmc1.xlsx^{(72.6KB, xlsx)}

Multimedia component 2

mmc2.docx^{(948.7KB, docx)}

Data Availability Statement

[bib1] 1.Murag S., Ahmed A., Kim D. Recent epidemiology of nonalcoholic fatty liver disease. Gut and Liver. 2021;15:206–216. doi: 10.5009/gnl20127. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Tobari M., Hashimoto X. Characteristic features of nonalcoholic fatty liver disease in Japan with a focus on the roles of age, sex and body mass index. Gut and Liver. 2020;14:537–545. doi: 10.5009/gnl19236. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Zhou J.H., Zhou F., Wang W.X., Zhang X.J., Ji Y.X., Zhang P. Epidemiological features of NAFLD from 1999 to 2018 in China. Hepatology. 2020;71:1851–1864. doi: 10.1002/hep.31150. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Powell E.E., Wong V.W.S., Rinella M. Non-alcoholic fatty liver disease. Lancet. 2021;397:2212–2224. doi: 10.1016/S0140-6736(20)32511-3. [DOI] [PubMed] [Google Scholar]

[bib5] 5.Ye J.Z., Lin Y.S., Wang Q., Li Y.T., Zhao Y.J., Chen L.J. Integrated multichip analysis identifies potential key genes in the pathogenesis of nonalcoholic steatohepatitis. Front. Endocrinol. 2020;11 doi: 10.3389/fendo.2020.601745. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Liu L.Q., Lin B.G., Chen Z.Q., Deng M.X., Wang Y., Wang J.S. Identification of key pathways and genes in nonalcoholic fatty liver disease using bioinformatics analysis. Arch. Med. Sci. 2020;16:374–385. doi: 10.5114/aoms.2020.93343. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Feng G., Li X.P., Niu C.Y., Liu M.L., Yan Q.Q., Fan L.P. Bioinformatics analysis reveals novel core genes associated with nonalcoholic fatty liver disease and nonalcoholic steatohepatitis. Gene. 2020;742 doi: 10.1016/j.gene.2020.144549. [DOI] [PubMed] [Google Scholar]

[bib8] 8.Grobman W.A., Stamilio D.M. vol. 194. 2006. pp. 888–894. (Iconography: Methods of Clinical Prediction). [DOI] [PubMed] [Google Scholar]

[bib9] 9.Meng X.W., Cheng Z.L., Lu Z.Y., Tan Y.N., Jia X.Y., Zhang M. MX2: identification and systematic mechanistic analysis of a novel immune-related biomarker for systemic lupus erythematosus. Front. Immunol. 2022;13 doi: 10.3389/fimmu.2022.978851. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Langfelder P., Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 2008;9 doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Godec J., Tan Y., Liberzon A., Tamayo P., Bhattacharya S., Butte A.J. Compendium of immune signatures identifies conserved and species-specific biology in response to inflammation. Immunity. 2016;44:194–206. doi: 10.1016/j.immuni.2015.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Sepulveda J.L. Using R and bioconductor in clinical genomics and transcriptomics. J. Mol. Diagn. 2020;22:3–20. doi: 10.1016/j.jmoldx.2019.08.006. [DOI] [PubMed] [Google Scholar]

[bib13] 13.Kanehisa M., Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Yu G., Wang L., He Q.Y. DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics. 2015;31:608–609. doi: 10.1093/bioinformatics/btu684. [DOI] [PubMed] [Google Scholar]

[bib15] 15.Li H., Lai L., Shen J. Development of a susceptibility gene based novel predictive model for the diagnosis of ulcerative colitis using random forest and artificial neural network. Aging (Albany NY) 2020;12:20471–20482. doi: 10.18632/aging.103861. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Tian Y., Y ang J., Lan M., Zou T. Construction and analysis of a joint diagnosis model of random forest and artificial neural network for heart failure. Aging (Albany NY) 2020;12:26221–26235. doi: 10.18632/aging.202405. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.Newman A.M., Steen C., Liu C., Gentles A., Chaudhuri A., Scherer F. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 2019;37:773–782. doi: 10.1038/s41587-019-0114-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Leek J.T., Johnson W.E., Parker H.S., Jaffe A.E., Storey J.D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–883. doi: 10.1093/bioinformatics/bts034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.Anstee Q.M., Mcpherson S., Day C.P. How big a problem is non-alcoholic fatty liver disease? BMJ. 2011:343. doi: 10.1136/bmj.d3897. [DOI] [PubMed] [Google Scholar]

[bib20] 20.Estes C., Anstee Q.M., Arias-Loste M.T., Bantel H., Bellentani S., Caballeria J. Modeling NAFLD disease burden in China, France, Germany, Italy, Japan, Spain, United Kingdom, and United States for the period 2016-2030. J. Hepatol. 2018;69:896–904. doi: 10.1016/j.jhep.2018.05.036. [DOI] [PubMed] [Google Scholar]

[bib21] 21.Ioannou G.N. Ioannou, epidemiology and risk-stratification of NAFLD-associated HCC. J. Hepatol. 2021;75:1476–1484. doi: 10.1016/j.jhep.2021.08.012. [DOI] [PubMed] [Google Scholar]

[bib22] 22.Halazonetis T.D., Georgopoulos K., Greenberg M.E., Leder P. C-Jun dimerizes with itself and with C-Fos, forming complexes of different DNA binding affinities. Cell. 1989;55:917–924. doi: 10.1016/0092-8674(88)90147-x. [DOI] [PubMed] [Google Scholar]

[bib23] 23.Cubero F.J., Zhao G., Trautwein C. JNK: a double-edged sword in tumorigenesis. Hepatology. 2011;54:1470–1472. doi: 10.1002/hep.24532. [DOI] [PubMed] [Google Scholar]

[bib24] 24.Hu B., Yu M., Ma X. IFNα potentiates anti-PD-1 efficacy by remodeling glucose metabolism in the hepatocellular carcinoma microenvironment. Cancer Discov. 2022;12:1718–1741. doi: 10.1158/2159-8290.CD-21-1022. [DOI] [PubMed] [Google Scholar]

[bib25] 25.Hasenfuss S.C., Bakiri L., Thomsen Thomsen MK., Williams E.G., Auwerx J., Wagner E.F. Regulation of steatohepatitis and PPARγ signaling by distinct AP-1 dimers. Cell Metabol. 2014;19:84–95. doi: 10.1016/j.cmet.2013.11.018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 26.Li L.Y., Wu J.X. Analysis of hub genes and molecular mechanisms in non-alcoholic steatohepatitis based on the gene expression omnibus database. Zhonghua Yixue Zazhi. 2021;101:3317–3322. doi: 10.3760/cma.j.cn112137-20210416-00913. [DOI] [PubMed] [Google Scholar]

[bib27] 27.Kung W.W., Ramachandran S., Makukhin N., Bruno E., Ciulli A. Ciulli, Structural insights into substrate recognition by the SOCS2 E3 ubiquitin ligase. Nat. Commun. 2019;10:2534. doi: 10.1038/s41467-019-10190-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 28.Zadjali F., Santana-Farre R., Vesterlund M., Carow B., Flores-Morales A. SOCS2 deletion protects against hepatic steatosis but worsens insulin resistance in high-fat-diet-fed mice. Faseb Journal Official Publication of the Federation of American Societies for Experimental Biology. 2012;26:3282–3291. doi: 10.1096/fj.12-205583. [DOI] [PubMed] [Google Scholar]

[bib29] 29.Val C.H., Oliveira M., Lacerda D.R., Barroso A., Machado F.S. SOCS2 modulates adipose tissue inflammation and expansion in mice. J. Nutr. Biochem. 2019;76 doi: 10.1016/j.jnutbio.2019.108304. [DOI] [PubMed] [Google Scholar]

[bib30] 30.Greenhalgh C.J., Rico-Bautista E., Lorentzon M., Thaus A.L., Morgan P.O., Willson Willson TA., et al. SOCS2 negatively regulates growth hormone action in vitro and in vivo. J. Clin. Invest. 2005;115:397–406. doi: 10.1172/JCI22710. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] 31.He L.E., Li H.Y., Wu A.Q., Peng Y.L., Shu G., Yin G. Functions of N6-methyladenosine and its role in cancer. Mol. Cancer. 2019;18 doi: 10.1186/s12943-019-1109-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] 32.Mangodt T.C., Van Herck M.A., Nullens S., Ramet J., De Dooy J.J., Jorens P.G. The role of Th17 and Treg responses in the pathogenesis of RSV infection. Pediatr. Res. 2015;78:483–491. doi: 10.1038/pr.2015.143. [DOI] [PubMed] [Google Scholar]

[bib33] 33.Rau M., Schilling A.K., Meertens J. Progression from nonalcoholic fatty liver to nonalcoholic steatohepatitis is marked by a higher frequency of Th17 cells in the liver and an increased Th17/resting regulatory T cell ratio in peripheral blood and in the liver. J. Immunol. 2016;196:97–105. doi: 10.4049/jimmunol.1501175. [DOI] [PubMed] [Google Scholar]

[bib34] 34.Chanana P.F. Immune imbalances in non-alcoholic fatty liver disease: from general biomarkers and neutrophils to interleukin-17 Axis activation and new therapeutic targets. Front. Immunol. 2016;7:490. doi: 10.3389/fimmu.2016.00490. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] 35.Kesarwala M.A.A.H., et al. NAFLD, causes selective CD4(+) T lymphocyte loss and promotes hepatocarcinogenesis. Nature. 2016;531:253–257. doi: 10.1038/nature16969. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] 36.He B., Wu L., Wei X., Shao Y., Jiang J. The imbalance of Th17/Treg cells is involved in the progression of nonalcoholic fatty liver disease in mice. BMC Immunol. 2017;18:33. doi: 10.1186/s12865-017-0215-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] 37.Ma X., Hua J., Mohamood A.R., Hamad A., Ravi R., Li Z.A. Li, high-fat diet and regulatory T cells influence susceptibility to endotoxin-induced liver injury. Hepatology. 2010;46:1519–1529. doi: 10.1002/hep.21823. [DOI] [PubMed] [Google Scholar]

[bib38] 38.fat Microvesicular. Inter cellular adhesion molecule-1 and regulatory T-lymphocytes are of importance for the inflammatory process in livers with non-alcoholic steatohepatitis. Apmis. 2011;119:412–420. doi: 10.1111/j.1600-0463.2011.02746.x. [DOI] [PubMed] [Google Scholar]

[bib39] 39.Philippe Lefebvr, Fanny Lalloyer, Eric Baugé. Interspecies NASH disease activity whole-genome profiling identifies a fibrogenic role of PPARα-regulated dermatopontin. JCI Insight. 2017;2 doi: 10.1172/jci.insight.92264. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] 40.Yadav H., Quijano C., Kamaraju A., Gavrilova O., Malek R., Chen W., Chen, et al. Protection from obesity and diabetes by blockade of TGF-β/smad3 signaling. Cell Metabol. 2011;14:67–79. doi: 10.1016/j.cmet.2011.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] 41.Kim H.L., Chung G.E., Park I.Y., Choi J.M., Hwang S.M., Lee J.H. Elevated peripheral blood monocyte fraction in nonalcoholic fatty liver disease. Tohoku J. Exp. Med. 2011;223:227–233. doi: 10.1620/tjem.223.227. [DOI] [PubMed] [Google Scholar]

[bib42] 42.Oeztuerk S., Boehm B.O. Kratzer. A nonclassical monocyte phenotype in peripheral blood is associated with nonalcoholic fatty liver disease: a report from an emil subcohort. Horm. Metab. Res. 2016;48:54–61. doi: 10.1055/s-0035-1547233. [DOI] [PubMed] [Google Scholar]

[bib43] 43.Zhang J., Chen W., Fang L., Li Q., Jing F. Increased intermediate monocyte fraction in peripheral blood is associated with nonalcoholic fatty liver disease. Wien Klin. Wochenschr. 2018;130:1–8. doi: 10.1007/s00508-018-1348-6. [DOI] [PubMed] [Google Scholar]

[bib44] 44.Lambrecht J., Tacke F. Controversies and opportunities in the use of inflammatory markers for diagnosis or risk prediction in fatty liver disease. Front. Immunol. 2020;11 doi: 10.3389/fimmu.2020.634409. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] 45.Xu R., Huang H., Zheng Z., Wang F.S. The role of neutrophils in the development of liver diseases. Chinese Journal of Immunology. 2014;11(8) doi: 10.1038/cmi.2014.2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] 46.Rafat M.N., Sherief A., Fetough A., Yousof M.A., Shahawy M.E. Shahawy, neutrophil to lymphocyte ratio as a new marker for predicting steatohepatitis in patients with nonalcoholic fatty liver disease. Int. J. of Advanced Research. 2015;32:297–302. [Google Scholar]

[bib47] 47.Macek Jilkova Z., SamiaMarche HeleneDecaens, ThomasSturm NathalieJouvin-Marche, EvelyneHuard BertrandMarche, Patrice N. Progression of fibrosis in patients with chronic viral hepatitis is associated with IL- 17(+) neutrophils. Liver Int. 2016:36. doi: 10.1111/liv.13060. [DOI] [PubMed] [Google Scholar]

PERMALINK

Analysis of machine learning based integration to identify the crosslink between inflammation and immune response in non-alcoholic fatty liver disease through bioinformatic analysis

Runzhi Yu

Yiqin Huang

Xiaona Hu

Jie Chen

Abstract

Background

Methods

Results

Conclusion

1. Introduction

2. Materials and methods

2.1. Acquisition of NAFLD datasets and removal of batch effects

2.2. Identification of differential expression genes

2.3. Weighted gene Co-expression network analysis

2.4. Functional enrichment analysis and protein-protein interaction network

2.5. Machine learning screening for optimal feature genes

2.6. Assessment of hallmark gene sets and immune cell infiltration

2.7. GSEA and correlation analysis of optimal feature genes

2.8. Western blotting analysis

2.9. Statistical analysis

3. Results

3.1. Screening of DEGs in NAFLD

Fig. 1.

3.2. WGCNA and screening of hub modules

Fig. 2.

3.3. Functional enrichment analysis of overlapping DEGs

Fig. 3.

3.4. Integrated LASSO, SVM-REF and RF algorithm for screening hub markers

Fig. 4.

3.5. The validation of hub biomarkers

3.5.1. BBOX1 (butyrobetaine (gamma), 2-oxoglutarate dioxygenase 1)

3.5.2. FOSB (FosB proto-oncogene, AP-1 transcription factor subunit)

3.5.3. NR4A2 (nuclear receptor subfamily 4 group a member 2, also known as Nurr1)

3.5.4. RAB26 (RAB26, member RAS oncogene family)

3.5.5. SOCS2 (suppressor of cytokine signalling 2)

Fig. 5.

Fig. 6.

3.6. Correlation between potential biomarkers and gene set enrichment analysis

Fig. 7.

3.7. Hallmark gene sets and immune cell infiltration

Fig. 8.

Fig. 9.

3.8. Western blot validation of optimal feature genes

Fig. 10.

4. Conclusions

Funding

Data availability statement

CRediT authorship contribution statement

Declaration of competing interest

Acknowledgements

Footnotes

Contributor Information

Appendix A. Supplementary data

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases