Abstract
Background
Susceptibility to metabolic dysfunction-associated fatty liver diseases (MAFLD) shows a large inter-ethnic variability. Currently, large-scale genome-wide association studies (GWAS) on MAFLD in a Korean population are limited. This study aimed to investigate genes underlying MAFLD in a Korean population.
Methods
A total of 13,457 Korean adults (4061 cases and 9396 controls) who underwent abdominal ultrasonography, biochemical testing, and genetic studies at a comprehensive health promotion center from 2019 to 2023 were included. Genome-wide genotyping was conducted using Infinium Asian Screening Array and an iSCAN system (Illumina, San Diego, CA, USA). Gene-based approach was conducted with Multi-Marker Analysis for Genomic Annotation (MAGMA) and Functional Mapping and Annotation (FUMA). Expression quantitative trait loci (eQTLs) mapping was done using GTEx v8 data.
Results
The 22q13.3, 19p13.11, and 2p23.3 loci were associated with MAFLD after adjusting for age, sex, and body mass index (p < 5 × 10–8). Of these, 154 (89%) variants were identified as eQTLs (FDR < 0.05). Gene-based approach showed that PNPLA3, SAMM50, and PARVB were significantly associated with MAFLD (Bonferroni-corrected p < 2.99 × 10−6), followed by PDLIM4, GCKR, APOB, GPAM, HMGA1, C5orf56, and APOC1.
Conclusions
This is the largest-scale GWAS of MAFLD in a Korean adult population. Genotyping PARVB as well as PNPLA3 might help us identify individuals with the highest risk of MAFLD in Korean adults. These findings would contribute to our understanding of genetic pathogenesis of MAFLD in the Korean population.
Supplementary Information
The online version contains supplementary material available at 10.1186/s40001-025-02576-6.
Keywords: Metabolic dysfunction-associated fatty liver disease (MAFLD), Genome-wide association studies (GWAS), Genetic variants, PNPLA3, SAMM50, PARVB
Background
Non-alcoholic fatty liver disease (NAFLD), one of the leading causes of chronic liver diseases, has a global prevalence of 25% [1–3]. NAFLD is defined by the presence of hepatic steatosis in the absence of secondary causes such as excessive alcohol consumption, viral infection, or autoimmune hepatitis. NAFLD is strongly associated with obesity, diabetes, and metabolic dysfunction. There are significant inter-ethnic differences in the development and progression of NAFLD [4, 5]. NAFLD is most common in Hispanics and East Asian, intermediate in Caucasians, and less common in African ancestry. This finding is associated with a combination of genetic composition with environmental factors such as caloric intake and physical inactivity [2].
Epidemiological, twin, and familial studies have shown that NAFLD is a heritable disease with a heritability ranging from 20 to 70% [6]. The first genome-wide association study (GWAS) has revealed that rs738409 in patatin-like phospholipase domain-containing 3 (PNPLA3) is significantly associated with NAFLD [7]. Subsequently, it have been reported that several genes including transmembrane 6 superfamily member 2 (TM6SF2), glucokinase regulator (GCKR), SAMM50 sorting and assembly machinery component (SAMM50), membrane-bound O-acyltransferase domain-containing 7 (MBOAT7), and hydroxysteroid 17β-dehydrogenase (HSD17B13) are also linked to NAFLD and these associations are reproducible and robust across multiple ethnic groups [8]. Other genes such as Lipin 1 (LPIN1), Apolipoprotein C3 (APOC3), Apolipoprotein B (APOB), Ectonucleotide Pyrophosphatase/Phosphodiesterase 1 (ENPP1), PPARG Coactivator 1 Alpha (PPARGC1A), and Lysophospholipase Like 1 (LYPLAL1) have also been suggested to have potential associations with NAFLD [9, 10]. Susceptibility to NAFLD shows a large inter-ethnic variability, with some reported genes not validated in other ethnic groups [11]. Currently, large-scale GWAS in a Korean population is still limited [11]. There have been four two previous GWAS on Korean adults, revealing that PNPLA3, SAMM50, and/or TM6SF2 are significantly associated with NAFLD and/or lean NAFLD [12–15]. One study shows that rs7996045 in the intron of CLDN10 was associated with NAFL without comorbidities, which is a type of NAFLD [14]. This suggests that there are more genes associated with fatty liver yet to be discovered in a Korean population.
Recently, the term of metabolic dysfunction-associated fatty liver disease (MAFLD) was proposed to emphasis the metabolic contribution and replace the nomenclature of NAFLD by the Asian Pacific Association for the Study of Liver (APASL) in 2020 [1]. Although there is a very high correlation between NAFLD and MAFLD, MAFLD patients show higher mortality and poorer clinical outcomes than NAFLD patients [16]. Recent epidemiologic studies showed that prevalence of MAFLD is higher than that of NAFLD. Furthermore, persons with MAFLD were more likely to be older males with metabolic component such as obesity, diabetes, hypertension, and dyslipidemia [17]. A recent study on a Han population has reported that NAFLD risk variants such as rs738409 in PNLPA3 are not associated with MAFLD [6]. Another recent study has reported that rs738409 in PNPLA3 and rs59148799 in GATA zinc finger domain-containing 2A (GATAD2A) are significantly associated with MAFLD in a Korean population [18]. Currently, studies on MAFLD are less sufficient than studies on NAFLD. Furthermore, large-scale GWAS conducted in a Korean population is very limited. This suggests that additional genetic variants for MAFLD remain to be identified. This study aimed to investigate genes and variants underlying MAFLD in a Korean adult population at a genome-wide level.
Methods
Study population and data collection
This study included 15,004 Korean adult population who underwent abdominal ultrasonography, biochemical testing, and genetic studies at a comprehensive health promotion center from 2019 to 2023. Individuals without abnormalities in abdominal ultrasonography were included as controls. Individuals with hepatic steatosis by abdominal ultrasonography and the presence of overweight/obesity, type 2 diabetes, or the evidences of metabolic dysregulation according to APASL guideline were categorized into cases [1]. Cases with MAFLD were categorized into two groups: (1) obese MAFLD with the body mass index (BMI) ≥ 25 kg/m2; (2) non-obese MAFLD with BMI < 25 kg/m2 who had either diabetes or at least two metabolic risk abnormalities. Metabolic risk abnormalities were defined according to APASL guidelines: waist circumference ≥ 90/80 cm in Asian men and women, blood pressure ≥ 130/85 mmHg, plasma triglycerides ≥ 150 mg/dL, plasma high-density lipoprotein (HDL)-cholesterol < 40 mg/dL for men and < 50 mg/dL for women, fasting glucose levels of 100 to 125 mg/dL or hemoglobin A1c (HbA1c) level of 5.7% to 6.4%, homeostasis model assessment of insulin resistance [HOMA-IR; (insulin × glucose)/405] score ≥ 2.5, and plasma high-sensitivity C-reactive protein (hs-CRP) > 2 mg/L [1].
We excluded 1547 samples due to missing clinical information, inconsistencies between genotype-estimated sex and clinically reported sex, and the presence of cryptically related individuals with a kinship coefficient greater than 0.2 calculated by the KING software [19]. A total of 4,061 cases (N = 3453 for obese MAFLD and N = 608 for non-obese MAFLD) and 9396 controls were analyzed. Samples enrolled from January 2023 to December 2023 were used as a discovery set (N = 8299, 2462 cases and 5,837 controls) and samples enrolled from January 2019 to December 2022 were used as a validation set (N = 5158, 1599 cases and 3559 controls) (Fig. 1). Clinical characteristics of study samples are shown in Supplementary Table 1. This study was approved by the Institutional Review Board of Samsung Changwon Hospital (2023-03-002).
Fig. 1.
Overall workflow of this study
Anthropometric measurements and laboratory data included age, sex, systolic blood pressure, diastolic blood pressure, BMI, waist circumference, total protein, albumin, aspartate transaminase (AST), alanine aminotransferase (ALT), gamma glutamyltransferase (GGT), fasting glucose, insulin, HbA1c, total cholesterol, HDL-cholesterol, low-density lipoprotein–cholesterol (LDL-cholesterol), triglyceride, and high-sensitivity C-reactive protein (hs-CRP).
Diabetes was diagnosed based on the following criteria: HbA1c ≥ 6.5% or fasting plasma glucose level ≥ 126 mg/dL [20]. Metabolic syndrome was diagnosed if three or more of the following were met: waist circumference ≥ 90 cm in men or ≥ 80 cm in women; triglycerides ≥ 150 mg/dL; HDL-cholesterol < 40 mg/dL in men or < 50 mg/dL in women; blood pressure ≥ 130/85 mmHg; and fasting blood glucose ≥ 100 mg/dL [21, 22].
Genotyping and quality control
A total of 650,296 variants were genotyped using Infinium Asian Screening Array (ASA) chip with an iSCAN system and GenomeStudio Software according to the manufacturer’s instructions (Illumina, San Diego, CA, USA). We included only autosomal variants and excluded variants with more than 10% missing genotypes (n = 4836) and/or variants with a minor allele frequency less than 1% (n = 159,331) (Supplementary Fig. 1). After data preparation using the method provided by Will Rayner (https://www.chg.ox.ac.uk/~wrayner/tools/), a total of 464,192 variants from 13,486 samples were included in the final analysis for imputation. Imputation analysis was performed using the 1000 Genome Project Phase 3 version 5 reference panel on the Michigan Imputation Server (Supplementary Fig. 1) [23]. After excluding variants with an imputation quality score (rsq) less than 0.3 and removing duplicated markers present at the same genomic positions, 9,478,995 variants were obtained from the imputed data (Supplementary Fig. 1). For the final association analysis, a total of 6,254,348 variants were included after excluding variants that met any of the following conditions: (1) minor allele frequency less than 1%; (2) more than 10% missing genotypes; or (3) significant deviations from Hardy–Weinberg equilibrium with P < 0.000001 in controls (Supplementary Fig. 1).
GWAS and gene-based approach
We conducted an association analysis using logistic regression after adjusting for age, sex, and body mass index (BMI) through a two-stage analysis: a discovery stage (p < 5 × 10–5) with 2462 MAFLD patients and 5837 controls, followed by a replication stage (p < 0.05) with 1599 MAFLD patients and 3559 controls and a combined stage (p < 5 × 10–8) with 4061 MAFLD patients and 9396 controls.
A gene-based approach with a combined set was used for analysis. To apply a gene-based approach, we analyzed 6,254,348 variants from 13,457 subjects after excluding variants that met previously established criteria in the discovery set (Fig. 1). Analysis with the gene-based approach was conducted using Multivariate Analysis of Genomic Annotation (MAGMA) [24]. Functional Mapping and Annotation (FUMA)’s SNP2Gene and GENE2FUNC were applied to annotate and prioritize variants and genes from association results with default options [25]. We mapped input SNPs to 16,703 genes using a window extending 2 kb upstream and 1 kb downstream of annotated gene boundaries, and conducted gene-based tests with a Bonferroni-corrected p value threshold of 2.99 × 10−6. Gene set analysis was conducted using MAGMA with Gene Ontology biological processes and molecular functions (MsigDB c5), employing default parameters. Expression quantitative trait loci (eQTLs) analysis was done to investigate whether MAFLD variants regulated expression levels of genes. Genotype-Tissue Expression (GTEx) v8 data on the FUMA platform were used for eQTL analysis.
Statistical analysis
Data quality control and the association test was performed using logistic regression with an additive genetic model after adjusting for age, sex, and BMI, as implemented in PLINK 1.9 [26]. LocusZoom software was used to depict candidate regions in detail [27]. We calculated intercept value and Lambda GC using Linkage Disequilibrium Score Regression [28]. Clinical and laboratory data were compared using Student’s t-test, chi-squared test, Wilcoxon rank-sum test, and Kruskal–Wallis test, as appropriate. All statistical analyses except for the association test were conducted using R software (The R Foundation for Statistical Computing, Vienna, Austria).
Results
GWAS of MAFLD
A total of 639 variants with p < 5 × 10–5 were identified in the discovery set and 236 variants were validated in the replication set. In the combined set, we identified 173 variants significantly associated with MAFLD at a genome-wide significance (p < 5 × 10–8) after adjusting for age, sex, and BMI (Supplementary Table 2). Our overall workflow is schematized in Fig. 1. These variants are located in the 22q13.3, 19p13.11, and 2p23.3 loci, spanning the PNPLA3 (No. of variants = 84), SAMM50 (No. of variants = 71), PARVB (No. of variants = 16), TM6SF2 (No. of variants = 1), and GCKR (No. of variants = 1) (Figs. 2 and 3).
Fig. 2.
Manhattan plot showing genome-wide p value identified in the discovery set consisting of 4061 MAFLD patients and 9396 controls. Three loci (2p23.3, 19p13.11, and 22q13.3) were identified as candidate loci with minimum p value of less than 5.0 × 10–8
Fig. 3.
Regional plot of candidate loci in the combined set. Figures show regional association plots of significant representative loci: a 2p23.3, b 19p13.11, and c 22q13.3. Purple shaded circles represent the most significant SNPs: rs1260326, rs58542925, and rs738409 located in 2p23.3, 19p13.11, and 22q13.3, respectively. Blue line indicates the recombination rate, while the filled color represents the linkage disequilibrium score based on r2 values estimated from the 1000 Genomes Nov 2014 ASN data
Among them, 8 MAFLD variants in exonic regions are described in Table 1. As the most significant variants, rs738409 (C > G) and rs738408 (C > T) in PNPLA3 showed a higher risk allele frequency in the MAFLD case group than in the non-MAFLD group (0.46 vs. 0.39) (p = 1.11 × 10–43, odds ratio (OR) = 1.62) (Table 1). The rs738409 variant was identified as the top lead SNP in the genomic risk loci through FUMA analysis. The risk T allele of rs1260326 (C > T) in GCKR was higher in the MAFLD case group than in the non-MAFLD group (0.57 vs. 0.55) in the combined set (p = 1.44 × 10–8, OR = 1.21, Table 1). In addition, our study confirmed that risk alleles of rs58542926 (C > T) in TM6SF2, rs1260326 (C > T) in GCKR, and rs1007863 (T > C) in PARVB were significantly associated with MAFLD (p = 3.35 × 10–9 and OR = 1.45 for rs58542926, p = 1.44 × 10–8 and OR = 1.21 for rs1260326, and p = 1.69 × 10–29 and OR = 1.47 for rs1007863, Table 1).
Table 1.
Variants in exonic regions significantly associated with MAFLD
| Chr | Position† | SNP | Gene symbol | Effect/other allele | EAF in cases | EAF in controls | p value‡ | OR (CI 95%) |
|---|---|---|---|---|---|---|---|---|
| 2 | 27,730,940 | rs1260326 | GCKR | T/C | 0.57 | 0.55 | 1.44E−08 | 1.21 (1.13–1.30) |
| 19 | 19,379,549 | rs58542926 | TM6SF2 | T/C | 0.09 | 0.07 | 3.35E−09 | 1.45 (1.28–1.63) |
| 22 | 44,324,727 | rs738409 | PNPLA3 | G/C | 0.46 | 0.39 | 1.11E−43 | 1.62 (1.51–1.73) |
| 22 | 44,324,730 | rs738408 | PNPLA3 | T/C | 0.46 | 0.39 | 1.11E−43 | 1.62 (1.51–1.73) |
| 22 | 44,368,122 | rs3761472 | SAMM50 | G/A | 0.45 | 0.38 | 8.99E−38 | 1.56 (1.46–1.67) |
| 22 | 44,372,632 | rs14315 | SAMM50 | T/C | 0.54 | 0.48 | 4.08E−31 | 1.49 (1.39–1.59) |
| 22 | 44,386,281 | rs7587 | SAMM50 | T/C | 0.31 | 0.36 | 9.51E−18 | 0.73 (0.68–0.79) |
| 22 | 44,395,451 | rs1007863 | PARVB | C/T | 0.54 | 0.48 | 1.69E−29 | 1.47 (1.38–1.57) |
Chr chromosome, MAFLD metabolic dysfunction-associated fatty liver disease, EAF effect allele frequency, OR odds ratio, CI confidence interval, SNP single nucleotide polymorphism
†Physical position based on human reference genome build 19
‡P value was calculated using logistic regression, adjusting for age, sex, and body mass index
Among 173 MAFLD variants, 168 variants were associated with obese MAFLD (p < 5 × 10–8), including 82 variants of PNPLA3, 69 variants of SAMM50, 16 variants of PARVB, and 1 variant of TM6SF2. A total of 126 variants were associated with non-obese MAFLD (p < 5 × 10–8), including 68 variants of PNPLA3, 50 variants of SAMM50, and 8 variants of PARVB. A total of 42 MAFLD variants were only associated with obese MAFLD. Non-obese MAFLD-specific variants were not identified. All 8 MAFLD variants in exonic regions demonstrated significant associations with both obese MAFLD and non-obese MAFLD when compared to the control group (Table 2).
Table 2.
Examples of significant variants according to MAFLD subgroup
| Chr | Position† | SNP | Gene symbol | Effect/other allele | Cohort | EAF in cases | EAF in controls | p value‡ | OR (CI 95%) |
|---|---|---|---|---|---|---|---|---|---|
| 2 | 27,730,940 | rs1260326 | GCKR | C/T | Obese MAFLD | 0.57 | 0.55 | 1.10E−07 | 1.23 (1.14–1.32) |
| Non-obese MAFLD | 0.59 | 0.55 | 2.84E−03 | 1.20 (1.06–1.35) | |||||
| 19 | 19,379,549 | rs58542926 | TM6SF2 | T/C | Obese MAFLD | 0.09 | 0.07 | 3.52E−09 | 1.52 (1.32–1.74) |
| Non-obese MAFLD | 0.08 | 0.07 | 4.85E−02 | 1.24 (1.00–1.54) | |||||
| 22 | 44,324,727 | rs738409 | PNPLA3 | G/C | Obese MAFLD | 0.46 | 0.39 | 5.41E−36 | 1.64 (1.52–1.77) |
| Non-obese MAFLD | 0.49 | 0.39 | 2.43E−13 | 1.56 (1.39–1.76) | |||||
| 22 | 44,324,730 | rs738408 | PNPLA3 | T/C | Obese MAFLD | 0.46 | 0.39 | 5.41E−36 | 1.64 (1.52–1.77) |
| Non-obese MAFLD | 0.49 | 0.39 | 2.43E−13 | 1.56 (1.39–1.76) | |||||
| 22 | 44,368,122 | rs3761472 | SAMM50 | G/A | Obese MAFLD | 0.45 | 0.38 | 1.73E−30 | 1.57 (1.45–1.70) |
| Non-obese MAFLD | 0.48 | 0.38 | 6.86E−13 | 1.55 (1.37–1.74) | |||||
| 22 | 44,372,632 | rs14315 | SAMM50 | T/C | Obese MAFLD | 0.53 | 0.48 | 6.13E−25 | 1.49 (1.38–1.61) |
| Non-obese MAFLD | 0.57 | 0.48 | 2.56E−11 | 1.51 (1.34–1.70) | |||||
| 22 | 44,386,281 | rs7587 | SAMM50 | T/C | Obese MAFLD | 0.32 | 0.36 | 4.68E−15 | 0.73 (0.67–0.79) |
| Non-obese MAFLD | 0.30 | 0.36 | 3.72E−06 | 0.74 (0.65–0.84) | |||||
| 22 | 44,395,451 | rs1007863 | PARVB | C/T | Obese MAFLD | 0.53 | 0.48 | 7.40E−24 | 1.48 (1.37–1.60) |
| Non-obese MAFLD | 0.57 | 0.48 | 1.07E−10 | 1.48 (1.32–1.67) |
Chr chromosome, MAFLD metabolic dysfunction-associated fatty liver disease, EAF effect allele frequency, OR odds ratio, CI confidence interval, SNP single nucleotide polymorphism
†Physical position based on human reference genome build 19
‡P values were calculated using logistic regression, adjusting for age, sex, and body mass index
The LDSC analysis yielded an intercept value of 1.0274 (SE = 0.0098), suggesting that GWAS results were not significantly influenced by confounding factors such as population structure or relatedness. Additionally, the Lambda GC value of 1.0466 indicated a relatively low overall inflation of GWAS results, further supporting the reliability of our findings (Supplementary Fig. 2).
eQTL analysis
Among 173 MAFLD variants, 154 (89%) variants were identified as eQTLs across 32 tissues, resulting in 1263 variant-tissue pairs (False Discovery Rate p < 0.05, Table S1). Significant eQTL associations were identified in a number of tissues including adipose subcutaneous tissue (Supplementary Table 3). A number of MAFLD variants including rs1260326, rs58542926, rs738409, and rs738408 exhibited significant common eQTL associations in subcutaneous adipose tissue (Supplementary Table 3). For example, individuals carrying MAFLD-associated risk alleles such as G of rs738409, T of rs738408, and G of rs3761472 exhibited increased SAMM50 gene expression in subcutaneous adipose tissue. In addition, we found that the rs734561 variant located in the 22q13.3 region influenced SAMM50 expression across 26 tissues.
Gene-based analysis
We utilized MAGMA to identify genes associated with MAFLD. We found that three genes, PNPLA3, SAMM50, and PARVB, were significantly associated with MAFLD at the genome-wide level (Bonferroni-corrected p < 2.99 × 10−6, Table 3). The following seven genes showed notable associations (p < 0.0001), including PDZ and LIM Domain 4 (PDLIM4; p = 8.06 × 10−6), GCKR (p = 1.46 × 10⁻5), APOB (p = 2.74 × 10⁻5), Glycerol-3-Phosphate Acyltransferase, Mitochondrial (GPAM; p = 2.90 × 10⁻5), High Mobility Group AT-Hook 1 (HMGA1; p = 4.13 × 10⁻5), Colitis Associated IRF1 Antisense Regulator of Intestinal Homeostasis (CARINH; C5orf56; p = 4.40 × 10⁻5), and APOC1 (p = 5.45 × 10⁻5), although these associations did not reach genome-wide significance (Table 3).
Table 3.
The top 10 significant genes identified from gene-based approach
| Gene symbol | Chromosome | Start position† | End position† | Number of variants | p value‡ |
|---|---|---|---|---|---|
| Total cohort | |||||
| PNPLA3 | 22 | 44,317,619 | 44,362,368 | 150 | 1.50E−14 |
| SAMM50 | 22 | 44,349,301 | 44,408,411 | 175 | 2.15E−14 |
| PARVB | 22 | 44,393,091 | 44,570,829 | 657 | 2.46E−11 |
| PDLIM4 | 5 | 131,591,364 | 131,610,147 | 58 | 8.06E−06 |
| GCKR | 2 | 27,717,709 | 27,747,554 | 49 | 1.46E−05 |
| APOB | 2 | 21,223,301 | 21,268,945 | 84 | 2.74E−05 |
| GPAM | 10 | 113,908,624 | 113,977,135 | 141 | 2.90E−05 |
| HMGA1 | 6 | 34,202,650 | 34,215,008 | 21 | 4.13E−05 |
| C5orf56 | 5 | 131,744,328 | 131,812,736 | 232 | 4.40E−05 |
| APOC1 | 19 | 45,415,504 | 45,423,606 | 21 | 5.45E−05 |
| Obese MAFLD | |||||
| SAMM50 | 22 | 44,349,301 | 44,407,411 | 175 | 2.76E−14 |
| PNPLA3 | 22 | 44,317,619 | 44,361,368 | 150 | 5.00E−10 |
| PARVB | 22 | 44,393,091 | 44,569,829 | 657 | 6.45E−09 |
| PAPPA-AS1 | 9 | 119,159,439 | 119,164,885 | 17 | 1.47E−05 |
| DPYSL5 | 2 | 27,068,615 | 27,174,219 | 164 | 2.42E−05 |
| C2orf16 | 2 | 27,797,389 | 27,806,588 | 8 | 6.48E−05 |
| AC109829.1 | 2 | 27,758,253 | 27,791,011 | 75 | 8.30E−05 |
| FRK | 6 | 116,251,312 | 116,383,921 | 227 | 8.58E−05 |
| HEMGN | 9 | 100,688,073 | 100,709,138 | 37 | 8.59E−05 |
| GCKR | 2 | 27,717,709 | 27,747,554 | 49 | 1.12E−04 |
| Non-obese MAFLD | |||||
| PNPLA3 | 22 | 44,317,619 | 44,361,368 | 150 | 4.90E−13 |
| SAMM50 | 22 | 44,349,301 | 44,407,411 | 175 | 7.82E−12 |
| C5orf56 | 5 | 131,744,328 | 131,812,736 | 215 | 1.59E−05 |
| HRH2 | 5 | 175,083,033 | 175,114,245 | 52 | 2.26E−05 |
| PDLIM4 | 5 | 131,591,364 | 131,610,147 | 58 | 4.58E−05 |
| ANKRD61 | 7 | 6,069,007 | 6,077,017 | 31 | 8.84E−05 |
| EIF2AK1 | 7 | 6,060,881 | 6,100,861 | 168 | 1.14E−04 |
| C10orf25 | 10 | 45,492,146 | 45,498,336 | 17 | 1.14E−04 |
| AIDA | 1 | 222,840,355 | 222,888,552 | 71 | 2.68E−04 |
| ROMO1 | 20 | 34,285,194 | 34,289,906 | 14 | 2.78E−04 |
MAFLD metabolic dysfunction-associated fatty liver disease
†Physical position based on human reference genome build 19
‡Bonferroni-corrected P < 2.99 × 10−6
Subgroup analysis showed that SAMM50 (2.76 × 10⁻14), PNPLA3 (5 × 10⁻10), and PARVB (6.45 × 10⁻9) were significantly associated with obese MAFLD, while PNPLA3 (4.90 × 10⁻13) and SAMM50 (7.82 × 10⁻12) were significantly associated with non-obese MAFLD (Table 3).
Gene set analysis using Gene Ontology biological processes and molecular functions identified two significant gene sets related to lipid homeostasis and phosphatidylcholine floppase activity, with adjusted P values of 0.0048 and 0.011, respectively (Supplementary Table 4). Subsequent MAGMA analysis of GTEx v8 data did not detect any tissue-specific expression patterns associated with MAFLD (Data not shown).
Discussion
This study represents the largest GWAS of MAFLD conducted in a Korean adult population to date. We demonstrated that a total of 173 variants located in PNPLA3, SAMM50, PARVB, TM6SF2, and GCKR genes were significantly associated with MAFLD in a Korean adult population after adjusting for age, sex, and BMI. We not only confirmed the association between PNPLA3 and MAFLD but also demonstrated similar associations between other genes and MAFLD. The association between rs738409 in PNPLA3 and MAFLD has been reported in one recent GWAS on a Korean population [18]. Although associations of SAMM50 and GCKR with NAFLD have been well established, relationships of SAMM50 and GCKR with MAFLD have not been reported yet in a Korean population [12, 13]. Here, we found that SAMM50 and GCKR were associated with MAFLD, suggesting that similar genetic factors could contribute to the pathogenesis of both NAFLD and MAFLD.
In addition, we uncovered a novel association between PARVB and MAFLD in a Korean adult population. The relationship between PARVB and NAFLD has been reported in an Asian population [29–32]. To the best of our knowledge, we firstly revealed that PARVB was associated with MAFLD in a Korean adult population. The PARBV encodes parvin-β, which forms integrin-linked kinase–PINCH–parvin complex. Overexpression of parvin-β can promote lipogenic gene expression and apoptosis [33, 34]. Key mechanisms of NAFLD progression have been reported [33]. The identification of PARVB in addition to the previously known PNPLA3 might have important implications for clinical practice. Polygenic risk scores using PARVB as well as PNPLA3 could potentially enhance our ability to identify individuals at the highest risk of developing MAFLD in Korean adults. Further translational studies are needed for genetic prediction and surveillance using these findings.
Gene-based approach also showed that PNPLA3, SAMM50, and PARVB genes were significantly associated with MAFLD. In addition, we demonstrated associations of MAFLD with other genes such as GPAM, one of the validated NAFLD associated genes [35]. In addition, a number of MAFLD variants in PNPLA3, SAMM50, PARVB, TM6SF2, and GCKR genes were eQTLs in subcutaneous adipose tissues. These genes are involved in lipid metabolism: PNPLA3 in lipid droplet remodeling, SAMM50 in lipid accumulation, PARVB in lipogenic gene expression, TM6SF2 in hepatic lipid export (VDRL secretion), and GCKR in de novo lipogenesis [4, 5]. Furthermore, gene expression levels in subcutaneous adipose tissues have been linked to hepatic steatosis, steatohepatitis, and fibrosis. Gene set analysis using Gene Ontology biological processes identified that genes associated with lipid homeostasis were significant in Korean adult population. This implicated that lipid metabolism abnormality mainly contributed to MAFLD predisposition in a Korean adult population.
Here, we identified that 125 variants in PNPLA3, SAMM50, and PARVB were associated with non-obese MAFLD in a Korean adult population. Previous studies have reported that PNPLA3 and TM6SF2 are mainly associated with increased NAFLD in a lean population [36]. To date, variants in SAMM50 (including rs3761472) and PARVB (including rs1007863) have not been reported in lean MAFLD or lean NAFLD [36]. Although the criteria for lean MAFLD include BMI < 23 kg/m2 and the criteria for non-obese MAFLD include BMI < 25 kg/m2, genetic characteristics of lean MAFLD and non-obese MAFLD are likely to be similar. We firstly reported that SAMM50 and PARVB were also associated with non-obese MAFLD. These results were consistent with a gene-based approach.
There are some strengths and limitations in this study. Compared to the previous study, we further applied two key methods: gene-based approach and eQTL analyses. We implemented a comprehensive gene-based approach in addition to individual SNP-based analysis. Compared to SNP-based study, this methodology provides a more sophisticated interpretation of genetic complexity, enabling a broader understanding of the genetic mechanisms underlying MAFLD. Gene-based approaches address key limitations of single-variant GWAS by capturing the cumulative effects of multiple SNPs within genes, providing a more comprehensive understanding of genetic associations. By focusing on gene-level analysis, this approach offers a more nuanced method for identifying functionally relevant candidate variants and delivers deeper insights into disease etiology beyond the limited SNP-level interpretation of traditional GWAS. Furthermore, we conducted in silico eQTL mapping to integrate genetic variation with transcriptional changes using the GTEx database and FUMA platform. This approach allows for an exploration of the mechanisms of gene expression regulation associated with MAFLD, offering insights into the molecular basis of the disease. In spite of the largest study conducted in Korean adult population, we have not discovered a completely novel gene and did not conduct an in vitro functional experiment. This could be an external validation study for the previous study [18]. Further study based on large-scale GWAS will be needed to discover novel genes underlying MAFLD in a Korean population.
Using comprehensive methods such as GWAS, gene-based approaches, and eQTL studies, we confirmed that PNPLA3, SAMM50, and PARVB were significantly associated with MAFLD in a Korean adult population. To the best of our knowledge, we first report that SAMM50 and PNPLA3 are also associated with lean MAFLD in a Korean adult population. By elucidating these novel genetic associations and confirming previously identified risk factors, this study provides a more comprehensive picture of MAFLD genetics in a Korean population. These findings offer valuable insights into the pathophysiology of MAFLD and highlight potential targets for risk assessment and therapeutic interventions.
Conclusion
This study demonstrated that PNPLA3, SAMM50, and PARVB were key genetic contributors to MAFLD in Korean adult population. These findings would contribute to our understanding of genetic pathogenesis of MAFLD in the Korean population. Further research is needed to fully elucidate the functional roles of these genetic variants and to translate these findings into clinical applications.
Supplementary Information
Acknowledgements
Not applicable.
Abbreviations
- NAFLD
Non-alcoholic fatty liver disease
- GWAS
Genome-wide association study
- BMI
Body mass index
- HDL-cholesterol
High-density lipoprotein cholesterol
- LDL-cholesterol
Low-density lipoprotein cholesterol
- HbA1c
Hemoglobin A1c
- HOMA-IR
Homeostasis model assessment of insulin resistance
- hs-CRP
High-sensitivity C-reactive protein
- AST
Aspartate transaminase
- ALT
Alanine aminotransferase
- GGT
Gamma glutamyltransferase
- eQTLs
Expression quantitative trait loci
- GTEx
Genotype-Tissue Expression
- OR
Odds ratio
- PNPLA3
Patatin-like phospholipase domain-containing 3
- TM6SF2
Transmembrane 6 superfamily member 2
- GCKR
Glucokinase regulator
- SAMM5
SAMM50 sorting and assembly machinery component
- MBOAT7
Membrane-bound O-acyltransferase domain-containing 7
- HSD17B13
Hydroxysteroid 17β-dehydrogenase
- LPIN1
Lipin 1
- APOC3
Apolipoprotein C3
- APOB
Apolipoprotein B
- ENPP1
Ectonucleotide pyrophosphatase/phosphodiesterase 1
- PPARGC1A
PPARG coactivator 1 alpha
- LYPLAL1
Lysophospholipase like 1
- PDLIM4
PDZ and LIM domain 4
- GPAM
Glycerol-3-phosphate acyltransferase, mitochondrial
- HMGA1
High mobility group AT-hook 1
- CARINH
Colitis associated IRF1 antisense regulator of intestinal homeostasis
Author contributions
J.P. and K.P. had full access to all the data of this study and made the decision to submit this article for publication. J.P. and K.P. contributed to study concept and design, formal data analyses and interpretation, and drafting of the manuscript. K.P. was involved in acquisition of clinical data, acquisition of biochemical data, study supervision, and critical revision of the manuscript for important intellectual content. J.P. performed statistical and bioinformatics analysis.
Funding
This study was supported by a research grant from SD Medical Research Institute in 2023 and the National Research Foundation of Korea grant funded by the Korea government (RS-2023-00211468).
Availability of data and materials
The complete dataset will not be made publicly available because of restrictions imposed by the ethics committees due to the sensitive nature of the personal data collected. Requests for data can be made to the corresponding author.
Declarations
Ethics approval and consent to participate
This study was approved by the Institutional Review Board of Samsung Changwon Hospital (2023–03-002).
Consent for publication
Not applicable.
Competing interests
The authors declare no conflict of interests for this article.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Eslam M, Newsome PN, Sarin SK, Anstee QM, Targher G, Romero-Gomez M, et al. A new definition for metabolic dysfunction-associated fatty liver disease: an international expert consensus statement. J Hepatol. 2020;73(1):202–9. [DOI] [PubMed] [Google Scholar]
- 2.Powell EE, Wong VW, Rinella M. Non-alcoholic fatty liver disease. Lancet. 2021;397(10290):2212–24. [DOI] [PubMed] [Google Scholar]
- 3.Younossi ZM, Koenig AB, Abdelatif D, Fazel Y, Henry L, Wymer M. Global epidemiology of nonalcoholic fatty liver disease-meta-analytic assessment of prevalence, incidence, and outcomes. Hepatology. 2016;64(1):73–84. [DOI] [PubMed] [Google Scholar]
- 4.Eslam M, George J. Genetic contributions to NAFLD: leveraging shared genetics to uncover systems biology. Nat Rev Gastroenterol Hepatol. 2020;17(1):40–52. [DOI] [PubMed] [Google Scholar]
- 5.Mahmoudi SK, Tarzemani S, Aghajanzadeh T, Kasravi M, Hatami B, Zali MR, et al. Exploring the role of genetic variations in NAFLD: implications for disease pathogenesis and precision medicine approaches. Eur J Med Res. 2024;29(1):190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Liao S, An K, Liu Z, He H, An Z, Su Q, et al. Genetic variants associated with metabolic dysfunction-associated fatty liver disease in western China. J Clin Lab Anal. 2022;36(9): e24626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Romeo S, Kozlitina J, Xing C, Pertsemlidis A, Cox D, Pennacchio LA, et al. Genetic variation in PNPLA3 confers susceptibility to nonalcoholic fatty liver disease. Nat Genet. 2008;40(12):1461–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Eslam M, Valenti L, Romeo S. Genetics and epigenetics of NAFLD and NASH: clinical impact. J Hepatol. 2018;68(2):268–79. [DOI] [PubMed] [Google Scholar]
- 9.Macaluso FS, Maida M, Petta S. Genetic background in nonalcoholic fatty liver disease: a comprehensive review. World J Gastroenterol. 2015;21(39):11088–111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Seko Y, Yamaguchi K, Itoh Y. The genetic backgrounds in nonalcoholic fatty liver disease. Clin J Gastroenterol. 2018;11(2):97–102. [DOI] [PubMed] [Google Scholar]
- 11.Kumar A, Shalimar WGK, Gupta V, Sachdeva MP. Genetics of nonalcoholic fatty liver disease in Asian populations. J Genet. 2019. 10.1007/s12041-019-1071-8. [PubMed] [Google Scholar]
- 12.Chung GE, Lee Y, Yim JY, Choe EK, Kwak MS, Yang JI, et al. Genetic polymorphisms of PNPLA3 and SAMM50 are associated with nonalcoholic fatty liver disease in a Korean population. Gut Liver. 2018;12(3):316–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Koo BK, Joo SK, Kim D, Bae JM, Park JH, Kim JH, et al. Additive effects of PNPLA3 and TM6SF2 on the histological severity of non-alcoholic fatty liver disease. J Gastroenterol Hepatol. 2018;33(6):1277–85. [DOI] [PubMed] [Google Scholar]
- 14.Kim YJ, Cho YS. Genetic association study identifies genetic variants for non-alcoholic fatty liver without comorbidities in the Korean population. Genes Genomics. 2023;45(7):847–54. [DOI] [PubMed] [Google Scholar]
- 15.Park H, Yoon EL, Chung GE, Choe EK, Bae JH, Choi SH, et al. Genetic and metabolic characteristics of lean nonalcoholic fatty liver disease in a Korean health examinee cohort. Gut Liver. 2024;18(2):316–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Nguyen VH, Le MH, Cheung RC, Nguyen MH. Differential clinical characteristics and mortality outcomes in persons with NAFLD and/or MAFLD. Clin Gastroenterol Hepatol. 2021;19(10):2172–81. [DOI] [PubMed] [Google Scholar]
- 17.Vaz K, Clayton-Chubb D, Majeed A, Lubel J, Simmons D, Kemp W, et al. Current understanding and future perspectives on the impact of changing NAFLD to MAFLD on global epidemiology and clinical outcomes. Hepatol Int. 2023;17(5):1082–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lee Y, Cho EJ, Choe EK, Kwak MS, Yang JI, Oh SW, et al. Genome-wide association study of metabolic dysfunction-associated fatty liver disease in a Korean population. Sci Rep. 2024;14(1):9753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26(22):2867–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.International Expert C. International expert committee report on the role of the A1C assay in the diagnosis of diabetes. Diabetes Care. 2009;32(7):1327–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Alberti KG, Eckel RH, Grundy SM, Zimmet PZ, Cleeman JI, Donato KA, et al. Harmonizing the metabolic syndrome: a joint interim statement of the international diabetes federation task force on epidemiology and prevention; national heart, lung, and blood institute; American heart association; world heart federation; international atherosclerosis society; and international association for the study of obesity. Circulation. 2009;120(16):1640–5. [DOI] [PubMed] [Google Scholar]
- 22.Lim S, Shin H, Song JH, Kwak SH, Kang SM, Won Yoon J, et al. Increasing prevalence of metabolic syndrome in Korea: the Korean national health and nutrition examination survey for 1998–2007. Diabetes Care. 2011;34(6):1323–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Das S, Forer L, Schonherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nat Genet. 2016;48(10):1284–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.de Leeuw CA, Mooij JM, Heskes T, Posthuma D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol. 2015;11(4): e1004219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Watanabe K, Taskesen E, van Bochoven A, Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun. 2017;8(1):1826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2010;26(18):2336–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics C, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47(3):291–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kitamoto T, Kitamoto A, Yoneda M, Hyogo H, Ochi H, Nakamura T, et al. Genome-wide scan revealed that polymorphisms in the PNPLA3, SAMM50, and PARVB genes are associated with development and progression of nonalcoholic fatty liver disease in Japan. Hum Genet. 2013;132(7):783–92. [DOI] [PubMed] [Google Scholar]
- 30.Lee KJ, Moon JS, Lim JG, Huh H, Ahn JE, Kim L, et al. PARVB and HSD17B13 variants are associated with nonalcoholic fatty liver disease in children. J Gastroenterol Hepatol. 2024;39(6):1172–82. [DOI] [PubMed] [Google Scholar]
- 31.Wu G, Wang K, Xue Y, Song G, Wang Y, Sun X, et al. Association of rs5764455 and rs6006473 polymorphisms in PARVB with liver damage of nonalcoholic fatty liver disease in Han Chinese population. Gene. 2016;575(2 Pt 1):270–5. [DOI] [PubMed] [Google Scholar]
- 32.Xu K, Zheng KI, Zhu PW, Liu WY, Ma HL, Li G, et al. Interaction of SAMM50-rs738491, PARVB-rs5764455 and PNPLA3-rs738409 increases susceptibility to nonalcoholic steatohepatitis. J Clin Transl Hepatol. 2022;10(2):219–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Fujii H, Kawada N. Inflammation and fibrogenesis in steatohepatitis. J Gastroenterol. 2012;47(3):215–25. [DOI] [PubMed] [Google Scholar]
- 34.Johnstone CN, Mongroo PS, Rich AS, Schupp M, Bowser MJ, Delemos AS, et al. Parvin-beta inhibits breast cancer tumorigenicity and promotes CDK9-mediated peroxisome proliferator-activated receptor gamma 1 phosphorylation. Mol Cell Biol. 2008;28(2):687–704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Stefan N, Schick F, Birkenfeld AL, Haring HU, White MF. The role of hepatokines in NAFLD. Cell Metab. 2023;35(2):236–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Njei B, Al-Ajlouni YA, Ugwendum D, Abdu M, Forjindam A, Mohamed MF. Genetic and epigenetic determinants of non-alcoholic fatty liver disease (NAFLD) in lean individuals: a systematic review. Transl Gastroenterol Hepatol. 2024;9:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The complete dataset will not be made publicly available because of restrictions imposed by the ethics committees due to the sensitive nature of the personal data collected. Requests for data can be made to the corresponding author.



