Abstract
Nonalcoholic fatty liver disease (NAFLD) is a rising cause of hepatocellular carcinoma (HCC). We examined whether inherited pathogenic variants in candidate genes (n = 181) were enriched in patients with NAFLD-HCC. To this end, we resequenced peripheral blood DNA of 142 NAFLD-HCC, 59 NAFLD with advanced fibrosis, and 50 controls, and considered 404 healthy individuals from 1000 G. Pathogenic variants were defined according to ClinVar, likely pathogenic as rare variants predicted to alter protein activity. In NAFLD-HCC patients, we detected an enrichment in pathogenic (p = 0.024), and likely pathogenic variants (p = 1.9*10−6), particularly in APOB (p = 0.047). APOB variants were associated with lower circulating triglycerides and higher HDL cholesterol (p < 0.01). A genetic risk score predicted NAFLD-HCC (OR 4.96, 3.29–7.55; p = 5.1*10−16), outperforming the diagnostic accuracy of common genetic risk variants, and of clinical risk factors (p < 0.05). In conclusion, rare pathogenic variants in genes involved in liver disease and cancer predisposition are associated with NAFLD-HCC development.
Introduction
Nonalcoholic fatty liver disease (NAFLD) is now the leading cause of hepatic damage worldwide1. In a minority of affected individuals, NAFLD can progress to cirrhosis and hepatocellular carcinoma (NAFLD-HCC), which is emerging as a leading cause of liver-related mortality2–4.
Aging, male sex, presence of type 2 diabetes (T2D) and cirrhosis are clinical risk factors for NAFLD-HCC2. However, due to the high prevalence of NAFLD in the general population, the low incidence of NAFLD-HCC, but the frequent occurrence of this kind of cancer in non-cirrhotic individuals, and the competition with metabolic comorbidities as a cause of death, mass screening of NAFLD-HCC is currently unfeasible. As a result, most cancers are diagnosed at advanced stages, leading to a dismal prognosis for those affected5,6. Therefore, identification of new biomarkers able to improve NAFLD-HCC risk stratification are of paramount clinical importance for the development of targeted and cost-effective surveillance programs.
Heritability is involved in HCC predisposition7, and NAFLD has a strong genetic component8, as does its progression to advanced disease9. A common genetic variation encoding for the I148M variant in PNPLA3, the major inherited determinant of hepatic fat accumulation, predisposes to NAFLD-HCC independently of the effect of fibrosis10,11. We have also showed that other common genetic variants influencing hepatic fat accumulation, namely those in TM6SF2 and MBOAT7, may improve the ability to discriminate NAFLD patients at risk of HCC11. However, possibly because of the still large fraction of missing heritability, carriage of these variants was not specific enough to identify patients at risk of NAFLD-HCC to be implemented in clinical practice12,13.
Part of the missing heritability in NAFLD may be accounted for by carriage of rare genetic variants with a large effect size. Indeed, rare germline genetic variants in TERT and other loci are associated with occurrence of NAFLD-cirrhosis and familial HCC14,15, supporting the hypothesis that rare genetic variants contribute to NAFLD-HCC risk and phenotype variability. However, a systematic evaluation of candidate genes in a large number of affected individuals has not been performed so far.
Taking advantage of next generation whole exome sequencing (WES), the aim of this study was therefore to examine whether variants in candidate genes involved in liver disease and cancer predisposition, that are either pathogenic (that is already linked to a pathological phenotype) or rare and predicted with stringent criteria to alter protein activity (likely pathogenic), are enriched in NAFLD-HCC as compared to healthy individuals and patients with advanced NAFLD. The main outcome was the overall enrichment in pathogenic/likely pathogenic variants, hypothesizing that their identification might be useful in discriminating disease risk and ultimately inform a stratified approach to HCC screening or surveillance in the ever-increasing numbers of patients with NAFLD.
Such an approach enabled us to evaluate a large panel of genes, which would have been not possible by using targeted panels, while restricting at the same time the analysis to those for which there is already solid evidence of a causal relationship with liver disease/cancer.
Methods
Study cohorts and design
The evaluated cohorts and study flow chart are presented in Fig. 1.
The discovery NAFLD-HCC cohort included 72 Italian patients and 70 UK patients, who were enrolled between January 2010 and 2016. All were of Caucasian ancestry.
The diagnosis of HCC was based on the EASL-EORTC clinical practice guidelines for management of hepatocellular carcinoma16. Secondary causes of steatosis were excluded on history, including alcohol abuse (≥30 g/day in M/F) and the use of drugs known to precipitate steatosis. Viral and autoimmune hepatitis, hereditary hemochromatosis, Wilson’s disease, overt alpha-1-antitrypsin deficiency and present or previous infection with HBV (HBsAg) and HCV were ruled out using standard clinical and laboratory evaluation as well as liver biopsy features.
Fifty-nine patients with advanced fibrosis due to NAFLD (histological stage F3–F4 or clinically overt cirrhosis) recruited at the Italian institutions during the same period were used as controls. A local ethnically matched control group of comparable sex distribution including 50 Italian healthy blood donors without clinical and biochemical evidences of liver disease, NAFLD, metabolic abnormalities and no alcohol abuse14, and the 404 non-Finnish European (NFE) healthy individuals included in the 1000 Genomes database (http://www.internationalgenome.org), for whom complete exome data were publicly available were used as further controls (including 91 Italian and 107 UK individuals).
The study protocol conformed to the ethical guidelines of the 1975 Declaration of Helsinki, was approved by the Ethical committee of the involved Institutions (Fondazione IRCCS Ca’ Granda Policlinico and University of Newcastle upon Tyne), and was performed according to the recommendations of the hospitals involved. Informed consent was obtained from each patient or responsible guardian.
The clinical features of individuals included in the study are presented in Table 1.
Table 1.
HCC discovery (n = 72) | HCC replication (n = 70) | p value (discovery vs. replication) | Advanced fibrosis (n = 59) | Healthy individuals (n = 50) | p value (HCC vs. no-HCC) | |
---|---|---|---|---|---|---|
Age, years | 68 ± 9 | 74 ± 7 | <0.0001 | 59 ± 10 | 49 ± 12 | <0.0001 |
Sex, F | 17 (24%) | 10 (20%) | 0.15 | 21 (36%) | 17 (34%) | 0.048 |
BMI, Kg/m2 | 29.8 ± 5.8 (n = 58) | 32.7 ± 8.2 (n = 45) | 0.12 | 31.3 ± 4.9 (n = 44) | 24.6 ± 2.5 (n = 50) | <0.0001 |
Type 2 Diabetes, yes | 43 (61%) | 36 (62%) | 0.72 | 33 (56%) | 0 | <0.0001 |
PNPLA3 I148M | ||||||
I/I | 18 (25%) | 14 (20%) | 0.2 | 13 (22%) | 28 (56%) | 0.13 |
I/M | 30 (42%) | 32 (46%) | 26 (44%) | 21 (42%) | ||
M/M | 24 (33%) | 24 (34%) | 20 (34%) | 1 (2%) | ||
TM6SF2, E167K | ||||||
E/E | 57 (79%) | 53 (76%) | 0.84 | 47 (80%) | 38 (76%) | 0.59 |
E/K | 14 (20%) | 14 (20%) | 12 (20%) | 12 (24%) | ||
K/K | 1 (1%) | 3 (4%) | 0 | 0 | ||
MBOAT7, rs641738 C > T | ||||||
C/C | 18 (25%) | 24 (34%) | 0.032 | 20 (34%) | 19 (38%) | 0.32 |
C/T | 31 (43%) | 38 (54%) | 26 (44%) | 23 (46%) | ||
T/T | 23 (32%) | 8 (12%) | 13 (22%) | 8 (16%) |
BMI: body mass index; HCC hepatocellular carcinoma; PNPLA3: patatin-like phospholipase domain-containing protein 3; TM6SF2: transmembrane 6 superfamily member 2; MBOAT7: membrane bound O-acyltranferase domain containing 7; GCKR: glucokinase regulatory protein. Data were compared by univariate generalized linear models.
There were four sequential steps to the study (Fig. 1). The first step consisted in whole exome sequencing, variant analysis, identification and prioritization. The second addressed the possibility of enrichment in already known pathogenic variants in candidate genes in NAFLD-HCC cases vs. controls, and identified the most mutated genes and the diagnostic yield for Mendelian monogenic disorders. The third step encompassed the identification of rare variants predicted to alter protein function in the same candidate genes that might be associated with disease predisposition. Finally, a genetic risk score (GRS) for NAFLD-HCC was developed and assessed for its diagnostic accuracy.
Whole exome sequencing, variants identification, annotation and prioritization
The WES sequencing and analytical pipeline is presented in Supplementary Fig. 1.
Briefly, for the 258 samples included in the EPIDEMIC-NAFLD cohort, which were resequenced for this project (Exome sequencing for the Identification of Inherited Variants Involved in HCC development in NAFLD), DNA was extracted from peripheral blood mononuclear cells, and quantified by a Qubit 2.0 analyzer using the Qubit dsDNA BR Assay Kit (Thermo-Fisher Scientific, Waltham, MA, USA). Samples purity was evaluated using a Nanodrop 1000 spectrophotometer (Thermo-Fisher, Waltham, MA, USA) and integrity was assessed by gel electrophoresis.
DNA libraries were enriched for exome sequencing by the SureSelect Human All Exon v5 kit (Agilent, Cernusco sul Naviglio, Milan, Italy). Sequencing was subsequently performed on the HiSeq4000 platform (Illumina, city). Raw reads quality control was performed using FastQC software (Brabaham bioinformatics, Cambridge, UK). Reads mapping on human GRCh37 genome was performed using MEM algorithm of Burrows Wheeler Aligner (BWA) version 0.7.1017. Reads with low quality alignments and duplicate reads were filtered out using Samtools18 to generate high quality bam files. Mapping quality control was performed using Picard-tools (http://broadinstitute.github.io/picard) and Bedtools19. Sequencing mean depth was of 73x, and no samples exhibit a mean depth lower than 50x (Supplementary Fig. 2 panels a,b). Sequencing resulted in a good target coverage: almost all samples exhibited more than 90% coverage of the target at 20x depth. Importantly, sequencing statistics in terms of input reads, high quality mapped reads, mean depth and coverage, did not show variations among the different cohorts (Supplementary Fig. 2 panel c).
Variant calling was performed following GATK best practices20. Briefly, indel local realignment, base quality recalibration and variants calling (Haplotypecaller algorithm) were performed using GATK version 3.3.021. GVCF joint and variants filtering using variant quality score recalibration (VQSR) method were performed. Variants quality score log-odds (VQSLOD) above 99% tranche were considered true positives. To avoid the possibility of calling somatic variants due to the presence of circulating tumor DNA, variants present in <20% of total reads were discarded. Indel left normalization was performed using BCFtools software22. Variants annotation was performed using both variant effect predictor (VEP)23 and ANNOVAR24 tools.
Variants filtering was performed using VCFtools25 to exclude variants over VQSLOD threshold and variants which were called in less than 95% of samples. All intronic and synonymous variants according to VEP prediction were excluded from the analyses. Multidimensional scaling of identity-by-state distances analysis was conducted on EPIDEMIC study samples exploiting SNPRelate R Bioconductor package26. As shown in Supplementary Fig. 3, in the EPIDEMIC project samples, the first component of variability was explained by the geographic origin of the patients (Italy vs. UK). Whole exome sequencing reads from 1000 genomes project phase 3 (1000 G)27 non-Finnish Europeans (NFE; 404 samples) were processed using the same pipeline described for the EPIDEMIC samples.
Candidate genes selection and classification of variants
Candidate genes were selected according to the literature updated at January 2016, among those whose variants were robustly linked with cancer predisposition syndromes28, or mutated in HCC29, or predisposing to hereditary liver diseases30, or involved in predisposition to telomeres diseases31, or in iron and lipid metabolism and NAFLD32. The complete list of 181 candidate genes and their classification is presented in the Supplementary File, sheet “Candidate genes”.
For Step 2 (enrichment in rare pathogenic variants in candidate genes and diagnostic rate), variants reported as “likely pathogenic” in the Clinvar (https://www.ncbi.nlm.nih.gov) database33, located in candidate genes, and with a minor allele frequency (MAF) <0.05 in 1000 G NFE, ExAC databases and in local healthy controls, were selected.
For Step 3 (enrichment in rare variants predicted to alter protein function, novel likely pathogenic), we used stringent criteria, that is selection of variants determining an alteration of protein sequence (missense, nonsense, splice sites), located in candidate genes, and with a MAF <0.001 in ExAC NFE, MAF <0.005 in the EPIDEMIC project samples, and a CADD Phred >10 (Top 10% of damaging variants)34.
Statistical analysis
For descriptive statistics, continuous variables are shown as mean and standard deviation or median and interquartile range for highly skewed biological variables. Variables with skewed distributions were logarithmically or inverse normally transformed before analyses. All genetic analyses were calculated by using an additive model.
Fisher’s Exact test, multivariate or univariate generalized linear models were used when appropriate. Models were adjusted for clinically relevant covariates, as specified in the Results section. Gene enrichment in rare variants was assessed using the cohort allelic sum test (CAST) approach35,36. The association between the frequency of variants in genes significantly enriched in NAFLD-HCC vs. healthy controls was next validated against the cumulative frequency observed in NFE individuals included in the ExAC project (n = 33370) by Fisher’s exact test, adjusted for the number of comparisons. A NAFLD-HCC risk score was developed as previously described11,37. The GRS for HCC was calculated by regressing the number of pathogenic/likely pathogenic variant collapsed at the level of single candidate genes and common genetic risk factors for in PNPLA3, TM6SF2 and MBOAT7 against the presence of HCC. To internally validate the GRS, β coefficients were adjusted using the Jack-knife resampling method. The diagnostic accuracy of different models for NAFLD-HCC prediction was compared by two-sided Venkatraman test38. The population attributable risk (PAR) of GRS for NAFLD-HCC was estimated as previously described for case-control studies39.
GRS gene functions were explored by pathway enrichment analysis exploiting Ingenuity Pathway Analysis software (Qiagen, Valencia, USA) with default parameters.
Protein features were obtained from Uniprot database (www.uniprot.org) coding variants of interest genes were mapped into reported protein domains and regions. Variants enrichment in protein domains was evaluated by CAST Burden test approach. Lollipop diagrams were generated using Mutation Mapper software (http://www.cbioportal.org/mutation_mapper.jsp).
Statistical analyses were carried out using R statistical analysis software version 3.3.2 (http://www.R-project.org/). P values < 0.05 were considered statistically significant.
Results
Pathogenic variants in candidate genes are enriched in NAFLD-HCC
We first examined pathogenic variants in HCC cases and controls. The list of pathogenic variants identified in candidate genes is presented in the Supplementary File, sheet “Pathogenic”. We identified 68 variants previously linked to pathological phenotypes, which met the inclusion criteria. There was an enrichment in the number of pathogenic variants per individual in candidate genes in NAFLD-HCC patients, as compared to patients with advanced fibrosis, healthy individuals and the 1000 G cohort (Fig. 2a and Supplementary Table 2; OR 1.4, 95% c.i. 1.1-infinite, p = 0.024).
A comutation plot reporting genes interested by pathogenic variants in the newly characterized EPIDEMIC cohorts (excluding 1000 G), as well as common variants previously associated with NAFLD-HCC is reported in Fig. 3. Among the single genes, we found a significant enrichment of variants in the APOB gene predisposing to familial hypobetalipoproteinemia (two variants in cases and none in controls, p = 0.047). Furthermore, we confirmed a strong association with the common PNPLA3 I148M variant (OR 2.49, 95% c.i. 1.89–3.30), and detected an association with the TM6SF2 E167K variant (OR 1.72, 95% c.i. 1.10–1.24) regulating hepatic lipid compartmentalization. The rs641738 MBOAT7 variant was associated with NAFLD-HCC in the discovery (OR 1.49, 95% c.i. 1.03–2.15, p = 0.031), but not in the validation cohort (OR 0.81, 95% c.i. 0.56–1.19, p = NS).
After resequencing of candidate genes, according to ClinVar, 19/142 (13.4%) NAFLD-HCC patients vs. 3/59 (5.1%) NAFLD with advanced fibrosis, 3/50 (6.0%) local controls, and 28/404 (6.9%) NFE individuals from 1000 G had a genetic picture consistent with a diagnosis of Mendelian disease predisposing to advanced liver disease or cancer (HCC: 19/142, 13.4%, vs. no-HCC: 34/513, 4.7%; OR 3.15, 95% c.i. 1.57–5.93, p = 0.0005).
Role of rare variants likely determining an alteration in protein activity
The list of rare variants in candidate genes predicted to alter protein activity (likely pathogenic variants) is provided in the Supplementary File, sheet “likely pathogenic”. We observed an enrichment in rare pathogenic variants in NAFLD-HCC cases vs. controls (Fig. 2b and Supplementary Table 2; OR 3.5 95% c.i. 2.2-inf, p = 1.9*10−6). Genes significantly enriched in likely pathogenic variants in NAFLD-HCC cohorts are presented in Table 2. In the overall series or national cohorts, we found a significant enrichment in variants in Telomerase complex genes (RTEL1, TERF2), DNA and oxidative damage response (RB1), and we also highlighted genes involved in regulation of cell growth and proliferation (STK11, TSC1, TSC2, NF2, SMAD4). Genes involved in regulation of hepatic lipid metabolism, including APOB and MBOAT7 were also enriched in rare likely pathogenic variants, and we detected an enrichment in variants of SQSTM1. Most of the associations remained consistent when the cumulative frequency of variants in the candidate genes was compared to that observed in NFE individuals included in the ExAC database. A comutation plot reporting genes significantly enriched in likely pathogenic variants in the EPIDEMIC cohorts is reported in Supplementary Fig. 4.
Table 2.
Gene | NAFLD-HCC (n = 72) | Controls (n = 513) | OR | (95% c.i.) | p value* | % carriers NAFLD-HCC | ExAC | OR (95% c.i.) | Adjusted p value** |
---|---|---|---|---|---|---|---|---|---|
Discovery cohort (Italy) | Replication vs. ExAC NFE | ||||||||
RTEL1 | 9 | 9 | 7.9 | 2.7–23.5 | 7.8*10–5 | 12.5 | 2.1 | 5.9 (2.6–11.8) | 0.0003 |
SQSTM1 | 3 | 1 | 22.0 | 1.7–1161 | 0.0065 | 4.2 | 0.7 | 5.8 (1.2–18.0) | 0.050 |
TSC2 | 6 | 10 | 4.6 | 1.3–14.4 | 0.0083 | 8.3 | 3.1 | 2.7 (0.9–6.1) | 0.064 |
APOB | 11 | 31 | 2.8 | 1.2–6.1 | 0.012 | 15.3 | 4.6 | 3.3 (1.6–6.2) | 0.005 |
TERF2 | 3 | 3 | 7.4 | 1–55.6 | 0.027 | 4.2 | 0.2 | 17.4 (3.4–55.0) | 0.004 |
SMAD4 | 2 | 1 | 14.5 | 1–859 | 0.041 | 2.8 | 0.4 | 7.4 (0.9–28.3) | 0.064 |
Validation cohort (UK) | |||||||||
STK11 | 2 | 1 | 14.9 | 1–884 | 0.039 | 2.9 | 0.3 | 8.4 (1.0–32) | 0.051 |
MBOAT7 | 2 | 1 | 14.9 | 1–884 | 0.039 | 2.9 | 0.6 | 4.5 (0.5–17.1) | 0.076 |
NF2 | 3 | 4 | 5.7 | 1–34.2 | 0.041 | 4.3 | 0.3 | 12.1 (2.4–37.2) | 0.009 |
RB1 | 4 | 8 | 3.8 | 1–14.7 | 0.0201 | 5.7 | 1.0 | 5.9 (1.6–16) | 0.018 |
Overall | |||||||||
RTEL | 10 | 9 | 4.2 | 1.5–12 | 0.0026 | 7.0 | 2.1 | 3.3 (1.5–6.3) | 0.006 |
RB1 | 7 | 8 | 3.3 | 1–10.5 | 0.026 | 4.9 | 1.0 | 5.1 (2.0–11) | 0.003 |
TSC1 | 6 | 6 | 3.7 | 1–14.1 | 0.027 | 4.2 | 1.7 | 2.5 (0.9–5.6) | 0.077 |
SMAD4 | 3 | 1 | 11.0 | 1–579 | 0.034 | 2.1 | 3.7 | 5.6 (1.1–17.2) | 0.054 |
SQSTM1 | 3 | 1 | 11.0 | 1–579 | 0.034 | 2.1 | 0.7 | 3.0 (0.6–8.9) | 0.086 |
Significant associations were replicated against NFE individuals included in the ExAC project (n = 33370).
*Evaluated by Burden test. **Evaluated by Fisher’s test adjusted for multiple comparisons. RTEL1: regulator of telomere elongation helicase 1, SQSTM1: sequestosome-1 (p62), TSC2: tuberous sclerosis complex 2, APOB: apolipoprotein B, TERF2: telomere repeat binding factor 2, SMAD4: SMAD (suppressor of mothers against decapentaplegic) family member 4, STK11: serine/threonine kinase 11 (LKB1), MBOAT7: membrane bound O-acyltranferase domain containing 7, NF2: neurofibromin 2, RB1: retinoblastoma 1. Control group was defined as n = 404 NFE individuals form 1000 G project, n = 59 NAFLD patients with advanced fibrosis/cirrhosis, n = 50 local healthy individuals.
In order to check the efficacy of the criteria adopted (frequency and predicted impact) for the identification of likely pathogenic variants, we assessed whether variants in APOB (pathogenic/likely pathogenic), which are associated with a clear phenotype that can be assessed by common biochemical tests, influence circulating lipid levels. Results are shown in Supplementary Table 3. Supporting the validity of our selection algorithm, in patients for whom data were available carriage of APOB variants was associated with 46% higher HDL cholesterol and 44% lower triglycerides (p = 0.008 and p = 0.001, respectively), consistent with a hypobetalipoproteinemia phenotype.
All in all, these data suggest that rare genetic variants with a very high likelihood of impacting protein activity are associated with the predisposition to develop NAFLD-HCC.
Genetic risk score development and validation
To examine whether evaluation of common and rare germline genetic variants may be clinically helpful in the stratification of NAFLD-HCC risk, we developed a weighted GRS for this condition and tested its diagnostic accuracy. The GRS coefficients are shown in Supplementary Table 4. In the total cohort of 655 individuals, the GRS was associated with HCC risk (OR 435, 95% c.i. 111–1903; p < 2*10−16, OR 4.96, 95% c.i. 3.29–7.55; p = 5.1*10−16 for high vs. low GRS). The GRS had an AUROC of 0.74, 95% c.i. 0.69–0.79 for predicting NAFLD-HCC in the total cohort, as compared to 0.79, 95% c.i. 0.73–0.85 and to 0.69, 95% c.i. 0.62–0.75 in the discovery and validation cohorts, respectively (Fig. 4a). The diagnostic thresholds, sensitivity and specificity values are reported in Table S5. The best GRS threshold had a 61% sensitivity and 72% specificity to detect NAFLD-HCC.
The GRS improved the ability to discriminate NAFLD-HCC risk as compared to evaluation of the PNPLA3 I148M variant alone and of a combination of the PNPLA3 I148M and TM6SF2 E167K variants (p < 2*10−16; Fig. 4b and Supplementary Table 5).
In the EPIDEMIC cohort with complete data (n = 251), the GRS was associated with NAFLD-HCC (OR 4.96, 95% c.i. 3.29–7.55; p = 5.1*10−16), independently of classical risk factors (OR 2.28, 95% c.i. 1.06–4.97, p = 0.04; Table 3). The addition of GRS to a model based on acquired risk factors modified the ability to discriminate NAFLD-HCC (p = 0.03 for comparison of AUC; shown in Fig. 4c). This was mainly due to a slight increase in sensitivity (83% v. 78%, Supplementary Table 6). In the EPIDEMIC cohort, the clinical risk score misclassified 46 patients (19%). Addition of GRS to risk prediction led to a net improvement in classification in 8 individuals (3% of the overall cohort, 17% of misclassified; p = 0.004).
Table 3.
Risk factors | OR | 95% c.i. | p value* |
---|---|---|---|
GRS, range | 59.4 | 5.3–952 | 0.002 |
GRS, high | 2.28 | 1.06–4.97 | 0.04 |
Sex, M | 1.56 | 0.64–3.82 | 0.33 |
Age, years | 1.17 | 1.12–1.24 | 4.4*10–10 |
Type 2 diabetes, yes | 1.58 | 0.67–3.67 | 0.29 |
Advanced fibrosis, yes | 1.60 | 0.63–4.00 | 0.31 |
GRS: Genetic risk score; Range: OR for highest for lowest GRS value in the cohort; High: OR for GRS > 0.22 (median value). *Evaluated at generalized linear model considering covariates shown in the table.
The PAR of the overall panel of genetic variants considered in the GRS for the risk of NAFLD-HCC is presented in Supplementary Table 7. In the overall cohort, the genetic variants considered accounted for 48% (95% c.i. 38–59%) of NAFLD-HCC phenotype variability.
Characterization of specific variants
The localization of likely pathogenic variants in selected genes (RETL1, APOB, SQSTM1) in individuals with (n = 142) and without (n = 513) HCC is shown in Supplementary Fig. 5a–c. For APOB, the rate of truncating variants, determining a higher likelihood of severe impairment in protein function, was higher in HCC vs. non-HCC individuals (p = 0.009). Furthermore, variants in SQSMT1 tended to localize in a protein region reported to interact with NTRK1 (p = NS).
The summary output of Ingenuity pathway analysis of genes included in the GRS (which were mutated in NAFLD-HCC cases) is presented in Supplementary Table S8. As expected, there was a significant enrichment in pathways related to cancer, hepatocellular carcinoma, liver disease, lipid metabolism, and hereditary disorders. Interestingly, there was an over-representation of genes involved in FXR/RXR activation, and regulation of cell cycle at the G1/S checkpoint.
Discussion
NAFLD-HCC is an emerging complication of metabolic conditions such as obesity and T2D2. Due to the very high prevalence of the population at risk, classic screening strategies are presently unfeasible. Therefore, novel noninvasive biomarkers are urgently needed to improve disease risk stratification. Indeed, although carriage of the common PNPLA3 I148M variant is a strongly associated with NAFLD-HCC development, taken by itself it is not sufficiently accurate to stratify the risk of this condition12.
Here we show that in patients with NAFLD-HCC, pathogenic and likely pathogenic variants in genes linked to liver disease and cancer predisposition are enriched as compared to healthy individuals. Furthermore, we have replicated this result in two independent cohorts.
Although further validation is required before target gene resequencing can be recommended in clinical practice, these findings have potential clinical implications. In the present cohort, resequencing of the candidate genes panel led to the detection of likely predisposing genetic conditions in a large fraction of patients presenting with NAFLD-HCC. This may also aid in the identification of family members for whom screening would be cost effective, or specific preventive treatments may be considered. Furthermore, evaluation of a comprehensive GRS, which takes into consideration rare variants, in individuals with NAFLD may allow a more accurate HCC risk stratification and the implementation of targeted surveillance. Indeed, the comprehensive GRS showed superior diagnostic accuracy as compared to the evaluation of common genetic risk factors, including the PNPLA3 I148M variant alone12, or a combination of PNPLA3 I148M and TM6SF2 E167K variants40. Furthermore, the GRS improved patient stratification, when considered together with classical risk factors for NAFLD-HCC. The clinical utility of GRS assessment should however be tested in familial and prospective studies evaluating patients with NAFLD and other liver diseases.
Furthermore, the present findings may also have pathophysiological implications worthy of exploration. Firstly, although PNPLA3 I148M is the major genetic determinant of NAFLD-HCC, no other rare loss-of-function mutations were identified in this gene among the affected patients. This would support the notion that the I148M acquires new functions able to trigger hepatic fat accumulation, alteration of retinol metabolism, and carcinogenesis41–43. Secondly, consistently with previous data11, other variants favoring hepatocellular fat retention were associated with NAFLD-HCC, including common and rare variants in TM6SF2 and MBOAT7 genes. Thirdly, variants in APOB, responsible for hypobetalipoproteinemia, were collectively observed in a high proportion of Italian patients (15%), and there was a significant enrichment in pathogenic and truncating mutations in this gene in the overall cohort of NAFLD-HCC patients. APOB genetic variants leading to the synthesis of a dysfunctional ApoB100 protein and to a consequent impairment in the export of lipids from hepatocytes within very low-density lipoproteins are responsible for the development of severe hepatic steatosis (hypobetalipoproteinemia, an autosomal dominant disease). At the same time, some APOB variants that lead to the alteration of the first portion of the protein result also in altered activity of ApoB48, the protein isoform expressed by enterocytes. This results in retention of chylomicrones, malabsorption of fat and liposoluble vitamins (retinol - vitamin A, vitamin E and vitamin D), known to play a protective role in liver disease progression, and possibly in the alteration of the intestinal barrier. Most importantly, individuals carrying APOB mutations had a circulating lipid profile consistent with hypobetalipoproteinemia, providing functional validation of the pathogenicity of the genetic mutations identified. Notably, in line with a causal role of hepatocellular lipid retention in promoting NAFLD-HCC, somatic mutations in APOB also frequently occur during hepatic carcinogenesis44. The mechanism connecting APOB mutations with carcinogenesis is still not completely understood. Induction of hepatocellular lipid accumulation, oxidative stress, and the loss of a possible tumor suppressive activity of APOB are some of the hypothesis that have been raised45,46. Therefore, the identification of APOB mutations in subjects with NAFLD-HCC would be to allow the diagnosis, in these cases mostly unrecognized, of familial hypobetalipoproteinemia in the first-degree relatives, allowing to establish adequate HCC surveillance.
An additional finding was the novel association between variants in SQSTM1 and NAFLD-HCC. SQSMT1 encodes for p62, a component of Mallory-Denk Bodies and hyaline granules47. Protein p62 aggregates accumulate in the cytoplasm of damaged liver cells in NASH and HCC47, and may promote hepatocytes transformation through the activation of antioxidants and mTOR pathways48,49. In keeping, we also identified variants in genes regulating cell growth via the insulin signaling and mTOR pathways, and, in line with previous findings from our group14,31, in the telomere regulation machinery. However, as the present study was not designed to this aim, enrichment in variants in specific genes will have to be confirmed in larger cohorts.
This study has some limitations. First, the design was cross-sectional with retrospective data collection, so that GRS for NAFLD-HCC will need to be validated in future prospective studies including individuals with NAFLD and other liver diseases at high baseline risk. However, since HCC remains a relatively rare complication of NAFLD (which affects almost one in three individuals in the general population) the risk of misclassifying healthy controls was minimal, and would have rather led to reduction of the study power. Furthermore, the sample size was relatively limited, and therefore we focused our attention on pathogenic variants in candidate genes, which are already known to cause disease. As we considered healthy individuals from the general population as controls, results are potentially applicable to NAFLD-HCC genetic screening at population level without prior knowledge of liver disease severity status. If it will be proven cost-effective and ethically acceptable, this approach may assist in stratifying the risk of liver disease and HCC, besides of other chronic degenerative diseases. Additional studies are required to discover new variants predisposing to NAFLD-HCC, which were not examined in this study. Moreover, we could not evaluate a control group with advanced fibrosis due to NAFLD for the UK NAFLD-HCC validation cohort, in which a different pattern of mutations was observed as compared to the Italian cohort. Finally, findings may only be applicable to individuals of European descent.
In conclusion, rare pathogenic variants in candidate genes involved in the predisposition to liver disease or cancer are associated with an increased risk of developing NAFLD-HCC.
Supplementary information
Acknowledgements
The study was supported by the Fondazione IRCCS Ca’ Granda – Istituto Nazionale di Genetica Molecolare (INGM) Molecular Medicine Grant 2014–2016, and the myFIRST AIRC grant n.16888 for the EPIDEMIC-NAFLD project, Ricerca Finalizzata 2016 Ministero della Salute - RF-2016-02364358 (LV), Ricerca Corrente Fondazione IRCCS Ca’ Granda, Associazione Malattie Metaboliche del Fegato ONLUS (non-profit organization) (LV and SF), Associazione Italiana Studio Fegato (AISF: MM and LV). HR and recruitment of patients to the Newcastle Academic Health Partners Bioresource was supported by the European Community’s Seventh Framework Programme (FP7/2001–2013) under grant agreement HEALTH-F2-2009-241762 for the project FLIP.
Author Contributions
S.P. study design, data collection and analysis, important intellectual contribution and manuscript drafting. G.B. data analysis, manuscript drafting and important intellectual contribution. A.P. data analysis. P.D. data collection. B.D. data collection. M.V.M.C. data collection. M.M. data collection. A.L.F. data collection. R.R. data collection. S.P. data collection. A.G. data collection. L.M. data collection. G.S. data collection and important intellectual contribution. E.B. data collection and important intellectual contribution. S.F. study oversight. A.A. data collection. R.D.A. data collection. C.X. data analysis and important intellectual contribution. S.R. important intellectual contribution. R.D.F. study oversight. H.R. data collection, funding, study oversight and important intellectual contribution. L.V.C.V. study design, funding, oversight, data analysis, data collection and interpretation, manuscript drafting, and important intellectual contribution.
Competing Interests
Authors declare that they do not have any conflict of interest or competing interests relevant to the present manuscript. L.V. received speaking fees from: M.S.D. Gilead, AlfaSigma, AbbVie, participated in consulting or advisory boards for: Gilead, Pfizer, Astra Zeneca, Novo Nordisk, and received a research grant from: Gilead.
Footnotes
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Serena Pelusi and Guido Baselli contributed equally.
Supplementary information
Supplementary information accompanies this paper at 10.1038/s41598-019-39998-2.
References
- 1.Younossi Z, Henry L. Contribution of Alcoholic and Nonalcoholic Fatty Liver Disease to the Burden of Liver-Related Morbidity and Mortality. Gastroenterology. 2016;150:1778–1785. doi: 10.1053/j.gastro.2016.03.005. [DOI] [PubMed] [Google Scholar]
- 2.Dongiovanni P, Romeo S, Valenti L. Hepatocellular carcinoma in nonalcoholic fatty liver: role of environmental and genetic factors. World J Gastroenterol. 2014;20:12945–12955. doi: 10.3748/wjg.v20.i36.12945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Baffy G, Brunt EM, Caldwell SH. Hepatocellular carcinoma in non-alcoholic fatty liver disease: An emerging menace. J Hepatol. 2012;56:1384–1391. doi: 10.1016/j.jhep.2011.10.027. [DOI] [PubMed] [Google Scholar]
- 4.Dyson J, et al. Hepatocellular cancer: The impact of obesity, type 2 diabetes and a multidisciplinary team. J Hepatol. 2014;60:110–117. doi: 10.1016/j.jhep.2013.08.011. [DOI] [PubMed] [Google Scholar]
- 5.Piscaglia F, et al. Clinical patterns of hepatocellular carcinoma in nonalcoholic fatty liver disease: A multicenter prospective study. Hepatology. 2016;63:827–838. doi: 10.1002/hep.28368. [DOI] [PubMed] [Google Scholar]
- 6.Younossi ZM, et al. Association of nonalcoholic fatty liver disease (NAFLD) with hepatocellular carcinoma (HCC) in the United States from 2004 to 2009. Hepatology. 2015;62:1723–1730. doi: 10.1002/hep.28123. [DOI] [PubMed] [Google Scholar]
- 7.Turati F, et al. Family history of liver cancer and hepatocellular carcinoma. Hepatology. 2012;55:1416–1425. doi: 10.1002/hep.24794. [DOI] [PubMed] [Google Scholar]
- 8.Dongiovanni P, Romeo S, Valenti L. Genetic Factors in the Pathogenesis of Nonalcoholic Fatty Liver and Steatohepatitis. BioMed research international. 2015;2015:460190. doi: 10.1155/2015/460190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Caussy C, et al. Nonalcoholic fatty liver disease with cirrhosis increases familial risk for advanced fibrosis. J Clin Invest. 2017;127:2697–2704. doi: 10.1172/JCI93465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Liu YL, et al. Carriage of the PNPLA3rs738409 C > G polymorphism confers an increased risk of non-alcoholic fatty liver disease associated hepatocellular carcinoma. J Hepatol. 2013;61:75–81. doi: 10.1016/j.jhep.2014.02.030. [DOI] [PubMed] [Google Scholar]
- 11.Donati B, et al. MBOAT7 rs641738 variant and hepatocellular carcinoma in non-cirrhotic individuals. Sci Rep. 2017;7:4492. doi: 10.1038/s41598-017-04991-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Anstee QM, Liu YL, Day CP, Reeves HL. Reply to: HCC and liver disease risk in homozygous PNPLA3p.I148M carriers approach monogenic inheritance. J Hepatol. 2015;62:982–983. doi: 10.1016/j.jhep.2014.12.028. [DOI] [PubMed] [Google Scholar]
- 13.European Association for the Study of the Liver. Electronic address eee, European Association for the Study of D, European Association for the Study of O. EASL-EASD-EASO Clinical Practice Guidelines for the management of non-alcoholic fatty liver disease. J Hepatol64, 1388–1402 (2016). [DOI] [PubMed]
- 14.Donati B, et al. Telomerase reverse transcriptase germline mutations and hepatocellular carcinoma in patients with nonalcoholic fatty liver disease. Cancer Med. 2017;6:1930–1940. doi: 10.1002/cam4.1078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Caruso S, et al. Germline and somatic DICER1 mutations in familial and sporadic liver tumors. J Hepatol. 2017;66:734–742. doi: 10.1016/j.jhep.2016.12.010. [DOI] [PubMed] [Google Scholar]
- 16.EASL-EORTC clinical practice guidelines: management of hepatocellular carcinoma. J Hepatol56, 908–943 (2012). [DOI] [PubMed]
- 17.Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589–595. doi: 10.1093/bioinformatics/btp698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Van der Auwera GA, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013;43(11):10 11–33. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Danecek P, McCarthy SA. BCFtools/csq: haplotype-aware variant consequences. Bioinformatics. 2017;33:2037–2039. doi: 10.1093/bioinformatics/btx100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.McLaren W, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17:122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Danecek P, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zheng X, et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28:3326–3328. doi: 10.1093/bioinformatics/bts606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Genomes Project C, et al. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhang J, et al. Germline Mutations in Predisposition Genes in Pediatric Cancer. N Engl J Med. 2015;373:2336–2346. doi: 10.1056/NEJMoa1508054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zucman-Rossi J, Villanueva A, Nault JC, Llovet JM. Genetic Landscape and Biomarkers of Hepatocellular Carcinoma. Gastroenterology. 2015;149:1226–1239 e1224. doi: 10.1053/j.gastro.2015.05.061. [DOI] [PubMed] [Google Scholar]
- 30.Scorza M, et al. Genetic diseases that predispose to early liver cirrhosis. Int J Hepatol. 2014;2014:713754. doi: 10.1155/2014/713754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Donati, B & Valenti, L. Telomeres, NAFLD and Chronic Liver Disease. International Journal of Molecular Sciences 17 (2016). [DOI] [PMC free article] [PubMed]
- 32.Dongiovanni P, Valenti L. Genetics of nonalcoholic fatty liver disease. Metabolism: clinical and experimental. 2016;65:1026–1037. doi: 10.1016/j.metabol.2015.08.018. [DOI] [PubMed] [Google Scholar]
- 33.Rehm HL, et al. ClinGen–the Clinical Genome Resource. N Engl J Med. 2015;372:2235–2242. doi: 10.1056/NEJMsr1406261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kircher M, et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46:310–315. doi: 10.1038/ng.2892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Cohen JC, et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 2004;305:869–872. doi: 10.1126/science.1099870. [DOI] [PubMed] [Google Scholar]
- 36.Morgenthaler S, Thilly WG. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST) Mutat Res. 2007;615:28–56. doi: 10.1016/j.mrfmmm.2006.09.003. [DOI] [PubMed] [Google Scholar]
- 37.Angulo P, et al. The NAFLD fibrosis score: a noninvasive system that identifies liver fibrosis in patients with NAFLD. Hepatology. 2007;45:846–854. doi: 10.1002/hep.21496. [DOI] [PubMed] [Google Scholar]
- 38.Venkatraman ES, Begg CB. A distribution-free procedure for comparing receiver operating characteristics curves from a paired experiment. Biometrika. 1996;83:835–848. doi: 10.1093/biomet/83.4.835. [DOI] [Google Scholar]
- 39.Bruzzi P, Green SB, Byar DP, Brinton LA, Schairer C. Estimating the population attributable risk for multiple risk factors using case-control data. Am J Epidemiol. 1985;122:904–914. doi: 10.1093/oxfordjournals.aje.a114174. [DOI] [PubMed] [Google Scholar]
- 40.Stickel, F. et al. Genetic variants in PNPLA3 and TM6SF2 predispose to the development of hepatocellular carcinoma in individuals with alcohol-related cirrhosis. Am J Gastroenterol (2018). [DOI] [PubMed]
- 41.Donati B, et al. The rs2294918 E434K Variant Modulates Patatin-Like Phospholipase Domain-Containing 3 Expression and Liver Damage. Hepatology. 2016;63:787–798. doi: 10.1002/hep.28370. [DOI] [PubMed] [Google Scholar]
- 42.Valenti L, Dongiovanni P. Mutant PNPLA3 I148M protein as pharmacological target for liver disease. Hepatology. 2017;66:1026–1028. doi: 10.1002/hep.29298. [DOI] [PubMed] [Google Scholar]
- 43.Pirazzi C, et al. PNPLA3 has retinyl-palmitate lipase activity in human hepatic stellate cells. Hum Mol Genet. 2014;23:4077–4085. doi: 10.1093/hmg/ddu121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Cancer Genome Atlas Research Network. Electronic address wbe, Cancer Genome Atlas Research N. Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma. Cell169, 1327-1341 e1323 (2017). [DOI] [PMC free article] [PubMed]
- 45.Valenti L, Romeo S. Editorial: new insights into the relationship between the intestine and non-alcoholic fatty liver-is “fatty gut” involved in disease progression? Aliment Pharmacol Ther. 2017;46:377–378. doi: 10.1111/apt.14154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Lee G, et al. Clinical significance of APOB inactivation in hepatocellular carcinoma. Exp Mol Med. 2018;50:147. doi: 10.1038/s12276-018-0174-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Stumptner C, Fuchsbichler A, Zatloukal K, Denk H. In vitro production of Mallory bodies and intracellular hyaline bodies: the central role of sequestosome 1/p62. Hepatology. 2007;46:851–860. doi: 10.1002/hep.21744. [DOI] [PubMed] [Google Scholar]
- 48.Inami Y, et al. Persistent activation of Nrf2 through p62 in hepatocellular carcinoma cells. J Cell Biol193, 275-284 (2011). [DOI] [PMC free article] [PubMed]
- 49.Umemura A, et al. p62, Upregulated during Preneoplasia, Induces Hepatocellular Carcinogenesis by Maintaining Survival of Stressed HCC-Initiating Cells. Cancer Cell. 2016;29:935–948. doi: 10.1016/j.ccell.2016.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.