Summary
Large-scale genetic association studies have identified multiple susceptibility loci for nasopharyngeal carcinoma (NPC), but the underlying biological mechanisms remain to be explored. To gain insights into the genetic etiology of NPC, we conducted a follow-up study encompassing 6,907 cases and 10,472 controls and identified two additional NPC susceptibility loci, 9q22.33 (rs1867277; OR = 0.74, 95% CI = 0.68–0.81, p = 3.08 × 10−11) and 17q12 (rs226241; OR = 1.42, 95% CI = 1.26–1.60, p = 1.62 × 10−8). The two additional loci, together with two previously reported genome-wide significant loci, 5p15.33 and 9p21.3, were investigated by high-throughput sequencing for chromatin accessibility, histone modification, and promoter capture Hi-C (PCHi-C) profiling. Using luciferase reporter assays and CRISPR interference (CRISPRi) to validate the functional profiling, we identified PHF2 at locus 9q22.33 as a susceptibility gene. PHF2 encodes a histone demethylase and acts as a tumor suppressor. The risk alleles of the functional SNPs reduced the expression of the target gene PHF2 by inhibiting the enhancer activity of its long-range (4.3 Mb) cis-regulatory element, which promoted proliferation of NPC cells. In addition, we identified CDKN2B-AS1 as a susceptibility gene at locus 9p21.3, and the NPC risk allele of the functional SNP rs2069418 promoted the expression of CDKN2B-AS1 by increasing its enhancer activity. The overexpression of CDKN2B-AS1 facilitated proliferation of NPC cells. In summary, we identified functional SNPs and NPC susceptibility genes, which provides additional explanations for the genetic association signals and helps to uncover the underlying genetic etiology of NPC development.
Keywords: genome-wide association study, nasopharyngeal carcinoma, cis-regulatory elements, functional profiling
Graphical abstract

We performed a large-scale genetic association study of nasopharyngeal carcinoma (NPC) followed by high-throughput profiling of cis-regulatory elements and experimental validation. We identified two NPC susceptibility genes, PHF2 at locus 9q22.33 and CDKN2B-AS1 at locus 9p21.3, which helps to uncover the genetic etiology of NPC development.
Introduction
Nasopharyngeal carcinoma (NPC) is an aggressive malignancy that originates from the nasopharyngeal mucosa and is associated with Epstein-Barr virus (EBV) infection.1 The incidence of NPC is highest in East and Southeast Asia, accounting for 77% of new NPC cases around the world.2,3,4 Familial aggregation5,6 and the high estimated heritability of NPC7 suggest the important role of genetic factors in NPC development. We and others have discovered many NPC susceptibility loci in genome-wide association studies (GWASs)8,9,10,11,12,13,14,15 and whole-exome sequencing studies (WESs).16,17,18,19 Although WESs identified rare variants in the protein-coding regions with a direct impact on protein structure or function, the NPC-associated common SNPs identified by GWASs are mostly in the non-coding regions of the susceptibility loci, including the human leukocyte antigen (HLA) locus and the non-HLA loci 3q26 (MECOM), 5p15.33 (CLPTM1L/TERT), 9p21.3 (CDKN2A/CDKN2B), 13q12 (TNFRSF19), and 16p13 (CIITA).8,9,10,11,12,13,14,15 Following the discovery of susceptibility loci in the GWASs, few susceptibility genes (e.g., TNFRSF19 at the 13q12 locus) have been experimentally confirmed,20 and the remaining await functional studies to explain the observed associations.
It is important but challenging to explain the underlying mechanisms of the observed genetic associations. Multi-dimensional bioinformatic profiles are beneficial for prioritizing functional SNPs from a group of association signals with strong linkage disequilibrium and for identifying the target genes that may be linearly far away but regulated by the functional SNPs. To expand our understanding of the genetic architecture and explain the biological role of the identified loci on NPC etiology, we performed a genetic association study combining 6,907 cases and 10,472 controls to identify additional NPC susceptibility loci. High-throughput sequencing for chromatin accessibility, histone modification, and long-range promoter contact profiling was performed in NPC cell lines for functional annotation of the susceptibility loci, followed by functional experiments to confirm the in silico findings and decipher the susceptibility genes for NPC development.
Material and methods
Study populations
To identify additional susceptibility loci, a total of six study populations were enrolled, including four published GWAS samples consisting of 4,506 cases and 5,384 controls (EPI-NPC-2005,21,22 NPCGEE,23 SYSUNPC,24,25 and Hong Kong26,27) and two replication samples (Guangxi and Jiangsu). The Guangxi sample included a total of 1,788 NPC cases and 1,890 healthy controls recruited from Guangxi Province (an NPC endemic area). The Jiangsu sample included a total of 613 NPC cases and 3,198 healthy controls recruited from Jiangsu Province (an NPC non-endemic area).25 The diagnosis of all NPC cases was confirmed by at least two pathologists in accordance with the World Health Organization (WHO) guidelines, which defined all squamous cell carcinomas of the nasopharynx as NPC.1,28 The controls were self-reported cancer-free individuals recruited from the same geographical region as the cases. The demographic characteristics of the participants are shown in Table S1. This study was approved by the Institutional Review Board of Sun Yat-sen University Cancer Center. All the participants provided written informed consent to take part in this study.
Nasopharyngeal carcinoma tissue samples
A total of 83 RNAlater (Sigma, St. Louis, MO, USA) preserved NPC tissues were obtained from the biobank of Sun Yat-sen University Cancer Center, Guangzhou, China. We also obtained peripheral blood DNA matching with these tissue samples for SNP genotyping. All the NPC cases were diagnosed by at least two pathologists in accordance with the World Health Organization (WHO) guidelines. None of the patients had undergone radiotherapy or chemotherapy prior to sampling. All individuals provided informed consent to participate in this study.
Genotyping and meta-analysis
The genotypes of the tag SNPs were either extracted from the genotyping results of the SNP array (the four study samples in the previous study and the control samples from the Jiangsu Province)15,25 or newly genotyped using the iPLEX Sequenom MassARRAY platform (the case samples from the Jiangsu Province and all the samples from the Guangxi Province). The SNPs with minor allele frequencies >0.01, call rate >95%, and the Hardy-Weinberg Equilibrium test p value < 1 × 10−12 were used for meta-analysis. The SNP effects were calculated by logistic regression adjusted for sex, age, and the principal components in the four study samples as previously described15 and adjusted for sex and age in the two replication samples. Fixed-effect meta-analysis was performed using META (v.1.7) to combine the SNP effects from the four study samples from the previous NPC GWAS (4,506 cases and 5,384 controls) and the two replication samples (2,401 cases and 5,088 controls). The locus with SNP surpassing a significance level of 5 × 10−8 was identified as an NPC susceptibility locus.
Cell lines and culture reagents
The NPC cell line C666-1 was purchased from Guangzhou Jennio Biotech Co., Ltd. (China), and the HK1 and HK1 EBV+ cell lines were kindly provided by Professor Mu-Sheng Zeng (Sun Yat-sen University Cancer Center). These cell lines were maintained in RPMI-1640 medium (Gibco; Thermo Fisher Scientific, Inc., USA) containing 10% fetal bovine serum (ExCell Bio, China). The human immortalized nasopharyngeal epithelial cell NP69 was cultured in keratinocyte/serum-free medium (K-SFM) containing epidermal growth factor (EGF) and bovine pituitary extract (Gibco; Thermo Fisher Scientific, Inc., USA). All the cell lines were cultured in an atmosphere of 5% CO2 at 37°C and were authenticated and free from Mycoplasma contamination.
Assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) and data analysis
To describe the chromatin landscape of the NPC cell lines C666-1, HK1, and HK1 EBV+ and the nasopharyngeal epithelial cell line NP69, ATAC-seq was performed as previously described.29 In short, 50,000 viable cells were resuspended in pre-cooled lysis buffer (10 mM NaCl, 3 mM MgCl2, 10 mM Tris-HCl [pH 7.4]) containing 0.01% digitonin, 0.1% Tween 20, and 0.1% NP40, incubated on ice for 3 min, and then centrifuged at 1,000 relative centrifugal force (RCF) for 10 min at 4°C. In the transposition reaction, the unfixed nuclei of the cells and Tn5 transposase (TruePrep DNA Library Prep Kit V2 for Illumina, Vazyme, China) were co-incubated at 37°C for 30 min. The transposed fragments were amplified for 12 PCR cycles, indexed, and sequenced on the Illumina NovaSeq platform with 2 × 150 bp pair-end reads (Novogene, China).
We performed ATAC-seq data analysis as follows. After trimming adaptors using Trimmomatic,30 the reads were mapped to the reference genome (GRCh37/hg19) using Bowtie 2 with default parameters.31 The duplicated reads were removed using the Picard tools MarkDuplicates program, and the reads mapped to the mitochondrial DNA or the ENCODE blacklisted regions32 were also discarded. The deeptools suite was used to calculate and visualize the transcription start site (TSS) enrichment of the ATAC reads.33 The ATAC-seq peak regions of each cell line were called using MACS2 and visualized using the WashU epigenome browser.34,35
Chromatin immunoprecipitation combined with high-throughput sequencing (ChIP-seq) for histone markers and data analysis
To identify the active regulatory regions, ChIP-seq for C666-1, HK1, HK1 EBV+, and NP69 cell lines was performed as previously described with some modifications.36 Briefly, cells were cross-linked with 1% formaldehyde and quenched by 0.13 M glycine. After being scraped into 1.5 mL centrifuge tubes and centrifuged at 3,500 RCF for 5 min at 4°C, cells were resuspended with sodium dodecyl sulfate (SDS) lysis buffer and sonicated to generate DNA fragments of 200–1,000 bp using the Covaris E220 Focused Ultrasonicator (Covaris, LLC, USA).
For chromatin immunoprecipitation, cell lysates were centrifuged at 10,000 RCF for 10 min at 4°C, and the supernatant was incubated overnight at 4°C with 1 μg antibodies of H3K4me1 (ab8895, Abcam, UK), H3K4me3 (ab8580, Abcam, UK), and H3K27ac (ab4729, Abcam, UK). Next, 20 μL of ChIP-grade Protein A/G Magnetic Beads (Thermo Fisher Scientific, Inc., USA) was added and incubated at 4°C for 2 h with mixing. Bead-bound DNA was washed sequentially with IP Wash Buffer 1 (low salt), IP Wash Buffer 2 (high salt), LiCl wash buffer, and TE buffer for 15 min at 4°C, and then incubated with IP Elution Buffer. The purified immunoprecipitated DNA was sequenced using Illumina NovaSeq (Novogene, China).
For ChIP-seq data analysis, adaptors were removed using Trimmomatic.30 Bowtie 2, with default parameters,31 was used to map the qualified reads to the reference genome (GRCh37/hg19). After removing the duplicated reads using the Picard tools MarkDuplicates program and removing the reads mapped to the mitochondrial DNA or the ENCODE blacklisted regions,32 MACS2 was applied to call peaks, and the results were visualized by using the WashU epigenome browser.34,35
Promoter capture Hi-C (PCHi-C)
To obtain high-resolution profiles of the chromosomal interactions involving promoter regions, PCHi-C of C666-1 was performed as described previously.37 In brief, 2 × 107 cells were cross-linked with 1% formaldehyde solution and quenched by 0.2 M glycine. Cell pellets were lysed by pre-cooled PCHi-C lysis buffer. The pelleted nuclei were incubated with 0.3% SDS at 37°C for 60 min and then quenched with 2.5% Triton X-100. For in situ DNA digestion, nuclei pellets were incubated with FastDigest HindIII restriction enzyme (Thermo Fisher Scientific, Inc., USA) and 1× FastDigest Buffer (Thermo Fisher Scientific, Inc., USA). Subsequently, the digested fragment overhangs were filled in and the DNA ends were marked with biotinylated dATP. T4 DNA ligase (Vazyme, China) was used for proximity ligation. NaCl (final 400 mM), Proteinase K (final 0.75 mg/mL) and SDS (final 0.85%) were used for DNA de-crosslinking. The purified DNA was sheared to a size of 300–500 bp to generate a Hi-C library. The Hi-C capture probes were designed using a 120-bp RNA-bait system (Agilent, USA) and were hybridized with the Hi-C library. The captured library was amplified by PCR and finally sequenced on an Illumina HiSeq PE150 platform (Frasergen, China).
To analyze the PCHi-C data, the raw reads were truncated, aligned to the hg19 reference genome, and filtered by HiCUP (v.0.8.2).38 The unique valid di-tags were processed using CHiCAGO (v.1.18.0)39 to identify significant interaction. Interactions with CHiCAGO scores ≥5 were considered high-confidence interactions. PCHi-C interactions were visualized using the WashU Epigenome browser.35 To assess the enrichment of promoter-interacting fragments for ATAC-seq peaks and histone modification peaks (H3K4me1, H3K4me3, and H3K27ac) associated with active regulatory elements, CHiCAGO was used to compare the observed and expected numbers of promoter-interacting fragments that overlapped with these features.
Functional annotations, candidate functional SNP selection, and target gene mapping
We extracted the SNPs in the four genome-wide significant loci (9q22.33, 17q12, 5p15.33, and 9p21.3). We used bedtools to annotate the SNPs that are in the regions of open chromatin state identified by ATAC-seq, as well as in the regions of histone modification identified by ChIP-seq (H3K4me1, H3K4me3, and H3K27ac) with p values <0.05 in the C666-1, HK1 EBV+, HK1, and NP69 cell lines.40 We also used bedtools to annotate the SNPs that are in the anchor of the PCHi-C loops of the C666-1 cell line. The cis-regulatory element (CRE) was defined by the following criteria: (1) showing both the ATAC-seq peaks and H3K27ac peaks, and (2) showing any of the H3K4me1 or H3K4me3 peaks. We focused on the CRE-overlapping SNPs that are in linkage disequilibrium (LD, R2 > 0.8) with the tag SNPs and with p < 1 × 10−6 in the reported GWAS15 and prioritized those having CRE overlap in two of the cell lines as candidate functional SNPs. In addition, if the SNPs overlapped with the CRE in only one of the NPC cell lines, those having PCHi-C anchor overlap were also prioritized as candidate functional SNPs. For the locus with CRE-overlapping SNPs that did not meet the p < 1 × 10−6 criterion, the top SNP is shown. We mapped the candidate functional SNPs to their potential target genes based on the following evidence: (1) spatial physical proximity genes detected by PCHi-C loops, (2) linear physical closest genes, and (3) eQTL genes from GTEx (v.8).41 The target genes mapped by any of the above criteria were selected for CRISPRi validation, and those significantly inhibited genes were further examined by eQTL analysis in NPC tumor tissues. Finally, the gene(s) for follow-up functional study were prioritized.
Dual-luciferase reporter assays
Dual-luciferase reporter assays were used to explore the effects of the different alleles of the SNPs on CRE activity. The DNA from NPC cell lines C666-1 and HK1 EBV+ was used to amplify the target fragments. The fragments containing the candidate SNPs and their flanking regions were cloned into the pGL3-promoter luciferase reporter vector (ClonExpress II One Step Cloning Kit; Vazyme, China), and the SNP genotypes were detected by Sanger sequencing. Plasmids containing the alternative alleles of the SNPs were generated by site-directed mutagenesis and were confirmed by Sanger sequencing. The primer pairs used for cloning and site-directed mutagenesis are listed in Table S2. C666-1 and HK1 EBV+ cells were added in separate 96-well plates (2 × 104 cells per well). After the cells were adhered to the wall, 50 ng of Firefly luciferase reporter vectors containing the inserted fragments and 10 ng pRL-TK control plasmid were co-transfected into the cells using Lipofectamine 3000 reagent (Thermo Fisher Scientific, Inc., China). Twenty-four hours later, Firefly and Renilla luciferase signals were detected using the Dual-Luciferase Reporter Assay System (Promega, USA) according to the manufacturer’s instructions. Relative Firefly signals were normalized to Renilla signals, and the activity of plasmids was expressed relative to the empty vector.
CRISPRi silencing of CRE using dCas9-KRAB
To generate cells producing dCas9-KRAB-MeCP2, C666-1 and HK1 EBV+ cell lines were infected with lentivirus pLV[Exp]-CBh>dCas9-KRAB-MeCP2:T2A:Hygro (VectorBuilder, China). The cells producing dCas9-KRAB-MeCP2 were selected by 200 μg/mL Hygromycin B (Yeasen Biotechnology [Shanghai] Co., Ltd., China).
The plasmid lentiGuide-Hygro-dTomato vector (Addgene #99376) was used for sgRNA expression. The sgRNAs targeting CREs were designed using Benchling web tools (https://www.benchling.com/crispr). For lentivirus packaging, sgRNA plasmids, pMD2.G, and psPAX2 were co-transfected into 293T cells with PEI MAX 40K (WEST GENE, China). Supernatants containing the viral particles were collected 48 h after transfection and filtered. For CRISPRi silencing, the cells producing dCas9-KRAB-MeCP2 were treated with a lentivirus encoding two independent sgRNAs for each CRE (Table S3). Quantitative PCR (qPCR) was used to detect the transcription of candidate target genes.
Real-time qPCR
Total RNA was extracted by TRIzol (Invitrogen, USA) according to the manufacturer’s instructions. One microgram of RNA was reversely transcribed to cDNA using Color Reverse Transcription Kit with gDNA Remover (EZBioscience, China) following the manufacturer’s protocol. Subsequently, qPCR was performed using 2×Color SYBR Green qPCR Master Mix (EZBioscience, China) on a LightCycler 480 System (Roche, Switzerland). The primers used in qPCR are listed in Table S4. The relative mRNA expression (normalized to GAPDH) was calculated using the 2−ΔΔCt method. Each experiment was independently repeated three times.
Overexpression and RNA interference of target genes
The lentiviruses for PHF2 and CDKN2B-AS1 overexpression and the corresponding control viruses were purchased from VectorBuilder (China). C666-1 and HK1 EBV+ cell lines were infected with recombinant lentivirus plus 5 mg/mL polybrene (VectorBuilder, China) according to the manufacturer’s instructions and were selected using 5 μg/mL puromycin (MP Biomedicals, LLC, USA).
For PHF2 knockdown, siRNA duplexes targeting PHF2 and the negative control siRNA (si-NC) were designed and synthesized by RiboBio (Guangzhou, China). For CDKN2B-AS1 knockdown, the lncRNA smart silencer (a mixture of three siRNAs and three ASOs targeting CDKN2B-AS1) and smart silencer of the negative control (ss-NC) was used (RiboBio, Guangzhou, China). Lipofectamine 3000 (Invitrogen, USA) was used for cell transfection according to the manufacturer’s instructions. The sequences of siRNA and lncRNA smart silencer are shown in Table S5. To validate the effects of overexpression and knockdown of target genes in C666-1 and HK1 EBV+ cell lines, the changes of protein abundance of PHF2 were verified by western blot. For the non-coding RNA CDKN2B-AS1, the expression changes were measured by qPCR.
Western blotting
Total protein was extracted by Tissue or Cell Total Protein Extraction Kit (KeyGen Biotech, China) according to the manufacturer’s instructions. BCA Protein Assay Kit (Cwbio, China) was used to quantify the protein abundance. Samples with equal volume of protein were subjected to 10% SDS-PAGE (Beyotime, China) and then transferred to the polyvinylidene fluoride (PVDF) membranes (EMD Millipore, USA). The membranes were then blocked with Tris-buffered saline 0.1% Tween-20 (TBST) containing 6% non-fat milk at room temperature, and then incubated with rabbit anti-PHF2 (1:1,000, 24624-1-AP, Proteintech, China) and rabbit anti-GAPDH (1:1,000, 10494-1-AP, Proteintech, China) antibodies at 4°C overnight. Horseradish peroxidase (HRP)-labeled secondary antibodies (1:5,000, ZSGB-BIO, China) were used to obtain the signals from proteins. Finally, the protein was visualized with an electrochemiluminescence (ECL) system (Biorad, USA) using ECL reagents (Advansta, USA).
Cell proliferation and colony formation assays
For cell proliferation assays, NPC cell lines C666-1 and HK1 EBV+ with different treatments were seeded in 96-well plates, and each well contained the same number of cells (4,000 cells/well for C666-1 and 1,500 cells/well for HK1 EBV+). Cell proliferation was detected using Cell Counting Kit-8 (Apexbio, China) according to the manufacturer’s instructions. Each experimental group had four replicates. For colony formation assays, an equal number of NPC cells (4,000 cells/well for C666-1 and 1,500 cells/well for HK1 EBV+) were seeded in 6-well plates and harvested 2 weeks later. The cells were fixed by methanol for 15–20 min and subsequently stained with 1% crystal violet for 15 min. The colonies containing more than 50 cells were counted using ImageJ. The experiment was repeated three times.
Statistical analyses
GraphPad Prism v.6.01 or R v.3.5.0 was used for statistical analysis and graph creation. Unpaired two-tailed Student’s t test was used to examine the differences between two experimental groups. Two-way ANOVA was used to test differences in cell proliferation between different groups. Experimental data were presented as mean ± standard deviation (SD) from at least three experiments. p < 0.05 was considered statistically significant.
Results
Replication and meta-analysis identify two additional NPC susceptibility loci
The overall workflow of the study is shown in Figure 1A. In our previous NPC GWAS including four study samples (4,506 cases and 5,384 controls), two non-HLA loci—9p21.3 (nearby gene: CDKN2B-AS1) and 5p15.33 (nearby gene: CLPTM1L/TERT)—were associated with NPC risk.15 To expand the current findings, we performed a replication study of two candidate loci in two additional study samples recruited from Guangxi and Jiangsu provinces (2,401 cases and 5,088 controls), followed by a meta-analysis (material and methods). We identified SNPs in the two loci showing genome-wide significance of p < 5 × 10−8 in the meta-analysis of the six samples, including rs1867277 at 9q22.33 locus (nearby gene: FOXE1; odds ratio [OR] = 0.74, 95% confidence interval (CI) = 0.68–0.81, p = 3.08 × 10−11) and rs226241 at 17q12 locus (nearby genes: LASP1/RPL23; OR = 1.42, 95% CI = 1.26–1.60, p = 1.62 × 10−8; Figures 1B and S1).
Figure 1.
Replication of GWAS meta-analysis identifies additional NPC risk loci
(A) Study design. Published NPC meta-GWAS by He et al.15
(B) Additional SNPs associated with NPC risk. The odds ratios (ORs), 95% confidence intervals (95% CIs) and the corresponding p values for rs1867277 (9q22.33) and rs226241(17q12) were calculated by fixed-effect meta-analysis.
Integrative functional profiling for CRE identification and target gene mapping of the NPC susceptibility loci
We next examined the potential functions of the variants in the NPC susceptibility loci 9q22.33 and 17q12 as well as two reported loci in our previous GWAS surpassing a genome-wide significance of p < 5 × 10−8 (9p21.3 and 5p15.33). We performed integrative functional profiling for active regulatory regions by ATAC-seq and histone ChIP-seq in NPC cell lines (C666-1, HK1 EBV+, and HK1) as well as in an immortalized nasopharyngeal epithelial cell line (NP69). Three histone markers that were hypothesized to act by influencing gene expression were used, among which H3K4me1 is associated with gene enhancer, H3K4me3 is associated with promoter, and H3K27ac is associated with active enhancer. The open chromatin peaks identified in the ATAC-seq analysis were enriched near the transcription start site (TSS, Figure S2A), and an average of 305,176 peaks were found in the four cell lines. H3K27ac and H3K4me3 modification regions were enriched near the TSS, and H3K4me1 modification regions were enriched in the flanking region of TSS (Figure S2B). An average of 186,630 H3K27ac peaks, 80,413 H3K4me3 peaks, and 202,661 H3K4me1 peaks were detected in the four cell lines. The top significant peaks were shown in Table S6. In addition, we performed PCHi-C sequencing in the C666-1 cell line and detected 108,744 high-confidence promoter interactions. The distances between valid interaction pairs were mainly distributed between 100 kb and 1 Mb (Figure S3A). The promoter-interacting fragments were enriched in the peaks of ATAC-seq and ChIP-seq for histone markers (Figure S3B). Using these identified peaks to define CREs (material and methods) and to perform annotations for the four NPC susceptibility loci, we discovered a total of 14 candidate functional SNPs located in the CRE regions, which may function in gene expression regulation. Due to the strong linkage disequilibrium (LD; R2 > 0.8) with the GWAS tag SNPs, these 14 candidate functional SNPs were associated with NPC susceptibility (Table 1 and S7).
Table 1.
The cis-regulatory element annotations and potential target genes of the 14 candidate functional SNPs
| rsID | Minora | Riskb | R2c |
Chromatin state annotationd |
Target gene mapping |
ID of CRE | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ATAC | H3K27ac | H3K4me1 | H3K4me3 | PCHi-C interaction gene | Linear closest gene | GTEx eQTL gene | |||||
| 9q22.33 (tag SNP: rs1867277; p = 3.08 × 10−11)e | |||||||||||
| rs13302470 | A | G | 1 | +++− | +−−−+ | ++−+ | ++++ | PHF2, NCBP1, XPA, TRMO | FOXE1, PTCSC2 | FOXE1, TRMO, XPA, PTCSC2, NANS | CRE1 |
| rs1867277 | A | G | 1 | ++++ | ++++ | −+++ | ++++ | PHF2, NCBP1, XPA, TRMO | FOXE1, PTCSC2 | FOXE1, TRMO, XPA, PTCSC2, NANS | |
| rs1867278 | C | A | 1 | ++++ | ++++ | −++− | ++++ | PHF2, NCBP1, XPA, TRMO | FOXE1, PTCSC2 | FOXE1, TRMO, XPA, PTCSC2, NANS | |
| rs1867280 | G | C | 1 | ++++ | ++++ | −−+− | ++++ | PHF2, NCBP1, XPA, TRMO | FOXE1, PTCSC2 | FOXE1, TRMO, XPA, PTCSC2, NANS | |
| rs3021526 | C | T | 1 | +−−+ | ++++ | −+++ | ++++ | PHF2, NCBP1, XPA, TRMO | FOXE1, PTCSC2 | FOXE1, TRMO, XPA, PTCSC2, NANS | |
| rs1443434 | G | T | 1 | +−−− | ++−− | −+++ | +++− | PHF2, NCBP1, XPA, TRMO | FOXE1, PTCSC2 | FOXE1, TRMO, XPA, PTCSC2, NANS | |
| rs1443435 | T | C | 1 | +−−− | ++−− | −+++ | +++− | PHF2, NCBP1, XPA, TRMO | FOXE1, PTCSC2 | FOXE1, TRMO, XPA, PTCSC2, NANS | |
| 9p21.3 (tag SNPs: rs3218012, rs6475604)f | |||||||||||
| rs2518723 | T | C | 0.93 | ++++ | ++++ | +−−− | ++++ | CDKN2A-DT | CDKN2A, CDKN2B-AS1 | N/A | CRE2 |
| rs2069418 | G | C | 0.89 | −+++ | ++++ | ++++ | ++++ | N/A | CDKN2B, CDKN2B-AS1 | CDKN2B-AS1 | CRE3 |
| 5p15.33 (tag SNP: rs31487; p = 1.52 × 10−10)f | |||||||||||
| rs4975616 | G | A | 0.88 | +−+− | +−−+ | ++++ | −−−− | TERLR1 | N/A | TERT, CLPTM1L | CRE4 |
| rs11133729 | G | C | 0.97 | +−−+ | +−−+ | +−++ | +−−− | N/A | CLPTM1L | N/A | CRE5 |
| rs383009 | T | C | 0.97 | ++++ | +−++ | ++++ | −−−− | N/A | CLPTM1L | TERT, CLPTM1L | |
| rs27996 | G | A | 0.97 | ++++ | ++++ | −−−− | ++++ | TERLR1, SLC6A18, TERT, LPCAT1 | CLPTM1L | TERT, CLPTM1L | CRE6 |
| 17q12 (tag SNP: rs226241; p = 1.62 × 10−8)e | |||||||||||
| rs1124927 | G | G | 0.88 | ++++ | ++++ | −−−− | ++++ | STAC2, CWC25, ARL5C, CACNB1 | RPL23 | LASP1 | CRE7 |
Minor alleles of the SNPs
Risk alleles of the SNPs
R2 of the corresponding SNPs with the tag SNPs
“+” indicates the SNP is overlapped with ATAC-seq peaks or histone peaks identified in the cell lines, and “−” indicates the SNP is not overlapped with ATAC-seq peaks or histone peaks identified in the cell lines; each symbol (“+” or “−”) represents the four cell lines in this order: C666-1, HK1 EBV+, HK1 and NP69
p values of the meta-analysis of NPC risk using six samples, including the reported meta-GWAS samples and the two replication samples (Guangxi and Jiangsu)
p values of the reported GWAS of NPC risk using four study samples; the corresponding tag SNPs for rs2518723 and rs2069418 are rs3218012 (p = 5.68 × 10−9) and rs6475604 (p = 1.89 × 10−9), respectively
In the locus 9q22.33, a total of seven candidate functional SNPs that were in complete LD (R2 = 1) with the tag SNP rs1867277 were identified in the gene regulatory region (CRE1) marked by an open chromatin state and histone modifications (Table 1; Figure 2). We mapped the candidate functional SNPs to the target genes by spatial physical proximity detected by PCHi-C loops and identified four candidate target genes (PHF2, NCBP1, TRMO, and XPA). We also mapped the functional SNPs to the linear closest genes (FOXE1 and PTCSC2) and to the eQTL genes from GTEx (FOXE1, TRMO, XPA, PTCSC2, and NANS; Table 1). In another locus, 17q12, one candidate functional SNP (rs1124927) that was in strong LD (R2 = 0.88) with the tag SNP rs226241 was identified in CRE7. The candidate target genes of rs1124927 captured by PCHi-C were STAC2, CWC25, ARL5C, and CACNB1 (Table 1; Figure S4). Mapping rs1124927 to genes by linear physical positions and eQTL data identified additional target genes (RPL23 and LASP1). In the two reported loci surpassing genome-wide significance, two potential functional SNPs in the 9p21.3 locus identified in two different CRE regions (CRE2-3) were assigned to target genes by PCHi-C interactions (CDKN2A-DT), by linear physical positions (CDKN2A, CDKN2B, and CDKN2B-AS1), and by eQTL data (CDKN2B-AS1; Table 1; Figure 3). In addition, four potential functional SNPs in the 5p15.33 locus were identified in three CRE regions and mapped to target genes by PCHi-C interactions (TERLR1, SLC6A18, TERT, and LPCAT1), by linear physical positions (CLPTM1L), and by eQTL data (TERT and CLPTM1L; Table 1; Figure S5). Because no candidate functional SNP surpassing 1.0 × 10−6 in 17q12 was identified, and the candidate SNPs in 5p15.33 such as rs11133729 and rs383009 were located in repetitive sequence, we chose 9q22.33 and 9p21.3 for further experimental study.
Figure 2.
Integrated functional annotation for the seven SNPs at 9q22.33 locus
(A) Chromatin interaction peaks identified by PCHi-C in C666-1 cells are shown in a looping format.
(B) A partial enlarged view of the 9q22.33 locus. The SNPs are shown as red vertical lines according to their chromosomal positions (x axis). The seven potential functional SNPs at 9q22.33 co-localize with ATAC-seq peaks, histone (H3K4me1, H3K4me3, and H3K27ac) ChIP-seq peaks, and chromatin interaction peaks (yellow shading). The histone ChIP-seq peaks are shown as log likelihood ratio (called by MACS2) after subtracting the input of each cell line.
Figure 3.
Integrated functional annotation for the two SNPs at 9p21.3 locus
The SNPs are shown as red vertical lines according to their chromosomal positions (x axis). The two potential functional SNPs, rs2518723 and rs2069418, at 9p21.3 co-localize with ATAC-seq peaks, histone (H3K4me1, H3K4me3, and H3K27ac) ChIP-seq peaks, and chromatin interaction peaks (yellow shading). Chromatin interaction peaks identified by PCHi-C in C666-1 cells are shown in a looping format. The histone ChIP-seq peaks are shown as log likelihood ratio (called by MACS2) after subtracting the input of each cell line.
A risk haplotype at 9q22.33 inhibits the activity of an enhancer for the tumor suppressor PHF2 and promotes NPC carcinogenesis
To further delineate the functional roles of the candidate SNPs at locus 9q22.33, we firstly evaluated the allele-specific effects of the functional SNPs on the transcriptional regulatory activity by using dual-luciferase assays in the C666-1 and HK1 EBV+ cell lines. Among the seven potential functional SNPs in CRE1, the risk alleles of five SNPs—rs13302470 (G), rs1867277 (G), rs1867278 (A), rs1443434 (T), and rs1443435 (C)—showed lower enhancer activity of the reporter gene. The five SNPs were in complete LD with each other (pairwise R2 = 1.00), and a lower transcription activity was also observed in the risk haplotype of rs13302470 (G)-rs1867277 (G)-rs1867278 (A)-rs1443434 (T)-rs1443435 (C) (Mut_GGATC) than in the protective haplotype (WT_AACGT) (Figure 4A). Taken together, CRE1 showed lower enhancer activity when harboring NPC risk SNPs or haplotype, and we next explored the seven target genes (PHF2, PTCSC2, FOXE1, XPA, TRMO, NANS, and NCBP1) regulated by CRE1 mapped by any of the following evidence: (1) spatial physical proximity genes detected by PCHi-C loops, (2) linear physical closest genes, (3) eQTL genes from GTEx (v.8).41 We applied CRISPRi of CRE1 using two sgRNAs in the C666-1 and HK1 EBV+ cell lines and measured the expressions of the seven candidate target genes. The expression levels of PHF2 and PTCSC2 were significantly inhibited by the two sgRNAs in both cell lines (Figure 4B). As the tag SNP rs1867277 was one of the five functional SNPs validated by dual-luciferase assays and was in complete LD with the other four SNPs, we examined whether the genotypes of the representative SNP rs1867277 in CRE1 were associated with the expression of PHF2 and PTCSC2 by eQTL analysis in 83 NPC tissues. We found significantly lower PHF2 expression in the samples carrying NPC risk genotypes (GG) than those carrying AG genotypes (Figure 4C). A similar trend was observed for PTCSC2 expression, but the difference was not statistically significant (Figure S6).
Figure 4.
Experimental validation of potential functional SNPs at 9q22.33 locus
(A) Luciferase activity of the DNA segments in the CRE1. The flanking sequence of the SNPs in CRE1 at 9q22.33 were cloned into the pGL3-promoter luciferase reporter vector and transfected into C666-1 and HK1 EBV+ cells. Five SNPs within CRE1 were mutated from the protective alleles to the risk alleles. Luciferase activity was normalized by pGL3 promoter (n = 5).
(B) Two different CRISPRi single-guide (sg) RNAs (sgC1-1 and sgC1-2) were used to target dCas9-KRAB to CRE1. The sgCon contains a non-targeting control sgRNA. The expression of genes was measured by qPCR. The data in (A) and (B) are presented as mean ± standard deviation (SD). Student’s t test was used for statistical analysis.
(C) eQTL analyses demonstrated the correlation between the genotypes of tag SNP rs1867277 and the expression of PHF2 in the 83 NPC tissues. The data are presented as mean and 95% confidence interval. Student’s t test was used for statistical analysis.
(D) Proliferation curves before (vector) or after overexpression of PHF2 (PHF2) in C666-1 and HK1 EBV+ cells. Two-way ANOVA was used for the statistical analysis of cell viability assays.
(E) Colony formation assay for C666-1 and HK1 EBV+ cells with PHF2 overexpression and control cells (vector). Student’s t test was used for statistical analysis.
(F) Proliferation curves before (si-NC) or after knockdown of PHF2 (si-PHF2-1 and si-PHF2-2) in C666-1 and HK1 EBV+ cells. Two-way ANOVA was used for the statistical analysis of cell viability assays.
(G) Colony formation assay for C666-1 and HK1 EBV+ cells with PHF2 knockdown (si-PHF2-1 and si-PHF2-2) and control cells (si-NC). Student’s t test was used for statistical analysis. The data in (D)–(G) are presented as mean ± SD. ∗∗∗ indicates p < 0.001, ∗∗ indicates p < 0.01, and ∗ indicates p < 0.05.
Since the candidate gene PHF2 was validated by CRISPRi and eQTL analysis, we investigated the potential role of PHF2 on NPC carcinogenesis. We overexpressed or knocked down PHF2 in C666-1 and HK1 EBV+ cell lines (Figure S7). Our results showed that the overexpression of PHF2 decreased cell proliferation and clonogenicity in both cell lines (Figures 4D and 4E), whereas knockdown of PHF2 promoted cell proliferation and clonogenicity (Figures 4F and 4G). Collectively, these results suggest that the NPC risk haplotype of the functional SNPs may reduce the expression of PHF2 by inhibiting the enhancer activity of CRE1, and PHF2 may act as a tumor suppressor, and its downregulation promotes the development of NPC.
The risk allele of rs2069418 at 9p21.3 promotes the activity of an enhancer for the oncogene CDKN2B-AS1 and increases NPC risk
To investigate the functional roles of the two candidate SNPs (rs2518723 and rs2069418) in another NPC susceptibility locus, 9p21.3, we first examined whether the corresponding CREs exhibit different transcriptional regulatory activity when harboring risk or protective alleles of each SNP. The results of dual-luciferase assays showed that, compared with the protective allele G, the NPC risk allele C of rs2069418 increased the enhancer activity of CRE3 in C666-1 and HK1 EBV+ cell lines (Figure 5A). However, no allelic effect on the CRE transcription activity was detected for rs2518723 in CRE2 (Figure S8). We then focused on CRE3 in the following study. Among the two candidate target genes mapped by PCHi-C interaction, linear physical position, and eQTL (CDKN2B and CDKN2B-AS1), CRISPRi silencing of CRE3 significantly inhibited the expression of CDKN2B-AS1 in both NPC cell lines, but no difference of expression was found for CDKN2B (Figure 5B). We next examined the association of rs2069418 in CRE3 with the expression of CDKN2B-AS1 in NPC tissues by eQTL analysis and observed higher CDKN2B-AS1 expression in individuals with the NPC risk genotype (CC) than in individuals with the heterozygous genotypes (GC) (Figure 5C). Hence, the risk allele of rs2069418 may promote the expression of CDKN2B-AS1 by increasing the enhancer activity of CRE3.
Figure 5.
Experimental validation of potential functional SNPs at 9p21.3 locus
(A) The flanking sequence of the SNP rs2069418 of CRE3 was cloned into the pGL3-promoter luciferase reporter vector. Plasmids with different genotypes of rs2069418 were generated by site-directed mutagenesis and transfected into C666-1 and HK1 EBV+ cell lines. Luciferase activity results were normalized by pGL3 promoter (n = 5).
(B) Two different CRISPRi single-guide (sg) RNAs (sgC3-1 and sgC3-2) were used to target dCas9-KRAB to CRE3 in C666-1 and HK1 EBV+ cells. The sgCon contains a non-targeting control sgRNA. The expression of genes was measured by qPCR. The data in (A) and (B) are presented as mean ± standard deviation (SD). Student’s t test was used for statistical analysis.
(C) eQTL analyses demonstrated the correlation between the genotypes of rs2069418 and the expression of CDKN2B-AS1 in the 83 NPC tissues. The data are presented as mean and 95% confidence interval. Student’s t test was used for statistical analysis.
(D) Proliferation curves before (vector) or after overexpression of CDKN2B-AS1 (CDKN2B-AS1) in C666-1 and HK1 EBV+ cells. Two-way ANOVA was used for the statistical analysis of cell viability assays.
(E) Colony formation assay for C666-1 and HK1 EBV+ cells with CDKN2B-AS1 overexpression and control cells (vector). Student’s t test was used for statistical analysis.
(F) Proliferation curves for C666-1 and HK1 EBV+ cells transfected with smart silencer of CDKN2B-AS1 (ss-CDKN2B-AS1) compared with their control groups (ss-NC). Two-way ANOVA was used for the statistical analysis of cell viability assays.
(G) Colony formation assay for C666-1 and HK1 EBV+ cells transfected with smart silencer of CDKN2B-AS1 (ss-CDKN2B-AS1) compared with their control groups (ss-NC). Student’s t test was used for statistical analysis. The data in (D)-(G) are presented as mean ± SD. ∗∗∗ indicates p < 0.001, ∗∗ indicates p < 0.01, and ∗ indicates p < 0.05.
Since the candidate gene CDKN2B-AS1 was validated by CRISPRi and eQTL analysis, we further explored the function of CDKN2B-AS1 in NPC development. We overexpressed or knocked down CDKN2B-AS1 in C666-1 and HK1 EBV+ cells (Figure S9) and investigated its influence on cell phenotypes. We found that the overexpression of CDKN2B-AS1 promoted cell proliferation and clonogenicity in both NPC cell lines (Figures 5D and 5E), whereas CDKN2B-AS1 knockdown had the opposite effects (Figures 5F and 5G). These results suggested that CDKN2B-AS1 may act as an oncogene in NPC development.
Discussion
In this study, we identified two additional susceptibility loci, 9q22.33 and 17q12, for NPC development. By high-throughput chromatin state and promoter interaction profiling as well as expression quantitative trait mapping in the NPC cell lines and tissues, we prioritized candidate functional SNPs in the two additional loci (9q22.33 and 17q12) as well as two previously reported loci (5p15.33 and 9p21.3) and mapped their target genes, followed by experimental confirmation. We nominated PHF2 as a susceptibility gene regulated by functional SNPs in 9q22.33 and provided supportive evidence that CDKN2B-AS1 was a susceptibility gene regulated by functional SNPs in 9p21.3, which can largely expand the current understanding of the NPC genetic etiology.
The 9q22.33 locus was identified as an additional susceptibility locus for NPC. The functional SNPs were overlapped with a CRE detected by chromatin state profiling; the risk alleles reduced the enhancer activity of this element and inhibited the expression of the target gene PHF2. This long-range regulation (4.3 Mb), detected by PCHi-C, was confirmed by CRISPRi experiments in NPC cell lines and eQTL in NPC tissues, which finally assigned the association signals of the functional SNPs to the NPC susceptibility gene PHF2. In this study, we found that PHF2 was a tumor suppressor in NPC and that knockdown of PHF2 promoted cell proliferation and clonogenicity. PHF2 encodes a histone demethylase and has been reported as a tumor suppressor in other cancers42,43 such as head and neck cancer,44 colon and stomach cancers,43 and acute lymphoblastic leukemia.45 Moreover, PHF2 could promote DNA repair by homologous recombination, and depletion of PHF2 leads to DNA damage and genomic instability.46,47 PHF2 has also been reported to promote p53-mediated apoptosis induced by anticancer drugs and therefore may enhance the DNA damage response in chemotherapy.43 These findings suggest a tumor suppressor role of PHF2 and emphasize the important role of the DNA damage repair pathway on NPC genetic etiology.48,49
The 9p21.3 locus encompasses the cyclin-dependent kinase inhibitor genes CDKN2A and CDKN2B and the noncoding RNA gene CDKN2B-AS1(ANRIL). This reported NPC susceptibility locus is a region that contributes to shared susceptibility across multiple cancer types.50,51,52 However, the genetic associations remain to be biologically interpreted, which includes the identification of functional SNPs among a group of LD-linked genetic association signals and illustrates their biological importance. We discovered that rs2069418 in the first intron of CDKN2B-AS1 (15 kb downstream of the TSS) was the functional SNP overlapping with a region of active chromatin state. The risk allele C increased the enhancer activity of the detected regulatory element. The eQTL analysis in multiple tissues in the GTEx database and expression confirmation in the NPC tumor tissues showed that the risk allele C was associated with higher CDKN2B-AS1 expression. Overexpression of CDKN2B-AS1 promoted cell proliferation and clonogenicity in NPC, suggesting its oncogenic role in NPC, consistent with previous studies.53,54 CDKN2B-AS1 was reported to modulate the expression of cyclin-dependent kinase inhibitor CDKN2B and CDKN2A and was discovered as an oncogenic lncRNA and prognostic biomarker in other cancer types.55,56 Furthermore, CDKN2B-AS1 has been reported to act as an oncogene by enhancing the NF-κB signaling pathway.57 The aberrant NF-κB activation through genomic aberrations along with EBV-encoded LMP1 expression was ubiquitously observed in NPC,58 but whether and how CDKN2B-AS1 is involved in this NPC tumorigenesis pathway warrants further investigation. Taken together, these findings suggest that rs2069418 may regulate the oncogene CDKN2B-AS1, which provides an additional interpretation for the association signals in this locus.
This study is an NPC post-GWAS functional study that applied multiple functional genomics approaches followed by experimental verification. The systematic profiling of CREs in NPC cell lines enables us to bridge the gap between large-scale genetic associations in epidemiologic studies and functional follow-up studies, providing a valuable resource for the etiological studies of NPC as well as other EBV-associated malignancies. In addition to the etiological explanation, these identified functional SNPs could also be used as predictive biomarkers integrated into the current polygenic risk model for the NPC risk prediction. However, our study had some limitations. For example, the power of eQTL analysis in NPC tissues and the power of testing the SNP effects on NPC risk in NPC non-endemic areas could be improved by increasing the sample size. Additional chromatin state and promoter contact interaction profiling could be performed in NPC tumor tissues to expand the current catalog of functional SNPs associated with NPC susceptibility. How EBV infection changes the histone modification profile and interacts with host susceptibility genes in NPC development should also be investigated. Lastly, further investigation is warranted to elucidate the molecular mechanism of the identified susceptibility genes PHF2 and CDKN2B-AS1 as well as the target genes in the other two susceptibility loci, 5p15.33 and 17q12, on NPC development.
In summary, by integrating epidemiological studies with a series of high-throughput functional profiling and functional experiments, we discovered and verified functional SNPs in two NPC susceptibility loci. We identified two susceptibility genes, PHF2 and CDKN2B-AS1, in NPC development. Our findings can provide deeper functional evidence to explain the populational association between the functional SNPs and NPC susceptibility.
Acknowledgments
This study was supported by the National Key Research and Development Program of China (2021YFC2500400); the Basic and Applied Basic Research Foundation of Guangdong Province, China (2021B1515420007); the National Key Research and Development Program of China (2020YFC1316902); the National Natural Science Foundation of China (82003520, 82273705, 81973131 and 81903395); and the National Institutes of Health (R01CA11587301).
Author contributions
W.-H.J.: designed and supervised research; T.-M.W., R.-W.X.: performed research, analyzed data, and wrote the paper; Y.-Q.H., W.-L.Z., H.D., M.T., Z.-M.M.: performed research and analyzed data; W.-Q.X., D.-W.Y., C.-M.D., Y.L., T.Z., D.-H.L., Y.-X.W., X.-Y.C., J.Z., X.-Z.L., P.-F.Z., X.-H.Z., S.-D.Z., Y.-Z.H., Y.C., Y. Zheng, Z.Z., Y. Zhou, G.J., J.B., H.-Q.M., Y.S., J.M., Z.H., J.L., M.L.L., H.-O.A., W.Y., T.-H.L., H.S.: performed research.
Declaration of interests
The authors declare no competing interests.
Published: June 22, 2023
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2023.06.003.
Web resources
bedtools (v.2.29.2), https://bedtools.readthedocs.io/en/latest/
Bowtie 2 (v.2.3.5.1), https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
CHiCAGO (v.1.18.0), https://www.bioconductor.org/packages/release/bioc/html/Chicago.html
Deeptools (v.3.3.0), https://github.com/deeptools/deepTools
ENCODE blacklist regions, https://github.com/Boyle-Lab/Blacklist/tree/master
GraphPad Prism, https://www.graphpad.com/
GTEx (v.8), https://gtexportal.org/home/
HiCUP (v.0.8.2), https://www.bioinformatics.babraham.ac.uk/projects/hicup/
MACS2 (v.2.2.7.1), https://hbctraining.github.io/Intro-to-ChIPseq/lessons/05_peak_calling_macs.html
META (v.1.7), https://mathgen.stats.ox.ac.uk/genetics_software/meta/meta.html
Picard(v.2.20), https://github.com/broadinstitute/picard/releases/tag/2.20.4
R (v.3.5.0), https://cran.r-project.org/
Trimmomatic (v.0.39), https://github.com/usadellab/Trimmomatic
WashU epigenome browser, http://epigenomegateway.wustl.edu/
Supplemental information
Data and code availability
The summary statistics of the four published NPC GWAS samples (EPI-NPC-2005 sample, NPCGEE sample, SYSUNPC sample, and Hong Kong sample) have been deposited in the NHGRI-EBI GWAS Catalog dataset (https://www.ebi.ac.uk/gwas/) with accession number GCST90093313. The raw data of the baseline sample information, genetic information, and the functional experiments of this work has been deposited in the Research Data Deposit public platform (www.researchdata.org.cn, accession number RDDB2023245550). All analyses were performed utilizing standard publicly available software. This study did not generate custom code.
References
- 1.Chen Y.-P., Chan A.T.C., Le Q.-T., Blanchard P., Sun Y., Ma J. Nasopharyngeal carcinoma. Lancet. 2019;394:64–80. doi: 10.1016/S0140-6736(19)30956-0. [DOI] [PubMed] [Google Scholar]
- 2.Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA. Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]
- 3.Ferlay J E.M., Lam F., Colombet M., Mery L., Piñeros M., Znaor A., Soerjomataram I., Bray F. In: Global Cancer Observatory: Cancer Today. Lyon F., editor. International Agency for Research on Cancer; 2020. [Google Scholar]
- 4.Tang L.L., Chen W.Q., Xue W.Q., He Y.Q., Zheng R.S., Zeng Y.X., Jia W.H. Global trends in incidence and mortality of nasopharyngeal carcinoma. Cancer Lett. 2016;374:22–30. doi: 10.1016/j.canlet.2016.01.040. [DOI] [PubMed] [Google Scholar]
- 5.Jia W.-H., Feng B.-J., Xu Z.-L., Zhang X.-S., Huang P., Huang L.-X., Yu X.-J., Feng Q.-S., Yao M.-H., Shugart Y.Y., Zeng Y.X. Familial risk and clustering of nasopharyngeal carcinoma in Guangdong, China. Cancer. 2004;101:363–369. doi: 10.1002/cncr.20372. [DOI] [PubMed] [Google Scholar]
- 6.Liu Z., Chang E.T., Liu Q., Cai Y., Zhang Z., Chen G., Huang Q.H., Xie S.H., Cao S.M., Shao J.Y., et al. Quantification of familial risk of nasopharyngeal carcinoma in a high-incidence area. Cancer. 2017;123:2716–2725. doi: 10.1002/cncr.30643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Huang S.-F., Hsiao J.-H., Young C.-K., Chien H.-T., Kuo C.-F., See L.-C., Luo S.-F., Huang L.-H., Liao C.-T., Chang T.-C.J. Familial aggregation of nasopharyngeal carcinoma in Taiwan. Oral Oncol. 2017;73:10–15. doi: 10.1016/j.oraloncology.2017.07.020. [DOI] [PubMed] [Google Scholar]
- 8.Tse K.P., Su W.H., Chang K.P., Tsang N.M., Yu C.J., Tang P., See L.C., Hsueh C., Yang M.L., Hao S.P., et al. Genome-wide association study reveals multiple nasopharyngeal carcinoma-associated loci within the HLA region at chromosome 6p21.3. Am. J. Hum. Genet. 2009;85:194–203. doi: 10.1016/j.ajhg.2009.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ng C.C., Yew P.Y., Puah S.M., Krishnan G., Yap L.F., Teo S.H., Lim P.V.H., Govindaraju S., Ratnavelu K., Sam C.K., et al. A genome-wide association study identifies ITGA9 conferring risk of nasopharyngeal carcinoma. J. Hum. Genet. 2009;54:392–397. doi: 10.1038/jhg.2009.49. [DOI] [PubMed] [Google Scholar]
- 10.Bei J.X., Li Y., Jia W.H., Feng B.J., Zhou G., Chen L.Z., Feng Q.S., Low H.Q., Zhang H., He F., et al. A genome-wide association study of nasopharyngeal carcinoma identifies three new susceptibility loci. Nat. Genet. 2010;42:599–603. doi: 10.1038/ng.601. [DOI] [PubMed] [Google Scholar]
- 11.Tang M., Lautenberger J.A., Gao X., Sezgin E., Hendrickson S.L., Troyer J.L., David V.A., Guan L., McIntosh C.E., Guo X., et al. The principal genetic determinants for nasopharyngeal carcinoma in China involve the HLA class I antigen recognition groove. PLoS Genet. 2012;8:e1003103. doi: 10.1371/journal.pgen.1003103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chin Y.M., Mushiroda T., Takahashi A., Kubo M., Krishnan G., Yap L.F., Teo S.H., Lim P.V.H., Yap Y.Y., Pua K.C., et al. HLA-A SNPs and amino acid variants are associated with nasopharyngeal carcinoma in Malaysian Chinese. Int. J. Cancer. 2015;136:678–687. doi: 10.1002/ijc.29035. [DOI] [PubMed] [Google Scholar]
- 13.Bei J.X., Su W.H., Ng C.C., Yu K., Chin Y.M., Lou P.J., Hsu W.L., McKay J.D., Chen C.J., Chang Y.S., et al. A GWAS Meta-analysis and Replication Study Identifies a Novel Locus within CLPTM1L/TERT Associated with Nasopharyngeal Carcinoma in Individuals of Chinese Ancestry. Cancer Epidemiol. Biomarkers Prev. 2016;25:188–192. doi: 10.1158/1055-9965.EPI-15-0144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Cui Q., Feng Q.-S., Mo H.-Y., Sun J., Xia Y.-F., Zhang H., Foo J.N., Guo Y.-M., Chen L.-Z., Li M., et al. An extended genome-wide association study identifies novel susceptibility loci for nasopharyngeal carcinoma. Hum. Mol. Genet. 2016;25:3626–3634. doi: 10.1093/hmg/ddw200. [DOI] [PubMed] [Google Scholar]
- 15.He Y.Q., Wang T.M., Ji M., Mai Z.M., Tang M., Wang R., Zhou Y., Zheng Y., Xiao R., Yang D., et al. A polygenic risk score for nasopharyngeal carcinoma shows potential for risk stratification and personalized screening. Nat. Commun. 2022;13:1966. doi: 10.1038/s41467-022-29570-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Sasaki M.M., Skol A.D., Bao R., Rhodes L.V., Chambers R., Vokes E.E., Cohen E.E.W., Onel K. Integrated genomic analysis suggests MLL3 is a novel candidate susceptibility gene for familial nasopharyngeal carcinoma. Cancer Epidemiol. Biomarkers Prev. 2015;24:1222–1228. doi: 10.1158/1055-9965.EPI-15-0275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dai W., Zheng H., Cheung A.K.L., Tang C.S.-m., Ko J.M.Y., Wong B.W.Y., Leong M.M.L., Sham P.C., Cheung F., Kwong D.L.-W., et al. Whole-exome sequencing identifies MST1R as a genetic susceptibility gene in nasopharyngeal carcinoma. Proc. Natl. Acad. Sci. USA. 2016;113:3317–3322. doi: 10.1073/pnas.1523436113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Yu G., Hsu W.L., Coghill A.E., Yu K.J., Wang C.P., Lou P.J., Liu Z., Jones K., Vogt A., Wang M., et al. Whole-Exome Sequencing of Nasopharyngeal Carcinoma Families Reveals Novel Variants Potentially Involved in Nasopharyngeal Carcinoma. Sci. Rep. 2019;9:9916. doi: 10.1038/s41598-019-46137-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wang T.M., He Y.Q., Xue W.Q., Zhang J.B., Xia Y.F., Deng C.M., Zhang W.L., Xiao R.W., Liao Y., Yang D.W., et al. Whole-exome sequencing study of familial nasopharyngeal carcinoma and its implication for identifying high-risk individuals. J Natl Cancer Inst. 2022;114:1689–1697. doi: 10.1093/jnci/djac177. [DOI] [PubMed] [Google Scholar]
- 20.Deng C., Lin Y.-X., Qi X.-K., He G.-P., Zhang Y., Zhang H.-J., Xu M., Feng Q.-S., Bei J.-X., Zeng Y.-X., Feng L. TNFRSF19 Inhibits TGFβ Signaling through Interaction with TGFβ Receptor Type I to Promote Tumorigenesis. Cancer Res. 2018;78:3469–3483. doi: 10.1158/0008-5472.CAN-17-3205. [DOI] [PubMed] [Google Scholar]
- 21.Xu F.-H., Xiong D., Xu Y.-F., Cao S.-M., Xue W.-Q., Qin H.-D., Liu W.-S., Cao J.-Y., Zhang Y., Feng Q.-S., et al. An epidemiological and molecular study of the relationship between smoking, risk of nasopharyngeal carcinoma, and Epstein-Barr virus activation. J. Natl. Cancer Inst. 2012;104:1396–1410. doi: 10.1093/jnci/djs320. [DOI] [PubMed] [Google Scholar]
- 22.Jia W.-H., Luo X.-Y., Feng B.-J., Ruan H.-L., Bei J.-X., Liu W.-S., Qin H.-D., Feng Q.-S., Chen L.-Z., Yao S.Y., Zeng Y.X. Traditional Cantonese diet and nasopharyngeal carcinoma risk: a large-scale case-control study in Guangdong, China. BMC Cancer. 2010;10:446. doi: 10.1186/1471-2407-10-446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ye W., Chang E.T., Liu Z., Liu Q., Cai Y., Zhang Z., Chen G., Huang Q.-H., Xie S.-H., Cao S.-M., et al. Development of a population-based cancer case-control study in southern china. Oncotarget. 2017;8:87073–87085. doi: 10.18632/oncotarget.19692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lv J.-W., Chen Y.-P., Huang X.-D., Zhou G.-Q., Chen L., Li W.-F., Tang L.-L., Mao Y.-P., Guo Y., Xu R.-H., et al. Hepatitis B virus screening and reactivation and management of patients with nasopharyngeal carcinoma: A large-scale, big-data intelligence platform-based analysis from an endemic area. Cancer. 2017;123:3540–3549. doi: 10.1002/cncr.30775. [DOI] [PubMed] [Google Scholar]
- 25.Dai J., Lv J., Zhu M., Wang Y., Qin N., Ma H., He Y.-Q., Zhang R., Tan W., Fan J., et al. Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations. Lancet Respir. Med. 2019;7:881–891. doi: 10.1016/S2213-2600(19)30144-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Mai Z.-M., Lin J.-H., Chiang S.-C., Ngan R.K.-C., Kwong D.L.-W., Ng W.-T., Ng A.W.-Y., Yuen K.-T., Ip K.-M., Chan Y.-H., et al. Test-retest reliability of a computer-assisted self-administered questionnaire on early life exposure in a nasopharyngeal carcinoma case-control study. Sci. Rep. 2018;8:7052. doi: 10.1038/s41598-018-25046-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mai Z.-M., Lin J.-H., Ngan R.K.-C., Kwong D.L.-W., Ng W.-T., Ng A.W.-Y., Yuen K.-T., Ip D.K.M., Chan Y.-H., Lee A.W.-M., et al. Milk Consumption Across Life Periods in Relation to Lower Risk of Nasopharyngeal Carcinoma: A Multicentre Case-Control Study. Front. Oncol. 2019;9:253. doi: 10.3389/fonc.2019.00253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Stelow E.B., Wenig B.M. Update From The 4th Edition of the World Health Organization Classification of Head and Neck Tumours: Nasopharynx. Head Neck Pathol. 2017;11:16–22. doi: 10.1007/s12105-017-0787-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Corces M.R., Trevino A.E., Hamilton E.G., Greenside P.G., Sinnott-Armstrong N.A., Vesuna S., Satpathy A.T., Rubin A.J., Montine K.S., Wu B., et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods. 2017;14:959–962. doi: 10.1038/nmeth.4396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Bolger A.M., Lohse M., Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Amemiya H.M., Kundaje A., Boyle A.P. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci. Rep. 2019;9:9354. doi: 10.1038/s41598-019-45839-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ramírez F., Ryan D.P., Grüning B., Bhardwaj V., Kilpert F., Richter A.S., Heyne S., Dündar F., Manke T. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–W165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zhang Y., Liu T., Meyer C.A., Eeckhoute J., Johnson D.S., Bernstein B.E., Nusbaum C., Myers R.M., Brown M., Li W., Liu X.S. Model-based analysis of ChIP-Seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Li D., Purushotham D., Harrison J.K., Hsu S., Zhuo X., Fan C., Liu S., Xu V., Chen S., Xu J., et al. WashU Epigenome Browser update 2022. Nucleic Acids Res. 2022;50:W774–W781. doi: 10.1093/nar/gkac238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Arrigoni L., Richter A.S., Betancourt E., Bruder K., Diehl S., Manke T., Bönisch U. Standardizing chromatin research: a simple and universal method for ChIP-seq. Nucleic Acids Res. 2016;44:e67. doi: 10.1093/nar/gkv1495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Orlando G., Kinnersley B., Houlston R.S. Capture Hi-C Library Generation and Analysis to Detect Chromatin Interactions. Curr. Protoc. Hum. Genet. 2018;98:e63. doi: 10.1002/cphg.63. [DOI] [PubMed] [Google Scholar]
- 38.Wingett S., Ewels P., Furlan-Magaril M., Nagano T., Schoenfelder S., Fraser P., Andrews S. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 2015;4:1310. doi: 10.12688/f1000research.7334.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Cairns J., Freire-Pritchett P., Wingett S.W., Várnai C., Dimond A., Plagnol V., Zerbino D., Schoenfelder S., Javierre B.-M., Osborne C., et al. CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol. 2016;17:127. doi: 10.1186/s13059-016-0992-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Quinlan A.R., Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Aguet F., Anand S., Ardlie K.G., Gabriel S., Getz G.A., Graubert A., Hadley K., Handsaker R.E., Huang K.H., Kashin S., et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318–1330. doi: 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hasenpusch-Theil K., Chadwick B.P., Theil T., Heath S.K., Wilkinson D.G., Frischauf A.M. PHF2, a novel PHD finger gene located on human chromosome 9q22. Mamm. Genome. 1999;10:294–298. doi: 10.1007/s003359900989. [DOI] [PubMed] [Google Scholar]
- 43.Lee K.H., Park J.W., Sung H.S., Choi Y.J., Kim W.H., Lee H.S., Chung H.J., Shin H.W., Cho C.H., Kim T.Y., et al. PHF2 histone demethylase acts as a tumor suppressor in association with p53 in cancer. Oncogene. 2015;34:2897–2909. doi: 10.1038/onc.2014.219. [DOI] [PubMed] [Google Scholar]
- 44.Ghosh A., Ghosh S., Maiti G.P., Mukherjee S., Mukherjee N., Chakraborty J., Roy A., Roychoudhury S., Panda C.K. Association of FANCC and PTCH1 with the development of early dysplastic lesions of the head and neck. Ann. Surg Oncol. 2012;19(Suppl 3):S528–S538. doi: 10.1245/s10434-011-1991-x. [DOI] [PubMed] [Google Scholar]
- 45.Ge Z., Gu Y., Han Q., Sloane J., Ge Q., Gao G., Ma J., Song H., Hu J., Chen B., et al. Plant homeodomain finger protein 2 as a novel IKAROS target in acute lymphoblastic leukemia. Epigenomics. 2018;10:59–69. doi: 10.2217/epi-2017-0092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Alonso-de Vega I., Paz-Cabrera M.C., Rother M.B., Wiegant W.W., Checa-Rodríguez C., Hernández-Fernaud J.R., Huertas P., Freire R., van Attikum H., Smits V.A.J. PHF2 regulates homology-directed DNA repair by controlling the resection of DNA double strand breaks. Nucleic Acids Res. 2020;48:4915–4927. doi: 10.1093/nar/gkaa196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Pappa S., Padilla N., Iacobucci S., Vicioso M., Álvarez de la Campa E., Navarro C., Marcos E., de la Cruz X., Martínez-Balbás M.A. PHF2 histone demethylase prevents DNA damage and genome instability by controlling cell cycle progression of neural progenitors. Proc. Natl. Acad. Sci. USA. 2019;116:19464–19473. doi: 10.1073/pnas.1903188116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Qin H.D., Shugart Y.Y., Bei J.X., Pan Q.H., Chen L., Feng Q.S., Chen L.Z., Huang W., Liu J.J., Jorgensen T.J., et al. Comprehensive pathway-based association study of DNA repair gene variants and the risk of nasopharyngeal carcinoma. Cancer Res. 2011;71:3000–3008. doi: 10.1158/0008-5472.CAN-10-0469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Lung M.L., Cheung A.K.L., Ko J.M.Y., Lung H.L., Cheng Y., Dai W. The interplay of host genetic factors and Epstein-Barr virus in the development of nasopharyngeal carcinoma. Chin. J. Cancer. 2014;33:556–568. doi: 10.5732/cjc.014.10170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Li W.-Q., Pfeiffer R.M., Hyland P.L., Shi J., Gu F., Wang Z., Bhattacharjee S., Luo J., Xiong X., Yeager M., et al. Genetic polymorphisms in the 9p21 region associated with risk of multiple cancers. Carcinogenesis. 2014;35:2698–2705. doi: 10.1093/carcin/bgu203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Fehringer G., Kraft P., Pharoah P.D., Eeles R.A., Chatterjee N., Schumacher F.R., Schildkraut J.M., Lindström S., Brennan P., Bickeböller H., et al. Cross-Cancer Genome-Wide Analysis of Lung, Ovary, Breast, Prostate, and Colorectal Cancer Reveals Novel Pleiotropic Associations. Cancer Res. 2016;76:5103–5114. doi: 10.1158/0008-5472.CAN-15-2980. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Gu F., Pfeiffer R.M., Bhattacharjee S., Han S.S., Taylor P.R., Berndt S., Yang H., Sigurdson A.J., Toro J., Mirabello L., et al. Common genetic variants in the 9p21 region and their associations with multiple tumours. Br. J. Cancer. 2013;108:1378–1386. doi: 10.1038/bjc.2013.7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Li Z., Cai X., Zou W., Zhang J. CDKN2B-AS1 promotes the proliferation, clone formation, and invasion of nasopharyngeal carcinoma cells by regulating miR-98-5p/E2F2 axis. Am. J. Transl. Res. 2021;13:13406–13422. [PMC free article] [PubMed] [Google Scholar]
- 54.Hu X., Jiang H., Jiang X. Downregulation of lncRNA ANRIL inhibits proliferation, induces apoptosis, and enhances radiosensitivity in nasopharyngeal carcinoma cells through regulating miR-125a. Cancer Biol. Ther. 2017;18:331–338. doi: 10.1080/15384047.2017.1310348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Yu W., Gius D., Onyango P., Muldoon-Jacobs K., Karp J., Feinberg A.P., Cui H. Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA. Nature. 2008;451:202–206. doi: 10.1038/nature06468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ma W., Qiao J., Zhou J., Gu L., Deng D. Characterization of novel LncRNA P14AS as a protector of ANRIL through AUF1 binding in human cells. Mol. Cancer. 2020;19:42. doi: 10.1186/s12943-020-01150-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Zhou X., Han X., Wittfeldt A., Sun J., Liu C., Wang X., Gan L.-M., Cao H., Liang Z. Long non-coding RNA ANRIL regulates inflammatory responses as a novel component of NF-κB pathway. RNA Biol. 2016;13:98–108. doi: 10.1080/15476286.2015.1122164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Bruce J.P., To K.-F., Lui V.W.Y., Chung G.T.Y., Chan Y.-Y., Tsang C.M., Yip K.Y., Ma B.B.Y., Woo J.K.S., Hui E.P., et al. Whole-genome profiling of nasopharyngeal carcinoma reveals viral-host co-operation in inflammatory NF-κB activation and immune escape. Nat. Commun. 2021;12:4193. doi: 10.1038/s41467-021-24348-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The summary statistics of the four published NPC GWAS samples (EPI-NPC-2005 sample, NPCGEE sample, SYSUNPC sample, and Hong Kong sample) have been deposited in the NHGRI-EBI GWAS Catalog dataset (https://www.ebi.ac.uk/gwas/) with accession number GCST90093313. The raw data of the baseline sample information, genetic information, and the functional experiments of this work has been deposited in the Research Data Deposit public platform (www.researchdata.org.cn, accession number RDDB2023245550). All analyses were performed utilizing standard publicly available software. This study did not generate custom code.





