Skip to main content
Medicine logoLink to Medicine
. 2016 Dec 30;95(52):e5793. doi: 10.1097/MD.0000000000005793

Strong cis-acting expression quantitative trait loci for the genes encoding SNHG5 and PEX6

Jihyeon Lee 1, Jihye Ryu 1, Chaeyoung Lee 1,
Editor: Xiaolin Zhu1
PMCID: PMC5207599  PMID: 28033303

Supplemental Digital Content is available in the text

Keywords: chromosome 6, eQTL, gene expression, PEX6, SNHG5, SNP

Abstract

Expression of quantitative trait loci (eQTLs) for the genes located in human chromosome 6 were examined. Data on RNA expression in lymphoblastoid cells of 373 unrelated Europeans were used to identify eQTLs.

Genome-wide analysis resulted in 24,447 nucleotide variants associated with gene expression (P < 2.16 × 10−10). We found 36variants with P < 10−100, which were all associated with expression levels of the genes encoding small nucleolar RNA host gene 5 (SNHG5) and peroxisomal biogenesis factor 6 (PEX6). Enhancer eQTLs downstream of theSNHG5 gene might be candidate genetic factors for susceptibility to cancer. This is because nucleotide substitutions (eg, G→T at rs6922) of the enhancer eQTLs may cause low expression of SNHG5 gene, and low expression of snoRNA U50, a product generated from introns of the SNHG5gene, can induce cancer. One presently identified eQTL for the PEX6 gene was rs10948059, which had been associated with prostate cancer from previous association studies. The results imply that variants associated with prostate cancer can be identified through expressional change in the PEX6 gene, but not in the overlapped glycine N-methyltransferase gene which had been considered as a candidate gene.

Further studies are required to understand their underlying mechanisms for the strong eQTLs for the SNHG5 and PEX6 genes.

1. Introduction

As of 2003, more than 170 million base pairs had been completely sequenced for human chromosome 6.[1] Genome-wide association studies (GWAS) have unraveled a large number of single-nucleotide variants associated with human traits. According to the GWAS Catalog, 32,768 genetic associations were reported across hundreds of diseases and quantitative traits as of October 2016 (P < 10−5; https://www.ebi.ac.uk/gwas). The GWAS signals were more frequently found in chromosome 6 than in any other chromosomes, even in chromosome 1, which is 1.5 times longer than chromosome 6. The chromosome 6 has medical significance, with about 120 causal genes reported for complex diseases, such as cancers, heart disease, diabetes, Alzheimer disease, rheumatoid arthritis, and multiple sclerosis.[24] In particular, major histocompatibility complex and related regions including the human leukocyte antigen (HLA) genes in chromosome 6 are critically related to infection, immunity, and inflammation.[2] Many nucleotide variants turned out to be important for susceptibility to complex diseases from GWAS. One example is rs9378815, an intergenic single-nucleotide polymorphism (SNP) on chromosome 6, which has been identified as a rheumatoid arthritis risk locus from a trans-ethnic GWAS meta-analysis using >100,000 subjects (P = 1.7 × 10−10).[5] Another example is rs2046210, an SNP upstream of the gene encoding estrogen receptor alpha on chromosome 6, which was associated with susceptibility to breast cancer in a study involving over 31,000 subjects (eg, P = 1.5 × 10−30 in Chinese women).[6] Nevertheless, their underlying mechanisms are unknown. Identifying nucleotide variants that influence RNA transcript expression levels is critical to understand underlying mechanisms of GWAS signals. The objective of the current study was to identify expression quantitative trait loci (eQTLs) for genes located in chromosome 6, which might have abundant potential regulatory sequences.

2. Methods

The study used RNA expression data for lymphoblastoid cells of 376 unrelated individuals from European populations of CEPH (CEU), Finns (FIN), British (GBR), and Toscani (TSI) to identify regulatory variants for expression of the 573 genes in chromosome 6, which were obtained from the study by Lappalainen et al.[7] The genes included protein-coding genes and long intergenic noncoding RNA genes. The expression level was calculated as the sum of reads per kilobase of transcript per million mapped reads values of all the transcripts of each gene for each individual. A read was counted in an exon if either start or end coordinate of the read was in the exon. To avoid overestimating gene expressions, the count was divided by the number of overlapping exons among split reads. Their corresponding genotypes were obtained from the 1000 Genomes project (phase I; http://phase1browser.1000genomes.org/) that originally conducted whole genome sequencing with an average depth of 5×, and whole exome sequencing with an average depth of 80× using 1092 individuals from 14 populations of Europeans, East Asians, Saharan Africans, and Americans. A total of 404,240 SNPs on chromosome 6 among 5,941,815 genome-wide SNPs were included in the current analysis after a filtration process (excluding SNPs with minor allele frequency [MAF] <0.05). Three individuals with missing rate >0.05 of the genotypes were excluded from the analysis. Ethical approval was not necessary because we dealt with publically available data.

Genetic associations of SNPs with RNA expression were conducted to determine the significance of regression coefficient for minor allele effect. We employed the following mixed model incorporating a genomic relationship matrix (GRM) that reflected polygenic covariance among individuals to avoid population stratification[8]:

2.

where y is the vector of gene expression levels, μ is the overall mean, 1 is the vector of 1 s, β is the fixed effect for minor allele of the SNP to be tested for association, and x is the vector of the numbers of minor alleles of the SNP, for example, 0(0, 0) for homozygote of the major allele, 1 (1, 0) for heterozygote, and 2 (1, 1)for homozygote of the minor allele in additive (dominant, recessive) model. Vector g is the vector of random polygenic effects with 

2.

, where A is GRM estimated using SNPs, and 

2.

is the polygenic variance component. Elements of A consist of the following pair-wise genomic relationship coefficients estimated using genotypes of SNPs in linkage equilibrium (r2 > 0.8): 

2.

where ajk is the genomic relationship coefficient between jth and kth individuals, NS is the number of SNPs, xij (xik) is the number (0, 1, or 2) of the minor allele at the ith SNP for the jth (kth) individual, pi is the frequency of the minor allele at the ith SNP. ε is the vector of random residuals with 

2.

, where 

2.

is the residual variance component and I is the identity matrix. All the association analyses were conducted using Genome-wide Complex Trait Analysis v1.24 freeware.[9]

Multiple testing was applied employing Bonferroni correction. A total of 231,629,520 association tests for 573 genes and 404,240 SNP markers resulted in the significance threshold value of 2.16 × 10−10. The association tests were analyzed using PLINK ver.1.04 (Free Software Foundation, Inc. Boston, MA).

Linkage disequilibrium (LD) blocks were constructed based on the algorithm of Gabriel et al[10] (Haploview version 4.2). The algorithm used a 95% confidence interval (CI) of pair-wise D’ estimate between SNPs with MAF >0.05.

Promoters/enhancers including the significant SNPs were examined using genome-wide chromatin interaction analysis with paired-end-tag sequencing (ChIA-PET) data, which resulted from the study by Li et al.[11] The paired-end-tag (PET) sequences were mapped with the human reference genome of hg19. High-confidence intrachromosomal PET clusters with PFDR <0.05were included with genomic span of 8 Kb to 1 Mb and PET count >3 for each PET cluster from the saturated libraries. Interaction anchors flanking transcription start site ±5 Kb were defined as promoters.

Enrichment analysis was conducted to analyze associations of significant SNPs with diseases and biological processes using genomic regions enrichment of annotations tool.[12] Diseases and biological processes based on the Gene Ontology (http://www.geneontology.ogr/) were used as input annotation terms. Each gene involved in input annotation terms was assigned to a regulatory domain that extends up to 1 Mb in both directions. Significance of enrichment was determined by a hypergeometric test over genes. The significance threshold was adjusted by false discovery rate (FDR) to control false-positives.

Other enrichment analyses were also performed to determine enrichment of cis versus trans regulation, promoter/enhancer versus nonpromoter/nonenhancer regions, and downstream versus upstream regions. Their significances were all determined with P < 0.01 by the Fisher exact test.

3. Results

The eQTL analysis revealed that 24,447 SNPs were associated with expression of the 88 genes located on chromosome 6 (P < 2.16 × 10−10). Most were identified using the additive model (24,213 SNPs). They were enriched (PFDR < 0.05, where PFDR is P value adjusted by FDR) with genes involved in a variety of diseases (Supplementary Table 1) and biological processes (Supplementary Table 2).The SNPs were enriched (P < 0.01) in promoter and enhancer regions resulting from a ChIA-PET study.[11] There were 18,872 cis-eQTLs within 1 Mb from target genes and 5575trans-eQTLs >1 Mb away from target genes, and they were largely cis-regulatory eQTLs (77% = 18,872/24,447) by the enrichment analysis (P < 0.01). The cis-eQTLs appeared considerably clustered with 68% (12,886/18,872) located within 100 Kb. Also, 4960 SNPs were associated with expression of multiple genes, which resulted in 18,378 unique SNPs. In particular, 36 SNPs were observed with extremely low P value (P < 10−100; Table 1). All 36 SNPs influenced RNA expression of the genes encoding small nucleolar RNA host gene 5 (SNHG5) and peroxisomal biogenesis factor 6 (PEX6). The SNPs were all located in or near the corresponding gene. On chromosome 6, the SNHG5 gene is located from 85,676,990 to 85,678,736 bp (GRCh38/hg38), and the PEX6 gene is located from 42,964,335 to 42,979,150 bp (GRCh38/hg38). Considerable numbers of SNPs flanking the genes were associated with their expression (P < 2.16 × 10−10; Fig. 1). Most of the strongly significant SNPs with P < 10−100 were found downstream of the SNHG5 gene and within the PEX6 gene. The SNPs were observed in strong linkage (r2 > 0.8), forming LD blocks (Fig. 2).

Table 1.

Genetic associations of nucleotide variants with mRNA expression of genes located in chromosome 6.

3.

Figure 1.

Figure 1

Genetic associations of nucleotide variants with mRNA expression of SNHG5 (A) and PEX6 (B).The blue line indicates the significance threshold with P < 2.16 × 10−10, and the red line indicates the significance threshold with P < 10−100. The green bar and the black arrow indicate region and direction of each gene, respectively.

Figure 2.

Figure 2

Linkage disequilibrium blocks for strongly significant SNPs (P < 10−100) in and around the SNHG5 (A) and PEX6 (B) genes.

4. Discussion

The current study identified 24,447 SNPs associated with RNA expression of genes located in chromosome 6, including cis-eQTLs (77%) that may affect transcription initiation complex on the same physical chromosome. The cis-eQTLs tended to be considerably clustered with 68% located within 100 Kb from their corresponding target gene, which concurred with the previous study where 80% of cis-eQTLs was identified within 100 Kb using 9 types of tissues.[13]

The current study showed the pleiotropy that 4960 SNPs were associated with expression of multiple genes. This implied direct or indirect functional relationships among genes simultaneously regulated by eQTLs. For example, rs2020202 was associated with expression of 9 histone genes (HIST1B, HIST1E, HIST2AH, HIST2AJ, HIST2BM, HIST3B, HIST3I, HIST3J, and HIST4B), 26 SNPs were associated with expression of 3 HLA genes (HLA-DQB2, HLA-DRB6, and HLA-DRB1), and 9 SNPs were associated with expression of 3 serpin genes (SERPINB1, SERPINB6, and SERPINB9).

In particular, 36 SNPs showed strong associations with P < 10−100, and they were all associated with expression of the gene encoding SNHG5 or PEX6. All the SNPs were located flanking the corresponding gene within 200 Kb. An enrichment analysis showed that the SNPs strongly associated with expression of the SNHG5 were enriched downstream of the gene (P < 0.01). However, SNPs with P < 2.16 × 10−10 were enriched upstream of the gene (P < 0.01).The difference in the degree of significance implied that downstream and upstream signals may act as regulatory sequences with different importance. The association signal of rs6922 (P = 1.26 × 10−54) was located far from the target gene (SNHG5; 181,402 bp apart), and thus it was suspected as an enhancer variant. This speculation led us to compare it with the sequences resulting from a ChIA-PET study.[11] As a result, we found that 1 anchor of an interacting PET cluster included rs6922,and the other anchor was the promoter of the SNHG5gene. The current studyalso showed that nucleotide substitution from G to T at rs6922 decreased the expression of SNHG5 (beta = −53.7). The SNPs with P < 2.16 × 10−10 were enriched in the promoter and enhancer regions of the SNHG5 gene (P < 0.01; 7 SNPs in the promoter and 23 SNPs in the enhancer regions).

Nucleotide substitution of such variants might influence the expression of SNHG5 gene and is candidate genetic factor for susceptibility to cancer. This is supported by previous studies in which a variety of cancers were induced by low expression of snoRNAU50, a product generated from introns 4 and 5 of the SNHG5 gene. Expression of U50 is down-regulated in prostate and breast cancer samples.[14,15] These studies also showed that the expression of U50 can inhibit colony formation of prostate and breast cancer cells. SNHG5 gene is located at the chromosomal translocation breakpoint involved in B-cell lymphoma (provided by RefSeq, July 2008). It was recently reported that the serum levels of SNHG5 were up-regulated in patients with malignant melanoma, and that SNHG5 may function in melanomagenesis or melanoma metastasis.[16] Further studies are needed to understand the underlying mechanisms of nucleotide substitutions on cancers and their subtypes.

The current study revealed many intragenic eQTLs for the PEX6 gene. The gene encodes a member of the ATPases associated with diverse cellular activities. In particular, the PEX6 plays a key role in biogenesis of peroxisome, which is a small eukaryotic organelle specialized to perform oxidative reactions. Its mutations cause peroxisome biogenesis disorders of complementation groups 4 and 6.[17] This suggests that the eQTLs detected from the current study might influence on susceptibility to the peroxisome biogenesis disorders by abnormal expression of PEX6.

Two of the eQTLs for PEX6 had been identified for association with prostate cancer from 2 previous studies which revealed rs10948059 and rs9462856 linked to it in European populations, respectively.[18,19] These findings were interpreted that the genetic associations with prostate cancer were produced by regulating the expression of the glycine N-methyltransferase (GNMT) gene that overlaps with the PEX6 gene. The authors suspected the GNMT gene because the rs10948059 and rs9462856 were their intronic variants. Since the nucleotide variants were revealed as a strong eQTL for PEX6 gene from the current study (P = 6.98 × 10−108), the variants associated with prostate cancer might be identified through expressional change in PEX6. This was supported by the results of the Human Protein Atlas in which PEX6 is down-regulated in prostate cancer cells (http://www.proteinatlas.org/ENSG00000124587-PEX6/cancer).

We found that the strong eQTLs (P < 10−100) using additive model (“AA” vs “AB” vs “BB,” where A is major allele and B is minor allele[20]) were all significant using dominance (“AA + AB” vs “BB”) and recessive (“AA” vs “AB + BB”) models (P < 2.16 × 10−10). However, they were not as strong as observed using the additive model, and all showed P > 10−100. The best fit into the additive model implied that the expressional regulation of eQTLs was correlated with the number of minor alleles.

The current study also identified eQTLs with 10−100 < P < 2.16 × 10−10. In particular, some eQTLs for PEX6 (rs9471950, rs9471951, rs7754294, rs13203402, rs7740252, rs7758576, and rs7759112; 5.63 × 10−12 < P < 1.91 × 10−10) turned out to be enhancer variants of the serum response factor (SRF) gene, which resulted from chromatin interaction analysis with paired-end-tag using RNA polymerase II.[11] The regulatory variants could be plausible because SRF is a well-known transcript factor for the PEX6 gene.

In this study, we employed a mixed model incorporating pair-wise polygenic relationship among individuals to control population stratification, which may reduce false-positive eQTLs.[8] The analysis revealed strong cis-acting eQTLs for SNHG5 and PEX6, and offered insights about their cellular mechanism of transcriptome variation. However, this study was limited to gene expression in lymphoblastoid cells. Other strong eQTLs would be identified with different cell types. In addition, this study had another limitation in understanding specific functions of eQTLs in gene expression (eg, splicing regulation and transcript stability).

In conclusion, the current study revealed novel eQTLs for SNHG5 and PEX6 genes in chromosome 6.Nucleotide substitutions of the eQTLs might be candidate factors for a variety of cancers by regulating expression of the 2 genes. Further studies are needed to understand their underlying mechanisms.

Supplementary Material

Supplemental Digital Content
medi-95-e5793-s001.docx (21.3KB, docx)

Footnotes

Abbreviations: ChIA-PET = chromatin interaction analysis with paired-end-tag sequencing, eQTL = expression quantitative trait locus, GRM = genomic relationship matrix, GWAS = genome-wide association study, HLA = human leukocyte antigen, LD = linkage disequilibrium, MAF = minor allele frequency, RPKM = reads per kilobase per million mapped reads, SNP = single-nucleotide polymorphism.

Funding: This work was funded by the National Research Foundation of Korea, the Ministry of Education, Science, and Technology (Grant No. NRF-2012M3A9D1054705).

The authors have no funding and conflicts of interest to disclose.

Supplemental Digital Content is available for this article.

References

  • [1].Mungall AJ, Palmer SA, Sims SK, et al. The DNA sequence and analysis of human chromosome 6. Nature 2003;425:805–11. [DOI] [PubMed] [Google Scholar]
  • [2].Borchers CH, Kast J, Foster LJ, et al. The Human Proteome Organization Chromosome 6 Consortium: integrating chromosome-centric and biology/disease driven strategies. J Proteomics 2014;100:60–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Flicek P, Amode MR, Barrell D, et al. Ensembl 2011. Nucl Acids Res 2011;39(Database issue):D800–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Communications LHNCfB: conditions related to genes on chromosome 6. Lister Hill National Center for Biomedical Communications; 2012. [Google Scholar]
  • [5].Okada Y, Wu Di, Trynka G, et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 2014;506:376–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Cai Q, Wen W, Qu S, et al. Replication and functional genomic analyses of the breast cancer susceptibility locus at 6q25.1 generalize its importance in women of Chinese, Japanese, and European ancestry. Cancer Res 2011;71:1344–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Lappalainen T, Sammeth M, Friedländer MR, et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 2013;501:506–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Shin J, Lee C. Statistical power for identifying nucleotide markers associated withquantitative traits in genome-wide association analysis using a mixed model. Genomics 2013;105:1–4. [DOI] [PubMed] [Google Scholar]
  • [9].Yang J, Lee SH, Goddard ME, et al. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 2011;88:76–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Gabriel SB, Schaffner SF, Nguyen H, et al. The structure of haplotype blocks in the human genome. Science 2002;296:2225–9. [DOI] [PubMed] [Google Scholar]
  • [11].Li G, Ruan X, Auerbach RK, et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 2012;148:84–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].McLean CY, Bristor D, Hiller M, et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 2010;28:495–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].The GTEx Consortium The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 2015;348:648–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Dong XY, Rodriguez C, Guo P, et al. SnoRNA U50 is a candidate tumor suppressor gene at 6q14.3 with a mutation associated with clinically significant prostate cancer. Hum Mol Genet 2008;17:1031–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Dong XY, Guo P, Boyd J, et al. Implication of snoRNA U50 in human breast cancer. J Genet Genomics 2009;36:447–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Ichigozaki Y, Fukushima S, Jinnin M, et al. Serum long non-coding RNA, snoRNA host gene 5 level as a new tumor marker of malignant melanoma. Exp Dermatol 2016;25:67–9. [DOI] [PubMed] [Google Scholar]
  • [17].Matsumoto N, Tamura S, Moser A, et al. The peroxin Pex6p gene is impaired in peroxisomal biogenesis disorders of complementation group 6. J Hum Genet 2001;46:273–7. [DOI] [PubMed] [Google Scholar]
  • [18].Chen M, Huang YL, Huang YC, et al. Genetic polymorphisms of the glycine-N-methyl transferase and prostate cancer risk in the health professionals follow-up study. PLoS One 2014;9:e94683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Ianni M, Porcellini E, Carbone I, et al. Genetic factor regulating inflammation and DNA methylation associated with prostate cancer. Prostate Cancer Prostatic Dis 2013;16:56–61. [DOI] [PubMed] [Google Scholar]
  • [20].Zhao F, Song M, Wang Y, et al. Genetic model. J Cell Mol Med 2016;20:765. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Digital Content
medi-95-e5793-s001.docx (21.3KB, docx)

Articles from Medicine are provided here courtesy of Wolters Kluwer Health

RESOURCES