SNP2APA: a database for evaluating effects of genetic variants on alternative polyadenylation in human cancers

Yanbo Yang; Qiong Zhang; Ya-Ru Miao; Jiajun Yang; Wenqian Yang; Fangda Yu; Dongyang Wang; An-Yuan Guo; Jing Gong

doi:10.1093/nar/gkz793

. 2019 Sep 12;48(D1):D226–D232. doi: 10.1093/nar/gkz793

SNP2APA: a database for evaluating effects of genetic variants on alternative polyadenylation in human cancers

Yanbo Yang ^1,³, Qiong Zhang ^2,³, Ya-Ru Miao ², Jiajun Yang ¹, Wenqian Yang ¹, Fangda Yu ¹, Dongyang Wang ¹, An-Yuan Guo ^2,^✉, Jing Gong ^1,^3,^✉

PMCID: PMC6943033 PMID: 31511885

Abstract

Alternative polyadenylation (APA) is an important post-transcriptional regulation that recognizes different polyadenylation signals (PASs), resulting in transcripts with different 3′ untranslated regions, thereby influencing a series of biological processes and functions. Recent studies have revealed that some single nucleotide polymorphisms (SNPs) could contribute to tumorigenesis and development through dysregulating APA. However, the associations between SNPs and APA in human cancers remain largely unknown. Here, using genotype and APA data of 9082 samples from The Cancer Genome Atlas (TCGA) and The Cancer 3′UTR Altas (TC3A), we systematically identified SNPs affecting APA events across 32 cancer types and defined them as APA quantitative trait loci (apaQTLs). As a result, a total of 467 942 cis-apaQTLs and 30 721 trans-apaQTLs were identified. By integrating apaQTLs with survival and genome-wide association studies (GWAS) data, we further identified 2154 apaQTLs associated with patient survival time and 151 342 apaQTLs located in GWAS loci. In addition, we designed an online tool to predict the effects of SNPs on PASs by utilizing PAS motif prediction tool. Finally, we developed SNP2APA, a user-friendly and intuitive database (http://gong_lab.hzau.edu.cn/SNP2APA/) for data browsing, searching, and downloading. SNP2APA will significantly improve our understanding of genetic variants and APA in human cancers.

INTRODUCTION

Alternative polyadenylation (APA) is a widespread phenomenon that generates transcript isoforms with different lengths of 3′ untranslated regions (3′UTR) by recognizing different polyadenylation signals (PASs) (1). More than 70% of human genes have multiple polyadenylation sites (2). As a common post-transcriptional modification mechanism, APA events may cause the alteration of important regulatory elements, such as miRNA binding sites and RNA protein binding sites, thus impacting the stability, localization and translation rate of mRNAs (3). APA modulation has been investigated in cells, tissues and different diseases. Previous studies have shown that APA often functions in a tissue- or cell-specific manner (4,5), and several APA dysregulations have been identified in human diseases (6–9), including cancers (10). A significant global 3′UTR shortening has been found in cancer cell lines and tumor samples, compared with normal samples (11). Another study pointed out that shortening or lengthening of 3′UTR might lead to a worse prognosis in some cancers. For example, kidney cancer samples with the shorter isoforms TMCO7 and PLXDC2 were found to have lower survival rates (12). However, research on the APA role and APA regulation in cancer is still at an early stage.

As the most common genetic variant, single nucleotide polymorphisms (SNPs) are major contributors to the differences in human disease susceptibility (13). Genome-wide association studies (GWAS) have identified thousands of SNPs associated with complex traits and diseases. Currently, most studies of the disease/trait-related SNPs remain at statistical level, and the biological mechanism underlying them is still largely unknown (14). Quantitative trait locus (QTL) mapping, such as eQTL and meQTL analysis, is a method used to evaluate the effects of genetic variants on intermediate molecular phenotypes, and has been demonstrated as a powerful tool to decipher the function of SNPs and prioritize genetic variants within GWAS loci (15–19). Recent studies have confirmed the associations between several APA quantitative trait loci (apaQTLs) and cancer. For example, the presence of a SNP in a canonical PAS within TP53 (AATAAA to AATACA) has been found to be highly associated with the processing of the impaired 3′ end of TP53 transcripts and increase the susceptibility to cancers including cutaneous basal cell carcinoma, prostate cancer, glioma and colorectal adenoma (20). However, large-scale genome-wide analyses of apaQTL have rarely been reported, and no database for apaQTLs in cancer is available. Recently, Feng et al. have used Percentage of Distal polyA site Usage Index (PDUI) to quantify APA events for 10,537 tumor samples across TCGA 32 cancer types (21). Therefore, it is feasible to add APA as an additional dimension to the existing cancer genomic analysis.

In this study, by using the genotype and PDUI data, we developed a new computational pipeline to systematically perform apaQTL analyses across 32 cancer types. We further identified apaQTLs associated with patient overall survival time and apaQTLs located in GWAS linkage disequilibrium (LD) regions. The SNP2APA database (http://gong_lab.hzau.edu.cn/SNP2APA/) was constructed for browsing, searching and downloading the apaQTL data.

MATERIALS AND METHODS

Collection and processing of genotype data

We downloaded the genotype data across 32 cancer types from the TCGA data portal (https://portal.gdc.cancer.gov/) (22), which contained 898,620 SNPs called by Affymetrix SNP 6.0 array. We extracted 9082 samples with both genotype data and APA data available (Figure 1A). To increase the power for apaQTL discovery, IMPUTE2 was used to impute autosomal variants of all samples in each cancer type with haplotypes of 1000 Genomes Phase 3 as the reference panel (23,24). After imputation, SNPs of each cancer type were selected in terms of the following criteria (25): (i) imputation confidence score, INFO ≥0.4, (ii) minor allele frequency (MAF) ≥5%, (iii) SNP missing rate <5% for best-guessed genotypes at posterior probability ≥0.9 and (iv) Hardy-Weinberg equilibrium P-value > 1 × 10⁻⁶ estimated by Hardy-Weinberg R package (26).

Figure 1. — Simplified schematic showing the workflow of SNP2APA database. (A) Collection of genotype and clinical data. (B) Collection of APA data and GWAS data. (C) Database content in SNP2APA. (D) The online PAS predict tool in SNP2APA. (E) Main functions in SNP2APA.

Collection and processing of data for APA events

To quantify dynamic APA events, we used the PDUI value as the indicator and downloaded them from The TC3A Data Portal (http://tc3a.org/) for 32 cancer types (Figure 1B) (21). PDUI value was a novel, intuitive ratio for quantifying APA events based on RNA-Seq data (12). PDUI was calculated by the number of transcripts with distal polyA site divided by the total number of transcripts with both distal and proximal polyA sites. The greater PDUI represented the more transcripts using the distal polyA site, and vice versa. For example, value 1 indicated that all transcripts of the gene used the distal polyA site, while value 0 indicated that all transcripts of the gene used the proximal polyA site. For each cancer type, APA events were selected as follows: (i) the missing rate of PDUI data <0.1, (ii) the standard deviation of PDUI > 5%. After filtering, an average of 4143 APA events per cancer type were included for the further analyses. To minimize the effects of outliers on the regression scores, the PDUI values of each gene across all samples were transformed into a standard normal based on rank (25).

Obtaining of covariates

To improve the sensitivity in QTL analyses, we collected several known and unknown confounders as covariates for apaQTL analysis (25). We first used the smartpca in the EIGENSTRAT program (27) to perform principal component analysis (PCA) of the genotype data for each cancer type. The top five principal components in genotype data were included as covariates for correcting the ethnicity differences. We additionally used PEER software (28) to analyse the APA data and obtained the first 15 PEER factors as covariates which were used for eliminating the possible batch effects and other confounders. Finally, other common confounders such as gender, age and tumor stage (25,29,30), were also included as covariates for apaQTL analysis.

Identification of cis- and trans-apaQTL using MatrixEQTL

For each cancer type, we evaluated pairwise associations between autosomal SNPs and APA events through linear regression by MatrixEQTL (31), a software for efficient QTL analysis. The SNP locations (hg19) were downloaded from dbSNP database (https://www.ncbi.nlm.nih.gov/projects/SNP) and distal PAS locations were extracted from the APA datasets. The SNPs with false discovery rates (FDRs) <0.05 calculated by MatrixEQTL and the absolute value of correlation coefficient (r) ≥0.3 were defined as apaQTLs (Figure 1C). Of them, we further defined the apaQTLs within 1 Mb from the distal PAS as the cis-apaQTLs (25), while defined the apaQTLs beyond that region or on another chromosome as the trans-apaQTLs.

Identification of survival-associated apaQTLs

To prioritize promising apaQTLs, we further examined the association between apaQTLs and patient survival time. The clinical data including survival time of patient were downloaded from TCGA data portal. For each apaQTL, the samples were divided into three groups by genotypes: homozygous genotype (AA), heterozygous genotype (Aa), and homozygous genotype (aa). Then the log-rank test was performed to examine the differences in survival time, and Kaplan–Meier (KM) curves were plotted for intuitive visualization of the survival time for each group. Finally, apaQTLs with FDR <0.05 were designated as survival-associated apaQTLs.

Identification of GWAS-associated apaQTLs

GWAS has been successfully used for identifying thousands of disease susceptibility loci, but it remains a challenge to pinpoint the causal variants and decipher their underlying mechanisms. To facilitate the interpretation of GWAS results, we integrated apaQTLs with the existing GWAS risk loci to explore trait/disease-associated apaQTLs. We downloaded all the risk tag SNPs identified in GWAS studies from GWAS catalog (http://www.ebi.ac.uk/gwas, accessed September 2018) (32). Then the SNPs in linkage disequilibrium (LD) regions with GWAS tag SNPs were extracted from SNAP (https://personal.broadinstitute.org/plin/snap/ldsearch.php) (33). The parameters were set as follows: (i) SNP dataset: 1000 Genomes, (ii) r² (the square of the Pearson correlation coefficient of LD) threshold: 0.5, (iii) population panel: CEU (Utah residents with northern and western European ancestry), (iv) distance limit: 500 kb. Finally, we defined apaQTLs that overlapped with these GWAS tag SNPs and LD SNPs as GWAS-associated apaQTLs.

DATABASE CONSTRUCTION AND CONTENT

All results mentioned above were stored into MongoDB database (version 3.4.2) in the form of relation tables. A user-friendly web interface, SNP2APA (http://gong_lab.hzau.edu.cn/SNP2APA/), was constructed to support data browsing, searching, downloading and PAS online prediction (Figure 1D and E), based on Flask (version 1.0.3) framework with Angularjs (version 1.6.1) as the JavaScript library. It was running on Apache2 web server (version 2.4.18). We have tested SNP2APA on various web browsers, including Chrome (recommended), Firefox, Opera, Internet Explorer, Windows Edge and Safari of macOS.

Data summary of SNP2APA

In total, SNP2APA included 9082 tumor samples across 32 cancer types with both genotype data and APA data available for apaQTL analysis. The sample sizes for each cancer type ranged from 36 in cholangiocarcinoma (CHOL) to 1,091 in invasive breast carcinoma (BRCA) with a median of 221 (Table 1). After genotype imputation and quality control, 4 390 660 SNPs on average per each cancer type were included for further analysis, ranging from 2 746 335 for BRCA to 5 143 663 for acute myeloid leukemia (LAML). After filtering APA events by both the rate of missing PDUI value >0.1 and PDUI standard deviation >0.05, we obtained an average of 4143 APA events per cancer type, ranging from 519 for thyroid carcinoma (THCA) to 6978 for stomach adenocarcinoma (STAD).

Table 1.

Summary of apaQTLs in SNP2APA

				Cis			Trans
Cancer type	No. of amples	No. of enotypes	No. of PA events	Pairs	APA events	apaQTLs	Pairs	APA events	apaQTLs
ACC	77	3 567 954	3114	3026	135	2864	1566	158	1422
BLCA	408	4 190 525	3780	17 072	218	16 472	883	82	819
BRCA	1 091	2 746 335	5379	11 941	212	11 376	501	7	470
CESC	299	4 291 784	3268	14 767	211	14 358	773	114	745
CHOL	36	4 012 152	3564	1 710	54	1610	1980	34	1153
COAD	285	4 499 815	3356	15 797	231	15 264	1341	231	1280
DLBC	48	4 845 461	3658	1630	67	1580	2640	126	2171
ESCA	184	4 457 611	4510	27 484	615	26 009	665	122	644
GBM	150	4 556 998	5353	36 614	801	34 381	575	126	539
HNSC	518	4 254 665	4646	19 960	254	19 162	715	18	655
KICH	66	3 771 774	4477	3047	136	3010	1477	128	1313
KIRC	525	4 577 720	4906	20 978	240	19 596	905	25	867
KIRP	290	4 895 360	4355	19 494	280	18 258	2390	330	2156
LAML	122	5 143 663	3754	7675	159	7588	517	81	501
LGG	515	4 634 138	5251	29 267	330	27 826	1150	41	1008
LIHC	369	4 158 963	3127	10 779	159	10 511	842	131	738
LUAD	511	4 384 429	4471	19 628	241	18 763	1210	23	1160
LUSC	500	3 744 419	5126	21 804	296	20 915	718	14	673
MESO	87	4 784 882	3999	9077	237	8447	1082	120	1019
OV	291	2 963 431	6174	21 159	382	19 702	285	57	285
PAAD	178	4 996 008	4466	20 351	462	19 177	1065	178	951
PCPG	178	4 721 561	3696	25 042	571	23 185	1133	131	1130
PRAD	494	4 828 721	4704	30 998	332	29 312	1842	15	1796
SARC	258	4 088 267	3910	13 158	232	12 582	897	320	536
SKCM	103	4 854 570	4179	12 811	310	11 672	1766	144	1702
STAD	414	4 310 492	6978	23 045	334	21 499	478	97	465
TGCT	150	4 825 013	4616	20 876	487	19 369	1118	204	1068
THCA	503	4 877 853	519	2999	35	2896	10	9	9
THYM	120	4 940 146	3773	12 939	325	12 255	971	117	957
UCEC	176	4 950 486	2588	8903	288	8788	987	212	920
UCS	56	3 888 385	3733	2206	99	1999	1185	143	1112
UVM	80	4 737 552	3149	8021	186	7516	552	66	457

Open in a new tab

cis- and trans-apaQTLs in SNP2APA

SNP2APA mainly provided four kinds of datasets: cis- and trans-apaQTLs, survival apaQTLs and GWAS-associated apaQTLs (Figure 2A and B). In the cis-apaQTL analysis, a total of 467 942 cis-apaQTLs across 32 cancer types were identified at the level of FDR < 0.05 and |r| ≥ 0.3, with a median of 14 811 apaQTLs per cancer type, minimum of 1580 in lymphoid neoplasm diffuse large B-cell lymphoma (DLBC), and maximum of 34 381 in glioblastoma multiforme (GBM). In the trans-apaQTL analysis, a total of 30 721 trans-apaQTLs across 32 cancer types were identified at P-value < 1 × 10⁻⁸ and |r| ≥ 0.3, with a median of 936 apaQTLs per cancer type, minimum of nine in thyroid carcinoma (THCA), and maximum of 2171 in DLBC.

Survival and GWAS associated apaQTLs

To prioritize promising apaQTLs, we associated apaQTLs with the survival data of patients downloaded from the TCGA portal. A total of 2154 apaQTLs associated with overall survival time across 32 cancer types at FDR < 0.05, were identified and included in SNP2APA. For example, we found that rs10247994 was highly associated with patient overall survival time in kidney renal clear cell carcinoma (KIRC) (Figure 2C). The significant differences in PDUI values among corresponding genotypes of rs10247994 were observed, indicating that this SNP might play an important role in regulating the APA event of PUSH gene in KIRC (Figure 2C).

We further mapped apaQTL results to SNPs in GWAS regions and identified a total of 151 342 apaQTLs overlapping with GWAS LD regions with one or multiple traits. For example, rs2303282, as a risk SNP, was reported to be associated with BRCA (34). In our study, we found that rs370151 was in LD with the rs2303282 (LD r² = 0.87) and was highly associated with APA event of AMFR gene. AMFR was reported to encode a tumor motor stimulating protein receptor (35). Thus, it could be inferred that rs370151 might play an important role in breast cancer by affecting APA events (Figure 2D).

THE FUNCTION AND USAGE OF SNP2APA DATABASE

SNP2APA provided a user-friendly web interface (http://gong_lab.hzau.edu.cn/SNP2APA/) that enabled users to browse, search, and download four datasets: cis-apaQTLs, trans-apaQTLs, survival-apaQTLs, and GWAS-apaQTLs. In addition, we designed a ‘Pancan-apaQTL’ page for batch search and visualization. A ‘PAS Predict’ page was constructed for online predicting whether a SNP could destroy or create the PAS of APA.

On the homepage, we provided a quick search option for users. After inputting an interested SNP, gene or APA event, users could obtain the corresponding results presented as four dynamic tables containing the information of cis-apaQTLs, trans-apaQTLs, survival-apaQTLs and GWAS-apaQTLs. By querying the cis/trans-apaQTL page, we could obtain a table containing the information of SNP ID, SNP genomic position, SNP alleles, APA events, gene symbol of APA, APA position, beta value (effect size of SNP on PDUI value), r value and P-value of apaQTL (Figure 2E). For each record, a vector diagram of the boxplot was embedded to display the association between SNP genotypes and PDUI values. By querying the survival-apaQTL page, the SNP ID, SNP genomic position, SNP alleles, sample size, log-rank test P-value, and median survival time of different genotypes will be displayed. For each record, a vector diagram of the KM-plot was provided for visualizing the association between SNP genotypes and overall survival time. On the ‘GWAS-apaQTL’ page, the information of the SNP, related APA event, gene symbol of APA and related traits would be available.

On the ‘PanCan-apaQTL’ page, users could submit multiple SNPs or gene symbols of APA events. Then they would obtain two heatmaps displaying the correlation coefficient (r) of cis-apaQTLs and trans-apaQTLs across the cancer types (Figure 2F).

PAS is the most important regulatory element during the regulation of APA events (3). To further explore the impact of SNP on PAS, we developed a web-based tool by utilizing Dragon PolyA Spotter (http://www.cbrc.kaust.edu.sa/dps/Capture.html) (36) and designed the ‘PAS Predict’ page. On this page, users could submit a wild-type sequence and the corresponding mutant sequence to predict the effect of SNP on polyadenylation signals (PAS) so as to determine whether SNP could destroy or create the PAS (Figure 2G).

In SNP2APA, four main datasets for each cancer type are freely available from the ‘Download’ page. The ‘Help’ page provided the basic information on database, pipeline of database construction, result summary, and contact. SNP2APA was open to any feedback with email address provided at the bottom of the ‘Help’ page.

CONCLUSION AND FUTURE DIRECTIONS

We developed SNP2APA as a resource providing comprehensive apaQTLs across 32 cancer types. To the best of our knowledge, this is the first database systematically evaluating the effects of the genetic variants on APA, especially in multiple cancer types with a large sample size. In recent years, increasing studies have suggested that APA is likely to play important roles in cancer. Therefore, it is urgent to add APA as an additional dimension to existing cancer genomic analysis. In this version of TC3A, by using genotype and APA data of 9082 tumor samples, we provided numerous apaQTLs among multiple cancer types and identified abundant apaQTLs associated with patient survival time or located in known GWAS loci. To explore the impact of SNPs on PAS, we also designed an online tool for users to predict functional apaQTLs. The SNP2APA database will greatly facilitate the interpretation of risk SNPs identified in genetic studies. In the future, with the increasing number of RNA-Seq datasets and genotype data from large consortium projects, we will continue to update the SNP2APA database. We believe that our database will be of particular interest to researchers in the field of genetic variants and APA in cancer.

FUNDING

National Natural Science Foundation of China (NSFC) [31970644 to J.G., 31822030 and 31771458 to A.Y.G.]; Huazhong Agricultural University Scientific & Technological Self-innovation Foundation [11041810351 to J.G.]. Funding for open access charge: Huazhong Agricultural University Scientific & Technological Self-innovation Foundation [11041810351].

Conflict of interest statement. None declared.

REFERENCES

1. Gruber A.J., Schmidt R., Gruber A.R., Martin G., Ghosh S., Belmadani M., Keller W., Zavolan M.. A comprehensive analysis of 3′ end sequencing data sets reveals novel polyadenylation signals and the repressive role of heterogeneous ribonucleoprotein C on cleavage and polyadenylation. Genome Res. 2016; 26:1145–1159. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Tian B., Manley J.L.. Alternative polyadenylation of mRNA precursors. Nat. Rev. Mol. Cell Biol. 2017; 18:18–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Elkon R., Ugalde A.P., Agami R.. Alternative cleavage and polyadenylation: extent, regulation and function. Nat. Rev. Genet. 2013; 14:496–506. [DOI] [PubMed] [Google Scholar]
4. MacDonald C.C. Tissue-specific mechanisms of alternative polyadenylation: Testis, brain, and beyond (2018 update). Wires RNA. 2019; 10:e1526. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Di Giammartino D.C., Nishida K., Manley J.L.. Mechanisms and consequences of alternative polyadenylation. Mol. Cell. 2011; 43:853–866. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Chang J.W., Yeh H.S., Yong J.. Alternative polyadenylation in human diseases. Endocrinol. Metab. (Seoul). 2017; 32:413–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Bacchetta R., Barzaghi F., Roncarolo M.G.. From IPEX syndrome to FOXP3 mutation: a lesson on immune dysregulation. Ann. N. Y. Acad. Sci. 2018; 1417:5–22. [DOI] [PubMed] [Google Scholar]
8. Bennett C.L., Brunkow M.E., Ramsdell F., O’Briant K.C., Zhu Q., Fuleihan R.L., Shigeoka A.O., Ochs H.D., Chance P.F.. A rare polyadenylation signal mutation of the FOXP3 gene (AAUAAA → AAUGAA) leads to the IPEX syndrome. Immunogenetics. 2001; 53:435–439. [DOI] [PubMed] [Google Scholar]
9. Garin I., Edghill E.L., Akerman I., Rubio-Cabezas O., Rica I., Locke J.M., Maestro M.A., Alshaikh A., Bundak R., del Castillo G. et al.. Recessive mutations in the INS gene result in neonatal diabetes through reduced insulin biosynthesis. Proc. Natl. Acad. Sci. U.S.A. 2010; 107:3105–3110. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Erson-Bensan A.E., Can T.. Alternative polyadenylation: another foe in cancer. Mol. Cancer Res. 2016; 14:507–517. [DOI] [PubMed] [Google Scholar]
11. Xiang Y., Ye Y., Lou Y., Yang Y., Cai C., Zhang Z., Mills T., Chen N.Y., Kim Y., Muge Ozguc F. et al.. Comprehensive characterization of alternative polyadenylation in human cancer. J. Natl. Cancer Inst. 2018; 110:379–389. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Xia Z., Donehower L.A., Cooper T.A., Neilson J.R., Wheeler D.A., Wagner E.J., Li W.. Dynamic analyses of alternative polyadenylation from RNA-seq reveal a 3′-UTR landscape across seven tumour types. Nat. Commun. 2014; 5:5274. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Shastry B.S. SNP alleles in human disease and evolution. J. Hum. Genet. 2002; 47:561–566. [DOI] [PubMed] [Google Scholar]
14. Do C., Shearer A., Suzuki M., Terry M.B., Gelernter J., Greally J.M., Tycko B.. Genetic-epigenetic interactions in cis: a major focus in the post-GWAS era. Genome Biol. 2017; 18:120. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Westra H.J., Peters M.J., Esko T., Yaghootkar H., Schurmann C., Kettunen J., Christiansen M.W., Fairfax B.P., Schramm K., Powell J.E. et al.. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 2013; 45:1238–1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Xiong H.Y., Alipanahi B., Lee L.J., Bretschneider H., Merico D., Yuen R.K.C., Hua Y.M., Gueroussov S., Najafabadi H.S., Hughes T.R. et al.. The human splicing code reveals new insights into the genetic determinants of disease. Science. 2015; 347:1254806. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Takata A., Matsumoto N., Kato T.. Genome-wide identification of splicing QTLs in the human brain and their enrichment among schizophrenia-associated loci. Nat. Commun. 2017; 8:14519. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Gong J., Mei S.F., Liu C.J., Xiang Y., Ye Y.Q., Zhang Z., Feng J., Liu R.Y., Diao L.X., Guo A.Y. et al.. PancanQTL: systematic identification of cis-eQTLs and trans-eQTLs in 33 cancer types. Nucleic Acids Res. 2018; 46:D971–D976. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Gong J., Wan H., Mei S.F., Ruan H., Zhang Z., Liu C.J., Guo A.Y., Diao L.X., Miao X.P., Han L.. Pancan-meQTL: a database to systematically evaluate the effects of genetic variants on methylation in human cancer. Nucleic Acids Res. 2019; 47:D1066–D1072. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Stacey S.N., Sulem P., Jonasdottir A., Masson G., Gudmundsson J., Gudbjartsson D.F., Magnusson O.T., Gudjonsson S.A., Sigurgeirsson B., Thorisdottir K. et al.. A germline variant in the TP53 polyadenylation signal confers cancer susceptibility. Nat. Genet. 2011; 43:1098–1103. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Feng X., Li L., Wagner E.J., Li W.. TC3A: the cancer 3′ UTR atlas. Nucleic Acids Res. 2018; 46:D1027–D1030. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Cancer Genome Atlas, N. Comprehensive molecular portraits of human breast tumours. Nature. 2012; 490:61–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Howie B.N., Donnelly P., Marchini J.. A flexible and accurate genotype imputation method for the next generation of Genome-Wide association studies. PLoS Genet. 2009; 5:e1000529. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Altshuler D.M., Durbin R.M., Abecasis G.R., Bentley D.R., Chakravarti A., Clark A.G., Donnelly P., Eichler E.E., Flicek P., Gabriel S.B. et al.. A global reference for human genetic variation. Nature. 2015; 526:68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Ardlie K.G., DeLuca D.S., Segre A.V., Sullivan T.J., Young T.R., Gelfand E.T., Trowbridge C.A., Maller J.B., Tukiainen T., Lek M. et al.. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science. 2015; 348:648–660. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Graffelman J. Exploring diallelic genetic markers: The hardyweinberg package. J. Stat. Softw. 2015; 64:1–23. [Google Scholar]
27. Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D.. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006; 38:904–909. [DOI] [PubMed] [Google Scholar]
28. Stegle O., Parts L., Piipari M., Winn J., Durbin R.. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 2012; 7:500–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Schulz H., Ruppert A.K., Herms S., Wolf C., Mirza-Schreiber N., Stegle O., Czamara D., Forstner A.J., Sivalingam S., Schoch S. et al.. Genome-wide mapping of genetic determinants influencing DNA methylation and gene expression in human hippocampus. Nat. Commun. 2017; 8:1511. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Ongen H., Andersen C.L., Bramsen J.B., Oster B., Rasmussen M.H., Ferreira P.G., Sandoval J., Vidal E., Whiffin N., Planchon A. et al.. Putative cis-regulatory drivers in colorectal cancer. Nature. 2014; 512:87–90. [DOI] [PubMed] [Google Scholar]
31. Shabalin A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012; 28:1353–1358. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. MacArthur J., Bowler E., Cerezo M., Gil L., Hall P., Hastings E., Junkins H., McMahon A., Milano A., Morales J. et al.. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 2017; 45:D896–D901. [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Johnson A.D., Handsaker R.E., Pulit S.L., Nizzari M.M., O’Donnell C.J., de Bakker P.I.W.. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 2008; 24:2938–2939. [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Michailidou K., Lindstrom S., Dennis J., Beesley J., Hui S., Kar S., Lemacon A., Soucy P., Glubb D., Rostamianfar A. et al.. Association analysis identifies 65 new breast cancer risk loci. Nature. 2017; 551:92–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Jiang W.G., Raz A., Douglas-Jones A., Mansel R.E.. Expression of autocrine motility factor (AMF) and its receptor, AMFR, in human breast cancer. J. Histochem. Cytochem. 2006; 54:231–241. [DOI] [PubMed] [Google Scholar]
36. Kalkatawi M., Rangkuti F., Schramm M., Jankovic B.R., Kamau A., Chowdhary R., Archer J.A.C., Bajic V.B.. Dragon PolyA Spotter: predictor of poly(A) motifs within human genomic DNA sequences. Bioinformatics. 2012; 28:127–129. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B1] 1. Gruber A.J., Schmidt R., Gruber A.R., Martin G., Ghosh S., Belmadani M., Keller W., Zavolan M.. A comprehensive analysis of 3′ end sequencing data sets reveals novel polyadenylation signals and the repressive role of heterogeneous ribonucleoprotein C on cleavage and polyadenylation. Genome Res. 2016; 26:1145–1159. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2. Tian B., Manley J.L.. Alternative polyadenylation of mRNA precursors. Nat. Rev. Mol. Cell Biol. 2017; 18:18–30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3. Elkon R., Ugalde A.P., Agami R.. Alternative cleavage and polyadenylation: extent, regulation and function. Nat. Rev. Genet. 2013; 14:496–506. [DOI] [PubMed] [Google Scholar]

[B4] 4. MacDonald C.C. Tissue-specific mechanisms of alternative polyadenylation: Testis, brain, and beyond (2018 update). Wires RNA. 2019; 10:e1526. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5. Di Giammartino D.C., Nishida K., Manley J.L.. Mechanisms and consequences of alternative polyadenylation. Mol. Cell. 2011; 43:853–866. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6. Chang J.W., Yeh H.S., Yong J.. Alternative polyadenylation in human diseases. Endocrinol. Metab. (Seoul). 2017; 32:413–421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7. Bacchetta R., Barzaghi F., Roncarolo M.G.. From IPEX syndrome to FOXP3 mutation: a lesson on immune dysregulation. Ann. N. Y. Acad. Sci. 2018; 1417:5–22. [DOI] [PubMed] [Google Scholar]

[B8] 8. Bennett C.L., Brunkow M.E., Ramsdell F., O’Briant K.C., Zhu Q., Fuleihan R.L., Shigeoka A.O., Ochs H.D., Chance P.F.. A rare polyadenylation signal mutation of the FOXP3 gene (AAUAAA → AAUGAA) leads to the IPEX syndrome. Immunogenetics. 2001; 53:435–439. [DOI] [PubMed] [Google Scholar]

[B9] 9. Garin I., Edghill E.L., Akerman I., Rubio-Cabezas O., Rica I., Locke J.M., Maestro M.A., Alshaikh A., Bundak R., del Castillo G. et al.. Recessive mutations in the INS gene result in neonatal diabetes through reduced insulin biosynthesis. Proc. Natl. Acad. Sci. U.S.A. 2010; 107:3105–3110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10. Erson-Bensan A.E., Can T.. Alternative polyadenylation: another foe in cancer. Mol. Cancer Res. 2016; 14:507–517. [DOI] [PubMed] [Google Scholar]

[B11] 11. Xiang Y., Ye Y., Lou Y., Yang Y., Cai C., Zhang Z., Mills T., Chen N.Y., Kim Y., Muge Ozguc F. et al.. Comprehensive characterization of alternative polyadenylation in human cancer. J. Natl. Cancer Inst. 2018; 110:379–389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12. Xia Z., Donehower L.A., Cooper T.A., Neilson J.R., Wheeler D.A., Wagner E.J., Li W.. Dynamic analyses of alternative polyadenylation from RNA-seq reveal a 3′-UTR landscape across seven tumour types. Nat. Commun. 2014; 5:5274. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13. Shastry B.S. SNP alleles in human disease and evolution. J. Hum. Genet. 2002; 47:561–566. [DOI] [PubMed] [Google Scholar]

[B14] 14. Do C., Shearer A., Suzuki M., Terry M.B., Gelernter J., Greally J.M., Tycko B.. Genetic-epigenetic interactions in cis: a major focus in the post-GWAS era. Genome Biol. 2017; 18:120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15. Westra H.J., Peters M.J., Esko T., Yaghootkar H., Schurmann C., Kettunen J., Christiansen M.W., Fairfax B.P., Schramm K., Powell J.E. et al.. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 2013; 45:1238–1243. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16. Xiong H.Y., Alipanahi B., Lee L.J., Bretschneider H., Merico D., Yuen R.K.C., Hua Y.M., Gueroussov S., Najafabadi H.S., Hughes T.R. et al.. The human splicing code reveals new insights into the genetic determinants of disease. Science. 2015; 347:1254806. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17. Takata A., Matsumoto N., Kato T.. Genome-wide identification of splicing QTLs in the human brain and their enrichment among schizophrenia-associated loci. Nat. Commun. 2017; 8:14519. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18. Gong J., Mei S.F., Liu C.J., Xiang Y., Ye Y.Q., Zhang Z., Feng J., Liu R.Y., Diao L.X., Guo A.Y. et al.. PancanQTL: systematic identification of cis-eQTLs and trans-eQTLs in 33 cancer types. Nucleic Acids Res. 2018; 46:D971–D976. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19. Gong J., Wan H., Mei S.F., Ruan H., Zhang Z., Liu C.J., Guo A.Y., Diao L.X., Miao X.P., Han L.. Pancan-meQTL: a database to systematically evaluate the effects of genetic variants on methylation in human cancer. Nucleic Acids Res. 2019; 47:D1066–D1072. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20. Stacey S.N., Sulem P., Jonasdottir A., Masson G., Gudmundsson J., Gudbjartsson D.F., Magnusson O.T., Gudjonsson S.A., Sigurgeirsson B., Thorisdottir K. et al.. A germline variant in the TP53 polyadenylation signal confers cancer susceptibility. Nat. Genet. 2011; 43:1098–1103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21. Feng X., Li L., Wagner E.J., Li W.. TC3A: the cancer 3′ UTR atlas. Nucleic Acids Res. 2018; 46:D1027–D1030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22. Cancer Genome Atlas, N. Comprehensive molecular portraits of human breast tumours. Nature. 2012; 490:61–70. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23. Howie B.N., Donnelly P., Marchini J.. A flexible and accurate genotype imputation method for the next generation of Genome-Wide association studies. PLoS Genet. 2009; 5:e1000529. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] 24. Altshuler D.M., Durbin R.M., Abecasis G.R., Bentley D.R., Chakravarti A., Clark A.G., Donnelly P., Eichler E.E., Flicek P., Gabriel S.B. et al.. A global reference for human genetic variation. Nature. 2015; 526:68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25. Ardlie K.G., DeLuca D.S., Segre A.V., Sullivan T.J., Young T.R., Gelfand E.T., Trowbridge C.A., Maller J.B., Tukiainen T., Lek M. et al.. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science. 2015; 348:648–660. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26. Graffelman J. Exploring diallelic genetic markers: The hardyweinberg package. J. Stat. Softw. 2015; 64:1–23. [Google Scholar]

[B27] 27. Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D.. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006; 38:904–909. [DOI] [PubMed] [Google Scholar]

[B28] 28. Stegle O., Parts L., Piipari M., Winn J., Durbin R.. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 2012; 7:500–507. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] 29. Schulz H., Ruppert A.K., Herms S., Wolf C., Mirza-Schreiber N., Stegle O., Czamara D., Forstner A.J., Sivalingam S., Schoch S. et al.. Genome-wide mapping of genetic determinants influencing DNA methylation and gene expression in human hippocampus. Nat. Commun. 2017; 8:1511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] 30. Ongen H., Andersen C.L., Bramsen J.B., Oster B., Rasmussen M.H., Ferreira P.G., Sandoval J., Vidal E., Whiffin N., Planchon A. et al.. Putative cis-regulatory drivers in colorectal cancer. Nature. 2014; 512:87–90. [DOI] [PubMed] [Google Scholar]

[B31] 31. Shabalin A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012; 28:1353–1358. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] 32. MacArthur J., Bowler E., Cerezo M., Gil L., Hall P., Hastings E., Junkins H., McMahon A., Milano A., Morales J. et al.. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 2017; 45:D896–D901. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] 33. Johnson A.D., Handsaker R.E., Pulit S.L., Nizzari M.M., O’Donnell C.J., de Bakker P.I.W.. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 2008; 24:2938–2939. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B34] 34. Michailidou K., Lindstrom S., Dennis J., Beesley J., Hui S., Kar S., Lemacon A., Soucy P., Glubb D., Rostamianfar A. et al.. Association analysis identifies 65 new breast cancer risk loci. Nature. 2017; 551:92–94. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B35] 35. Jiang W.G., Raz A., Douglas-Jones A., Mansel R.E.. Expression of autocrine motility factor (AMF) and its receptor, AMFR, in human breast cancer. J. Histochem. Cytochem. 2006; 54:231–241. [DOI] [PubMed] [Google Scholar]

[B36] 36. Kalkatawi M., Rangkuti F., Schramm M., Jankovic B.R., Kamau A., Chowdhary R., Archer J.A.C., Bajic V.B.. Dragon PolyA Spotter: predictor of poly(A) motifs within human genomic DNA sequences. Bioinformatics. 2012; 28:127–129. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

SNP2APA: a database for evaluating effects of genetic variants on alternative polyadenylation in human cancers

Yanbo Yang

Qiong Zhang

Ya-Ru Miao

Jiajun Yang

Wenqian Yang

Fangda Yu

Dongyang Wang

An-Yuan Guo

Jing Gong

Abstract

INTRODUCTION

MATERIALS AND METHODS

Collection and processing of genotype data

Figure 1.

Collection and processing of data for APA events

Obtaining of covariates

Identification of cis- and trans-apaQTL using MatrixEQTL

Identification of survival-associated apaQTLs

Identification of GWAS-associated apaQTLs

DATABASE CONSTRUCTION AND CONTENT

Data summary of SNP2APA

Table 1.

cis- and trans-apaQTLs in SNP2APA

Figure 2.

Survival and GWAS associated apaQTLs

THE FUNCTION AND USAGE OF SNP2APA DATABASE

CONCLUSION AND FUTURE DIRECTIONS

FUNDING

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

SNP2APA: a database for evaluating effects of genetic variants on alternative polyadenylation in human cancers

Yanbo Yang

Qiong Zhang

Ya-Ru Miao

Jiajun Yang

Wenqian Yang

Fangda Yu

Dongyang Wang

An-Yuan Guo

Jing Gong

Abstract

INTRODUCTION

MATERIALS AND METHODS

Collection and processing of genotype data

Figure 1.

Collection and processing of data for APA events

Obtaining of covariates

Identification of cis- and trans-apaQTL using MatrixEQTL

Identification of survival-associated apaQTLs

Identification of GWAS-associated apaQTLs

DATABASE CONSTRUCTION AND CONTENT

Data summary of SNP2APA

Table 1.

cis- and trans-apaQTLs in SNP2APA

Figure 2.

Survival and GWAS associated apaQTLs

THE FUNCTION AND USAGE OF SNP2APA DATABASE

CONCLUSION AND FUTURE DIRECTIONS

FUNDING

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases