Summary
Splicing quantitative trait loci (sQTLs) have been demonstrated to contribute to disease etiology by affecting alternative splicing. However, the role of sQTLs in the development of non-small-cell lung cancer (NSCLC) remains unknown. Thus, we performed a genome-wide sQTL study to identify genetic variants that affect alternative splicing in lung tissues from 116 individuals of Chinese ancestry, which resulted in the identification of 1,385 sQTL-harboring genes (sGenes) containing 378,210 significant variant-intron pairs. A comprehensive characterization of these sQTLs showed that they were enriched in actively transcribed regions, genetic regulatory elements, and splicing-factor-binding sites. Moreover, sQTLs were largely distinct from expression quantitative trait loci (eQTLs) and showed significant enrichment in potential risk loci of NSCLC. We also integrated sQTLs into NSCLC GWAS datasets (13,327 affected individuals and 13,328 control individuals) by using splice-transcriptome-wide association study (spTWAS) and identified alternative splicing events in 19 genes that were significantly associated with NSCLC risk. By using functional annotation and experiments, we confirmed an sQTL variant, rs35861926, that reduced the risk of lung adenocarcinoma (rs35861926-T, OR = 0.88, 95% confidence interval [CI]: 0.82–0.93, p = 1.87 × 10−5) by promoting FARP1 exon 20 skipping to downregulate the expression level of the long transcript FARP1-011. Transcript FARP1-011 promoted the migration and proliferation of lung adenocarcinoma cells. Overall, our study provided informative lung sQTL resources and insights into the molecular mechanisms linking sQTL variants to NSCLC risk.
Keywords: splicing quantitative trait locus, non-small-cell lung cancer, splice-transcriptome-wide association study, risk loci, FARP1 exon 20 skipping
Graphical abstract
This study provided a comprehensive catalog of splicing quantitative trait loci (sQTLs) in lung tissues. Integrative sQTL analysis revealed risk loci for non-small-cell lung cancer. Further experiments confirmed that rs35861926 could reduce lung adenocarcinoma risk by promoting FARP1 exon 20 skipping to downregulate the expression level of transcript FARP1-011.
Introduction
Lung cancer is one of the most commonly diagnosed cancers and the leading cause of cancer mortality in China.1 Non-small-cell lung cancer (NSCLC) accounts for approximately 85% of all individuals with lung cancer.2 The development of lung cancer is driven by multiple factors involving environmental exposures (e.g., cigarette smoking3) and germline genetic variants. Since 2008, genome-wide association studies (GWASs) have identified 61 susceptibility loci for lung cancer,4,5,6,7 which provided important insights into the genetic architecture of lung cancer. However, GWAS risk variants account for only a modest proportion of the estimated heritability of lung cancer.5 Furthermore, since the majority of risk variants are located in non-coding regions of the genome, the target genes and downstream biological pathways that mediate these associations remain elusive.
Expression quantitative trait locus (eQTL) analyses can help to uncover the candidate genes for susceptibility loci identified by GWASs.8,9 However, only a moderate proportion of lung cancer susceptibility loci can be explained by eQTLs.8,9 Recent studies suggest that a substantial proportion of cancer heritability may be influenced by biological processes other than the genetic regulation of gene expression.10,11 Alternative splicing is a crucial post-transcriptional regulatory mechanism, which allows a single pre-mRNA to produce multiple mature mRNA isoforms that can be translated into functionally diverse proteins.12 More than 95% of human genes are affected by alternative splicing.12 Furthermore, aberrant splicing patterns are frequently observed in the development and progress of diseases including lung cancer.13,14
Increasing evidence has demonstrated that alternative splicing can be modulated by heritable genetic variants (splice QTL [sQTL]).15,16 In particular, the identification of sQTLs could help to gain insight into the mechanisms underlying GWAS associations for a number of traits or diseases.15,17 For example, the Genotype-Tissue Expression (GTEx) project characterized sQTLs in 49 tissues of 838 donors and reported a 1.86-fold enrichment of GWAS variants for cis-sQTLs. Further, most sQTLs act independently from eQTLs.15 Gusev et al. integrated sQTLs into ovarian cancer GWASs by using splice-transcriptome-wide association study (spTWAS) and identified 74 splicing variants associated with ovarian cancer risk.11 Subsequent in vitro assays showed that a risk variant in CHMP4C induced the allele-specific exon inclusion.11 Another study linking sQTLs to pancreatic cancer GWASs identified an sQTL variant that contributed to disease risk by regulating ELP2 splicing and blocking the STAT3 oncogenic pathway.10 These findings highlighted the role of sQTLs and alternative splicing in the development of human cancers. However, the effect of lung sQTLs on NSCLC risk is largely underexplored.
In this study, we first performed an sQTL analysis to systematically investigate the genetic control of alternative splicing by using a repository of genome-wide genotype and gene-splicing data in normal lung tissues from 116 donors of Chinese ancestry. Next, we characterized genomic properties of these sQTLs. Then, we integrated lung sQTLs and the large-scale GWAS (13,327 NSCLC-affected individuals and 13,328 control individuals) by using spTWAS11 to uncover susceptibility loci of NSCLC. Finally, we carried out functional experiments to confirm the biological mechanisms of the potential causative variant and target gene.
Material and methods
Study participants, DNA genotyping, and RNA sequencing
Individuals with NSCLC who had not received any chemotherapy or radiotherapy before diagnosis were recruited in the Nanjing Lung Cancer Cohort (NJLCC).18 The peripheral blood and lung tissue specimens (i.e., tumor and adjacent normal tissues) were collected during surgical resection. Each sample was subjected to an independent pathology review to confirm that the tumor specimen was consistent with NSCLC (>70% tumor cells) and the adjacent normal specimen did not contain tumor cells. Written informed consents were obtained from all participants. The NJLCC study was approved by the Institutional Review Board of Nanjing Medical University.
Genotype data was obtained from the whole-genome sequencing (WGS) of peripheral blood-derived DNA samples by the Illumina HiSeq or NovaSeq platforms. Read alignment, variant calling, and quality control were performed as previously described.7 For this study, we further excluded variants with call rate < 0.95, departure from Hardy-Weinberg Equilibrium (PHWE < 1 × 10−6), or minor allele frequency (MAF) < 0.01. We restricted our analysis to variants located in autosomal chromosomes. Genotype principal components (PCs) were computed with EIGENSOFT (v.6.1.4).19
For the NJLCC samples, RNA sequencing (RNA-seq) and quality control have been previously described.7 Briefly, total RNA from lung tissue samples was extracted with the RNeasy Mini Kit (Qiagen, Hilden, Germany). Transcriptome sequencing of RNA was performed on the Illumina HiSeq 1500 platform. Distribution of quality score for the sequencing reads was assessed with the FastQC package (http://www.bioinformatics.babraham.ac.uk/projects/fastqc), and low-quality reads were removed. The qualified RNA-seq reads were aligned to the GENCODE v.19 genome assembly with STAR (v.2.5.3a)20 and quantified as transcripts per million (TPM) with RSEM (v.1.3.0).21
After standard quality control procedures, a total of 116 individuals with DNA genotyping and RNA-seq data were included in further analysis.7 Baseline characteristics of the participants were summarized in Table S1.
Splicing quantification
We performed splicing quantification on adjacent normal lung tissues of 116 participants from the NJLCC study by using the pipeline from the GTEx project.15 Specifically, we quantified splicing by using the intron usage ratios computed by Leafcutter.22 Firstly, we used the bam2junc.sh script from LeafCutter to convert the output bam files from STAR (v.2.5.3a) into intron junction files (.junc). The leafcutter_cluster.py script was then utilized for intron clustering with the following parameters: --minclureads = 30, --mincluratio = 0.001, and –maxintronlen = 500,000.15 Intron clusters were mapped to genes on the basis of exon coordinates from GENCODE v.19 annotation. Introns were selected for further analysis if they were (1) located on autosomes, (2) present in ≥50% of all samples, and (3) with ≥12 (10% of the sample size) unique values for intron usage ratios. Additionally, we calculated the Z score for intron usage ratio across all samples and excluded introns with ) and ), where n is the sample size. For the qualifying introns, we used the prepare_phenotype_table.py script from LeafCutter to calculate normalized intron usage ratios.
To account for the unmeasured confounders (e.g., hidden batch effects, other technical and biological sources of bias), we utilized the probabilistic estimation of expression residuals (PEER) method to compute a set of 15 PEER factors from the normalized intron usage ratios.23
sQTL mapping
sQTL mapping was performed with the pipeline from the GTEx project.15 Specifically, we performed sQTL analyses with FastQTL,24 by using linear regression to test for associations between intron usage ratio and SNP genotype within a cis-region of 1 Mb up- and downstream of the transcription start site (TSS). Age, sex, the first five PCs, and the first 15 PEER factors were adjusted as covariates. We used grouped permutations (--grp option) with adaptive permutation mode (--permute 1,000 10,000) implemented in FastQTL to jointly calculate a beta distribution-approximated empirical p value over all intron clusters of a gene, learning the parameters by maximum likelihood estimation.24 For multiple testing correction, we used the empirical p value of each gene to calculate gene-level false discovery rate (FDR) by using the Storey and Tibshirani method.25 We applied an FDR threshold of <0.05 to identify sQTL-harboring genes (sGenes). To identify significant variant-intron pairs, we used the empirical p value of the gene closest to the 0.05 FDR threshold to calculate a nominal p value threshold for each gene on the basis of the beta distribution parameters obtained from the permutations.24 Specifically, the nominal p value threshold was calculated as F−1(pt), where F−1 is the inverse cumulative distribution. For each gene, variant-intron pairs with a p value below the nominal p value threshold were considered as significant.24 The genetic variants from significant variant-intron pairs were declared as sQTL variants. The corresponding introns were called sQTL-harboring introns (sIntrons). The most significant sQTL variant per sIntron was defined as sSNP.
To assess systematic inflation of test statistics,26 we randomly chose ten alternatively spliced introns to run trans-sQTL analysis, testing the associations of all genetic variants genome-wide with alternatively spliced introns via MatrixEQTL.27 We corrected for the same covariates as in the cis-sQTL analysis. The quantile-quantile (Q-Q) plot of p values for the trans-sQTLs were shown in Figure S1, indicating that our sQTL results were not confounded by population stratification and there was no systematic inflation.
sQTL sharing between NJLCC and GTEx
To evaluate the extent of sQTL sharing between NJLCC and GTEx lung tissues, we downloaded splice phenotype matrices (Lung.v8.leafcutter_phenotypes.bed), genetic variant calls (dbGaP: phs000424.v8.p2), and covariates used in sQTL analysis (Lung.v8.sqtl_covariates.txt) from the GTEx v.8 release (https://www.gtexportal.org/home/datasets).15 These correspond to 515 donors with both genetic variant calls and splice phenotypes available. We first counted the number of sharing sGenes that were significant in both NJLCC and GTEx. Then, for the NJLCC sSNPs, we performed sQTL analyses in GTEx. Specifically, genome coordinates in GTEx data were converted from hg38 to hg19 via LiftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver). We chose the sSNPs in NJLCC and tested their associations with corresponding introns in GTEx lung samples by using FastQTL.24 The extent of sQTL sharing was quantified with the Storey’s π1 statistics,25,28 which estimated the proportion of true associations (π1) on the basis of the distribution of p values for the corresponding variant-intron pairs in GTEx lung tissues.
Enrichment of sQTLs in chromatin states, epigenetic marks, and splicing-factor-binding sites
We downloaded chromatin states from the Roadmap Epigenomics Project (https://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#core_15state). The 15 chromatin states were generated from five chromatin marks (H3K4me3, H3K4me1, H3K36me3, H3K27me3, and H3K9me3) assayed in the lung tissue of a 30-year-old woman with bipolar disease as part of the Roadmap Epigenomics Consortium.29
Chromatin-immunoprecipitation-sequencing (ChIP-seq) datasets for transcription factor and histone modification marks (H3K4me3, H3K4me1, H3K27ac, H3K36me3, H3K27me3, and H3K9me3) in various lung-derived cell lines (i.e., lung carcinoma [A549] and fetal lung fibroblasts cell lines [IMR90 and AG04450], pulmonary fibroblasts isolated from lung tissue [HPF], embryonic lung fibroblast cells [WI-38], and normal human lung fibroblast primary cultured cells [NHLF]) were downloaded from the Encyclopedia of DNA Elements (ENCODE) Project30,31 and Roadmap Epigenomics Projects.29 Data from the assay for transposase-accessible chromatin using sequencing (ATAC-seq) in A549 cell line and human lung tissue, as well as CpG sites in A549 cell line, were downloaded from the ENCODE Project.30,31 Topologically associating domain (TAD) boundaries derived from genome-wide chromosome conformation capture (Hi-C) on A549 cell line (GEO: GSE92819) were downloaded from the Gene Expression Omnibus (GEO) database.32
We obtained the binding sites of human RNA-binding protein (RBP) in BED format from CLIPdb (http://111.198.139.65/RBP.html), which is a publicly available cross-linking immunoprecipitation (CLIP)-seq dataset of 171 human RBPs.33,34 We also downloaded a list of 71 experimentally validated human RNA-binding splicing regulatory proteins from the SpliceAid-F database (http://srv00.recas.ba.infn.it/SpliceAidF/).35 The human RBPs from CLIPdb were then filtered to 26 experimentally validated human RNA-binding splicing regulatory proteins that were also included in the SpliceAid-F database.
We used GREGOR (genomic regulatory elements and GWAS overlap algorithm)36 to evaluate global enrichment of sSNPs in chromatin states, epigenetic marks, and RBP-binding sites. GREGOR tests whether the number of sQTL index variants, or their linkage disequilibrium (LD) proxies, overlap with the corresponding regulatory feature more often than expected when compared to permuted control variants matched for number of LD proxies, allele frequency, and gene proximity.36
Comparison of eQTLs and sQTLs
We performed eQTL mapping on adjacent normal lung tissues of 116 participants from the NJLCC study by using the pipeline of GTEx.15 The details of eQTL analyses have been described in previous study.7 We counted the number of sGenes, eQTL-harboring genes (eGenes), and genes harboring both eQTLs and sQTLs. For each gene, we focused on the most significant (lead) sQTL variant and eQTL variant and calculated the distance between them. Linkage disequilibrium r2 between the lead sQTL variant and lead eQTL variant for each gene was computed by PLINK 1.9 (www.cog-genomics.org/plink/1.9/).37
NSCLC GWAS datasets
We obtained the summary statistics from our previous meta-analysis of NSCLC GWASs (13,327 affected individuals and 13,328 control individuals) in Chinese populations.5 These NSCLC GWASs include (1) the Nanjing Medical University (NJMU) GSA Project comprising three GWASs (the Nanjing GSA GWAS, 4,149 affected individuals and 3,198 control individuals; the Beijing GSA GWAS, 2,155 affected individuals and 2,035 control individuals; the Guangzhou GSA GWAS, 3,944 affected individuals and 4,065 control individuals); (2) the NJMU GWAS comprising two studies (the Nanjing GWAS, 1,317 affected individuals and 1,962 control individuals; the Beijing GWAS 809 affected individuals and 1,115 control individuals); and (3) the NJMU OncoArray GWAS (953 affected individuals and 953 control individuals). Details about the study design, genotype calling, imputation, quality control procedures of each GWAS, and statistical analysis have been described previously.5
GWAS enrichment analyses
Enrichment of NSCLC GWAS variants5 among sQTL variants in lung samples was evaluated. Firstly, LD and MAF were computed from 1000 Genomes Project Phase 3 reference panel (504 EAS subjects)38 via PLINK 1.9 (www.cog-genomics.org/plink/1.9/).37 We calculated LD information with the --tag-r2 0.1 --tag-kb 500 (and --tag-r2 0.8 --tag-kb 500) flags in order to find all proxies within a 1 Mb window around each variant at r2 thresholds of 0.1 (and 0.8). The distance of each variant to the nearest TSS was calculated with bedtools (https://bedtools.readthedocs.io/en/latest/) on the basis of GENCODE v.19 annotations. Secondly, we used GARFIELD (https://www.ebi.ac.uk/birney-srv/GARFIELD)39 to test for enrichment of NSCLC GWAS SNPs (p < 1 × 10−4) among sQTL variants in NJLCC lung samples. GARFIELD performs greedy pruning of genome-wide genetic variants (LD r2 ≥ 0.1) and then annotates each variant with a functional annotation if either the variant, or a proxy variant (r2 ≥ 0.8), overlaps the feature. GARFIELD used logistic regression to compute the enrichment (i.e., odds ratio [OR] and enrichment p value) of GWAS signals within a given functional annotation, while accounting for MAF, distance to nearest TSS, and number of LD proxies (r2 ≥ 0.8). If OR > 1, this denotes enrichment; otherwise, it was considered to be a depletion.39
Splice-transcriptome-wide association study (spTWAS)
Next, we used transcriptome and genotype data from NJLCC to impute the cis genetic component of intron usage ratios into the NSCLC GWAS. The spTWAS was performed with the FUSION suite of tools (http://gusevlab.org/projects/fusion/).40 The detailed steps of spTWAS are as follows. First, we used GCTA41 software to estimate the cis-heritability (the cis-window was defined as ±1 Mb window around gene TSS) of intron usage ratios, adjusting for age, sex, the first five PCs, and the first 15 PEER factors as covariates. Introns with p values for cis-heritability < 0.05 were retained for further analysis as in previous studies.42,43 Next, the prediction models for intron usage ratios were trained by modeling all cis-SNPs with the single best SNP, best linear unbiased predictor, elastic net regression, and least absolute shrinkage and selection operator (LASSO) regression. 5-fold cross-validation for each of the models was then performed. The model with the largest cross-validation R2 was chosen for downstream analyses. Cis-heritability estimation and the development of prediction models were performed with transcriptome and genotype data from NJLCC. spTWAS statistics were then calculated by integrating weights from the prediction models and summary statistics from the NSCLC GWAS of Chinese populations with the FUSION suite of tools.5 LD correlations of variants were based on the 1000 Genomes Project Phase 3 reference panel (504 EAS subjects).38 To account for multiple hypotheses testing, we used the Benjamini-Hochberg step-down method to calculate false discovery rate (FDR) for spTWAS associations. Introns with FDR < 0.05 were considered transcriptome-wide significant. We performed testing for colocalization for introns with spTWAS p value < 0.05 by using the COLOC software implemented in the FUSION suite of tools. Evidence for colocalization was assessed with the posterior probability for hypothesis 4 (PP4) that indicated whether the association of alternative splicing with NSCLC risk was driven by the same causal variant(s). Associations with PP4 > 0.7 was considered to be highly likely to colocalize at a locus.
Overlap of spTWAS hits (±1 Mb) with lung cancer susceptibility variants was assessed. Novel susceptibility loci were identified if introns (±1 Mb) did not overlap a lung cancer susceptibility variant reported by previous lung cancer GWASs in Table S2, and then regions were merged if the introns (±1 Mb) overlapped with each other. We used FUSION to perform summary-based conditional analyses between spTWAS and GWAS associations. Specifically, we conditioned GWAS association of each varian on the predicted value of spTWAS signal (intron usage ratios) to assess how much association signal remained independent of the spTWAS signal. When a locus contained multiple introns with transcriptome-wide significance, we performed stepwise selection by including each intron (from most to least significant) into the model until no introns remained conditionally significant. Summary-based conditional analyses for QTLs were performed with GCTA conditional and joint analysis.41
Functional annotation
For spTWAS associations that were highly likely to colocalize (PP4 > 0.7), we performed transcript expression QTL mapping (tQTL) for the corresponding genes. In the tQTL analyses, variants within a cis-region of 1 Mb up- and downstream of the TSS were tested for their associations with normalized expression levels of the transcripts. tQTL mapping was performed with FastQTL,24 adjusting age, sex, the first five PCs of the genotype matrix, and the first 15 PEER factors as covariates. For all transcripts of a single gene, we used the Benjamini-Hochberg step-down method to calculate FDR. An FDR threshold of <0.05 was considered statistically significant.
The effects of sQTL variants were annotated with Ensembl Variant Effect Predictor (VEP, version 109).44 We used Human Splicing Finder (version 3.1)45 and ESEfinder (release 3.0)46 to predict the effect of variants on alternative splicing.
Cell lines
Human NSCLC cell lines (A549 and PC9) were purchased from the cell library of Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (Shanghai, China). All cells have been tested negative for mycoplasma contamination. The cells were cultured in RPMI 1640 (cat. C11875500BT, Gibco) or DMEM (cat. C11995500BT, Gibco) supplemented with 10% fetal bovine serum (FBS) (cat. BC-SE-FBS01, Bio-Channel) and antibiotics (100 U/mL penicillin and 100 mg/mL streptomycin) (cat. 15140122, Gibco) at 37°C in an atmosphere of 5% CO2.
Construction of plasmids
FARP1 minigenes, which contained DNA fragments of exon 19 through exon 21 surrounding the sQTL variant rs35861926 G or T allele, were subcloned into pSPL3 vector (Invitrogen, USA). The candidate alternative events in FARP1 were contained in transcripts FARP1-011 (ENST00000595437) and FARP1-001 (ENST00000319562). We subcloned the cDNAs of full-length FARP1 isoform (long transcript, FARP1-011, ENST00000595437) and truncated FARP1 isoform (short transcript, FARP1-001, ENST00000319562) into the pcDNA3.1 vector (Invitrogen, USA), respectively.
Minigene splicing assay
A549 and PC9 cells were seeded in six-well plates and transfected with allele-specific FARP1 minigene plasmids via X-tremeGENE HP DNA transfection reagent (cat. 6366236001, Roche). Cells were collected after transfection. Total RNA was isolated from cells with TRIzol reagent (cat. 15596018, Invitrogen, USA) and reverse transcribed. To detect splicing alterations, amplified minigene-specific cDNA product was separated by agarose gel electrophoresis and imaged by Bio-Rad system.
Quantitative reverse-transcription PCR (qRT-PCR)
The total RNA was extracted from cells with TRIzol reagent (cat. 15596018, Invitrogen, USA) according to the manufacturer’s instructions. The RNA was reverse transcribed under standard conditions with a PrimeScript RT reagent Kit (cat. RR047A, Takara, Dalian). Then the qRT-PCR was performed with SYBR Green (cat. RR420A, Takara, Dalian). The primer sequences have been listed in Table S3.
Cell proliferation assays
After transfection, cell viability was tested with MTT kit (cat. 298-93-1, Biofroxx) according to the manufacturer’s instruction. For colony formation assay, a prespecified number of transfected cells were placed in each well of six-well plates and maintained in proper media containing 10% FBS for 2 weeks, during which the medium was replaced every 4 days. Then the colonies were fixed with formaldehyde and stained with crystal violet (cat. C0121, Beyotime). After a rinse with PBS, the visible colonies were counted. We determined EdU incorporation assay by using the 5-ethynyl-2′-deoxyuridine (EdU) with the Cell-LightTM EdU Apollo 567 In Vitro Kit (cat. C10310-1, RiboBio, Guangzhou, China) according to the manufacturer’s protocol.
Cell migration assays
Migration assay was performed with 8 μm pore size Transwell inserts (cat. 3422, Corning, USA). The transfected cells in 400 μL serum-free medium were placed in the upper chamber. Medium containing 10% FBS was added to the lower chamber. After incubation, cells were fixed and stained with methanol and 0.1% crystal violet (cat. C0121, Beyotime), imaged, and counted with an IX71 inverted microscope (Olympus, Tokyo, Japan).
Statistical analysis
All statistical tests were performed with Student’s t test and chi-square test for continuous and categorical variables, respectively. Correlation between continuous variables was assessed via Spearman’s correlation test. All reported p values were two-sided. Statistical analyses were conducted using R software (v.3.3.2). Venn diagram and circos plot were generated with VennDiagram (version 1.7.0), circlize (version 0.4.13),47 and ComplexHeatmap (version 2.2.0)48 R packages. Other plots were generated with the base plotting system and ggplot2 package (version 2.2.1)49 in R software (v.3.3.2).
Results
Identification and characterization of sQTLs in the lung tissues
We assayed genotypes by performing WGS on blood-derived DNA samples and quantified alternative splicing events in normal lung tissues from 116 donors in the NJLCC study and then investigated the genetic control of alternative splicing. After quality control of genotypes and intron usage ratios, we included 123,274 alternatively spliced introns (in 15,514 genes) and 8,001,957 genetic variants for further analyses. We next performed sQTL analysis to identify cis-regulatory variants that affected alternative splicing. At FDR < 0.05, we identified 1,385 sGenes and 378,210 significant variant-intron pairs that contained 3,232 sIntrons (Figure 1A; Table S4). To investigate the genomic characterization of sQTL variants, we calculated the distance of each sQTL variant to the corresponding splicing junction (the nearest splicing junction) and observed that sQTL variants clustered around the splice junction: 60% of sSNPs fall within 10 kb of the splice junction (Figure 1B). The majority of sSNPs (57%) lie within the gene body where the corresponding splicing event occurs (Figure 1C).
Figure 1.
Identification and characterization of sQTLs, comparison of sQTLs and eQTLs, and enrichment of NSCLC GWAS variants in lung sQTLs
(A) The discoveries from splicing quantitative trait locus (sQTL) analysis based on 116 human lung tissues from the Nanjing Lung Cancer Cohort (NJLCC) study. For each gene, variant-intron pairs with a p value below the gene-level nominal p value threshold were considered significant, and the corresponding introns were called sQTL harboring introns (sIntrons).
(B) Position of sQTL variants in relation to the splice junction.
(C) Percentage (%) of sSNPs (the most significant variant per sIntron) located in or outside the corresponding gene.
(D) Venn diagram showing the overlap of lung sGenes between the NJLCC and the Genotype-Tissue Expression (GTEx) project.
(E) Left, p value distribution of NJLCC sSNPs in GTEx lung tissues. Right, the direction of effect is consistent for the majority (96.3%) of NJLCC sSNPs in GTEx lung tissues.
(F) Enrichments of sSNPs in functional annotations. The height of the bars represents the fold change of the observed number of sSNPs to the expected number of variants that are not sSNPs overlapping a given annotation (see methods): 15 chromatin states (green, FDR < 0.05; gray, FDR ≥ 0.05); histone modifications (orange, FDR < 0.05); RNA-binding protein (RBP) eCLIP peaks (violet blue, FDR < 0.05).
(G) Venn diagram showing the overlap of sQTL-harboring genes (sGenes) and expression quantitative trait locus (eQTL)-harboring gene (eGenes) in NJLCC lung tissues.
(H and I) Distributions of the distance in base pairs (H) and linkage disequilibrium r2 (I) between the lead eQTL (the most significant eQTL variant per eGene) and sQTL (the most significant sQTL variant per gene) for genes harboring both.
(J) Enrichment of non-small-cell lung cancer (NSCLC) GWAS SNPs (GWAS p value < 10−4) among sQTL variants. The GWASs of NSCLC and its histological subtypes were conducted in Chinese populations. The points indicate enrichment log-odds ratios. The bars represent 95% confidence intervals (95% CIs). 1 × 10−6 ≤ ∗p value < 0.005, ∗∗∗p value < 1 × 10−6.
The GTEx project had released lung sQTLs derived from a genome-wide genotype-gene-splicing dataset comprising 515 individuals.15 We used multiple statistics to assess the extent of sQTL sharing between NJLCC and GTEx. First, we observed that 1,023 (73.9%) of the 1,385 sGenes in NJLCC overlapped those in GTEx lung tissues (Figure 1D). Second, we chose the sSNPs in NJLCC and calculated their p values through sQTL analysis in GTEx lung tissues. We observed that among the 3,232 sIntrons in NJLCC, 1,012 (31.3%) were not detected in GTEx lung tissues. Besides, the sSNPs for another 308 sIntrons in NJLCC were not captured by GTEx. As a result, the statistics for 59.2% of the NJLCC sSNPs can be calculated from the GTEx lung tissues. These NJLCC sSNPs yielded a Storey’s π1 = 0.896, and 96.3% of them showed the same direction of effect with the GTEx lung tissues (Figure 1E).
We also performed sQTL analyses in normal lung tissues of smokers and non-smokers. A total of 307 and 528 sGenes were identified in smokers and non-smokers, respectively (FDR < 0.05). At this threshold, 22.1% of the sGenes in smokers and 54.7% of the sGenes in non-smokers appear to be tissue specific, with 239 shared sGenes (Figure S2A). Using Storey’s π1 statistic as a more sensitive approach,28 we observed that for sSNPs in smokers, 86.3% were shared with non-smokers (Storey’s π1 = 0.863, Figure S2B). For sSNPs in non-smokers, 81.4% were shared with smokers (Storey’s π1 = 0.814, Figure S2C). The direction of effect is consistent for the majority (96.5%) of sSNPs between smokers and non-smokers (Figure S2D).
sQTLs are enriched in actively transcribed regions, genetic regulatory elements, and splicing-factor-binding sites
Accumulating evidence shows that sQTLs are enriched among various genetic regulatory elements.10,15,26 Therefore, we further evaluated functional enrichment patterns of sQTLs among genetic regulatory elements in lung tissues or lung-derived cell lines. We observed that lung sQTLs were significantly enriched in actively transcribed regions and enhancers. On the contrary, they were depleted in the repressed chromatin marked with heterochromatin, repressed polycomb, and quiescent regions (Figure 1F). We also observed the strong enrichment of lung sQTLs in transcription-factor-binding sites (TFBSs), open chromatin regions detected by ATAC-seq, CpG sites, TAD, and histone modification marks targeting promoters (H3K4me3), enhancers (H3K4me1 and H3K27ac), and transcribed regions (H3K36me3) (Figures 1F and S3). On the contrary, lung sQTLs were depleted in histone methylation marks associated with repressed polycomb (H3K27me3) and heterochromatin regions (H3K9me3) (Figure S3). Then, we further performed enrichment analysis for the binding sites of transcription factors. Among the 25 transcription factors assessed, we observed significant enrichment of lung sQTLs in the binding sites of 24 transcription factors (Figure S4).
Pre-mRNA splicing is regulated through an extensive protein-RNA interaction network involving cis elements within the pre-mRNA and trans-acting factors that bind to the cis-elements.16,50 To investigate whether the lung sQTLs are tagging the binding sites of splicing regulatory proteins, we obtained publicly available CLIP-seq datasets of 26 experimentally validated human RNA-binding splicing regulatory proteins.33,35 We evaluated sSNP enrichment in the binding sites of these splicing regulatory proteins and found the significant enrichment of sSNPs in the binding sites of 21 splicing regulatory proteins (Figure S5). The most enriched RBPs were RBM5 followed by HNRNPH1, RBFOX2, ZRANB2, and HNRNPK (FDR < 0.05) (Figure 1F).
sQTLs are mostly independent of eQTLs and enriched in NSCLC GWAS loci
To assess the relationship between sQTLs and eQTLs, eQTL analysis was also performed in 116 normal lung tissues from the NJLCC study, which identified 3,438 eQTL-harboring genes (eGenes). An analysis for the overlap of eGenes compared to the sGenes showed that 425 (30.7%) sGenes were also an eGene (Figure 1G). Of these overlapped genes, the lead eQTL was located at least 10 kb away from the lead sQTL (the most significant sSNP per gene) for 268 (63.1%) genes (Figure 1H). Furthermore, 284 (66.8%) genes exhibited an r2 < 0.8 between the lead eQTL and sQTL (Figure 1I). These features implied that most sQTLs were distinct from eQTLs and integration of sQTLs might provide additional insights into the etiology of NSCLC.
Next, to evaluate the potential contribution of lung sQTLs to the risk of NSCLC in Chinese population, we examined the enrichment of NSCLC GWAS variants (GWAS p < 1 × 10−4) in NJLCC lung sQTLs. We observed that variants associated with NSCLC and lung adenocarcinoma were significantly enriched in lung sQTLs (OR = 3.82, p = 1.51 × 10−3 for NSCLC; OR = 5.69, p = 1.96 × 10−7 for lung adenocarcinoma) (Figure 1J). Variants associated with lung squamous cell carcinoma were not significantly enriched in lung sQTLs (OR = 2.02, p = 0.33) (Figure 1J). These findings highlighted the important role of lung sQTLs in the development of NSCLC.
spTWAS prioritizes candidate susceptibility genes of NSCLC
To identify susceptibility genes of NSCLC, we performed spTWAS by integrating data of genotypes and intron usage ratios from NJLCC as the reference panel to reanalyze summary-level data of NSCLC GWAS.5 We first quantified the cis-heritability of intron usage ratios that was defined as the fraction of phenotypic variance explained by variants within 1 Mb of the TSS of the gene. A total of 10,085 significantly cis-heritable introns (Pcis-heritability < 0.05) were identified and retained for further analysis. Next, we performed spTWAS analysis across these introns and identified 23 alternative splicing events in 19 genes that were significantly associated with the risk of overall NSCLC or histological subtypes (FDR < 0.05, Figure 2; Table S5). Among those, 18 alternative splicing events in 15 genes were located at known lung cancer susceptibility loci,4,5,6,7 including 3q28 (LEPREL1-AS1), 6p22.1–6p21.32 (the major histocompatibility complex [MHC] region), 9p21.3 [KLHL9], 11q23.3 [PHLDB1], 17q24.2 [WIPI1], and 20q11.23 [RPN2] (Figure 2; Table S5). These results help to pinpoint the likely target gene of known susceptibility variants in each locus. Interestingly, we also found four genes whose alternative splicing events did not overlap with known susceptibility loci of lung cancer (Table S2). These included one locus for overall NSCLC, 7q22.3 (RP11-325F22.2), and three for lung adenocarcinoma, 3q23 (XRN1), 8q23.1 (EIF3E), and 13q32.2 (FARP1) (Figure 2; Table S5). Colocalization analysis showed that two significant alternative splicing events at 8q23.1 and one at 13q32.2 were highly likely to colocalize (PP4 > 0.7) (Figure 2; Table S5). We then performed detailed analysis for these two loci.
Figure 2.
Manhattan plot for splice-transcriptome-wide association study of non-small-cell lung cancer
(A–C) Manhattan plot shows −log10(p value) for associations of intron usage ratios with risk of (A) NSCLC; (B) lung adenocarcinoma; and (C) lung squamous cell carcinoma. The x axis represents chromosomal location, and the y axis represents −log10(p value). The red horizontal line denotes FDR < 0.05 (A and B). For lung squamous cell carcinoma, the red horizontal line at p value = 1 × 10−5 indicated that there was no significant alternative splicing event with FDR < 0.05 (C). spTWAS associations in FARP1 and EIF3E (red) were highly likely to colocalize (posterior probability for hypothesis 4 [PP4] > 0.7). The GWASs consisted of 13,327 NSCLC-affected individuals (including 8,762 individuals with lung adenocarcinoma and 3,860 individuals with lung squamous cell carcinoma) and 13,328 control individuals.
Two alternative splicing events of EIF3E in 8q23.1 were found to be significantly associated with the risk of lung adenocarcinoma (EIF3E chr8: 109,241,424–109,247,227: spTWAS Z = 3.93, p = 8.65 × 10−5, PP4 = 0.73; EIF3E chr8: 109,245,901–109,247,227: spTWAS Z = −3.94, p = 8.13 × 10−5, PP4 = 0.73) (Table S5). These two cis-regulated alternative splicing events were highly correlated (Pearson correlation r2 = 1) when imputed into the 1000 Genomes Project (504 EAS subjects in Phase 3 reference panel), representing a single signal at the 8q23.1 locus. Conditional analyses showed that intron EIF3E chr8: 109,245,901–109,247,227 largely explained the GWAS signal at this region (Figure 3A). Furthermore, GWAS signal of lung adenocarcinoma, with lead variant rs443680 (rs443680-G, OR = 1.09, 95% CI: 1.05–1.14, p = 4.62 × 10−5), colocalized with the sQTL (LD r2 = 0.99 between sSNP rs677031 and GWAS top SNP rs443680) of this splicing event. When conditioning on rs677031, the associations between the intron and all other variants in this region were diminished (Figure 3B), suggesting that this intron splicing event had one causal sQTL variant. The candidate alternative events in 8q23.1 were contained in four transcripts of EIF3E (EIF3E-001, EIF3E-002, EIF3E-009, EIF3E-011) (Figure 3D). But co-expression analysis coupled with transcript expression QTL analysis identified EIF3E-011 as the potential target (FDR < 0.05), which was annotated as a protein-coding transcript in GENCODE v.19 (Table S6). Transcript EIF3E-011 had a median expression level of 9.08 TPM and was the 2nd most abundant transcript of EIF3E. The rs677031-G allele was associated with a decreased usage of intron EIF3E chr8: 109,245,901–109,247,227 in normal lung tissues (β = −0.681, p = 6.69 × 10−12; Figures 3E and 3F). Additionally, neither rs677031 nor variants within a 1 Mb window around of the TSS of EIF3E were found to be associated with the overall expression of EIF3E (β = 0.037, p = 0.42 for rs677031-G; Figures 3C and 3G). EIF3E chr8: 109,245,901–109,247,227 was correlated with an increased expression of EIF3E-011 (ρ = 0.358, p = 8.05 × 10−5; Figure 3H), and rs677031-G was associated with a decreased expression of EIF3E-011 in NJLCC lung tissues (β = −2.889, p = 3.82 × 10−15, Figure 3I), which was in consistent direction with the sQTL. These results indicated that EIF3E splicing, which modified the expression of EIF3E-011 but not the total expression of EIF3E, might be a mediator of the link between genetic variants in 8q23.1 and risk of lung adenocarcinoma.
Figure 3.
spTWAS associations at EIF3E implicates a target gene independent of genetic effects on total expression
(A–C) Manhattan plots of SNP-phenotype association before (gray) and after (blue) conditioning on the effect of cis-regulated intron splicing (EIF3E chr8: 109,245,901–109,247,227) or the top QTL: GWAS of lung adenocarcinoma (8,762 affected individuals and 13,328 control individuals) (A), sQTL (116 participants) (B), and eQTL (116 participants) (C). Two-sided p value was derived from the GWAS summary data (A) or calculated via linear regression (B and C).
(D) A gene-level view of EIF3E highlighting (dashed lines) the intron cluster of the lung adenocarcinoma-associated introns (EIF3E chr8: 109,245,901–109,247,227 and EIF3E chr8: 109,241,424–109,247,227) and EIF3E transcripts in this region. Transcripts with median expression level > 0.1 transcripts per million (TPM) were shown. Protein-coding domain mappings are shown in rectangles.
(E) Differential intron usage ratios of EIF3E chr8: 109,245,901–109,247,227 and EIF3E chr8: 109,241,424–109,247,227 stratified by rs677031 genotypes.
(F and G) Boxplots of intron usage (EIF3E chr8: 109,245,901–109,247,227) (F) and overall expression of EIF3E (G), stratified by rs677031 genotypes. The thick line represents the median, the box represents the interquartile range (IQR), and the whiskers are the quartiles ± 1.5 × IQR.
(H) Scatterplots of normalized intron usage (EIF3E chr8: 109,245,901–109,247,227) and expression of transcript EIF3E-011 in 116 participants. Correlation between them was evaluated with Spearman’s correlation test.
(I) Boxplots for expression of transcript EIF3E-011, stratified by rs677031 genotypes. The thick line represents the median, the box represents the IQR, and the whiskers are the quartiles ± 1.5 × IQR.
We also identified another spTWAS association between FARP1 splicing and risk of lung adenocarcinoma at the 13q32.2 locus (FARP1 chr13: 99,090,112–99,091,058: spTWAS Z = 4.28, p = 1.91 × 10−5) (Figure 2; Table S5). The most significant variant rs35861926 in 13q32.2 that was associated with risk of lung adenocarcinoma (rs35861926-T, OR = 0.88, 95% CI: 0.82–0.93, p = 1.87 × 10−5, Figure 4A) was also the sSNP for intron FARP1 chr13: 99,090,112–99,091,058 (PP4 = 0.98). This intron was associated with the splicing of FARP1 exon 20, which might regulate the expression of the protein coding transcript FARP1-011 (Figures 4D and S6). Transcript FARP1-011 had a median expression level of 0.98 TPM in NJLCC normal lung tissues and was the 5th most abundant transcript of FARP1. The rs35861926-T allele was associated with a decreased usage of intron FARP1 chr13: 99,090,112–99,091,058 in normal lung tissues (β = −1.279, p = 1.45 × 10−18; Figures 4B, 4E, and 4F). Co-expression analysis showed a positive correlation between the usage of intron FARP1 chr13: 99,090,112–99,091,058 and FARP1-011 (ρ = 0.395, p = 1.17 × 10−5, Figure 4G; Table S7). Consistently, rs35861926-T was associated with a decreased expression of FARP1-011 (β = −0.504, p = 1.59 × 10−11, Figure 4H; Table S7). Besides, rs35861926 was not associated with the overall expression of FARP1 (p = 0.95; Figures 4C and 4I). These results suggested that FARP1 splicing, which controlled the expression of FARP1-011, might mediate the genetic effect of rs35861926 on the risk of lung adenocarcinoma.
Figure 4.
spTWAS association at FARP1 implicates a target gene independent of genetic effects on total expression
(A–C) Manhattan plots of SNP-phenotype association before (gray) and after (blue) conditioning on the effect of cis-regulated intron splicing (FARP1 chr13: 99,090,112–99,091,058) or the top QTL: GWAS of lung adenocarcinoma (8,762 affected individuals and 13,328 control individuals) (A), sQTL (116 participants) (B), and eQTL (116 participants) (C). Two-sided p value was derived from the GWAS summary data (A) or calculated with linear regression (B and C).
(D) A gene-level view of FARP1 highlighting (dashed lines) the intron cluster of the lung adenocarcinoma-associated intron (FARP1 chr13: 99,090,112–99,091,058), as well as the sQTL variant and FARP1 transcripts in this region. Transcripts with median expression level > 0.1 TPM were shown. Protein-coding domain mappings are shown in rectangles.
(E) Differential intron usage ratio of FARP1 chr13: 99,090,112–99,091,058 stratified by rs35861926 genotypes.
(F) Boxplots of intron usage (FARP1 chr13: 99,090,112–99,091,058), stratified by rs35861926 genotypes. The thick line represents the median, the box represents the IQR, and the whiskers are the quartiles ± 1.5 × IQR.
(G) Scatterplots of normalized intron usage (FARP1 chr13: 99,090,112–99,091,058) and expression of transcript FARP1-011 in 116 participants. Correlation between them was evaluated with Spearman’s correlation test.
(H) Boxplots for expression of transcript FARP1-011, stratified by rs35861926 genotypes. The thick line represents the median, the box represents the IQR, and the whiskers are the quartiles ± 1.5 × IQR.
(I) Boxplots of overall expression of FARP1, stratified by rs35861926 genotypes. The thick line represents the median, the box represents the IQR, and the whiskers are the quartiles ± 1.5 × IQR.
The T allele of risk variant rs35861926 promotes alternative splicing of FARP1 exon 20 in lung adenocarcinoma
Since the susceptibility locus in chromosome 13q32.2 (FARP1) showed the strongest evidence for colocalization (PP4 = 0.98), we selected this locus for functional validation. As described above, the T allele of risk variant rs35861926 was associated with a decreased usage of intron FARP1 chr13: 99,090,112–99,091,058 and a decreased expression of FARP1-011. rs35861926 was a missense variant located in FARP1 exon 20 ( c.2295T>G [p.Ile765Met] [GenBank: NM_001286839.2]). PolyPhen-2 and SIFT predicted this variant to be benign and tolerated, respectively (Table S8), indicating that rs35861926 may not exert its effect by altering single amino acid. Notably, in silico functional annotation showed that rs35861926 fell within the potential exonic splicing enhancer (ESE) as annotated by ESEfinder and Human Splicing Finder (Figure S7). Therefore, we hypothesized that the T allele of rs35861926 would disrupt the ESE on FARP1 exon 20, resulting in decreased activity at nearby splice sites and an increased possibility of exon skipping.
To test this, we transfected the plasmids containing the FARP1 minigenes with different rs35861926 alleles into A549 and PC9 cells (Figure 5A). The results showed that, compared with rs35861926-G, rs35861926-T promoted the alternative splicing of FARP1 exon 20, resulting in a decreased expression of the long transcript FARP1-011 (Figure 5B).
Figure 5.
The T allele of rs35861926 promotes alternative splicing of FARP1 exon 20 in lung adenocarcinoma
(A) FARP1 minigene vectors containing genome sequence of exon 19 through exon 21 surrounding the rs35861926 G or T allele were subcloned into pSPL3 vector.
(B) Minigene assays in A549 and PC9 cells were conducted to confirm the effects of rs35861926 on expression levels of FARP1-011 (long transcript). The experiments were independently replicated three times.
The long transcript of FARP1 promotes proliferation and migration of lung adenocarcinoma cells
To dissect the potential functional roles between the two transcripts of FARP1, we overexpressed the long transcript (FARP1-011) and the truncated short transcript (FARP1-001) in A549 and PC9 cells. Notably, the overexpression of the long transcript FARP1-011 could promote cell viability and colony-forming ability in A549 and PC9 cell lines (Figures 6A and 6B). Similarly, EdU incorporation assays showed that the long transcript FARP1-011 promoted cell proliferation in A549 and PC9 cell lines (Figure 6C). Transwell assays revealed that the cell migration ability was enhanced in the FARP1-011 group (Figure 6D).
Figure 6.
The long transcript of FARP1 promotes lung adenocarcinoma cell proliferation and migration
(A) The effect of FARP1-011 (long transcript) and FARP1-001 (short transcript) overexpression on the viability of A549 and PC9 cells. Results are shown as mean ± standard deviation (SD) from six independent experiments. Statistical significance was determined by Student’s two-sided t test, ∗p < 0.05, ∗∗p < 0.01.
(B–D) The effect of FARP1-011 and FARP1-001 transcript overexpression on colony formation abilities (B), proliferation abilities (C), and migration abilities (D) of A549 and PC9 cells. Results are shown as mean ± SD from three independent experiments. Statistical significance was determined by Student’s two-sided t test, ∗p value < 0.05, ∗∗p value < 0.01.
Discussion
In this study, we performed sQTL analysis on 116 Chinese participants on the basis of a genome-wide genotype- and gene-splicing dataset. We identified 1,385 sGenes and 378,210 significant variant-intron pairs containing 3,232 sIntrons. The sQTLs were thoroughly characterized and found to be enriched in actively transcribed regions, genetic regulatory elements, and splicing-factor-binding sites. Moreover, sQTLs are largely independent of eQTLs and enriched in NSCLC GWAS loci. Through integration of lung sQTLs with a large-scale NSCLC GWAS, we not only pinpointed the likely candidate genes for six known susceptibility loci of lung cancer, but also identified additional four susceptibility loci that might exert their effect via regulating the alternative splicing of targeted genes. Further functional experiments revealed that the T allele of sSNP rs35861926 promoted FARP1 exon 20 skipping to reduce the expression level of the long transcript FARP1-011. The decreased expression level of FARP1-011 inhibited proliferation and migration of lung adenocarcinoma cells, which could explain the decrease in lung adenocarcinoma risk associated with rs35861926-T allele.
Alternative splicing contributes to the functional diversity and complexity of proteins produced in tissues.51 The identification of sQTLs could help to understand the functional effects of genetic variants on complex traits and diseases. The GTEx project had released lung sQTLs derived from a genome-wide genotype-gene-splicing dataset comprising 515 individuals.15 We assessed the extent of sQTL sharing between NJLCC and GTEx lung tissues and found that 73.9% of sGenes in NJLCC overlapped those in GTEx lung tissues. Besides, we observed that the statistics for 59.2% of the NJLCC sSNPs can be calculated from GTEx lung tissues. These NJLCC sSNPs yielded a Storey’s π1 of 0.896 in GTEx lung samples, suggesting substantial sharing of sQTLs between these two different lung collections. The remaining 40.8% of sSNP in NJLCC were not captured by GTEx lung tissues, which could be to the result of the differences in alternative splicing patterns and allele frequencies across ancestries. Thus, the fraction of sQTLs specific for NJLCC deserves further evaluation to assess the extent to which they may be ancestry specific.
Additionally, we found that lung sQTLs tended to be enriched in actively transcribed regions, transcription-factor-binding sites, histone modification marks (H3K4me3, H3K4me1, and H3K27ac), and open chromatin regions (detected by ATAC-seq), which was consistent with previous studies.15,26,51,52 We also observed significant enrichment of lung sQTLs in potential splicing regulatory elements targeted by the binding sites for a number of RBPs, such as RBM5, HNRNPH1, RBFOX2, HNRNPK, and PTBP1. Some of these regulators are known to be involved in NSCLC risk. For example, RBM5, a component of the spliceosome A complex, could inhibit the development of NSCLC by modulating the alternative splicing of apoptosis-related genes.53,54,55 PTBP1 has been observed to promote migration and invasion of NSCLC cell lines.56,57 These observations are consistent with the assumption that genetic variants in the splicing regulatory element changes the likelihood that a splicing event occurs and might eventually affect disease risk.58
Although a substantial proportion of sGenes are expected to be eGenes, most sQTLs are independent of eQTLs.15,17,26,59 We analyzed the overlap of sGenes compared to eGenes and observed that 30.7% of sGenes were also an eGene; however, 67% of the overlapping genes exhibited an LD r2 < 0.8 between the lead sQTL and eQTL. Furthermore, the lead sQTL was located at least 10 kb away from the lead eQTL for 63.1% of these overlapping genes. These results further confirmed previous observations that the majority of sQTLs are distinct from eQTLs and the integration of sQTLs will provide insights into disease etiology.15,26,60
Importantly, we observed significant enrichment of sQTLs in genetic loci associated with NSCLC risk (p < 1 × 10−4), suggesting that a number of risk variants might contribute to NSCLC risk by affecting alternative splicing. We performed spTWAS and identified ten loci harboring alternative splicing events associated with risk of NSCLC. Four of these loci (7q22.3, 3q23, 8q23.1, and 13q32.2) did not overlap with known susceptibility loci of lung cancer (Table S2). Among these four loci, the alternative splicing of FARP1 (FARP1 chr13: 99,090,112–99,091,058) fully explained the GWAS signal in 13q32.2. FARP1 is a Rac guanine nucleotide exchange factor (Rac-GEF),61 which was upregulated in EpCam+ cells of human lung adenocarcinomas. Additionally, FARP1 was responsible for Rac1-mediated migration and invasion in lung cancer cells upon activation of receptor tyrosine kinases such as EGFR and c-Met.61 However, the functions of FARP1 transcripts in the development of NSCLC remain unknown. Our results showed that the sSNP rs35861926 was associated with the changed usage of intron FARP1 chr13: 99,090,112–99,091,058. Minigene splicing assays further revealed that the rs35861926-T allele promoted the splicing of FARP1 exon 20 and then resulted in a decreased expression of the long transcript FARP1-011. Overexpression of FARP1-011 significantly promoted proliferation and migration of lung adenocarcinoma cells compared with the short truncated transcript FARP1-001. These findings suggested that the sQTL variant rs35861926 could affect the risk of lung adenocarcinoma by promoting FARP1 exon 20 skipping to down-regulate the expression level of the long transcript FARP1-011.
We also identified another spTWAS signal for EIF3E in 8q23.1. EIF3E is a component of the multi-subunit eIF3 complex, which is essential for cap-dependent translation initiation.62 Decreased EIF3E expression is often observed in lung cancer and has been shown to induce epithelial-to-mesenchymal transition (EMT) through activation of the TGFβ signaling pathway in lung epithelial cells (A549).62 The lead sQTL variant rs677031 for intron EIF3E chr8: 109,245,901–109,247,227 was in high linkage disequilibrium (r2 = 0.99) with the GWAS top variant rs443680. The rs677031-G allele was associated with a decreased usage of intron EIF3E chr8: 109,245,901–109,247,227 and a lower expression of EIF3E-011 in normal lung tissues. These results provided evidence that genetic effects at 8q23.1 on the risk of lung adenocarcinoma might be mediated by EIF3E splicing that modulated the expression of EIF3E-011.
Our study has several strengths. We performed spTWAS of lung cancer and identified risk loci that could affect NSCLC risk by regulating alternative splicing. Additionally, through a combination of population study and functional experiments, we gained additional insights into the biological mechanism of NSCLC risk loci. However, several limitations are needed to be addressed in this study. First, we considered only cis-sQTLs owing to the current challenges in identifying trans-sQTLs. Second, we used bulk RNA sequencing data and therefore were unable to detect cell-type-specific sQTLs. Third, we only performed functional validation for FARP1. The biological functions of other sGenes should be further evaluated in the future, such as SFTPC, a gene relevant to lung carcinogenesis.63 Additionally, it would be helpful to see how the sQTLs pertain to specific subtypes or NSCLC overall. In recent years, different molecular typing methods have been developed to identify the subgroups of NSCLC, such as transcriptional subtypes,64,65,66,67,68 mutation subtypes,69 and proteome subtypes.70 Studies with larger samples would be useful to systematically analyze the sQTLs in NSCLC and its subtypes in the future. Finally, the sample size of our genotype-gene-splicing dataset was relatively small, which led to a limited power for sQTL mapping and spTWAS analysis. Thus, larger sample size and sQTL meta-analysis would allow for an increased discovery of sQTLs and provide deeper understanding for the genetic control of alternative splicing.
To summarize, our study provided a comprehensive catalog of lung sQTLs. In addition, a combination of spTWAS analyses and functional experiments provided additional insights into the molecular mechanisms underlying NSCLC risk loci. These findings indicated that alternative splicing can be considered more broadly in post-GWAS functional analyses.
Acknowledgments
The authors would like to thank all the individuals, research staff, and students who participated in this study. This work was supported by National Natural Science Foundation of China (81922061, 82003530, 81820108028, 81973123); Natural Science Foundation of Jiangsu Province (BK20200678); CAMS Innovation Fund for Medical Sciences (2019RU038); and National Science Foundation for Post-doctoral Scientists of China (2020M671545).
Declaration of interests
The authors declare no competing interests.
Published: August 9, 2023
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2023.07.008.
Contributor Information
Hongbing Shen, Email: hbshen@njmu.edu.cn.
Hongxia Ma, Email: hongxiama@njmu.edu.cn.
Web resources
1000 Genomes Project, https://www.internationalgenome.org
circlize R package, https://cran.r-project.org/web/packages/circlize/index.html
CLIPdb, http://111.198.139.65/RBP.html
ComplexHeatmap R package, https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html
ENCODE Project, https://www.encodeproject.org
Ensembl Variant Effect Predictor, https://asia.ensembl.org/info/docs/tools/vep/index.html
ESEfinder, https://esefinder.ahc.umn.edu
FastQC, http://www.bioinformatics.babraham.ac.uk/projects/fastqc
GENCODE, https://www.gencodegenes.org
ggplot2 R package, https://ggplot2.tidyverse.org
GREGOR, http://csg.sph.umich.edu/GREGOR
Human Splicing Finder, https://www.genomnis.com/access-hsf
PLINK 1.9, www.cog-genomics.org/plink/1.9
R software, https://www.r-project.org
Roadmap Epigenomics Project, https://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#core_15state
SpliceAid-F, http://srv00.recas.ba.infn.it/SpliceAidF
The Genotype-Tissue Expression (GTEx) project, https://gtexportal.org/home
VennDiagram R package, https://cran.r-project.org/web/packages/VennDiagram/index.html
Supplemental information
Data and code availability
Genotype data and RNA-seq data from Nanjing Lung Cancer Cohort (NJLCC) have not been deposited in a public repository because of consent restriction. GWAS summary data and a summary of the RNA-seq data (i.e., summary statistics for lung sQTLs) would be available from the corresponding author on request. Topologically associating domain (TAD) boundaries derived from genome-wide chromosome conformation capture (Hi-C) on A549 cell line were obtained from the Gene Expression Omnibus (GEO) database (GEO: GSE92819).
References
- 1.Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA. Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]
- 2.Bender E. Epidemiology: The dominant malignancy. Nature. 2014;513:S2–S3. doi: 10.1038/513S2a. [DOI] [PubMed] [Google Scholar]
- 3.Chen Z.M., Peto R., Iona A., Guo Y., Chen Y.P., Bian Z., Yang L., Zhang W.Y., Lu F., Chen J.S., et al. Emerging tobacco-related cancer risks in China: A nationwide, prospective study of 0.5 million adults. Cancer. 2015;121:3097–3106. doi: 10.1002/cncr.29560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bossé Y., Amos C.I. A Decade of GWAS Results in Lung Cancer. Cancer Epidemiol. Biomarkers Prev. 2018;27:363–379. doi: 10.1158/1055-9965.EPI-16-0794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Dai J., Lv J., Zhu M., Wang Y., Qin N., Ma H., He Y.Q., Zhang R., Tan W., Fan J., et al. Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations. Lancet Respir. Med. 2019;7:881–891. doi: 10.1016/S2213-2600(19)30144-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Byun J., Han Y., Li Y., Xia J., Long E., Choi J., Xiao X., Zhu M., Zhou W., Sun R., et al. Cross-ancestry genome-wide meta-analysis of 61,047 cases and 947,237 controls identifies new susceptibility loci contributing to lung cancer. Nat. Genet. 2022;54:1167–1177. doi: 10.1038/s41588-022-01115-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wang C., Dai J., Qin N., Fan J., Ma H., Chen C., An M., Zhang J., Yan C., Gu Y., et al. Analyses of rare predisposing variants of lung cancer in 6,004 whole genomes in Chinese. Cancer Cell. 2022;40:1223–1239.e6. doi: 10.1016/j.ccell.2022.08.013. [DOI] [PubMed] [Google Scholar]
- 8.Hao K., Bossé Y., Nickle D.C., Paré P.D., Postma D.S., Laviolette M., Sandford A., Hackett T.L., Daley D., Hogg J.C., et al. Lung eQTLs to help reveal the molecular underpinnings of asthma. PLoS Genet. 2012;8 doi: 10.1371/journal.pgen.1003029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bossé Y., Li Z., Xia J., Manem V., Carreras-Torres R., Gabriel A., Gaudreault N., Albanes D., Aldrich M.C., Andrew A., et al. Transcriptome-wide association study reveals candidate causal genes for lung cancer. Int. J. Cancer. 2020;146:1862–1878. doi: 10.1002/ijc.32771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Tian J., Chen C., Rao M., Zhang M., Lu Z., Cai Y., Ying P., Li B., Wang H., Wang L., et al. Aberrant RNA Splicing Is a Primary Link between Genetic Variation and Pancreatic Cancer Risk. Cancer Res. 2022;82:2084–2096. doi: 10.1158/0008-5472.CAN-21-4367. [DOI] [PubMed] [Google Scholar]
- 11.Gusev A., Lawrenson K., Lin X., Lyra P.C., Jr., Kar S., Vavra K.C., Segato F., Fonseca M.A.S., Lee J.M., Pejovic T., et al. A transcriptome-wide association study of high-grade serous epithelial ovarian cancer identifies new susceptibility genes and splice variants. Nat. Genet. 2019;51:815–823. doi: 10.1038/s41588-019-0395-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lee Y., Rio D.C. Mechanisms and Regulation of Alternative Pre-mRNA Splicing. Annu. Rev. Biochem. 2015;84:291–323. doi: 10.1146/annurev-biochem-060614-034316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.de Figueiredo-Pontes L.L., Wong D.W.S., Tin V.P.C., Chung L.P., Yasuda H., Yamaguchi N., Nakayama S., Jänne P.A., Wong M.P., Kobayashi S.S., Costa D.B. Identification and characterization of ALK kinase splicing isoforms in non-small-cell lung cancer. J. Thorac. Oncol. 2014;9:248–253. doi: 10.1097/JTO.0000000000000050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ludlow A.T., Wong M.S., Robin J.D., Batten K., Yuan L., Lai T.P., Dahlson N., Zhang L., Mender I., Tedone E., et al. NOVA1 regulates hTERT splicing and cell growth in non-small cell lung cancer. Nat. Commun. 2018;9:3112. doi: 10.1038/s41467-018-05582-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318–1330. doi: 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Park E., Pan Z., Zhang Z., Lin L., Xing Y. The Expanding Landscape of Alternative Splicing Variation in Human Populations. Am. J. Hum. Genet. 2018;102:11–26. doi: 10.1016/j.ajhg.2017.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Li Y.I., van de Geijn B., Raj A., Knowles D.A., Petti A.A., Golan D., Gilad Y., Pritchard J.K. RNA splicing is a primary link between genetic variation and disease. Science. 2016;352:600–604. doi: 10.1126/science.aad9417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Wang C., Yin R., Dai J., Gu Y., Cui S., Ma H., Zhang Z., Huang J., Qin N., Jiang T., et al. Whole-genome sequencing reveals genomic signatures associated with the inflammatory microenvironments in Chinese NSCLC patients. Nat. Commun. 2018;9:2054. doi: 10.1038/s41467-018-04492-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
- 20.Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Li B., Dewey C.N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinf. 2011;12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Li Y.I., Knowles D.A., Humphrey J., Barbeira A.N., Dickinson S.P., Im H.K., Pritchard J.K. Annotation-free quantification of RNA splicing using LeafCutter. Nat. Genet. 2018;50:151–158. doi: 10.1038/s41588-017-0004-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Stegle O., Parts L., Durbin R., Winn J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 2010;6 doi: 10.1371/journal.pcbi.1000770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ongen H., Buil A., Brown A.A., Dermitzakis E.T., Delaneau O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics. 2016;32:1479–1485. doi: 10.1093/bioinformatics/btv722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Storey J.D., Tibshirani R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Walker R.L., Ramaswami G., Hartl C., Mancuso N., Gandal M.J., de la Torre-Ubieta L., Pasaniuc B., Stein J.L., Geschwind D.H. Genetic Control of Expression and Splicing in Developing Human Brain Informs Disease Mechanisms. Cell. 2019;179:750–771.e22. doi: 10.1016/j.cell.2019.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Shabalin A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28:1353–1358. doi: 10.1093/bioinformatics/bts163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Nica A.C., Parts L., Glass D., Nisbet J., Barrett A., Sekowska M., Travers M., Potter S., Grundberg E., Small K., et al. The architecture of gene regulatory variation across multiple human tissues: the MuTHER study. PLoS Genet. 2011;7 doi: 10.1371/journal.pgen.1002003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Roadmap Epigenomics Consortium. Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., Heravi-Moussavi A., Kheradpour P., Zhang Z., Wang J., et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Rosenbloom K.R., Sloan C.A., Malladi V.S., Dreszer T.R., Learned K., Kirkup V.M., Wong M.C., Maddren M., Fang R., Heitner S.G., et al. ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res. 2013;41:D56–D63. doi: 10.1093/nar/gks1172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.ENCODE Project Consortium A user's guide to the encyclopedia of DNA elements (ENCODE) PLoS Biol. 2011;9 doi: 10.1371/journal.pbio.1001046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Yang Y.C.T., Di C., Hu B., Zhou M., Liu Y., Song N., Li Y., Umetsu J., Lu Z.J. CLIPdb: a CLIP-seq database for protein-RNA interactions. BMC Genom. 2015;16:51. doi: 10.1186/s12864-015-1273-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zhu Y., Xu G., Yang Y.T., Xu Z., Chen X., Shi B., Xie D., Lu Z.J., Wang P. POSTAR2: deciphering the post-transcriptional regulatory logics. Nucleic Acids Res. 2019;47:D203–D211. doi: 10.1093/nar/gky830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Giulietti M., Piva F., D'Antonio M., D'Onorio De Meo P., Paoletti D., Castrignanò T., D'Erchia A.M., Picardi E., Zambelli F., Principato G., et al. SpliceAid-F: a database of human splicing factors and their RNA-binding sites. Nucleic Acids Res. 2013;41:D125–D131. doi: 10.1093/nar/gks997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Schmidt E.M., Zhang J., Zhou W., Chen J., Mohlke K.L., Chen Y.E., Willer C.J. GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach. Bioinformatics. 2015;31:2601–2606. doi: 10.1093/bioinformatics/btv201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.1000 Genomes Project Consortium. Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Iotchkova V., Ritchie G.R.S., Geihs M., Morganella S., Min J.L., Walter K., Timpson N.J., UK10K Consortium. Dunham I., Birney E., Soranzo N. GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals. Nat. Genet. 2019;51:343–353. doi: 10.1038/s41588-018-0322-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Gusev A., Ko A., Shi H., Bhatia G., Chung W., Penninx B.W.J.H., Jansen R., de Geus E.J.C., Boomsma D.I., Wright F.A., et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 2016;48:245–252. doi: 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Yang J., Lee S.H., Goddard M.E., Visscher P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Dong P., Hoffman G.E., Apontes P., Bendl J., Rahman S., Fernando M.B., Zeng B., Vicari J.M., Zhang W., Girdhar K., et al. Population-level variation in enhancer expression identifies disease mechanisms in the human brain. Nat. Genet. 2022;54:1493–1503. doi: 10.1038/s41588-022-01170-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Siewert-Rocks K.M., Kim S.S., Yao D.W., Shi H., Price A.L. Leveraging gene co-regulation to identify gene sets enriched for disease heritability. Am. J. Hum. Genet. 2022;109:393–404. doi: 10.1016/j.ajhg.2022.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.McLaren W., Gil L., Hunt S.E., Riat H.S., Ritchie G.R.S., Thormann A., Flicek P., Cunningham F. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17:122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Desmet F.O., Hamroun D., Lalande M., Collod-Béroud G., Claustres M., Béroud C. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 2009;37:e67. doi: 10.1093/nar/gkp215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Cartegni L., Wang J., Zhu Z., Zhang M.Q., Krainer A.R. ESEfinder: A web resource to identify exonic splicing enhancers. Nucleic Acids Res. 2003;31:3568–3571. doi: 10.1093/nar/gkg616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Gu Z., Gu L., Eils R., Schlesner M., Brors B. circlize Implements and enhances circular visualization in R. Bioinformatics. 2014;30:2811–2812. doi: 10.1093/bioinformatics/btu393. [DOI] [PubMed] [Google Scholar]
- 48.Gu Z., Eils R., Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32:2847–2849. doi: 10.1093/bioinformatics/btw313. [DOI] [PubMed] [Google Scholar]
- 49.Wickham H. Springer-Verlag; 2016. ggplot2: Elegant Graphics for Data Analysis. [Google Scholar]
- 50.Wang Z., Burge C.B. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA. 2008;14:802–813. doi: 10.1261/rna.876308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Raj T., Li Y.I., Wong G., Humphrey J., Wang M., Ramdhani S., Wang Y.C., Ng B., Gupta I., Haroutunian V., et al. Integrative transcriptome analyses of the aging brain implicate altered splicing in Alzheimer's disease susceptibility. Nat. Genet. 2018;50:1584–1592. doi: 10.1038/s41588-018-0238-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Garrido-Martín D., Borsari B., Calvo M., Reverter F., Guigó R. Identification and analysis of splicing quantitative trait loci across multiple tissues in the human genome. Nat. Commun. 2021;12:727. doi: 10.1038/s41467-020-20578-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Fushimi K., Ray P., Kar A., Wang L., Sutherland L.C., Wu J.Y. Up-regulation of the proapoptotic caspase 2 splicing isoform by a candidate tumor suppressor, RBM5. Proc. Natl. Acad. Sci. USA. 2008;105:15708–15713. doi: 10.1073/pnas.0805569105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Bonnal S., Martínez C., Förch P., Bachi A., Wilm M., Valcárcel J. RBM5/Luca-15/H37 regulates Fas alternative splice site pairing after exon definition. Mol. Cell. 2008;32:81–95. doi: 10.1016/j.molcel.2008.08.008. [DOI] [PubMed] [Google Scholar]
- 55.Coomer A.O., Black F., Greystoke A., Munkley J., Elliott D.J. Alternative splicing in lung cancer. Biochim. Biophys. Acta. Gene Regul. Mech. 2019;1862 doi: 10.1016/j.bbagrm.2019.05.006. [DOI] [PubMed] [Google Scholar]
- 56.Li S., Shen L., Huang L., Lei S., Cai X., Breitzig M., Zhang B., Yang A., Ji W., Huang M., et al. PTBP1 enhances exon11a skipping in Mena pre-mRNA to promote migration and invasion in lung carcinoma cells. Biochim. Biophys. Acta. Gene Regul. Mech. 2019;1862:858–869. doi: 10.1016/j.bbagrm.2019.04.006. [DOI] [PubMed] [Google Scholar]
- 57.Sayed M.E., Yuan L., Robin J.D., Tedone E., Batten K., Dahlson N., Wright W.E., Shay J.W., Ludlow A.T. NOVA1 directs PTBP1 to hTERT pre-mRNA and promotes telomerase activity in cancer cells. Oncogene. 2019;38:2937–2952. doi: 10.1038/s41388-018-0639-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Wang G.S., Cooper T.A. Splicing in disease: disruption of the splicing code and the decoding machinery. Nat. Rev. Genet. 2007;8:749–761. doi: 10.1038/nrg2164. [DOI] [PubMed] [Google Scholar]
- 59.Qi T., Wu Y., Fang H., Zhang F., Liu S., Zeng J., Yang J. Genetic control of RNA splicing and its distinct role in complex trait variation. Nat. Genet. 2022;54:1355–1363. doi: 10.1038/s41588-022-01154-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Chen L., Ge B., Casale F.P., Vasquez L., Kwan T., Garrido-Martín D., Watt S., Yan Y., Kundu K., Ecker S., et al. Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells. Cell. 2016;167:1398–1414.e24. doi: 10.1016/j.cell.2016.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Cooke M., Kreider-Letterman G., Baker M.J., Zhang S., Sullivan N.T., Eruslanov E., Abba M.C., Goicoechea S.M., García-Mata R., Kazanietz M.G. FARP1, ARHGEF39, and TIAM2 are essential receptor tyrosine kinase effectors for Rac1-dependent cell motility in human lung adenocarcinoma. Cell Rep. 2021;37 doi: 10.1016/j.celrep.2021.109905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Desnoyers G., Frost L.D., Courteau L., Wall M.L., Lewis S.M. Decreased eIF3e Expression Can Mediate Epithelial-to-Mesenchymal Transition through Activation of the TGFbeta Signaling Pathway. Mol. Cancer Res. 2015;13:1421–1430. doi: 10.1158/1541-7786.MCR-14-0645. [DOI] [PubMed] [Google Scholar]
- 63.Imielinski M., Guo G., Meyerson M. Insertions and Deletions Target Lineage-Defining Genes in Human Cancers. Cell. 2017;168:460–472.e14. doi: 10.1016/j.cell.2016.12.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Wilkerson M.D., Yin X., Hoadley K.A., Liu Y., Hayward M.C., Cabanski C.R., Muldrew K., Miller C.R., Randell S.H., Socinski M.A., et al. Lung squamous cell carcinoma mRNA expression subtypes are reproducible, clinically important, and correspond to normal cell types. Clin. Cancer Res. 2010;16:4864–4875. doi: 10.1158/1078-0432.CCR-10-0199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Cancer Genome Atlas Research Network Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012;489:519–525. doi: 10.1038/nature11404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Roh W., Geffen Y., Cha H., Miller M., Anand S., Kim J., Heiman D.I., Gainor J.F., Laird P.W., Cherniack A.D., et al. High-Resolution Profiling of Lung Adenocarcinoma Identifies Expression Subtypes with Specific Biomarkers and Clinically Relevant Vulnerabilities. Cancer Res. 2022;82:3917–3931. doi: 10.1158/0008-5472.CAN-22-0432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Chen J., Yang H., Teo A.S.M., Amer L.B., Sherbaf F.G., Tan C.Q., Alvarez J.J.S., Lu B., Lim J.Q., Takano A., et al. Genomic landscape of lung adenocarcinoma in East Asians. Nat. Genet. 2020;52:177–186. doi: 10.1038/s41588-019-0569-6. [DOI] [PubMed] [Google Scholar]
- 68.Cancer Genome Atlas Research Network Comprehensive molecular profiling of lung adenocarcinoma. Nature. 2014;511:543–550. doi: 10.1038/nature13385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Harada G., Yang S.R., Cocco E., Drilon A. Rare molecular subtypes of lung cancer. Nat. Rev. Clin. Oncol. 2023;20:229–249. doi: 10.1038/s41571-023-00733-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Lehtiö J., Arslan T., Siavelis I., Pan Y., Socciarelli F., Berkovska O., Umer H.M., Mermelekas G., Pirmoradian M., Jönsson M., et al. Proteogenomics of non-small cell lung cancer reveals molecular subtypes associated with specific therapeutic targets and immune evasion mechanisms. Nat. Cancer. 2021;2:1224–1242. doi: 10.1038/s43018-021-00259-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Genotype data and RNA-seq data from Nanjing Lung Cancer Cohort (NJLCC) have not been deposited in a public repository because of consent restriction. GWAS summary data and a summary of the RNA-seq data (i.e., summary statistics for lung sQTLs) would be available from the corresponding author on request. Topologically associating domain (TAD) boundaries derived from genome-wide chromosome conformation capture (Hi-C) on A549 cell line were obtained from the Gene Expression Omnibus (GEO) database (GEO: GSE92819).