Skip to main content
Nature Communications logoLink to Nature Communications
. 2025 Oct 30;16:9603. doi: 10.1038/s41467-025-64605-6

Long-read RNA-seq demarcates cis- and trans-directed alternative RNA splicing

Giovanni Quinones-Valdez 1, Kofi Amoah 2, Xinshu Xiao 1,2,
PMCID: PMC12575695  PMID: 41168189

Abstract

Genetic regulation of alternative splicing constitutes an important link between genetic variation and disease. Nonetheless, RNA splicing is regulated by both cis-acting elements and trans-acting splicing factors. Determining splicing events that are directed primarily by the cis- or trans-acting mechanisms will greatly inform our understanding of the genetic basis of disease. Here, we show that long-read RNA-seq, combined with our new method isoLASER, enables a clear segregation of cis- and trans-directed splicing events for individual samples. The genetic linkage of splicing is largely individual-specific, in stark contrast to the tissue-specific pattern of splicing profiles. Analysis of long-read RNA-seq data from human and mouse revealed thousands of cis-directed splicing events susceptible to genetic regulation. We highlight such events in the HLA genes whose analysis was challenging with short-read data. We also highlight novel cis-directed splicing events in Alzheimer’s disease-relevant genes such as MAPT and BIN1. Together, the clear demarcation of cis- and trans-directed splicing paves ways for future studies of the genetic basis of disease.

Subject terms: Computational biology and bioinformatics, Transcriptomics


Genetic variants can influence how RNA is spliced, shaping disease risk. Here, the authors present isoLASER, a stand-alone method using long-read RNA sequencing to distinguish cis- and trans-directed splicing, revealing new insights into genetic regulation of splicing.

Introduction

Alternative splicing is an essential mechanism in eukaryotic cells that enables substantial transcriptomic diversity, playing critical roles in biology and disease1,2. Splicing is a closely regulated process, primarily governed by the interactions between cis-regulatory elements and trans-acting factors, such as RNA-binding proteins (RBPs). Disruption of cis-regulation of splicing by genetic variants is a primary link between genotypes and disease2,3. On the other hand, trans-factors are essential splicing regulators, orchestrating the substantial diversity of splicing profiles across cell types, tissues, and developmental stages1,4,5. In addition, the two aspects of splicing regulation are closely intertwined since interactions between the cis- and trans-regulators are essential for their function.

In recent years, an increasing amount of effort has been dedicated to studying the association between genetic variants and splicing. Using methods such as large-scale functional screens69, splicing quantitative trait loci (sQTL)10, allele-specific splicing11, machine learning1215, or large-scale data-driven approaches16,17, many genetic variants have been uncovered that are associated with or cause splicing alterations. Such molecular associations can be further exploited to prioritize candidate causal genes for diseases or traits following genome-wide association studies (GWAS)18,19. Based on these studies, splicing is emerging as an essential molecular trait that informs the genotype-phenotype relationships.

Interestingly, recent studies showed that not all splicing events are equally susceptible to aberrant disruption by genetic variants20,21. Certain exons, named hotspot exons, were shown to be prone to exon skipping and enriched with splice-disrupting variants21. The vulnerability of an exon to splice-altering variants may depend on its sequence context and basal exon inclusion level20,22. For exons genetically regulated by cis-variants, many exhibit signs of positive selection6,23 with enrichment in specific biological processes, such as immune-related pathways11,23. Similarly, species-specific splicing events are more often cis-directed than trans-directed2426. Together, these studies suggest that certain exons are more prone to cis-disruption than others, although every exon is controlled by both cis- and trans-acting regulators.

Here, we demonstrate the efficacy of long-read RNA-seq data in discerning cis- and trans-directed splicing events. Specifically, the cis-directed events are characterized by allele-specific alternative splicing patterns, which can be readily identified with our new method, isoLASER. In contrast, trans-directed events exhibit no linkage between haplotypes and splicing. We map the global landscape of cis-and trans-directed splicing by analyzing long-read RNA-seq data from human and mouse samples. Notably, our approach successfully uncovers cis-directed splicing in the highly polymorphic HLA system, which is difficult to achieve with short-read sequencing data. Applied to data derived from Alzheimer’s patients, we report disease-specific cis-directed events, including those in the HLA family and other genes known to contribute to the pathogenesis of the disease. This study delineates cis- and trans-directed splicing events, opening avenues for exploring disease mechanisms and potential therapeutic targets.

Results

Long-read RNA-seq uncovers cis- and trans-directed splicing events

Long-read RNA-seq possesses a unique strength in uncovering full-length isoforms of each gene and, when combined with genotype information, may unveil haplotype-specific splicing and other alternative RNA processing events. To explore this application, we examined long-read RNA-seq data generated by the ENCODE consortium in K562 cells using the PacBio Sequel II platform27 (“Methods” section). As an example, for the gene RIPK2, the long reads can be separated into two haplotypes (H1 and H2) based on the genotype of a heterozygous SNP (Fig. 1a). Strikingly, this analysis clearly uncovered two classes of alternatively spliced (AS) events. For one class of events (highlighted in blue, Fig. 1a), exon inclusion was observed almost exclusively in only one haplotype (H1). In contrast, the second class of AS events (highlighted in red, Fig. 1a) had approximately the same level (56% in H1 vs. 61% in H2) of exon inclusion in the two haplotypes. Thus, the haplotype-specific nature of the first class of exons reflects a dominating role of cis-regulatory genetic variants on splicing (with the causal variants being unknown), hereby referred to as “cis-directed” exons. For the second class of events, genetic modulation is not the dominant factor in the splicing regulation. Hence, we call this class “trans-directed” exons to reflect the dominating role of trans-factors, rather than cis-acting genetic variants, in controlling these alternative splicing events. Note that this classification reflects the prevailing regulatory mechanisms of an exon in a specific cellular context, which does not override the fact that a combination of cis-elements and trans-acting factors controls each splicing event. In addition, we note that the above definition of cis-directed exons refers to those linked to heterozygous variants in a specific sample. However, cis-directed exons may be regulated by homozygous variants, which are not identifiable using a single sample, but can be identified using multi-sample comparisons (below).

Fig. 1. isoLASER and dataset overview.

Fig. 1

a Examples of a cis- and a trans-directed exonic part in the gene RIPK2 in K562 cells. Reads were grouped by haplotype (H1 vs. H2) based on the genotype of an exonic SNP (black arrowhead). The cis- and trans-directed events are highlighted in blue and pink, respectively. b Illustration of the three major steps in isoLASER: de novo variant calling, haplotype phasing, and linkage analysis. c Overview of the ENCODE long-read RNA sequencing samples. d Heatmap of pairwise Pearson’s correlations using the PSI of all alternatively spliced exonic parts in human (left) and mouse (right) tissues. Tissue types and donor IDs are shown. Donors who only contributed one tissue are colored in shades of gray. Similarly, tissues present in only one donor are colored in shades of gray. Meila’s variation of information (VI) is given for the clustering consistency with the tissue VI(T) and donor VI(D). Lower values indicate a higher correlation. The lowest VI value is bolded. e Similar to (a), but using the adjusted mutual information (AMI) for Pearson’s correlation.

Next, to systematically identify cis- and trans-directed events, we developed a new computational method called isoLASER (isoform-Level analysis of Allele-Specific processing of Exonic Regions). Although the final goal is to analyze allelic linkage of splicing, isoLASER provides a one-stop solution by performing three major tasks: de novo variant calls, gene-level phasing of variants, and linkage testing between the phased haplotypes and the alternatively spliced exonic segments (i.e., allelic linkage of splicing; Fig. 1b, “Methods” section).

Before diving into the technical details and benchmarking of our tool, we highlight a key result that further motivates the need to classify splicing events. Using isoLASER, we analyzed all long-read RNA-seq data from human and mouse tissues/cell lines generated by the ENCODE consortium27 (Fig. 1c). As an initial overview of global splicing profiles, we report the PSI values of AS exonic parts and clustered the human or mouse tissues based on the Spearman correlation of their PSI values. Consistent with previous literature, the samples were clustered primarily according to their tissue of origin28 (Fig. 1d). Next, we report the adjusted mutual information (AMI) of the same exonic parts to quantify the allelic linkage between genetic variants and splicing levels of the same set of exonic parts (“Methods” section). It should be noted that this calculation was carried out for all AS exons that resided in reads harboring heterozygous SNPs, not restricted to significant cis-directed events. Remarkably, the AMI-based clustering segregated the samples primarily based on donor identity rather than tissue of origin (Fig. 1e). The clustering and tissue/donor labels were evaluated using Meila’s Variation of Information, where smaller values indicate higher correlation. This result strongly suggests that, despite splicing being highly tissue-specific, an individual’s genetic background plays an important role in shaping their overall splicing profile. Additionally, this observation underscores the potential widespread distinction of cis- and trans-directed splicing events, as illustrated in Fig. 1a.

The isoLASER method to detect cis- and trans-directed events in long-read RNA-seq

As a first step, isoLASER conducts variant calling using the long-read RNA-seq data. It uses a local reassembly approach based on de Bruijn graphs to identify nucleotide variation at the read level, followed by a multi-layer perceptron classifier to discard false positives (see “Methods” section). We trained this classifier using RNA sequencing data from the GM12878 cell line and variant calls from the Genome In A Bottle consortium29. This classifier achieves a training performance with the AUC between 0.92 and 0.99 for the receiver operating characteristic (ROC) curve and between 0.86 and 0.99 for the precision-recall curve (“Methods” section, Supplementary Fig. 1a). To evaluate its performance on testing data, we followed the variant calling benchmark protocol in de Souza et al.30 (“Methods” section). Using genotyped long-read RNA-seq data from HG002 cells, isoLASER achieved similar or higher F1 scores as GATK’s Haplotype Caller31, DeepVariant32, and Clair333 (with their corresponding preprocessing) but with superior precision (Fig. 2a, Supplementary Data 1), which is a desirable feature in typical applications.

Fig. 2. isoLASER overview and benchmarking.

Fig. 2

a Variant calling performance of isoLASER, GATK HaplotypeCaller (HC), DeepVariant, and Clair3 with the HG002 data. Precision and Recall are shown for different read coverage cutoffs. The dashed curves indicate F1 score thresholds. b Phasing comparison of heterozygous variants in HG002 between isoLASER, HapCUT2, and the diploid genome assembly-based phasing from the Human Pangenome Research Consortium (HPRC). Bar plots show the number of variants successfully phased by either or both methods. The pie charts above each bar depict the switch-error rate for isoLASER relative to HapCUT2 and HPRC, respectively. The very small orange wedge represents the low frequency of small errors, indicating high phasing accuracy. No variants were observed that were phased by isoLASER only. c Number of genes with at least one cis-directed splicing event identified by isoLASER or with allele-specific transcript expression identified by LORALS in the ENCODE human tissue dataset. d A cis-directed event identified in the gene WASHC2C. No allele-specific transcript expression was identified for this gene using LORALS. The cis- and trans-directed exonic parts are highlighted in blue and red, respectively. The delta PSI of the cis-directed event calculated by isoLASER is shown on the side.

Following variant calling, isoLASER next carries out gene-level phasing to identify haplotypes. Briefly, after read-level variant calls, an approach based on k-means read clustering is employed, using the variant alleles as values and weighted by the variant quality score (“Methods” section). This step simultaneously phases the variants and groups individual reads into their corresponding haplotypes (haplotagging). We compared the phasing performance of isoLASER with that of HapCUT234, a method designed to precisely phase variants using different sequencing protocols, including single-molecule long-read sequencing. We first used data from HG002 cells since telomere-to-telomere diploid assembly of this cell line is available35. IsoLASER and HapCUT2 were run using the same variant calls and alignment files. Over 99% of heterozygous variants phased by HapCUT2 or in the diploid assembly release were consistently phased by isoLASER (Fig. 2b), supporting the effectiveness of isoLASER in this step. Additionally, the switch-error rate (i.e., error in correctly maintaining the phase between adjacent heterozygous variants) was 0.15% and 0.1% respectively, again supporting a high accuracy in haplotype phasing. We also benchmarked isoLASER in variant calling and phasing using data derived from WTC11 cells, a human-induced pluripotent stem cell line (Supplementary Fig. 1b, c), and observed similar results.

Subsequently, the allelic linkage of splicing is analyzed for exonic segments (i.e., exonic parts) that are non-overlapping, unique exonic regions with distinct splicing patterns from each other. Exonic parts represent the basic units of exons that reflect local alternative splicing events, which enable event-specific genetic association analysis. For each gene, the allelic linkage between phased haplotypes and exonic parts was quantified by the AMI (“Methods” section). We simulated unlinked events for linkage testing to determine the AMI cutoffs at different read coverage levels to control false positives (Supplementary Fig. 2a, “Methods” section). Based on this analysis, we defined cis-directed events as those with AMI greater than 99% of the simulated background and an absolute delta Percent-Spliced-In (PSI difference between haplotypes) greater than 5%. In contrast, trans-directed events are those with AMI smaller than 95% at the corresponding read coverage level. Other events are defined as ambiguous. A quantile-quantile plot shows that cis-directed events identified in our dataset had a much higher AMI than expected by chance (“Methods” section) (Supplementary Fig. 2b), likely reflecting the stringency in calling cis-directed events. The higher AMI cutoff also corresponds to higher delta PSI values of the cis-directed events compared to all events (Supplementary Fig. 2c).

In addition, we note that the sharing of cis-directed events between biological replicates is much higher than that between different cell lines (Supplementary Fig. 2d). Biological replicates also had significantly correlated AMI scores and delta PSI values (Supplementary Fig. 2e). These observations support a desirable reproducibility in the detection and effect sizes of cis-directed events between biological replicates. We also assessed the coverage requirement necessary to identify cis-directed events, which depends on the effect size or delta PSI of the events. Detection of those with high delta PSI (0.4 or greater) demands less coverage than that of events with low delta PSI (Supplementary Fig. 2f).

We compared sample-level allelic linkage results of isoLASER (Supplementary Data 2) to those of LORALS, a computational method for allele-specific analysis of long-read RNA-seq data36. We applied both methods to ENCODE data generated from human tissues (Fig. 1c) and data from the HG002 cell line. IsoLASER identified substantially more genes with allele-specific splicing events (i.e., cis-directed events) than LORALS in both the ENCODE (Fig. 2c) and HG002 data (Supplementary Fig. 3). Unlike isoLASER, LORALS requires previously genotyped and phased data (which are available for the HG002 data, thus incorporated in running LORALS). Using the HG002 data, we observed that, for genes with cis-events detected by both methods, the maximum absolute delta PSI from all exonic parts in each gene is relatively high (median close to 0.30), supporting the nature of cis-events (i.e., allele-specific splicing patterns) (Supplementary Fig. 3). Genes detected with cis-events by isoLASER, but not LORALS, also had relatively high delta PSI. In contrast, the delta PSI for events detected by LORALS only, or genes without any cis-events, is relatively low. Since delta PSI reflects the expected allelic bias in cis-events, the above data support the validity of isoLASER results.

A major conceptual distinction between the two methods is that LORALS focuses on isoform-level allelic linkage, whereas isoLASER tackles individual exonic parts. Since a human gene may harbor multiple AS events, whose combinatorial inclusion may result in a large number of isoforms, it is expected that focusing on local AS events affords greater sensitivity than isoform-level approaches. As an example, to illustrate this difference, the gene WASHC2C has one exonic part spliced in an allele-specific manner identified by isoLASER (blue highlighted, Fig. 2d). In contrast, none of the 4 annotated transcripts showed significant allele-specific bias that was identifiable by LORALS. Furthermore, isoLASER quantifies the splicing difference between haplotypes using the delta PSI, providing a more interpretable metric of the splicing difference between haplotypes.

In the above description, isoLASER was applied to each sample individually. Applicability to a single sample is a major advantage of allele-specific analysis. Nonetheless, leveraging the existence of multiple samples, if available, is also important. We next extended isoLASER to jointly analyze multiple samples to achieve two objectives: (1) to prioritize likely functional variants and (2) to identify cis-events that were otherwise untestable or weak in individual samples. This approach allowed us to leverage the diverse genetic backgrounds of various donors (“Methods” section, Supplementary Fig. 4). Henceforth, we will refer to this joint analysis as isoLASER-joint and single-sample analysis as isoLASER. Notably, isoLASER-joint allows identification of cis-directed events driven by homozygous variants in a specific sample if alternative alleles are present across samples.

isoLASER analysis of human and mouse long-read RNA-seq data

We next applied isoLASER to all samples in Fig. 1c. In each tissue sample or cell line (human or mouse), we identified 2–946 cis-directed events (Fig. 3a, S5a–c, Supplementary Data 3). Across human (mouse) tissues, a total of 2047 (4679) unique cis-directed exonic parts in 1203 (2341) unique genes were identified, and for human (mouse) cell lines, a total of 3312 (1341) unique exonic parts and 1703 (778) unique genes were detected. About 13% of the cis-directed exonic parts were not annotated in GENCODE, thus denoted as novel exonic parts. As expected, samples with higher coverage had more cis-directed splicing events (Fig. 3a, S5a–c). Moreover, samples from F1 hybrid mouse strains demonstrated higher levels of cis-directed events than F0 inbred samples (Supplementary Fig. 5b, c). This observation aligns well with the number of heterozygous variants in hybrid mice, facilitating the detection of cis-regulated splicing.

Fig. 3. Frequency of cis-directed events in human tissue samples.

Fig. 3

a Number of unique cis-directed exonic events and the number of genes detected in each human tissue sample. The number of heterozygous variants, total read coverage, and donor membership are shown for each sample. b Fraction of genes with 1, 2, 3, 4, or 5+ cis-directed exons among all genes with any cis-directed events (number of genes = 42, 41, 37, 25, and 37, respectively). This fraction was calculated for each sample and visualized for all human tissue samples. All boxplots (same below) depict the median as the center line, the boxes define the interquartile range (IQR: 25th–75th percentiles), and the whiskers extend up to 1.5 times the IQR. c Sharing of cis-directed events, measured by the Jaccard index, between samples originating from different donors and different tissues (n = 691); different donors and same tissue (n = 53); and from the same donor but different tissues (n = 90). Nominal p-values were calculated using the Wilcox Rank Sum test (two-sided). The p-values for the comparisons (top bracket to bottom) are 0.017, 2.22e-16, and 2.22e-16, respectively. d Number of variants, cis-directed exonic parts, and genes detected by merging human samples of the same tissue type using isoLASER-joint. DLPFC: dorsolateral prefrontal cortex. e Left: quantile-quantile plot showing the distribution of GTEx sQTL p-values in the ovary. Blue: splicing-associated variants (SAVs) discovered by isoLASER-joint that had sQTL p-values (regardless of significance) in similar GTEx tissues. Red: variants in trans-directed genes that had sQTL p-values. A two-sided KS test was used to calculate the difference between the two curves. Right: Correlation between delta PSI of the variants calculated by isoLASER and the genotype beta parameter from the linear regression calculated in GTEx sQTL mapping. Intron usage instead of PSI was used by GTEx as the splicing phenotype. Thus, a negative correlation is expected. The p-value and regression coefficient were calculated using Pearson’s correlation. The trendline was fit using a linear regression model. f DeepRipe-predicted allelic binding difference between the alternative and reference alleles of all SAVs (n = 243) vs Control (n = 240). The p-value was calculated using a two-sided KS test (left). Allelic binding difference between the alternative and reference alleles of SAVs overlapping specific splicing-related RBP binding sites (right). Control variants are those in genes that do not contain a cis-directed event. Sample sizes (n) for each group are indicated above the boxes. Nominal p-values were calculated using a two-sided Wilcox Rank Sum test. n.s. not significant; *p < 0.05; **p < 0.01; ***p < 0.001.

Among genes with cis-directed events, 30–40% had two or more such events (Fig. 3b, S5d–f), with some events in the same gene being co-regulated by the same genetic variant. Notably, cis-directed events are more frequently shared among different tissues originating from the same donor compared to those (same or different tissue types) from different donors (Fig. 3c, S5g, h). This observation is again consistent with the nature of cis-directed events, where splicing is closely dependent on the genetic background of the sample. It also aligns with the earlier observation that the samples were largely segregated by donors when clustered by AMI (Fig. 1e). In general, trans-directed events were more abundant than cis-directed events at the gene level (Supplementary Fig. 6a, top). Nonetheless, depending on the genetic background, a trans-directed event in one sample could be a cis-directed event in another. We observed that up to 2.6% of trans-directed events were cis-directed in another sample (Supplementary Fig. 6b, blue grids). Moreover, when comparing the trans-directed events in one sample to cis-directed events in all other samples, we found that between 7.6 and 12.5% of trans-directed events could be cis-directed in at least one other sample (Supplementary Fig. 6b, red column). If all human tissues are considered together, 15% of the testable events were cis-directed in at least one tissue, with 68% being trans-directed (the rest being ambiguous) (Supplementary Fig. 6a).

We next applied isoLASER-joint to each tissue/cell type (requiring ≥2 samples). The number of cis-directed events increased compared to the single-sample analysis (Fig. 3d, S7a, b), as expected. The increase in the number of cis-directed events can be attributed to isoLASER-joint’s ability to identify additional events that were missed in the single-sample analysis due to a lack of sample-level heterozygosity. Using this feature, between 4 and 29% of events that were previously found to be trans-directed at the single-sample level were detected as cis-directed at the joint level (Supplementary Fig. 7c). Further, we summarized all events detected by isoLASER-joint in human tissues to show the overall number and proportion of cis- and trans-directed events detected in each tissue. Taking all human tissues together at the joint level, 37% of all testable events were cis-directed in at least one tissue, 46% were trans-directed and the remaining 17% were ambiguous (Supplementary Fig. 7d). In addition, the genetic variants associated with cis-directed events (Supplementary Data 4) had significantly lower p-values in the sQTL analysis reported by the GTEx study10, compared to the p-values of matched controls (Fig. 3e left; Supplementary Fig. 7e). Furthermore, the effect sizes (delta PSI from isoLASER) of the genetic variants overlapping sQTLs were significantly correlated with the genotype beta parameter from the sQTL regression (Fig. 3e right, Supplementary Fig. 7f). These results indicate that isoLASER-joint achieves an sQTL-type of analysis, requiring only limited sample sizes.

Genetic variants of cis-directed events may alter RBP binding

Splicing is known to be regulated by the binding of trans-acting factors, mainly RBPs, to cis-regulatory elements within exons and introns37. We hypothesized that the cis-directed events identified with isoLASER may be regulated by RBPs binding to the associated genetic variants. Indeed, 44 variants (referred to as splicing-associated variants, SAVs, hereafter) associated with cis-directed events in human tissues (based on the joint analysis) were known to alter RBP binding in an allele-specific binding (ASB) study38.To carry out a comprehensive analysis, we first identified overrepresented hexamers surrounding SAV alleles associated with upregulated or downregulated splicing, respectively (“Methods” section). We then assigned the motifs to potential RBPs based on the RNA Bind-N-Seq (RBNS)37,39 data in which RBP binding motifs were experimentally determined (Supplementary Data 5). Among RBPs that bind to splicing motifs enriched with SAVs were the HNRNP (D/L/K) proteins40, which are known to disrupt splice site recognition or interfere with binding of splicing enhancers. Other well-known splicing regulators were detected in this analysis, such as KHSRP and SFPQ40,41, which have previously been determined to enhance intron integrity and binding of other RBPs to the intron41,42. Next, we asked whether the associated variants induced allele-specific protein-RNA interactions between RBPs and the SAVs. Using DeepRiPe43, we predicted the binding score difference between the reference and alternative alleles of each variant to the associated RBP. As controls, we sampled single-nucleotide variants from trans-directed genes. Overall, we observed that variants of cis-directed events significantly altered RBP-RNA interactions compared to the controls (Fig. 3f). These findings together help to interpret the functional relevance of variants in cis-directed alternative splicing.

isoLASER uncovers allele-specific splicing of HLA genes and enables HLA typing

Next, we conducted a Gene Ontology (GO) analysis of genes containing cis-directed splicing events in at least one human tissue sample (“Methods” section). The analysis revealed significant terms, including immune system process and innate immune response (Supplementary Fig. 7g). Among immune-related genes are PKR, CD44, LILRB3, YPEL5, and members of the immunoglobulin superfamily (CD146, BTN3A2) or the human leukocyte antigen (HLA) family (Supplementary Data 6). The enrichment of cis-directed events in immune-related genes supports the idea that genetically regulated splicing is an important mechanism underlying the genetic architecture of immune diversity44.

Among the above immune-related genes that harbored cis-directed events are multiple members of the HLA family, such as HLA-A, HLA-DPB1, and HLA-DQA1 (Fig. 4a, Supplementary Data 7). HLA molecules present peptides to immune cells, thus playing a pivotal role in T-cell activation and effective immune response45,46. As the most polymorphic genetic system in humans, HLA genes contribute significantly to heritability, surpassing all other known loci combined47,48. Although HLA typing is essential for understanding the immune response in health and disease, the roles of alternative splicing in HLA expression remain poorly understood.

Fig. 4. Allele-specific splicing of HLA genes.

Fig. 4

a Summary of cis-directed events in the HLA gene family. Coordinates on the x-axis indicate the starting position of the exonic parts and are grouped by gene. The samples reflected by the y-axis are grouped by donor. The size of the dot indicates the effect size (delta PSI) of the events. Both cis- and trans-directed events are shown (trans-directed events have close to 0 delta PSI). The dots are colored to denote the Pfam protein domains overlapping the cis-directed events. b The gene HLA-C contains two cis-directed events, one specific to donor 1 (event labeled as 1) and the other specific to donor 2 (event labeled as 2). Note that skipping of exon 5 is present in donor 1, but it is not a significant cis-event. These donor-specific events are consistent across different tissues of the same donor. The name of each HLA-type corresponding to the most prevalent allele group is shown to the left of each haplotype. c Diagram to illustrate protein domains (Pfam) and their cellular localization of the HLA-C gene. The relative positions of the cis-directed events are shown. d AlphaFold predictions of three selected isoforms that contain or lack the cis-directed exonic regions. The first and last amino acids of the entire protein and of the trans-membrane domain were labeled as references. The position of the last amino acid (circled) indicates the length difference between the first two isoforms (6 amino acids difference).

IsoLASER allows HLA typing and a detailed analysis of allele-specific splicing patterns of each HLA gene. Notably, as a benchmark, the HLA allele prediction by isoLASER is highly concordant with the HLA allele groups identified via the diploid genome assembly (for HG002 data)49 and those detected using 70 short-read RNA-seq datasets by the arcasHLA method50 (“Methods” section, Supplementary Data 8, 9). We observed individual-specific cis-directed events across multiple HLA genes, with many of them overlapping annotated protein domains from the Pfam database51 (Fig. 4a). For example, HLA-C contains two individual-specific cis-directed events consistently observed across tissues of each individual (Fig. 4b). To gain a more in-depth view of this gene, we matched the haplotypes identified by isoLASER in human tissues with known HLA-C alleles from the IPD-IMGT database52. This analysis revealed consistent splicing patterns for each HLA-C allele irrespective of the tissue of origin (Supplementary Fig. 8). Specifically, the C*04 and C*06 alleles consistently exhibited some degree of exon 5 skipping. The C*03 allele had an alternative splice site in exon 6, with a relative usage of approximately 25%, whereas the C*07 allele exhibited no alternative splicing. Previous studies reported a lack of exon 5 skipping for the C*06 allele53. However, our data show a consistent skipping of exon 5 across different tissues, although the level of skipping is very small (around 1%) (Supplementary Fig. 8). It is possible that previous approaches failed to detect this event due to the very small effect size or that allele sub-groups from the C*06 group have different effects in different individuals.

The cis-directed events in HLA-C may have important functional implications. Specifically, they overlap with the MHC-I C-terminus domain situated in the cytoplasm (exon 5), as well as a transmembrane alpha helix (Fig. 4c). AlphaFold predictions of isoforms with or without the cis-directed events revealed substantial alterations to the resulting protein products (Fig. 4d). Specifically, event 1, yielding isoform ENST00000383329.7 through the usage of an alternative 3’ splice site on exon 6, results in the elongation of the MHC-1 C terminus by 6 amino acids, compared to ENST00000376228.9. Event 2, yielding isoform ENCODEHT000033595 through the exclusion of exon 5, results in the deletion of a large section of the transmembrane helix, causing complete loss of the helix structure.

Other genes in the HLA family also contain cis-directed events in functional domains, such as HLA-A, HLA-DMA, and HLA-DPB1 (Fig. 4a, Supplementary Fig. 9a–i). Similar to HLA-C, the haplotypes uncovered in the reads of the other genes also matched with known HLA alleles. Thus, the long-read RNA-seq data generally facilitate HLA typing in the RNA. Cis-events in these genes may also have important functional implications, such as by affecting the C-terminus of the HLA-A protein (Supplementary Fig. 9a–c), the C1-set domain of HLA-DMA that is in charge of T-cell recognition54 (Supplementary Fig. 9d–f), or the extension/shortening of the signal peptide in HLA-DPB1 (Supplementary Fig. 9g–i). Altogether, isoLASER helps to uncover a number of allele-specific splicing events in HLA genes, with potential implications for protein function, which complements the traditional methods of DNA-centric HLA typing.

Allele-specific splicing of HLA genes in Alzheimer’s disease (AD) patients

Recently, HLA genes have been increasingly recognized as important contributors to Alzheimer’s disease (AD)55. Using the data generated from the dorsolateral prefrontal cortex (DLPFC) tissue of 5 AD patients and 4 controls (Fig. 1c), we asked whether allele-specific splicing events in HLA genes were observed in AD and whether such events were specific to AD compared to controls. Combining the results of isoLASER and isoLASER-joint, two of the HLA genes demonstrated allele-specific splicing in the AD samples, including HLA-C and HLA-DMA (Supplementary Data 10). To expand this analysis, we also examined another long-read RNA-seq dataset (Oxford Nanopore) generated from the DLPFC of 6 AD patients and 6 controls56. This dataset yielded seven HLA genes with allele-specific splicing in the AD samples (Supplementary Data 10), including the two genes from the ENCODE data (Fig. 5a). It should be noted that the two datasets were derived from samples with vastly different genetic backgrounds (Supplementary Fig. 10a). As a result, many splicing events were testable in only one dataset, and only 26 genes shared the same cis-directed events between the two datasets (Supplementary Fig. 10b).

Fig. 5. Allele-specific splicing in Alzheimer’s disease.

Fig. 5

a Summary of cis-directed events in the HLA gene family and known AD risk genes in samples from the frontal cortex (Nanopore data) and dorsolateral prefrontal cortex (ENCODE PacBio data). Events from the sample-level and joint analysis are shown. X-axis is labeled with the coordinates of the starting positions of the cis-directed events. b Quantile-quantile plot of the Alzheimer’s GWAS p-values of SAVs identified by isoLASER. The GWAS threshold for significance is shown as a dashed line (5 × 10−8). SAVs identified only in AD, only in controls, or in both groups are colored differently. c Cis-directed events ranked by their ddPSI values. Gene names are shown for those with a ddPSI value greater than 0.5 or 0.2 (only if the gene was previously identified as an AD risk gene, red). d Differential allele-specific splicing pattern is present in the Ubiquitin C gene (UBC), which has the highest ddPSI value shown in (c). The inclusion of the highlighted intronic region is strongly linked to the variant at position 124914621 on chromosome 12, but the specific linkage pattern is reversed in the two conditions. e Diagram illustrating the repeating ubiquitin-like domains and the cis-directed exonic part in the UBC gene. f Minor allele frequency (from gnomAD) of the SAVs identified by isoLASER in the AD and control samples. Variants not found in gnomAD were assigned an MAF = 0. Indels not found in gnomAD were discarded. SAVs located in HLA and AD risk genes are labeled in red.

Given the presence of cis-directed events in HLA genes, we next asked if other AD-relevant genes also harbored cis-directed events. Genes such as MAPT, PSEN1, MS4A6A, PICALM, and ZCWPW1 were found to have such events in at least one sample (Fig. 5a, Supplementary Fig. 10c). Through the isoLASER-joint analysis, we identified the SAVs associated with these events and overlapped them with the GWAS summary statistics for AD. SAVs located in the genes MS4A6A, BIN1, HLA-DRA, and HLA-DQA1, among others, passed the GWAS P-value cutoff for significance, suggesting their potential contribution to the disease (Fig. 5b). Although some events were only observed in control donors, the GWAS significance of the SAVs indicates that the risk allele is associated with AD. It is also notable that these variants were not previously found to be associated with splicing through splicing QTL studies57 for AD, underscoring the advantage of our method.

We next identified differential allele-specific splicing events between AD and controls in the HLA and AD risk genes in the above two datasets, respectively (“Methods” section). Briefly, we compared the allele-specific effect size (delta PSI) of each SAV and then quantified the difference between AD and controls as the delta-delta PSI (ddPSI). As shown in Fig. 5c (and Supplementary Fig. 10d), a number of genes showed cis-events with significant ddPSI values, indicating the potential existence of disease-specific genetic regulation. Multiple genes involved in neurological conditions, such as UBC, contain multiple events with high ddPSI values. An intron (highlighted in Fig. 5d) of the UBC gene showed 100% retention in the haplotype carrying the T allele of the highlighted variant in AD samples but nearly 0% retention associated with the C allele. Remarkably, this pattern was reversed in control samples, with the C allele associated with 100% intron retention but the T allele with ~50% retention. The coding region of UBC comprises nine tandemly repeated moieties of the ubiquitin domain58. The highlighted alternative splicing event removes one ubiquitin domain from the final transcript (Fig. 5e), which can potentially disturb the tightly regulated cellular-free ubiquitin pool equilibrium59. Accumulation of misfolded proteins is a hallmark of multiple neurological diseases, including AD. The limited number of free ubiquitin units may impair the proper tagging and later degradation of Tau and Amyloid beta aggregates60,61.

Lastly, using the gnomAD database62 we extracted the minor allele frequency (MAF) of the SAVs identified in this analysis. Over 1201 variants (15% of the total) were classified as either rare (MAF < 10−2) or ultra-rare (MAF < 10−3) (Fig. 5f). Multiple SAVs identified in the HLA genes, MAPT, and PSEN2 genes fall under this category. Rare variant analysis is very challenging via genome-wide association methods, such as splicing QTL, due to their low prevalence. Here, our data show that isoLASER-joint can evaluate these variants directly, stemming from the allele-specific nature of this method.

Discussion

RNA splicing is primarily determined by the contribution of cis-acting sequence elements and trans-acting splicing factors. Genetic variants may disrupt cis-regulatory motifs and alter splicing, constituting an essential link between genotypes and phenotypes. Here, we leverage long-read RNA-seq to demarcate splicing events that are primarily cis- or trans-directed, using a new method, isoLASER, that performs de novo variant calling, gene-level phasing, and allele-specific splicing analysis for long reads. As a one-stop solution for the above multiple tasks, isoLASER is advantageous in that a phased whole-genome is not a prerequisite.

IsoLASER focuses on exonic parts instead of an isoform-level analysis36,63. This approach affords a granular view of individual splicing events and the associated isoform diversity. For example, multiple cis-directed events may be present in a specific gene, with different levels of allele-specific association. Since many transcript isoforms may be needed to represent the combinatorial inclusion or exclusion of these events, isoform-level analyses may be underpowered and yield insignificant observations. Thus, isoLASER complements previous studies of allele-specific isoform expression36,63,64 from the unique perspective of exonic parts. Depending on the purpose of a study, the users may execute both types of methods to obtain complementary insights. It should be noted that some genes harbor multiple non-consecutive cis-directed events, raising the possibility of genetic co-regulation of splicing65, although whether such events share a common causal variant remains to be investigated.

Furthermore, isoLASER’s usage of the adjusted mutual information and delta PSI offers a more intuitive understanding of partial associations compared to other statistical tests. Additionally, isoLASER leverages the read-level associations representing direct evidence of genetically modulated alternative splicing. This principle allows the detection of splicing-associated variants in small cohorts, presenting a viable alternative to splicing QTL mapping. Similarly, it also allows the examination of rare variants, as the only technical requirement is that such variants have sufficient read coverage in a sample. This aspect represents a remarkable advantage over association studies that require modeling or imputation of their effect size and significance.

We demonstrated that the genetic linkage profiles of alternative splicing events are highly individual-specific and maintained across different tissues. Gene Ontology analysis revealed that genes harboring cis-directed events are enriched with immune-related functions. This finding is consistent with reports of highly individual-specific immune adaptation through transcript alterations6668. Among genes with critical immune relevance is the HLA gene system, which has been associated with more complex diseases than any other genomic loci47,48,69. Due to their high polymorphic nature, allele-typing and allele-specific splicing of HLA genes are challenging to resolve using short reads (although not impossible). Long-read RNA-seq combined with isoLASER allowed us to uncover cis-directed splicing events associated with the different HLA allele groups, with some events causing significant disruption to the protein structure. Despite their potential functional impact, allele-specific splicing events in HLA genes were poorly characterized. Our method systematically identifies splicing events associated with the major HLA allele groups, offering a valuable complement to HLA typing in clinical applications.

As an application, we analyzed data from Alzheimer’s disease patients using isoLASER and reported cis-directed splicing in multiple AD-relevant genes such as BIN1, MAPT, MSA4A6, and HLA-DQA1. Many of the splicing-associated variants are significantly associated with the disease based on GWAS. Disease-specific analysis unveiled differential allele-specific splicing events between AD and controls. A remarkable example is the gene UBC, where the linkage direction is almost completely reversed in AD compared to controls. Global alterations in splicing are increasingly appreciated in AD brains57,7072,73, and here we present a unique approach to identify disease-specific regulation with a small cohort size. This notion calls for expanding the current focus of identifying disease-enriched mutations to disease-specific functions of genetic alterations. Lastly, around 15% of the splicing-associated variants were classified as rare or ultra-rare, according to the gnomAD database. This represents a significant expansion of the repertoire of splicing-related cis-acting variants.

In summary, this study represents a crucial step forward in deciphering the functional impact of genetic variants on splicing using long-read RNA sequencing and the isoLASER method, providing valuable insights that can inform future therapeutic approaches for various human diseases.

Methods

Mapping and preprocessing of long-read RNA-seq data

PacBio Circular Consensus Sequences (CCS) reads were previously generated and pre-processed by ENCODE. Briefly, CCS generation, adapter removal, and read refinement were performed using CCS (version 6.0.0), lima (version 1.10.0), and isoseq3 (version 3.2.2), respectively. These tools are part of the PacBio computational suite (https://github.com/PacificBiosciences/pbbioconda). The raw reads were downloaded from the ENCODE portal and aligned using minimap2 (v. 2.28-r1209)74. We used the splice-aware parameters: -ax splice:hq -uf against the reference human genome GRCh38 XY (excluding alternative contigs and scaffolds).

The ONT raw reads (fastq) were previously base-called using Guppy GPU base-caller v3.9 with configuration dna_r9.4.1_450bps_hac_prom.cfg as detailed in the original study56. The reads were then adapter-trimmed using Pychopper (version 2.5.0, https://github.com/nanoporetech/pychopper) and mapped with minimap2 (v. 2.28-r1209) using the same parameters above.

Splice junction correction in the raw bam files was performed using TranscriptClean75. Initial identification of canonical splice junctions utilized GENCODE annotation. Subsequently, the function talon_label_reads in TALON76 was used to identify and remove transcripts arising from internal priming artifacts. Reads mapped to A-rich regions, defined as regions containing at least 50% adenosine bases within a 20-base window, were excluded from the analysis. Additionally, reads spanning multiple genes were also discarded to exclude potential chimeric alignments to genes with homologous regions.

Transcript structures were inferred from the sequencing reads by TALON using default parameters. Filtering steps included discarding novel transcripts supported by fewer than five reads and transcripts labeled as ‘antisense’ or ‘genomic’. Transcripts lacking splice junctions were also discarded to exclude unprocessed or short transcripts. The remaining transcripts were used for annotating individual reads and determining exonic parts using isoLASER. Specifically, exonic parts were defined as non-overlapping exonic segments resulting from collapsing all identified isoforms77. We retained only exonic parts that were not constitutively present across all transcripts, thus representing alternative splicing events. The average length of an exonic part is between 100 and 200 bases, which is consistent with the expected length of exons in the human genome.

isoLASER: variant calling

isoLASER’s variant caller builds on the framework used in our previous method scAllele78. scAllele identifies small variants (up to 20 bases in length) through the local reassembly of reads, whereby the reads were segmented into overlapping 25-mers and reassembled into a de Bruijn Graph. Edges that were lowly covered were pruned to simplify the graph. The position and alleles of the variants were then identified from the ‘bubbles’ in the graph. The graph is then traversed to reconstruct the original sequence of the reads. The positions of the bubbles are labeled in the nodes, reflecting the presence of variants. This way, our method is able to detect variants at the read level.

We trained three different multi-layer perceptron models for SNPs, insertions and deletions respectively (with one input, one output, and two hidden layers) using features such as the allelic frequency in the reads, the length of the variant allele (for indels), the number of repeated k-mers (for k = 1, 2, and 3) around the variant (referred to as surrounding tandem repeats) and the haplotype consistency. This last metric refers to the total hamming distance between the observed haplotypes in the individual reads with the inferred haplotype from the k-means-based clustering (see “isoLASER: gene-level phasing” section). Thus, lower values indicate better clustering and haplotagging of the reads. To optimize model hyperparameters, we implemented the GridSearchCV function from the sklearn Python package, which employs an exhaustive “fit and score” approach across parameter combinations. The model was trained over 5000 iterations using the ReLU activation function. Performance was evaluated using a 10-fold cross-validation via shuffle-and-split sampling. The training dataset was randomly split into 70% for training and 30% for testing. The best-performing model was selected from the cross-validation. The variant quality was then calculated as the Phred-normalized probability of the False Positive classification.

These classifiers were trained using long-read RNA-seq from the GM12878 cell line and their genotyped variants from DNA sequencing generated by the Genome In A Bottle (GIAB) Consortium29 (URL: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/NISTv3.3.2/GRCh38/) (v. 3.3.2). We used the ground truth variant set to label the isoLASER variant calls into True or False Positives, thus limiting the training to detected variants. The training performance was then assessed using Receiver Operating Characteristic (ROC) and Precision-Recall curves (Supplementary Fig. 1a). Recall was computed using variants detected by isoLASER, rather than the full set of ground truth variants, as our primary objective was to distinguish true positives from false positives among called variants. A variant was considered testable if it met the following criteria: a minimum coverage of 20 reads in total. True sensitivity was assessed using data from the WTC11 and HG002 cell lines, independent of the GM12878 training data (Supplementary Figs. 1b, 2a).

isoLASER: gene-level phasing

Read-level variant information extracted by isoLASER facilitates variant phasing at the transcript level while organizing reads into local haplotypes (haplotagging). For every expressed gene, a v×n matrix was defined where v was the number of heterozygous variants identified in the gene, and n was the number of reads aligned to the gene. Each entry contained the detected allele in the corresponding read and the variant quality score (which is the same for all reads covering the variant). A value of 0 or 1 was assigned for the reference or alternative allele. A weighted k-means clustering-based approach was used to group the reads into haplotypes. K is set to 2, thus assuming diploid genomes. The weights of the variants were their variant quality score, calculated by the multi-layer perceptron (ranging from 0 to 1). The goal of using these weights is to reduce the effect of low-confidence variants in the clustering. Multiple rounds with random initialization were used to mitigate the potential sensitivity of the clustering outcome to the initial conditions. The best clustering was evaluated based on the average distance of the reads to the estimated centroid. The distance was defined as the hamming distance between the read-level haplotype and the centroid multiplied by the weight. Haplotype consistency was calculated as the log of the mean of the distances across reads. Given that haplotype consistency contributes to variant quality, we performed phasing and variant scoring iteratively until convergence.

Even with long-read sequencing, there are still sets of reads that do not overlap with each other within the same gene. These are often intronic reads that do not overlap with the exonic ones. Thus, we phased the reads in blocks. The v×n matrix is dot-multiplied by itself row-wise. In this way, reads that share variants have a value greater than or equal to 1, and reads that do not share any variant have a value of 0. Using this new matrix, we identified connected groups of reads and split them into “phasing groups”. Clustering was then performed separately for each group, as it is required that reads share variants for phasing. A minimum of 10 reads per phasing group was required.

Due to the known high error rate of indel calls, accurate haplotagging based solely on indels is challenging. Therefore, each phasing block was required to include at least one heterozygous SNP to ensure reliable haplotype assignment.

isoLASER: linkage test

Linkage between haplotype and local splicing at the gene level was calculated using the adjusted mutual information (AMI)79. The AMI of a given exonic part i in gene j is given as follows:

AMIij=MI(Gj,Si)E{MI(Gj,Si)}max{H(Gj),H(Si)}E{MI(Gj,Si)}

where:

MI(Gj,Si)=gGjsSipg,s×logp(g,s)p(g)×p(s)
Gj~{Haplotype1,Haplotype2}
Si~{inclusion,exclusion}
HGj=gGjp(g)×logp(g)
HSi=sSip(s)×logp(s)

where Gj represents the gene-level haplotype membership of each read, and Si corresponds to the splicing status of the exonic part i in each read, i.e., included or excluded.

The conventional mutual information approach may fail to discern meaningful linkages in cases where cluster imbalance exists, i.e., haplotype ratio of genes or inclusion ratios of exonic parts that are close to 0 or 1. AMI is less sensitive to this imbalance by subtracting the expected mutual information between two random clusters, i.e., E{MI(Gj,Si)}, and normalizing by the entropy of the variables max{H(Gj),H(Si)}.

Simulated data with randomly assigned labels (i.e., no underlying linkage) showed that AMI variance strongly depended on the total number of observations (that is, read coverage) (Supplementary Fig. 2a). We modeled this variance by fitting a third-degree polynomial through the 95th and 99th percentile values of different coverage bins. This serves as a proxy to the AMI significance as a function of read coverage. It is also a more computationally efficient approach than bootstrapping in assigning a meaningful significance score. Based on the polynomial prediction of AMI quantiles, we categorized exonic parts into cis-directed if their AMI was above the 99th percentile, ambiguous if they were between the 95th and 99th percentile, and trans-directed if they were below the 95th percentile. Note that in analyzing actual long-read data, genes with coverage higher than 3000 reads were down-sampled to 2000 to reduce the processing time and memory. The observed gene coverage distribution in real data indicates that most genes have significantly fewer reads than this cutoff (95% genes with ≤650 reads). Furthermore, as an effect size filter, we required cis-directed events to exhibit at least a 5% difference in PSI (delta PSI) between the two haplotypes of the same gene.

isoLASER-joint

isoLASER can perform joint linkage analysis by merging the linkage events and variant calls from multiple samples (Supplementary Fig. 4). For a specific exonic part, isoLASER merges the reads of the same phasing group from different samples and calculates a new AMI score. To account for differences in read coverage between samples, read counts were normalized prior to aggregation. This joint analysis requires a minimum of two samples, although the threshold can be adjusted by the user.

The isoLASER-joint method affords one major advantage. When analyzing multiple samples jointly, only functional variants or variants in strong linkage disequilibrium with the functional variant will maintain a high association with the splicing of the exonic parts. The association between non-functional variants and splicing is most likely inconsistent across samples. Therefore, isoLASER-joint effectively narrows down on the candidate variants that are potentially functional here. These variants are referred to as Splicing-Associated Variants (SAVs) (Supplementary Fig. 4). Note that we do not claim all SAVs are causal, but rather, they are enriched with potentially causal variants, analogous to fine-mapped variants. Samples that are not testable for linkage due to low read coverage or homozygosity can also contribute to the joint analysis, as they allow an examination of the consistency between alleles and splicing. Overall, as long as an alternative allele exists in the multi-sample cohort, isoLASER-joint can assess its allele-specific splicing in the specific sample and its association with splicing in the cohort, affording a possibility to study rare variants. The same AMI cutoffs as in the single-sample analysis were used to determine the significance of cis-linkage in isoLASER-joint. While, in principle, including more samples at this step could yield a shorter and more statistically robust list of SAVs, this was not explicitly evaluated in the current study.

It should be noted that isoLASER (single or joint mode) was applied to long-read RNA-seq data generated by both PacBio and Oxford Nanopore platforms. However, due to the elevated error rates associated with older Nanopore chemistries, we recommend focusing analyses on SNPs by excluding indels and validating detected variants against external datasets prior to performing phasing and allele-specific splicing analyses. In this study, variants identified from Nanopore data for the Alzheimer’s Disease analysis were required to be present in the gnomAD database62 (hg38, version 3.1.2) and sQTL data from the same tissue in an independent study57. For variants not directly found in these databases, we queried for linked variants in strong linkage disequilibrium (LD) (source: TopLD, v.2, URL:http://topld.genetics.unc.edu/downloads/downloads) to evaluate whether the original variant may be represented through LD associations.

Benchmarking of variant calling, phasing, allele-specific event detection, and HLA typing

Variant calling

variant calling by isoLASER was benchmarked against other methods, such as DeepVariant, Clair3, and GATK HaplotypeCaller, following the protocol established by de Souza et al.30. In brief, we used these methods to call variants in a long-read RNA sequencing data derived from the HG002 cell line, using the following parameters (other than the standard parameters for the input bam file, genome reference, threads, and output):

  • DeepVariant (version 1.6) parameters: --model_type PACBIO

  • Clair3 (version 1.0.10) parameters: --model_path path/to/Clair3/models/hifi

    --platform hifi --include_all_ctgs. Following the protocol from de Souza et al.30, we used this tool with three versions of the bam file (see below for pre-processed bam files) and combined the variant calls to achieve the best results.

  • GATK HC (version 4.6.0) parameters: -ERC GVCF (and the best-practices preprocessing pipeline).

  • IsoLASER (version 0.0.1) parameters: (no additional parameters).

For DeepVariant, Clair3, and GATK, we pre-processed the bam files by splitting exon-junction reads using the N cigar, followed by flag correction, according to the protocol described by de Souza et al.30. This transformation has been shown to improve the performance of these methods. IsoLASER was applied directly to the unmodified BAM files, without any preprocessing.

To evaluate variant calling performance, we used the lrRNABenchmark R package developed by de Souza et al.30 (https://github.com/vladimirsouza/lrRNAseqBenchmark). This tool compares variant calls from each method against a ground truth set, classifying each called variant as a true or false positive, and each ground truth variant as a true positive or false negative. Performance metrics, including precision, recall, and F1 score, were calculated at multiple read coverage thresholds (20, 31, 100, and 310 reads), with coverage derived directly from the alignment files rather than relying on method-reported values. To minimize potential confounding effects by densely clustered variants, regions containing three or more variants within a 201-base window were excluded from the analysis.

For the WTC11 data, we used the VCF files generated by de Souza et al.30 for DeepVariant, Clair3, and GATK directly, and incorporated only the isoLASER variant calls into the benchmark script ‘lrRNABenchmark’. The benchmarking parameters used for WTC11 were the same as for HG002. It is worth noting that de Souza et al.30 used older versions of the tools: DeepVariant (version 1.1.0), Clair3 (version 0.1-r5), and GATK HC (version 4.1.9.0).

Gene-level phasing

We evaluated the phasing performance of isoLASER in comparison to HapCut2, a tool designed for haplotype assembly, using the HG002 and WTC11 datasets. To accommodate the potential discontinuity of long-read alignments, both methods perform phasing in blocks, defined as groups of variants covered by overlapping reads. For isoLASER, we used phasing blocks as indicated by the “Phasing Group” (PG) tag in the output. We considered all heterozygous variants called by isoLASER and HapCut2 for evaluation. In isoLASER output, successful phasing is denoted by the genotype separator (“|”) in the VCF genotype field. We observed that blocks containing multiple variants were more likely to be phased successfully by both methods. Variants phased uniquely by isoLASER tend to localize in blocks with fewer variants (Supplementary Fig. 1c). Additionally, to assess consistency between the two methods, we calculated the switch-error rate. This metric was defined as the proportion of adjacent heterozygous variants within a phasing block whose phased genotype differs from HapCUT2. In other words, it is the proportion of genotype switches between the two methods.

Additionally, for the HG002 cell line, we obtained fully phased variants from the Human Pangenome Consortium (URL https://data.nist.gov/od/ds/ark:/88434/mds2-2578/assemblies-and-benchmarking_results/HG002-HPRC.tar.gz). These variants were used as an independent ground truth to further evaluate the phasing accuracy of isoLASER. As in the HapCUT2 comparison, we assessed the overlap between isoLASER and HPRC by quantifying the number of variants phased by either or both methods, as well as the corresponding switch-error rate.

Cis-directed event calling (linkage testing)

To benchmark isoLASER’s ability to identify allele-specific splicing events, we used LORALS (GitHub: https://github.com/LappalainenLab/lorals). Briefly, LORALS calculates variant read counts at the allele and isoform level, then performs a chi-squared test to identify allelic imbalances in isoform expression. Genes with significant imbalance are labeled as exhibiting Allele-Specific Transcript Structure (ASTS). For consistency, we labeled genes as cis-directed in the isoLASER results if they contained at least one exonic part identified as cis-directed.

We followed the workflow proposed in the LORALS GitHub page, performing transcriptome and genome-based alignment before calculating allele-specific expression (ASE) and ASTS. LORALS excludes isoforms supported by fewer than 10 reads and discards reads containing indels at variant sites, which considerably limits the number of testable genes compared to isoLASER. On the other hand, isoLASER requires a total of 20 reads for the gene overall. As the WTC11 dataset lacks variant and phasing information required by LORALS, we utilized the variant calls and phasing results from isoLASER. Additionally, we applied LORALS and isoLASER to the HG002 sample, for which we obtained previously published genotype and phasing data35. Note that LORALS does not support indels or structural variants in the haplotype-specific alignment. Thus, we aligned the HG002 reads to the reference genome (GRCh38) as the aligners can tolerate the presence of SNPs.

HLA typing

To evaluate the quality of HLA allele groups identified by isoLASER, we obtained 70 short-read total RNA-seq samples from the ENCODE portal, derived from 16 donors with corresponding long-read RNA-seq data. We used arcasHLA50 (genotype function) to determine HLA alleles for each donor and each gene from the short-read data. For each gene-donor pair, we selected HLA types supported by at least two short-read samples (from the genotype.json files). The HLA types identified from the short-read data using arcasHLA were then compared to those determined by isoLASER for matching genes and donors, as summarized in Supplementary Data 9.

Additional benchmarking of the HLA genes in the HG002 cell line was performed using diploid assembly of the MHC region by Chin et al.49.

Cluster assignment consistency

Hierarchical clustering (Fig. 1d, e) of samples using PSI and AMI metrics was performed by the package aheatmap, which uses Euclidean distance to measure similarity between samples and applies the ‘complete linkage’ method for clustering. We then cut the clustering dendrogram into 7 clusters and measured their correlation with the ‘Tissue’ and ‘Donor’ labels using the variation of information (VI) metric from the R function ‘fpc::cluster.stats’. Smaller VI values represent a higher correlation between the two sets of labels.

sQTL overlap

We downloaded sQTL results from the GTEx portal that included sGenes and significant variant-splicing associations based on LeafCutter’s intron excision phenotypes. Matching similar tissues between the GTEx and ENCODE data, we overlapped the significant sQTL pairs with the SAVs and exonic parts obtained from isoLASER-joint. Variants that were not significantly associated with splicing from the isoLASER analysis were included as matched controls.

There exist significant differences in how splicing is quantified by LeafCutter (used in the sQTL mapping) and isoLASER. The value calculated by LeafCutter is the junction usage within intron clusters, while isoLASER calculates PSI for each exonic part. To make the splicing metrics and events comparable between the two methods, we focused on intron clusters corresponding to exon skipping and alternative splice site usage, while excluding more complex splicing patterns such as mutually exclusive exons and multi-exon skipping. For these selected scenarios, the usage of the longest junction is inversely proportional to the PSI of the skipped exon or exonic part. Thus, for Fig. 3e, we expect an inverse correlation between these two metrics.

Furthermore, for matching events, we compared the effect size of genetic elements on splicing by extracting the regression parameter beta from the sQTL mapping and the delta PSI from isoLASER.

Gene ontology (GO) analysis

GO terms were obtained from the Ensembl database using the R package biomaRt80,81. To obtain the best GO term enrichment for cis-directed genes, we only used genes harboring events with a delta PSI greater than or equal to 0.25. The overall set of testable genes was used as background genes for each of the query gene sets. A testable gene was defined as a gene with 20 or more reads coverage containing a heterozygous variant but no cis-directed event. Then, for each GO term, we determined the significance of the enrichment among the query genes compared to the background using a hypergeometric test to obtain a p-value. After FDR correction using the Benjamini–Hochberg approach, only terms with FDR < = 0.05 and containing at least five query genes were considered.

Motif analysis

First, we classified the SAVs from isoLASER-joint into two groups depending on the observed change in splicing levels. If the variant allele was linked to a lower PSI than the reference allele, we included the variant allele in the set of variants that downregulate splicing, and the reference allele was included in the upregulating group, and vice versa. For each reference and alternative allele, we obtained an 11-nt sequence around the variant position (5-nt on each side). Then, we conducted a de novo RNA motif search with HOMER82 on the downregulating and upregulating sets of sequences independently, using one set as the background for the other.

RBP overlap analysis

We determined which RBPs may play a role in cis-directed splicing in humans by matching motifs identified by HOMER with motifs from RBP-motif pairs that were previously reported using the RBNS approach39,83. Then, we used DeepRiPe43 to predict differences in RBP binding preferences between the reference and variant alleles. Also, for each RBP, we randomly sampled a matching number of variants in trans-directed genes as controls for comparison and predicted changes in RBP binding. Lastly, we compared the distribution of changes in RBP binding caused by our set of putative functional variants and the randomly sampled controls.

Delta-delta PSI

The delta-delta PSI (ddPSI) captures the difference in PSI between haplotypes and between conditions and is defined as follows:

deltaPSI=PSIREFPSIALT
ddPSI=deltaPSIADdeltaPSICTRL

where PSIREF represents the PSI value of the exonic part linked to the reference allele, and PSIALT represents that for the alternative allele. ddPSI denotes the difference in delta PSI between the AD and control samples.

The ddPSI values range from 0 to 2, representing, respectively, no difference in linkage (0) to perfect linkage in the two conditions but with the opposite alleles (2).

HLA typing

To annotate the HLA alleles present in different samples, we mapped raw sequencing reads to the HLA allele database IPD-IMGT. The alignment was performed using minimap2 (v. 2.28-r1209)74 with splice-aware parameters: -ax splice:hq -uf to allow for split alignment. Supplementary and secondary alignments were excluded. We then identified the most prevalent allele groups among reads mapped to each HLA gene according to the standard nomenclature (https://hla.alleles.org/pages/nomenclature/naming_alleles/).

AlphaFold

We followed the pipeline established in the Google Colab Notebook (https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) to fold the transcripts identified by TALON. This pipeline utilizes AlphaFold84 v. 2.3.2. To assess the impact of cis-directed events on the resulting peptides, we specifically chose a subset of transcripts that differed solely in the inclusion or exclusion of the relevant exonic parts.

Supplementary information

Supporting Information (7.8MB, pdf)
41467_2025_64605_MOESM2_ESM.pdf (106KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (2.6MB, xlsx)
Supplementary Data 2 (1.8MB, xlsx)
Supplementary Data 3 (1.8MB, xlsx)
Supplementary Data 4 (3.6MB, xlsx)
Supplementary Data 5 (11.3KB, xlsx)
Supplementary Data 6 (33.7KB, xlsx)
Supplementary Data 7 (106.1KB, xlsx)
Supplementary Data 8 (10.8KB, xlsx)
Supplementary Data 9 (14KB, xlsx)
Supplementary Data 10 (22.6KB, xlsx)

Acknowledgements

We thank the Mortazavi laboratory at UC Irvine and the World laboratory at Caltech for producing the ENCODE dataset. We appreciate the helpful discussions with Dr. Michael R. Sawaya at UCLA. We thank members of the Xiao laboratory for helpful comments and discussions. This work was supported in part by grants from the National Institutes of Health (U01HG009417, R01AG056476, R01AG075206 to X.X.). G.Q.V. was supported by the UCLA Quantitative and Computational Biosciences Collaboratory Fellowship. K.A. was supported by the University of California-Historically Black Colleges and Universities (UC-HBCU) Fellowship. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author contributions

G.Q.V. and X.X. conceptualized the study. G.Q.V. and K.A. performed formal bioinformatic analyses. G.Q.V. wrote the isoLASER software. X.X. provided supervision. All authors contributed to the writing of the paper and approved the final manuscript.

Peer review

Peer review information

Nature Communications thanks Karine Choquet and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Data availability

The data supporting the findings of this study are available from the corresponding authors upon request. PacBio long-read RNA-seq of the human and mouse tissues and cell lines was obtained from the ENCODE portal (ENCODE 4 release) at https://www.encodeproject.org. In this study, we only included data generated by the PacBio Sequel II or later platforms. The Alzheimer’s ONT data were obtained from https://www.synapse.org/#!Synapse:syn5204789356. The GTEx sQTL data was obtained from the GTEx portal at https://www.gtexportal.org. Annotation of functional protein domains was obtained using InterPro at https://www.ebi.ac.uk/interpro.

Code availability

The source code for isoLASER is publicly available at https://github.com/gxiaolab/isoLASER. All steps involved in detecting cis-directed events in this study are also packaged in a Snakemake workflow that is publicly available at https://github.com/gxiaolab/isoLASER_paper.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-025-64605-6.

References

  • 1.Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol.18, 437–451 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Scotti, M. M. & Swanson, M. S. RNA mis-splicing in disease. Nat. Rev. Genet.17, 19–32 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science352, 600–604 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Matlin, A. J., Clark, F. & Smith, C. W. J. Understanding alternative splicing: towards a cellular code. Nat. Rev. Mol. Cell Biol.6, 386–398 (2005). [DOI] [PubMed] [Google Scholar]
  • 5.Ladd, A. N. & Cooper, T. A. Finding signals that regulate alternative splicing in the post-genomic era. Genome Biol.3, reviews0008.1 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Rong, S. et al. Large-scale functional screen identifies genetic variants with splicing effects in modern and archaic humans. Proc. Natl. Acad. Sci. USA120, e2218308120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cheung, R. et al. A multiplexed assay for exon recognition reveals that an unappreciated fraction of rare genetic variants cause large-effect splicing disruptions. Mol. Cell73, 183–194.e8 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Adamson, S. I., Zhan, L. & Graveley, B. R. Vex-seq: high-throughput identification of the impact of genetic variation on pre-mRNA splicing efficiency. Genome Biol.19, 71 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rosenberg, A. B., Patwardhan, R. P., Shendure, J. & Seelig, G. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell163, 698–711 (2015). [DOI] [PubMed] [Google Scholar]
  • 10.Garrido-Martín, D., Borsari, B., Calvo, M., Reverter, F. & Guigó, R. Identification and analysis of splicing quantitative trait loci across multiple tissues in the human genome. Nat. Commun.12, 727 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Amoah, K. et al. Allele-specific alternative splicing and its functional genetic variants in human tissues. Genome Res.31, 359–371 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Xiong, H. Y. et al. The human splicing code reveals new insights into the genetic determinants of disease. Science347, 1254806 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bretschneider, H., Gandhi, S., Deshwar, A. G., Zuberi, K. & Frey, B. J. COSSMO: predicting competitive alternative splice site selection using deep learning. Bioinformatics34, i429–i437 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cheng, J., Çelik, M. H., Kundaje, A. & Gagneur, J. MTSplice predicts effects of genetic variants on tissue-specific splicing. Genome Biol.22, 94 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell176, 535–548.e24 (2019). [DOI] [PubMed] [Google Scholar]
  • 16.Dawes, R. et al. SpliceVault predicts the precise nature of variant-associated mis-splicing. Nat. Genet.55, 324–332 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wagner, N. et al. Aberrant splicing prediction across human tissues. Nat. Genet.55, 861–870 (2023). [DOI] [PubMed] [Google Scholar]
  • 18.Bhattacharya, A. et al. Isoform-level transcriptome-wide association uncovers genetic risk mechanisms for neuropsychiatric disorders in the human brain. Nat. Genet.55, 2117–2128 (2023). [DOI] [PMC free article] [PubMed]
  • 19.Gandal, M. J. et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science362, eaat8127 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Holm, L. L. et al. All exons are not created equal—exon vulnerability determines the effect of exonic mutations on splicing. Nucleic Acids Res.52, 4588–4603 (2024). [DOI] [PMC free article] [PubMed]
  • 21.Glidden, D. T., Buerer, J. L., Saueressig, C. F. & Fairbrother, W. G. Hotspot exons are common targets of splicing perturbations. Nat. Commun.12, 2756 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Baeza-Centurion, P., Miñana, B., Valcárcel, J. & Lehner, B. Mutations primarily alter the inclusion of alternatively spliced exons. Elife9, e59959 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hsiao, Y.-H. E. et al. Alternative splicing modulated by genetic variants demonstrates accelerated evolution regulated by highly conserved proteins. Genome Res.26, 440–450 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wright, C. J., Smith, C. W. J. & Jiggins, C. D. Alternative splicing as a source of phenotypic diversity. Nat. Rev. Genet.23, 697–710 (2022). [DOI] [PubMed] [Google Scholar]
  • 25.Jelen, N., Ule, J., Živin, M. & Darnell, R. B. Evolution of nova-dependent splicing regulation in the brain. PLoS Genet.3, e173 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Barbosa-Morais, N. L. et al. The evolutionary landscape of alternative splicing in vertebrate species. Science338, 1587–1593 (2012). [DOI] [PubMed] [Google Scholar]
  • 27.Reese, F. et al. The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity. Preprint at bioRxiv10.1101/2023.05.15.540865 (2023).
  • 28.Consortium, T. Gte. et al. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science348, 648–660 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Cell Genom.2, 100128 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.de Souza, V. B. C. et al. Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data. Genome Biol.24, 1–14 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv10.1101/201178 (2018).
  • 32.Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol.36, 983–987 (2018). [DOI] [PubMed] [Google Scholar]
  • 33.Zheng, Z. et al. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat. Comput. Sci.2, 797–803 (2022). [DOI] [PubMed] [Google Scholar]
  • 34.Edge, P., Bafna, V. & Bansal, V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res.27, 801–812 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol.41, 1474–1482 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Glinos, D. A. et al. Transcriptome variation in human tissues revealed by long-read sequencing. Nature608, 353–359 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Wang, Z. & Burge, C. B. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA14, 802–813 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Yang, E. W. et al. Allele-specific binding of RNA-binding proteins reveals functional genetic variants in the RNA. Nat. Commun.10, 396275 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Van Nostrand, E. L. et al. A large-scale binding and functional map of human RNA-binding proteins. Nature583, 711–719 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Martinez-Contreras, R. et al. hnRNP proteins and splicing control. Adv. Exp. Med. Biol.623, 123–147 (2007). [DOI] [PubMed] [Google Scholar]
  • 41.Stagsted, L. V. W., O’leary, E. T., Ebbesen, K. K. & Hansen, T. B. The RNA-binding protein SFPQ preserves long-intron splicing and regulates circRNA biogenesis in mammals. eLife10, 1–26 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Li, M. et al. KHSRP ameliorates acute liver failure by regulating pre-mRNA splicing through its interaction with SF3B1. Cell Death Dis.15, 1–15 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Ghanbari, M. & Ohler, U. Deep neural networks for interpreting RNA-binding protein target preferences. Genome Res.30, 214–226 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Liston, A., Humblet-Baron, S., Duffy, D. & Goris, A. Human immune diversity: from evolution to modernity. Nat. Immunol.22, 1479–1489 (2021). [DOI] [PubMed] [Google Scholar]
  • 45.Traherne, J. A. Human MHC architecture and evolution: implications for disease association studies. Int. J. Immunogenet.35, 179–192 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Wieczorek, M. et al. Major histocompatibility complex (MHC) class I and MHC class II proteins: conformational plasticity in antigen presentation. Front. Immunol. 8, 292 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Tamouza, R., Krishnamoorthy, R. & Leboyer, M. Understanding the genetic contribution of the human leukocyte antigen system to common major psychiatric disorders in a world pandemic context. Brain Behav. Immun.91, 731–739 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Furukawa, H., Oka, S., Shimada, K., Hashimoto, A. & Tohma, S. Human leukocyte antigen polymorphisms and personalized medicine for rheumatoid arthritis. J. Hum. Genet.60, 691–696 (2015). [DOI] [PubMed] [Google Scholar]
  • 49.Chin, C. S. et al. A diploid assembly-based benchmark for variants in the major histocompatibility complex. Nat. Commun.11, 1–9 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Orenbuch, R. et al. arcasHLA: high-resolution HLA typing from RNAseq. Bioinformatics36, 33–40 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic Acids Res.49, D412–D419 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Robinson, J., Barker, D. J. & Marsh, S. G. E. 25 years of the IPD-IMGT/HLA database. HLA103, e15549 (2024). [DOI] [PubMed] [Google Scholar]
  • 53.Ehlers, F. A. I. et al. Polymorphic differences within HLA-C alleles contribute to alternatively spliced transcripts lacking exon 5. HLA100, 232 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Salter, R. D. et al. Polymorphism in the α3 domain of HLA-A molecules affects binding to CD8. Nature338, 345–347 (1989). [DOI] [PubMed] [Google Scholar]
  • 55.Wang, Z.-X., Wan, Q. & Xing, A. HLA in Alzheimer’s disease: genetic association and possible pathogenic roles. Neuromol. Med.22, 464–473 (2020). [DOI] [PubMed] [Google Scholar]
  • 56.Aguzzoli Heberle, B. et al. Mapping medically relevant RNA isoform diversity in the aged human frontal cortex with deep long-read RNA-seq. Nat. Biotechnol.43, 635–646 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Raj, T. et al. Integrative transcriptome analyses of the aging brain implicate altered splicing in Alzheimer’s disease susceptibility. Nat. Genet.50, 1584–1592 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Radici, L., Bianchi, M., Crinelli, R. & Magnani, M. Ubiquitin C gene: structure, function, and transcriptional regulation. Adv. Biosci. Biotechnol.4, 1057–1062 (2013). [Google Scholar]
  • 59.Park, C.-W. & Ryu, K.-Y. Cellular ubiquitin pool dynamics and homeostasis. BMB Rep.47, 475–482 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Hegde, A. N., Smith, S. G., Duke, L. M., Pourquoi, A. & Vaz, S. Perturbations of ubiquitin-proteasome-mediated proteolysis in aging and Alzheimer’s disease. Front. Aging Neurosci.11, 475441 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Ciechanover, A. & Brundin, P. The ubiquitin proteasome system in neurodegenerative diseases: sometimes the chicken, sometimes the egg. Neuron40, 427–446 (2003). [DOI] [PubMed] [Google Scholar]
  • 62.Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature581, 434–443 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Wang, B. et al. Variant phasing and haplotypic expression from long-read sequencing in maize. Commun. Biol.3, 78 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Deonovic, B., Wang, Y., Weirather, J., Wang, X.-J. & Au, K. F. IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing. Nucleic Acids Res.45, e32–e32 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Hardwick, S. A. et al. Single-nuclei isoform RNA sequencing unlocks barcoded exon connectivity in frozen brain tissue. Nat. Biotechnol.40, 1082–1092 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Peng, Q. et al. Impacts and mechanisms of alternative mRNA splicing in cancer metabolism, immune response, and therapeutics. Mol. Ther.30, 1018–1035 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Schaub, A. & Glasmacher, E. Splicing in immune cells—mechanistic insights and emerging topics. Int. Immunol.29, 173–181 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Ren, P. et al. Alternative splicing: a new cause and potential therapeutic target in autoimmune disease. Front. Immunol. 12, 713540 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Sakaue, S. et al. Tutorial: a statistical genetics guide to identifying HLA alleles driving complex disease. Nat. Protoc.10.1038/s41596-023-00853-4 (2023). [DOI] [PMC free article] [PubMed]
  • 70.Marques-Coelho, D. et al. Differential transcript usage unravels gene expression alterations in Alzheimer’s disease human brains. NPJ Aging Mech. Dis.7, 2 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Biamonti, G. et al. Alternative splicing in Alzheimer’s disease. Aging Clin. Exp. Res.33, 747–758 (2021). [DOI] [PubMed] [Google Scholar]
  • 72.Sun, Y. et al. A splicing transcriptome-wide association study identifies novel altered splicing for Alzheimer’s disease susceptibility. Neurobiol. Dis.184, 106209 (2023). [DOI] [PubMed] [Google Scholar]
  • 73.Farhadieh, M.-E. & Ghaedi, K. Analyzing alternative splicing in Alzheimer’s disease postmortem brain: a cell-level perspective. Front. Mol. Neurosci. 16, 1237874 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics34, 3094–3100 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Wyman, D. & Mortazavi, A. TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts. Bioinformatics35, 340–342 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Wyman, D. et al. A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification. Preprint at bioRxiv10.1101/672931 (2020).
  • 77.Schafer, S. et al. Alternative splicing signatures in RNA-seq data: percent spliced in (PSI). Curr. Protoc. Hum. Genet87, 11.16.1–11.16.14 (2015). [DOI] [PubMed] [Google Scholar]
  • 78.Quinones-Valdez, G., Fu, T., Chan, T. W. & Xiao, X. scAllele: a versatile tool for the detection and analysis of variants in scRNA-seq. Sci. Adv.8, eabn6398 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Vinh, N. X., Epps, J. & Bailey, J. Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res.11, 2837–2854 (2010). [Google Scholar]
  • 80.Smedley, D. et al. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res.43, W589–W598 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Howe, K. L. et al. Ensembl 2021. Nucleic Acids Res.49, D884–D891 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell38, 576–589 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Lambert, N. et al. RNA Bind-n-Seq: quantitative assessment of the sequence and structural binding specificity of RNA binding proteins. Mol. Cell54, 887–900 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature596, 583–589 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information (7.8MB, pdf)
41467_2025_64605_MOESM2_ESM.pdf (106KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (2.6MB, xlsx)
Supplementary Data 2 (1.8MB, xlsx)
Supplementary Data 3 (1.8MB, xlsx)
Supplementary Data 4 (3.6MB, xlsx)
Supplementary Data 5 (11.3KB, xlsx)
Supplementary Data 6 (33.7KB, xlsx)
Supplementary Data 7 (106.1KB, xlsx)
Supplementary Data 8 (10.8KB, xlsx)
Supplementary Data 9 (14KB, xlsx)
Supplementary Data 10 (22.6KB, xlsx)

Data Availability Statement

The data supporting the findings of this study are available from the corresponding authors upon request. PacBio long-read RNA-seq of the human and mouse tissues and cell lines was obtained from the ENCODE portal (ENCODE 4 release) at https://www.encodeproject.org. In this study, we only included data generated by the PacBio Sequel II or later platforms. The Alzheimer’s ONT data were obtained from https://www.synapse.org/#!Synapse:syn5204789356. The GTEx sQTL data was obtained from the GTEx portal at https://www.gtexportal.org. Annotation of functional protein domains was obtained using InterPro at https://www.ebi.ac.uk/interpro.

The source code for isoLASER is publicly available at https://github.com/gxiaolab/isoLASER. All steps involved in detecting cis-directed events in this study are also packaged in a Snakemake workflow that is publicly available at https://github.com/gxiaolab/isoLASER_paper.


Articles from Nature Communications are provided here courtesy of Nature Publishing Group

RESOURCES