Abstract
Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTLs) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here, we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.
Introduction
The link between genetics and splicing in complex disease
One of the themes of the evolution of multicellular organisms is the development of managed complexity, with alternative splicing serving as a prime example. As genes are partitioned into exons separated by intronic sequences of increasing length and sequence content, the complexity of gene protein products increases tremendously, with splicing serving as the control mechanism for this combinatorial protein complexity. Naturally occurring genetic variants can perturb this splicing control mechanism, and it is now clear that an appreciable proportion of genome-wide association studies (GWAS) loci contain variants that alter splicing (1–4).
GWAS has provided a wealth of novel insights into human biology, identifying genetic variants in over 55 000 loci associated with 5000 complex traits and diseases (5). In addition to GWAS, association studies across multiple tissues and cell types have mapped expression quantitative trait loci (eQTLs), whereby common genetic variants alter total gene expression levels (6). Colocalization studies combining GWAS and eQTL results have successfully identified functional mechanisms for many GWAS loci (7), indicating that the functional of these GWAS loci is to alter gene regulation. However, the function of most GWAS loci remains uncharacterized, suggesting that other important regulatory mechanisms are involved (and that eQTL discovery is incomplete). It is increasingly clear that one of these other mechanisms is genetically influenced alternative splicing, measured by splicing quantitative trait loci (sQTLs, Fig. 1), making the study of alternative splicing a priority for GWAS functional characterization of these loci (8–10).
Figure 1.
Conceptually, eQTLs do not affect sequence content but only affect the amount of RNA and protein. In reality, sQTLs can sometimes be observed as eQTL effects. sQTLs alter RNA and potentially protein content, leading to qualitative changes in protein function. Of note, sQTLs can also have quantitative effects on RNA and protein levels as shown in Figure 3.
The first observations of splicing in human immunoglobulin genes were made decades ago (11,12), but it was through the use of RNA-seq that the nearly ubiquitous nature of splicing in the human transcriptome was demonstrated (13). Alternative splicing (AS) of mRNA molecules to produce distinct isoforms is a mechanism of gene regulation inherent to nearly every protein-coding gene (92–94%) (13,14). Specific splicing events arise from the interplay of core splice factors, which are mandatory for splicing, and auxiliary splice factors, which regulate splicing (15,16) to form the ‘splicing network’ (10,17).
Aberrant splicing leads to a host of pathologies, from neurodegeneration to cancer (18–20). Genetic variation can affect splice factors, their target binding sites or other regulatory elements to disrupt the balance of the splicing network. Splice-altering genetic variation is consistent with other quantitative traits whereby the effect size of variants on splicing is inversely correlated with their minor allele frequency (21). Rare variants tend to have more dramatic effects on protein function, such as mis-splicing that leads to a truncated protein. On the other hand, common variants contribute to complex genetic diseases through a continuum of effects on splicing, from dramatic loss of multiple exons to subtle shifts of splicing ratios (3,4). Functional genomics approaches to map and functionally characterize sQTLs are rapidly advancing, and, as we discuss below, are ripe for integration with emerging long-read RNA sequencing approaches. Past excellent reviews cover the topic of genetically regulated alternative splicing, with a focus on insights derivable from the short-read RNA-seq data available at the time (8–10). This review comments on the state of the field in terms of the study of splicing in the context of complex human disease (GWASs) with a focus on long-read sequencing.
Genome-wide insights into genetically regulated splicing through sQTL analysis
While sQTLs are understudied relative to eQTLs, certain facts about sQTLs have been repeatedly demonstrated. First, sQTLs are highly prevalent across the genome, which may not be surprising given the multitude of ways genetic variation can impact splicing (22,23). On a per sample basis, the number of sQTL containing genes discovered has been smaller than eQTL genes, but this may reflect lower power for sQTL discovery. Second, sQTL effects are highly shared across tissues, though there are many examples of tissue or cell-type specific sQTLs (3,4).
While both eQTLs and sQTLs can be found throughout the gene body and outside of genes, the distribution of these variants within gene bodies is different (1), with sQTLs showing a pronounced enrichment in splice donor and acceptor sites, among missense and synonymous exonic variants, and in 5′ untranslated regions (UTRs) (3,4). In terms of the biological mechanisms implicated by sQTLs, there are allelic effects on canonical splice site donors and acceptors, and there is also clear evidence that sQTLs disrupt binding of core and auxiliary splice factors (i.e. RNA-binding proteins) (1,4).
Methods for sQTL discovery: central role of short-read RNA-seq-based splicing quantification
The earliest sQTL studies made creative use of exon arrays to identify sQTLs by comparing genetic effects on exon and gene level expression (24–26); however, RNA-seq has revolutionized the identification of sQTLs, largely due to the fact that RNA-seq provides direct measurement of splicing through junctional reads (27,28). A review of recent studies using sQTLs to functionally characterize GWAS loci may be found in Table 1. Previous reviews give an excellent overview of sQTL studies before 2018 (9,10).
Table 1.
An overview of sQTL studies that examine GWAS loci in the context of complex traits (published 2018–2022)
Disease/Trait | Year | Sample size | Tissue | Junction/isoform Quantification | sQTL calling | Study |
---|---|---|---|---|---|---|
Various traits | 2022 | GTEx | GTEx (49 tissues) | LeafCutter* | FastQTL + PEER* | Rouhana et al. (121) |
Various neurological and psychiatric disorders | 2022 | 100 | 255 primary human microglial samples from multiple brain regions | LeafCutter | TensorQTL | Lopes et al. (122) |
Alzheimer’s disease and related dementias | 2022 | Various including GTEx | Various including GTEx | Previously reported | Previously reported | Bellenguez et al. (123) |
Pancreatic Cancer | 2022 | TCGASpliceSeq (N = 176) | Pancreatic ductal adenocarcinoma (PDAC) | SpliceSeq | MatrixeQTL + regression analysis | Tian et al. (124) |
Various complex traits | 2022 | N/A | Various immune cells | In-house package | N/A | Yamaguchi et al. (125) |
BD (Bipolar Disorder) | 2022 | 511 total samples from 295 unique donors | Subgenual anterior cingulate cortex and amygdala samples | LeafCutter | FastQTL + PEER | Zandi et al. (126) |
Cardiometabolic traits | 2022 | 426 Finnish men from the METSIM study | Subcutaneous adipose tissue | LeafCutter | QTLtools + PEER | Brotman et al. (127) |
Prostate Cancer (PrCa) | 2022 | 467 | Benign prostate tissue | RSEM + sQTLseekeR | sQTLseekeR | Tian et al. (128) |
Coronary Artery Disease (CAD) | 2022 | 151 | Cultured smooth muscle cells | LeafCutter | FastQTL + PEER | Aherrahrou et al. (129) |
Various brain related traits | 2021 | PsyENCODE cohort (N = 1073) | Brain | THISTLE + LeafCutter | FastQTL + PEER | Yang et al. (50) |
Meta-analysis of Various traits | 2021 | Varied | Varied | LeafCutter | QTLtools + PEER | Kerimov et al. (115) |
Developing cortical wall or adult cortex | 2021 | Primary human neural progenitors (n = 85) and their sorted neuronal progeny (n = 74) | Primary human neural progenitors and their sorted neuronal progeny | LeafCutter | EMMAX | Aygun et al. (130) |
Meta-analysis of multiple cohorts for human immune traits | 2021 | Multiple cohorts: 1) DGN: N = 922, 2) BLUEPRINT: N = 197, 3) GEUVADIS N = 462 | Various immune cells | LeafCutter | FastQTL + PEER | Mu et al. (131) |
Various traits | 2021 | GTEx | GTEx (All tissues) | sQTLseekeR2 | sQTLseekeR2 | Garrido-Martín et al. (4) |
Various traits | 2021 | GTEx | GTEx (49 tissues) | LeafCutter* | FastQTL + PEER* | Barbeira et al. (2) |
Type 2 diabetes (T2D) | 2021 | GTEx | GTEx (48 tissues) | LeafCutter* | FastQTL + PEER* | Chen et al. (132) |
Kidney function | 2021 | GTEx | GTEx (Kidney) | LeafCutter* | FastQTL + PEER | Stanzick et al. (133) |
Type 1 Diabetes (T1D) | 2021 | Genotype-Tissue Expression (GTEx) | GTEx (All tissues) | LeafCutter* | FastQTL + PEER | Gao et al. (134) |
Glioma | 2021 | CommonMind Consortium (CMC) and GTEx | Multiple brain tissues | LeafCutter | Matrixeqtl + PEER | Patro et al. (135) |
Amyotrophic lateral sclerosis (ALS) | 2021 | 154 ALS cases and 49 control individuals | Cervical, thoracic, and lumbar spinal cord segments | LeafCutter | TensorQTL + PEER | Humphrey et al. (136) |
Complex disease in Colon | 2021 | 485 | Colonic mucosal biopsy | LeafCutter | FastQTL + PEER | Díez-Obrero et al. (137) |
Parkinson’s disease (PD) | 2021 | 230 | Monocytes | LeafCutter | QTLtools + PEER | Navarro et al. (138) |
Mental illness (bipolar disorder, schizophrenia, major depression) | 2021 | 200 | Postmortem subgenual anterior cingulate cortex (sgACC) | SQTLseekeR | sQTLseekeR | Akula et al. (139) |
Schizophrenia | 2021 | 151 | Prefrontal cortical samples | LeafCutter | QTLtools + PEER | Liu et al. (140) |
Melanoma | 2021 | 106 | Human primary melanocytes | LeafCutter | FastQTL + PEER | Zhang et al. (141) |
Aging human brain | 2020 | Religious Order Study (ROS) and Memory and Aging Project (MAP) cohorts (N = 450) | Brain | LeafCutter* | FastQTL + PEER* | Yang et al. (142) |
Chronic obstructive pulmonary disease (COPD) | 2020 | GTEx + Lung Tissue Research Consortium (LTRC) | GTEx (Lung) + LTRC | LeafCutter | FastQTL + PEER | Saferali et al. (73) |
Bladder cancer | 2020 | 580 cases/1101 controls (GTEx, TCGA, GEO, CancerSplicingQTL, 1000 Genomes Project) | Bladder | LeafCutter and SpliceSeq* | FastQTL + PEER + sQTLSeekeR* | Guo et al. (143) |
Cancer | 2020 | 19 257 cases and 30 208 controls (71 studies from 52 publications) | Various tissues | LeafCutter* | FastQTL + PEER* | Yuan et al. (144) |
CAD, stroke, migraine, abdominal aortic aneurysm | 2020 | 19 paired primary human coronary artery smooth muscle and endothelial cells | HCASMCs and HCAECs | MAJIQ | In-house regression analysis | Nurnberg et al. (145) |
Various traits | 2020 | 838 | Various tissues | LeafCutter | FastQTL + PEER | GTEx consortium (3) |
Parkinson’s disease (PD) | 2019 | ROS + MAP+CMC (N = 902) | Brain | LeafCutter* | FastQTL + PEER* | Li et al. (146) |
Immune activation | 2019 | 970 RNA-seq from 200 individuals of African- and European-descent | Resting and stimulated monocytes | LeafCutter | MatrixeQTL + PEER | Rotival et al. (147) |
Chronic obstructive pulmonary disease (COPD) | 2019 | 376 | Whole Blood | LeafCutter | MatrixeQTL + PEER | Saferali et al. (148) |
Schizophrenia | 2019 | 201 | Mid-gestational human brains | LeafCutter | FastQTL + PEER | Walker et al. (34) |
Cardiovascular disease | 2019 | 83 | Induced pluripotent stem cell (iPSC), hepatocyte-like cell (HLC), primary liver tissues | LeafCutter | QTLtools + PEER | Gawronski et al. (149) |
Alzeheimer’s disease (AD) | 2018 | 450 | Dorsolateral prefrontal cortex (DLPFC) | LeafCutter | FastQTL + PEER | Raj et al. (150) |
Coronary Artery Disease (CAD) | 2018 | 52 | HCASMC | LeafCutter | FastQTL + PEER | Liu et al. (151) |
aPreviously reported sQTL dataset
In many cases, sQTL detection methods utilize the same regression-based software programs used for eQTL detection, such as MatrixeQTL (29), FastQTL (30), tensorQTL (31) and EMMAX (32), though other approaches such as transcriptome-wide association studies (TWAS) have been developed (33,34). For eQTL studies, quantification of gene expression is fairly straightforward, but splicing differs from eQTLs in that alternative splicing is typically expressed in ratios in which the numerator is the count of a particular splice event, such as inclusion of an exon, and the denominator is the sum of the counts of all other linked splicing events. The most common metric used to quantify splice events is □, ‘percent spliced in’ (PSI), which represents the rate at which a genic feature (such as an exon) is included in mature RNA transcripts. Thus, the foundation for most sQTL analyses depends on the concept of a splicing event which leads naturally to an ‘event-based’ approach to splicing quantification.
A wide variety of splice quantification methods have been developed (35–38). Most event-based approaches take a ‘local’ approach in which the numerator and denominator values used to compute □ are calculated from directly observed short-read quantities, such as junctional reads or exon counts. Two widely used programs, rMATS (39) and LeafCutter (28), provide useful illustration of the central challenge in the event-based approach, namely the lack of a clear correct approach to calculating the denominator for □. This denominator is meant to capture the set of splicing alternatives that is relevant for any given splicing event, but splicing often shows a complex pattern of dependency in which splice events in different parts of the gene body have a strong pattern of co-occurrence (40,41). This complexity can make it challenging to unambiguously define the set of linked events that should constitute the denominator for calculating the □ of any specific event, as described in detail in the MISO publication (35). rMATS addresses this issue by limiting its analysis to five well-defined classes of splicing events (exon skipping, retained intron, etc.), but it does not account for more complex splicing patterns. The txRevise approach addresses more complex splicing, but still limits analysis to events occurring in known isoforms (i.e. present in reference databases) (42). In contrast, LeafCutter uses a data-driven approach to identify novel splicing events and to ‘learn’ patterns of splicing that may be quite complex, but this can come at the cost of shifting definitions of splicing events across or even within datasets. In addition, as Leafcutter relies solely on junction-spanning reads, it is unable to quantify changes in gene coverage such as intron retention or changes in UTRs. Overall, given the insufficiency of a single event-based approach to capture all possible transcript variations, selection of a tool represents the choice of which splicing features are most prioritized for sensitive and accurate detection.
An alternative to local, event-based quantification are isoform-based quantification methods, in which the abundances of full-length transcripts are first estimated from short reads, an approach that is used by tools such as kallisto (43–45), RSEM (44), Salmon (45) and StringTie (46). A common approach is to calculate isoform ratios (count of one isoform/total isoform counts for the gene) which can be tested for association with genetic variants in transcript ratio QTL (trQTL) analyses, of which there are many examples (47,48). Detection of changes in isoform usage is a multivariate problem, and methods like sQTLSeekeR/R2 (4), DRIMSeq (49) and THISTLE (50) implement statistical models that specifically account for the multivariate nature of isoform analysis. The advantage of isoform-based analysis is that by explicitly representing isoforms, which encode and are defined by a specific series of splice events, greater clarity can be achieved in the characterization of potentially complex splicing changes. The main drawback is the underlying inaccuracy of isoform estimation (51). Even with state-of-the-art isoform inference methods, accuracy varies by expression level (52), is reduced for genes with many exons and a large number of expressed isoforms (53) and may lead to reduced performance to detect differential isoform expression between conditions (54).
Since neither event-based nor isoform inference-based approaches can fully recover missing information about the true isoform ratios, this leads to appreciable variability in splicing quantification (51) which also produces variability in the results from different sQTL-calling algorithms. A recent sQTL study using both event-based (Leafcutter) and the isoform-based (THISTLE) approach found the two methods produced overlapping but complementary results (50). Another study found differences between sQTLSeekR and Leafcutter in the GTEx dataset (4).
Long-read RNA-seq to interpret genetically regulated splicing
By providing direct identification and quantification of full-length transcript isoforms, long-read sequencing—from technologies such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT)—can improve sQTL interpretation to provide a more direct link between genetic variants and their impact on transcript abundances (41,55–61).
One obvious way that long-read sequencing informs sQTL interpretation is to reveal effects of sQTLs on novel isoforms that would be undetectable or misinterpreted by analyses dependent entirely on reference transcriptome annotation (Fig. 2). A significant number of sQTLs discovered through short-read RNA-seq are associated with novel junctions or exons (28). These events are difficult to interpret, because they cannot be conclusively linked to a reference transcript. Long-read RNA-seq data from human cells or tissues routinely uncover tens of thousands of novel isoforms, indicating that human transcript isoform annotations are incomplete, estimated to only include roughly 33% of true isoforms (62–65).
Figure 2.
Long-read sequencing provides isoform-level characterization of sQTL effects. To illustrate, a hypothetical example may be considered. Panel A shows the event-based characterization of two exon inclusion/exclusion events, one of which involves a cryptic exon (represented by the dashed lines). The first event can be explained by the reference transcript annotation, but the second event indicates the presence of a novel isoform and the identity of this isoform cannot be defined from short-read data alone. Panel B shows the results of long-read sequencing which identify the novel isoform. The pattern of isoform usage by genotype confirms that this pattern of exon/inclusion events is driven by increased usage of the novel isoform in subjects with the G allele of the causal sQTL variant. Proper characterization of the isoform also provides more accurate information on protein sequence and functional potential (as seen in different shapes of the pink proteins).
Another way long-read sequencing improves sQTL interpretation is through resolution of sQTL effects on complex splicing phenomena, which we refer to here as simpler splicing events that tend to co-occur, such as distant exon inclusion events that occur within the same isoform (40,41). In many cases, the identification of ‘local’ sQTL events is not sufficient to map the effect of a genetic variant to a specific isoform and then to its downstream effect on protein function. By providing accurate information on the isoform content within each sample, long-read sequencing can clarify genetic effects on novel splicing events and complex patterns of interrelated splicing. For example, even when all junctions are present in the reference, novel isoforms often arise from new combinations of known splicing events (i.e. junctions, exons), which would require long-read sequencing to resolve (62,66). Though allele-specific expression of particular splicing events can be detected using short-read data (67,68), the use of long reads can reveal allele-specific expression of entirely full-length isoforms, even revealing variants that result in dramatic changes in isoform length (69–71). Recently, tools have been developed to process long-read sequence reads to extract both splicing and allele information to trace the parental origin of isoforms (70–72).
Two illustrative examples of long-read sequencing to clarify genetically influenced splicing are the COPD and lung function GWAS association in NPNT and the body fat percentage GWAS association in DUSP13.
In the case of NPNT, a genome-wide comparison of COPD GWAS peaks and leafcutter sQTLs from GTEx lung tissue identified NPNT as a locus harboring nearly identical genetic association signals for COPD and alternative splicing of multiple exons in NPNT. The A allele of the lead SNP rs34712979 introduces a novel splice acceptor site at the second exon, creating a NAGNAG motif. Analysis of short-read RNA-seq confirmed that the proximal acceptor site created by the A allele is strongly preferred to the canonical site. However, this variant also has unexplained sQTL associations with splicing in the second, third and fourth exons on NPNT, an observation that had no clear explanation in light of the reference isoforms. Targeted long-read sequencing in lung tissue from 10 subjects selected by rs34712979 genotype revealed the presence of multiple novel, truncated NPNT isoforms which were highly expressed. There were marked genotype-specific differences in the usage of these novel short isoforms that account for its pattern of sQTL associations (73).
In the case of DUSP13, Bayesian colocalization analysis between body fat percentage GWAS and sQTLs identified in muscle tissue from GTEx implicated three sQTL intron excision events (70). Long-read sequencing coupled with allele-specific transcript structure (ASTS) analysis using LORALS (70) showed transcript ENST00000372700 (DUSP13-202) lacking four middle exons was more highly expressed from the risk (ALT) allele.
Incorporation of long-read RNA-seq in genetic studies
Given the benefits of long-read sequencing for sQTL characterization, specific applications of long-read RNA-seq in disease genetics studies have recently emerged, and long-read RNA-seq has already been performed in GTEx and ENCODE samples (70,74). The cost and throughput for long-read sequencing is improving (e.g. MAS-Iso-Seq for PacBio (75), PromethION for ONT (76)), which has enabled such consortium-scale isoform surveys, but cost may still be prohibitive for large human cohorts or individual laboratories (77). In response to these limitations, most applications in human cohorts have pursued a targeted approach or a hybrid long-read/short-read approach.
The targeted approach is the most common scheme for long-read integration, and here a small number of genes with evidence of disease-associated splicing is analyzed by long-read transcript sequencing, through either targeted amplicon sequencing or hybridization-based enrichment of isoforms from target genes. The long-read data serve to validate initial sQTL associations and identify novel transcripts. For example, CACNA1C has multiple alternative splicing events with genetic associations with schizophrenia, and targeted Nanopore sequencing was used to elucidate novel isoforms in brain tissue (78). Also, a single mutation in the intronic region of the MYBPC3 gene led to aberrant splicing that was elucidated through targeted PacBio sequencing (79). Several other examples may be found in the literature (73,80–83).
More recently, a hybrid approach using both long-read and short-read data has been developed, which represents a solution to the moderate throughput of long-read sequencing (71,84). Here, long-read sequencing data can be collected from a subset of subjects to derive a set of context-specific isoform models. Such models can be used as a reference to improve isoform estimation from short-read data in the entire cohort, and thus guide sQTL analysis. Such data can be used to create a disease, population or cell-type-specific map of isoforms that can be used as a reference annotation to align and estimate isoform abundances using short-read RNAseq data. This long-read/short-read integrative strategy has been demonstrated for microglia in Alzheimer’s disease (Dr Jack Humphrey, unpublished).
Technical considerations of long-read sequencing for genetic studies in population cohorts
A typical long-read RNA-seq workflow involves several steps, the technical aspects of which have been extensively reviewed (27,55,56,58,85–91). Some of the key considerations for a long-read approach as opposed to a short-read approach relate to sample preparation, read length and accuracy, throughput and bioinformatic analysis, which is further described in Table 2.
Table 2.
Analysis considerations for use of long-read data in functional genomics studies
Technical/analysis aspect | Consideration for functional genomics |
---|---|
Importance of RNA integrity | Long-read RNA-seq requires high RNA integrity (e.g. RIN value 9), since degraded RNA can be difficult to distinguish from true isoforms. Such considerations are important for patient samples, such as FFPE, OCT-embedded, blood draws or post-mortem tissue. |
Length ranges | Long-read platforms currently can sequence molecules up to 10–15 kb, which covers most transcripts, but particularly long RNA isoforms may not be well captured by current technologies. |
Per base accuracy | The raw read accuracy of PacBio and ONT platform is lower than short-read-based Illumina sequencing, with accuracies of ~90%. However, PacBio’s circular consensus calling (CCS) and ONT R2C2 method leads to error rates that are of comparable accuracy approaching Illumina (Phred30 or 99.9% accuracy) (85,86). Comparisons between platforms—including a recent comprehensive comparison of the same barcoded library sequenced on the PacBio and ONT platform—reported sources of systematic error that are characteristic of each technology (152,153). |
Read throughput | A typical Sequel II (PacBio) 8 M flow cell returns ~ 2–4 million reads, while the MinION (ONT) returns 10–15 million reads (154). Though no standard guideline exists, obtaining a million or more reads per human transcriptome is recommended. Currently, long-read sequencing is low- to mid-throughput, potentially limiting transcriptome-wide discovery studies in population cohorts. |
Clustering algorithms to infer transcripts from raw long reads Consensus calling (read ‘polishing’) |
This step involves clustering or collapsing long reads that represent the same originating transcript isoform and obtaining the relative abundances of such transcripts based on the number of mapping reads. The clustering step is critical among all long-read pipelines because overclustering masks true splicing variability, whereas underclustering would report artifactual isoforms. A common approach infers distinct isoforms by collapsing genome-aligned reads based on the similarity of the candidate isoforms’ junction chains (63,155,156). An open question in the field is the extent to which the novel transcript sequences represent true biological transcription, rather than sample preparation or sequencing artifacts (157,158), which is still unclear. It remains challenging to distinguish transcripts with shorter 5′ ends, when the transcript is an exact subset of a longer transcript, as they can represent true novel 5′ transcriptional start sites (159) or 5′ truncation artifacts. |
Isoform quantification | The set of distinct transcript isoforms and their associated long reads should indicate the relative abundances of such transcripts. A few studies have reported isoform quantification levels from long-read data (84,160–162), and the quantitative figures of merit are actively being assessed (163). Tools such as tappAS allow for testing of gene-level versus isoform-level quantitative changes between conditions (99). |
Isoform visualization and annotation | To visualize and analyze long-read-derived isoforms, tools have been developed, such as SQANTI, tappAS and Isotopes, which compare known and novel isoforms to gene annotations. Isoforms can be visualized in genome browser tracks (164,165). However, the much greater length of introns compared with exons makes it difficult to easily view and compare the differences in isoform structures. New tools have been developed that enable visual analysis of distinctive and common features between isoforms for biological interpretation, such as visual ‘condensing’ of intronic regions and overlay of functional protein features and expression levels across samples (166–170). Tools such as tappAS and Biosurfer enable visualization of additional layers of functional information, such as how alternative splicing can alter protein structural domains or enzymatic binding sites (99). |
Using long-read RNA-seq to understand how genetic variants affect protein isoform functions through splicing
While the major functional consequence of eQTLs is to change RNA and protein expression levels, sQTLs can alter both the expression and sequence content of the resulting proteins. Using long-read data, one can predict the full-length encoded protein isoform product associated with an sQTL (92,93), bridging the gap between genetic variants and their functional consequences on proteins. Though protein QTLs may be measured through aptamer or mass spectrometry-based approaches, these modalities do not provide isoform-level protein quantification (94–96).
Knowledge of long-read-predicted protein isoforms opens up new possibilities for interpreting and prioritizing the function of disease-associated sQTLs (97,98). Bioinformatic and experimental approaches can be used to relate protein isoform changes to protein functional changes. For example, isoforms associated with sQTLs can be bioinformatically analyzed to determine how splicing changes lead to disruption or modulation of protein functional features, such as structural domains (99–101) or other protein functional features (102,103). Other approaches leverage isoform-specific expression correlation to derive isoform-specific networks (104,105) or to propagate gene-level annotations to the most likely functional isoform (106–108). Knowledge of the predicted protein isoforms can also be a valuable guide to design experiments for functional validation (109), which can include high-throughput phenotypic screening of isoforms (110,111) and isoform-specific assays such as protein–protein interaction profiling (112–114).
To understand the molecular basis of sQTL associations, at the heart is the need to quantify the functional differences between alternative protein isoforms associated with sQTL genetic variants (i.e. the genetic ‘risk’ isoform). There are two possibilities here: (1) the alternative isoform has reduced stability or molecular activity, relative to the wild-type isoform, or (2) the alternative isoform is capable of a different set of molecular activities, relative to the wild-type isoform. Once the pairwise isoform functional effects are defined, one should then consider the cumulative protein functional capacity of the gene, which is directly computable from the collective quantities and functional activities of the individual protein isoforms. This cumulative gene-level protein output could make conceptualization of sQTL-effects more tractable in systems-scale analysis. A full description of various eQTL and sQTL relationships to protein consequences may be found in Figure 3.
Figure 3.
Protein molecular consequences of eQTLs and sQTLs.
Future directions for applications of long-read sequencing to human genetics
As long-read technologies rapidly improve in throughput and cost, their use in population genetics studies, either alone or in an integrative strategy, will likely increase. In the near term, integrative strategies that combine long-read and short-read data will likely remain popular. One new strategy is to leverage resources such as GTEx or sQTL Compendia (115), which provides candidate sQTLs in normal tissues and determine how such sQTLs may map onto full-length isoforms expressed in representative disease models (cell-line or tissue). This integrative in silico/in vitro approach could allow for inference of the effects of sQTLs that are detected in an independent population or meta-analysis (in which there is sufficient sample size and power), but are placed within the isoform-relevant context of the disease model.
Currently, short-read sequencing is approximately 10-fold higher in throughput at the same cost, but due to developments such as MAS-Iso-Seq (PacBio) and the PromethION system (ONT), long-read data may become comparable in throughput and cost in the next 5 years. Long-read pipelines and long-read/short-read integrative strategies are maturing, with recent consortia such as the Long Read Genome Annotation Consortium conducting comprehensive comparisons across model organisms, library preparations, platforms and bioinformatic pipelines. Strategies to integrate long-read RNA-seq with proteomics are also emerging, pointing to the potential to obtain pQTL data that are protein isoform-resolved (92). Long-read RNA-seq at the single cell level could also open the door toward isoform-resolved sQTLs that are specific to certain cellular contexts (59,75,116).
Knowledge of the repertoire of expressed protein isoforms could also guide drug development. Drug targets against proteins with underlying genetic evidence are twice as likely to succeed in clinical trials (117,118). Indeed, for sQTLs, the link between the variant and the target gene may be more obvious, and could enable new opportunities for targeted therapies. For example, aberrant isoform expression may need small-molecule or ASO blocking therapies (119) or gain-of-function isoforms that mediate harmful interactions may need protein–protein interaction inhibitors (120).
While much has been learned about genetic effects on splicing in recent years, the future is even more promising because the mechanisms of splicing, so critical for producing protein complexity in humans, are now ripe for high-throughput characterization through long-read RNA-seq and its integration with GWAS and evolving methods of protein functional characterization.
Abbreviations
- AS
-
alternative splicing;
ASTS
allele-specific transcript structure;
GTEx
Genotype-Tissue Expression;
GWAS
genome-wide association studies;
ONT
Oxford Nanopore;
PacBio
Pacific Biosciences;
PSI
percent spliced in;
sQTL
splicing quantitative trait loci;
trQTL
transcript ratio QTL;
TWAS
transcriptome-wide association studies;
UTR
untranslated regions
Acknowledgements
We would like to thank Drs. Jack Humphrey and Elizabeth Tseng for feedback on the review.
Contributor Information
Peter J Castaldi, Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA; Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA.
Abdullah Abood, Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA; Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA.
Charles R Farber, Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA; Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA; Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA.
Gloria M Sheynkman, Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA; Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA; Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903, USA; UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22903, USA.
Conflict of Interest statement
The authors do not have a conflict of interest.
Funding
This work was funded by the National Institute of Health by the following grants: R01HL124233 and R01HL147326 to P.J.C., 5R01AR077992 and 5R01AR071657 to C.R.F., R01LM014017 to G.M.S. and T32LM012416 training grant award to A.A.
References
- 1. Li, Y.I., van de Geijn, B., Raj, A., Knowles, D.A., Petti, A.A., Golan, D., Gilad, Y. and Pritchard, J.K. (2016) RNA splicing is a primary link between genetic variation and disease. Science, 352, 600–604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Barbeira, A.N., Bonazzola, R., Gamazon, E.R., Liang, Y., Park, Y., Kim-Hellmuth, S., Wang, G., Jiang, Z., Zhou, D., Hormozdiari, F. et al. (2021) Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biol., 22, 49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. GTEx Consortium (2020) The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science, 369, 1318–1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Garrido-Martín, D., Borsari, B., Calvo, M., Reverter, F. and Guigó, R. (2021) Identification and analysis of splicing quantitative trait loci across multiple tissues in the human genome. Nat. Commun., 12, 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Loos, R.J.F. (2020) 15 years of genome-wide association studies and no signs of slowing down. Nat. Commun., 11, 5900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Nicolae, D.L., Gamazon, E., Zhang, W., Duan, S., Dolan, M.E. and Cox, N.J. (2010) Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet., 6, e1000888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Hukku, A., Pividori, M., Luca, F., Pique-Regi, R., Im, H.K. and Wen, X. (2021) Probabilistic colocalization of genetic variants from complex and molecular traits: promise and limitations. Am. J. Hum. Genet., 108, 25–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Gamazon, E.R. and Stranger, B.E. (2014) Genomics of alternative splicing: evolution, development and pathophysiology. Hum. Genet., 133, 679–687. [DOI] [PubMed] [Google Scholar]
- 9. Lu, Z.-X., Jiang, P. and Xing, Y. (2012) Genetic variation of pre-mRNA alternative splicing in human populations. Wiley Interdiscip. Rev. RNA, 3, 581–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Park, E., Pan, Z., Zhang, Z., Lin, L. and Xing, Y. (2018) The Expanding Landscape of Alternative Splicing Variation in Human Populations. Am. J. Hum. Genet., 102, 11–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Early, P., Rogers, J., Davis, M., Calame, K., Bond, M., Wall, R. and Hood, L. (1980) Two mRNAs can be produced from a single immunoglobulin mu gene by alternative RNA processing pathways. Cell, 20, 313–319. [DOI] [PubMed] [Google Scholar]
- 12. Choi, E., Kuehl, M. and Wall, R. (1980) RNA splicing generates a variant light chain from an aberrantly rearranged kappa gene. Nature, 286, 776–779. [DOI] [PubMed] [Google Scholar]
- 13. Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P. and Burge, C.B. (2008) Alternative isoform regulation in human tissue transcriptomes. Nature, 456, 470–476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Berget, S.M., Moore, C. and Sharp, P.A. (1977) Spliced segments at the 5′ terminus of adenovirus 2 late mRNA. Proc. Natl. Acad. Sci. U. S. A., 74, 3171–3175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Nilsen, T.W. and Graveley, B.R. (2010) Expansion of the eukaryotic proteome by alternative splicing. Nature, 463, 457–463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Kalsotra, A. and Cooper, T.A. (2011) Functional consequences of developmentally regulated alternative splicing. Nat. Rev. Genet., 12, 715–729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Barash, Y., Calarco, J.A., Gao, W., Pan, Q., Wang, X., Shai, O., Blencowe, B.J. and Frey, B.J. (2010) Deciphering the splicing code. Nature, 465, 53–59. [DOI] [PubMed] [Google Scholar]
- 18. Scotti, M.M. and Swanson, M.S. (2016) RNA mis-splicing in disease. Nat. Rev. Genet., 17, 19–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Cooper, T.A., Wan, L. and Dreyfuss, G. (2009) RNA and disease. Cell, 136, 777–793. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Cieply, B. and Carstens, R.P. (2015) Functional roles of alternative splicing factors in human disease. Wiley Interdiscip. Rev. RNA, 6, 311–326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Ferraro, N.M., Strober, B.J., Einson, J., Abell, N.S., Aguet, F., Barbeira, A.N., Brandt, M., Bucan, M., Castel, S.E., Davis, J.R. et al. (2020) Transcriptomic signatures across human tissues identify functional rare genetic variation. Science, 369, eaaz5900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Julien, P., Miñana, B., Baeza-Centurion, P., Valcárcel, J. and Lehner, B. (2016) The complete local genotype–phenotype landscape for the alternative splicing of a human exon. Nat. Commun., 7, 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Ke, S., Anquetil, V., Zamalloa, J.R., Maity, A., Yang, A., Arias, M.A., Kalachikov, S., Russo, J.J., Ju, J. and Chasin, L.A. (2018) Saturation mutagenesis reveals manifold determinants of exon definition. Genome Res., 28, 11–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Coulombe-Huntington, J., Lam, K.C.L., Dias, C. and Majewski, J. (2009) Fine-scale variation and genetic determinants of alternative splicing across individuals. PLoS Genet., 5, e1000766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Zhang, X., Joehanes, R., Chen, B.H., Huan, T., Ying, S., Munson, P.J., Johnson, A.D., Levy, D. and O’Donnell, C.J. (2015) Identification of common genetic variants controlling transcript isoform variation in human whole blood. Nat. Genet., 47, 345–352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Kwan, T., Benovoy, D., Dias, C., Gurd, S., Provencher, C., Beaulieu, P., Hudson, T.J., Sladek, R. and Majewski, J. (2008) Genome-wide analysis of transcript isoform variation in humans. Nat. Genet., 40, 225–231. [DOI] [PubMed] [Google Scholar]
- 27. Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., Szcześniak, M.W., Gaffney, D.J., Elo, L.L., Zhang, X. et al. (2016) A survey of best practices for RNA-seq data analysis. Genome Biol., 17, 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Li, Y.I., Knowles, D.A., Humphrey, J., Barbeira, A.N., Dickinson, S.P., Im, H.K. and Pritchard, J.K. (2018) Annotation-free quantification of RNA splicing using LeafCutter. Nat. Genet., 50, 151–158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Shabalin, A.A. (2012) Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics, 28, 1353–1358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Ongen, H., Buil, A., Brown, A.A., Dermitzakis, E.T. and Delaneau, O. (2016) Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics, 32, 1479–1485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Taylor-Weiner, A., Aguet, F., Haradhvala, N.J., Gosai, S., Anand, S., Kim, J., Ardlie, K., Van Allen, E.M. and Getz, G. (2019) Scaling computational genomics to millions of individuals with GPUs. Genome Biol., 20, 228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Kang, H.M., Sul, J.H., Service, S.K., Zaitlen, N.A., Kong, S.-Y., Freimer, N.B., Sabatti, C. and Eskin, E. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat. Genet., 42, 348–354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Gusev, A., Lawrenson, K., Lin, X., Lyra, P.C., Jr., Kar, S., Vavra, K.C., Segato, F., Fonseca, M.A.S., Lee, J.M., Pejovic, T. et al. (2019) A transcriptome-wide association study of high-grade serous epithelial ovarian cancer identifies new susceptibility genes and splice variants. Nat. Genet., 51, 815–823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Walker, R.L., Ramaswami, G., Hartl, C., Mancuso, N., Gandal, M.J., de la Torre-Ubieta, L., Pasaniuc, B., Stein, J.L. and Geschwind, D.H. (2019) Genetic Control of Expression and Splicing in Developing Human Brain Informs Disease Mechanisms. Cell, 179, 750–771.e22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Katz, Y., Wang, E.T., Airoldi, E.M. and Burge, C.B. (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods, 7, 1009–1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Trincado, J.L., Entizne, J.C., Hysenaj, G., Singh, B., Skalic, M., Elliott, D.J. and Eyras, E. (2018) SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol., 19, 40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Sebestyén, E., Zawisza, M. and Eyras, E. (2015) Detection of recurrent alternative splicing switches in tumor samples reveals novel signatures of cancer. Nucleic Acids Res., 43, 1345–1356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Mehmood, A., Laiho, A., Venäläinen, M.S., McGlinchey, A.J., Wang, N. and Elo, L.L. (2020) Systematic evaluation of differential splicing tools for RNA-seq studies. Brief. Bioinform., 21, 2052–2065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Shen, S., Park, J.W., Lu, Z.X., Lin, L., Henry, M.D., Wu, Y.N., Zhou, Q. and Xing, Y. rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc. Natl. Acad. Sci. U. S. A., 111, E5593–601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Anvar, S.Y., Allard, G., Tseng, E., Sheynkman, G.M., de Klerk, E., Vermaat, M., Yin, R.H., Johansson, H.E., Ariyurek, Y., den Dunnen, J.T., Turner, S.W. and t Hoen, P.A.C. (2018) Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing. Genome Biol., 19, 46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Tilgner, H., Jahanbani, F., Blauwkamp, T., Moshrefi, A., Jaeger, E., Chen, F., Harel, I., Bustamante, C.D., Rasmussen, M. and Snyder, M.P. (2015) Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events. Nat. Biotechnol., 33, 736–742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Alasoo, K., Rodrigues, J., Danesh, J., Freitag, D.F., Paul, D.S. and Gaffney, D.J. (2019) Genetic effects on promoter usage are highly context-specific and contribute to complex traits. elife, 8, e41673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Bray, N.L., Pimentel, H., Melsted, P. and Pachter, L. (2016) Erratum: Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol., 34, 888. [DOI] [PubMed] [Google Scholar]
- 44. Li, B. and Dewey, C.N. (2011) RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics, 12, 323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Patro, R., Duggal, G., Love, M.I., Irizarry, R.A. and Kingsford, C. (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods, 14, 417–419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.-C., Mendell, J.T. and Salzberg, S.L. (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol., 33, 290–295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Lappalainen, T., Sammeth, M., Friedländer, M.R., t Hoen, P.A., Monlong, J., Rivas, M.A., Gonzàlez-Porta, M., Kurbatova, N., Griebel, T., Ferreira, P.G. et al. (2013) Transcriptome and genome sequencing uncovers functional variation in humans. Nature, 501, 506–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Ye, C.J., Chen, J., Villani, A.-C., Gate, R.E., Subramaniam, M., Bhangale, T., Lee, M.N., Raj, T., Raychowdhury, R., Li, W. et al. (2018) Genetic analysis of isoform usage in the human anti-viral response reveals influenza-specific regulation of ERAP2 transcripts under balancing selection. Genome Res., 28, 1812–1825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Nowicka, M. and Robinson, M.D. (2016) DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Res, 5, 1356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Yang, J., Qi, T., Wu, Y., Zhang, F. and Zeng, J. (2021) Genetic control of RNA splicing and its distinctive role in complex trait variation. Genetic control of RNA splicing and its distinctive role in complex trait variation. Research Square. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Steijger, T., Abril, J.F., Engström, P.G., Kokocinski, F., Consortium, R.G.A.S.P., Hubbard, T.J., Guigó, R., Harrow, J. and Bertone, P. (2013) Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods, 10, 1177–1184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Kanitz, A., Gypas, F., Gruber, A.J., Gruber, A.R., Martin, G. and Zavolan, M. (2015) Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol., 16, 150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Zhang, C., Zhang, B., Lin, L.-L. and Zhao, S. (2017) Evaluation and comparison of computational tools for RNA-seq isoform quantification. BMC Genomics, 18, 583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Teng, M., Love, M.I., Davis, C.A., Djebali, S., Dobin, A., Graveley, B.R., Li, S., Mason, C.E., Olson, S., Pervouchine, D. et al. (2016) A benchmark for RNA-seq quantification pipelines. Genome Biol., 17, 74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Amarasinghe, S.L., Su, S., Dong, X., Zappia, L., Ritchie, M.E. and Gouil, Q. (2020) Opportunities and challenges in long-read sequencing data analysis. Genome Biol., 21, 30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Logsdon, G.A., Vollger, M.R. and Eichler, E.E. (2020) Long-read human genome sequencing and its applications. Nat. Rev. Genet., 21, 597–614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Mantere, T., Kersten, S. and Hoischen, A. (2019) Long-Read Sequencing Emerging in Medical Genetics. Front. Genet., 10, 426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Stark, R., Grzelak, M. and Hadfield, J. (2019) RNA sequencing: the teenage years. Nat. Rev. Genet., 20, 631–656. [DOI] [PubMed] [Google Scholar]
- 59. Volden, R., Palmer, T., Byrne, A., Cole, C., Schmitz, R.J., Green, R.E. and Vollmers, C. (2018) Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc. Natl. Acad. Sci., 115, 9726–9731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B. et al. (2009) Real-time DNA sequencing from single polymerase molecules. Science, 323, 133–138. [DOI] [PubMed] [Google Scholar]
- 61. Liu, S., Wu, I., Yu, Y.-P., Balamotis, M., Ren, B., Ben Yehezkel, T. and Luo, J.-H. (2021) Targeted transcriptome analysis using synthetic long read sequencing uncovers isoform reprograming in the progression of colon cancer. Commun Biol, 4, 506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Sharon, D., Tilgner, H., Grubert, F. and Snyder, M. (2013) A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol., 31, 1009–1014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Workman, R.E., Tang, A.D., Tang, P.S., Jain, M., Tyson, J.R., Razaghi, R., Zuzarte, P.C., Gilpatrick, T., Payne, A., Quick, J. et al. (2019) Nanopore native RNA sequencing of a human poly(A) transcriptome. Nat. Methods, 16, 1297–1305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Mudge, J.M. and Harrow, J. (2016) The state of play in higher eukaryote gene annotation. Nat. Rev. Genet., 17, 758–772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Deveson, I.W., Brunck, M.E., Blackburn, J., Tseng, E., Hon, T., Clark, T.A., Clark, M.B., Crawford, J., Dinger, M.E., Nielsen, L.K. et al. (2018) Universal Alternative Splicing of Noncoding Exons. Cell Syst, 6, 245–255.e5. [DOI] [PubMed] [Google Scholar]
- 66. Sheynkman, G.M., Tuttle, K.S., Laval, F., Tseng, E., Underwood, J.G., Yu, L., Dong, D., Smith, M.L., Sebra, R., Willems, L. et al. (2020) ORF Capture-Seq as a versatile method for targeted identification of full-length isoforms. Nat. Commun., 11, 2326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Amoah, K., Hsiao, Y.-H.E., Bahn, J.H., Sun, Y., Burghard, C., Tan, B.X., Yang, E.-W. and Xiao, X. (2021) Allele-specific alternative splicing and its functional genetic variants in human tissues. Genome Res., 31, 359–371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Li, G., Bahn, J.H., Lee, J.-H., Peng, G., Chen, Z., Nelson, S.F. and Xiao, X. (2012) Identification of allele-specific alternative mRNA processing via transcriptome sequencing. Nucleic Acids Res., 40, e104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Tilgner, H., Grubert, F., Sharon, D. and Snyder, M.P. (2014) Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc. Natl. Acad. Sci. U. S. A., 111, 9869–9874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Glinos, D.A., Garborcauskas, G., Hoffman, P., Ehsan, N., Jiang, L., Gokden, A., Dai, X., Aguet, F., Brown, K.L., Garimella, K. et al. (2021) Transcriptome variation in human tissues revealed by long-read sequencing. Transcriptome variation in human tissues revealed by long-read sequencing. Nature, 608, 353–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Deonovic, B., Wang, Y., Weirather, J., Wang, X.-J. and Au, K.F. (2017) IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing. Nucleic Acids Res., 45, e32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. de Souza, V.B.C., Jordan, B.T., Tseng, E., Nelson, E.A., Hirschi, K.K., Sheynkman, G. and Robinson, M.D. (2022) Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data. Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data. bioRxiv, 2022.02.08.479579. [Google Scholar]
- 73. Saferali, A., Xu, Z., Sheynkman, G.M., Hersh, C.P., Cho, M.H., Silverman, E.K., Laederach, A., Vollmers, C. and Castaldi, P.J. (2020) Characterization of a COPD-Associated NPNT Functional Splicing Genetic Variant in Human Lung Tissue via Long-Read Sequencing. medRxiv. [Google Scholar]
- 74. Wyman, D., Balderrama-Gutierrez, G., Reese, F., Jiang, S., Rahmanian, S., Forner, S., Matheos, D., Zeng, W., Williams, B., Trout, D. et al. (2020) A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification. A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification. bioRxiv, 672931. [Google Scholar]
- 75. Al’Khafaji, A.M., Smith, J.T., Garimella, K.V., Babadi, M., Sade-Feldman, M., Gatzen, M., Sarkizova, S., Schwartz, M.A., Popic, V., Blaum, E.M. et al. (2021) High-throughput RNA isoform sequencing using programmable cDNA concatenation. High-throughput RNA isoform sequencing using programmable cDNA concatenation. bioRxiv, https://www.biorxiv.org/content/10.1101/2021.10.01.462818v1.ful. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Volden, R. and Vollmers, C. (2022) Single-cell isoform analysis in human immune cells. Genome Biol., 23, 47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Schliekelman, P. (2008) Statistical power of expression quantitative trait loci for mapping of complex trait loci in natural populations. Genetics, 178, 2201–2216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Clark, M.B., Wrzesinski, T., Garcia, A.B., Hall, N.A.L., Kleinman, J.E., Hyde, T., Weinberger, D.R., Harrison, P.J., Haerty, W. and Tunbridge, E.M. (2020) Long-read sequencing reveals the complex splicing profile of the psychiatric risk gene CACNA1C in human brain. Mol. Psychiatry, 25, 37–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Dainis, A., Tseng, E., Clark, T.A., Hon, T., Wheeler, M. and Ashley, E. Targeted Long-Read RNA Sequencing Demonstrates Transcriptional Diversity Driven by Splice-Site Variation in MYBPC3. Targeted Long-Read RNA Sequencing Demonstrates Transcriptional Diversity Driven by Splice-Site Variation in MYBPC3. Circ Genom Precis Med., 12, e002464 [DOI] [PubMed] [Google Scholar]
- 80. Aneichyk, T., Hendriks, W.T., Yadav, R., Shin, D., Gao, D., Vaine, C.A., Collins, R.L., Domingo, A., Currall, B., Stortchevoi, A. et al. (2018) Dissecting the Causal Mechanism of X-Linked Dystonia-Parkinsonism by Integrating Genome and Transcriptome Assembly. Cell, 172, 897–909.e21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Flaherty, E., Zhu, S., Barretto, N., Cheng, E., Deans, P.J.M., Fernando, M.B., Schrode, N., Francoeur, N., Antoine, A., Alganem, K. et al. (2019) Neuronal impact of patient-specific aberrant NRXN1α splicing. Nat. Genet., 51, 1679–1690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82. Kohli, M., Ho, Y., Hillman, D.W., Van Etten, J.L., Henzler, C., Yang, R., Sperger, J.M., Li, Y., Tseng, E., Hon, T. et al. (2017) Androgen Receptor Variant AR-V9 Is Coexpressed with AR-V7 in Prostate Cancer Metastases and Predicts Abiraterone Resistance. Clin. Cancer Res., 23, 4704–4715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83. Tian, L., Shao, Y., Nance, S., Dang, J., Xu, B., Ma, X., Li, Y., Ju, B., Dong, L., Newman, S. et al. (2019) Long-read sequencing unveils IGH-DUX4 translocation into the silenced IGH allele in B-cell acute lymphoblastic leukemia. Nat. Commun., 10, 2789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84. Hu, Y., Fang, L., Chen, X., Zhong, J.F., Li, M. and Wang, K. (2021) LIQA: long-read isoform quantification and analysis. Genome Biol., 22, 182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85. van Dijk, E.L., Jaszczyszyn, Y., Naquin, D. and Thermes, C. (2018) The Third Revolution in Sequencing Technology. Trends Genet., 34, 666–681. [DOI] [PubMed] [Google Scholar]
- 86. Byrne, A., Cole, C., Volden, R. and Vollmers, C. (2019) Realizing the potential of full-length transcriptome sequencing. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci., 374, 20190097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87. Ardui, S., Ameur, A., Vermeesch, J.R. and Hestand, M.S. (2018) Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Res., 46, 2159–2168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88. Sedlazeck, F.J., Lee, H., Darby, C.A. and Schatz, M.C. (2018) Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat. Rev. Genet., 19, 329–346. [DOI] [PubMed] [Google Scholar]
- 89. Hardwick, S.A., Joglekar, A., Flicek, P., Frankish, A. and Tilgner, H.U. (2019) Getting the Entire Message: Progress in Isoform Sequencing. Front. Genet., 10, 709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90. Wang, B., Kumar, V., Olson, A. and Ware, D. (2019) Reviving the Transcriptome Studies: An Insight Into the Emergence of Single-Molecule Transcriptome Sequencing. Front. Genet., 10, 384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91. Oikonomopoulos, S., Bayega, A., Fahiminiya, S., Djambazian, H., Berube, P. and Ragoussis, J. (2020) Methodologies for Transcript Profiling Using Long-Read Technologies. Front. Genet., 11, 606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92. Miller, R.M., Jordan, B.T., Mehlferber, M.M., Jeffery, E.D., Chatzipantsiou, C., Kaur, S., Millikin, R.J., Dai, Y., Tiberi, S., Castaldi, P.J. et al. (2022) Enhanced protein isoform characterization through long-read proteogenomics. Genome Biol., 23, 1–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93. Deslattes Mays, A., Schmidt, M., Graham, G., Tseng, E., Baybayan, P., Sebra, R., Sanda, M., Mazarati, J.-B., Riegel, A. and Wellstein, A. (2019) Single-Molecule Real-Time (SMRT) Full-Length RNA-Sequencing Reveals Novel and Distinct mRNA Isoforms in Human Bone Marrow Cell Subpopulations. Genes, 10, 253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94. Pietzner, M., Wheeler, E., Carrasco-Zanini, J., Cortes, A., Koprulu, M., Wörheide, M.A., Oerton, E., Cook, J., Stewart, I.D., Kerrison, N.D. et al. (2021) Mapping the proteo-genomic convergence of human diseases. Science, 374, eabj1541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95. Chick, J.M., Munger, S.C., Simecek, P., Huttlin, E.L., Choi, K., Gatti, D.M., Raghupathy, N., Svenson, K.L., Churchill, G.A. and Gygi, S.P. (2016) Defining the consequences of genetic variation on a proteome-wide scale. Nature, 534, 500–505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96. Wu, L., Candille, S.I., Choi, Y., Xie, D., Jiang, L., Li-Pook-Than, J., Tang, H. and Snyder, M. (2013) Variation and genetic control of protein abundance in humans. Nature, 499, 79–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97. Reixachs-Solé, M. and Eyras, E. (2022) Uncovering the impacts of alternative splicing on the proteome with current omics techniques. Wiley Interdiscip. Rev., RNA, e1707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98. Li, H.-D., Menon, R., Omenn, G.S. and Guan, Y. (2014) The emerging era of genomic data integration for analyzing splice isoform function. Trends Genet., 30, 340–347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99. de la Fuente, L., Arzalluz-Luque, Á., Tardáguila, M., Del Risco, H., Martí, C., Tarazona, S., Salguero, P., Scott, R., Lerma, A., Alastrue-Agudo, A. et al. (2020) tappAS: a comprehensive computational framework for the analysis of the functional impact of differential splicing. Genome Biol., 21, 119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100. Tapial, J., Ha, K.C.H., Sterne-Weiler, T., Gohr, A., Braunschweig, U., Hermoso-Pulido, A., Quesnel-Vallières, M., Permanyer, J., Sodaei, R., Marquez, Y. et al. (2017) An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms. Genome Res., 27, 1759–1768. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101. Tranchevent, L.-C., Aubé, F., Dulaurier, L., Benoit-Pilven, C., Rey, A., Poret, A., Chautard, E., Mortada, H., Desmet, F.-O., Chakrama, F.Z. et al. (2017) Identification of protein features encoded by alternative exons using Exon Ontology. Genome Res., 27, 1087–1097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102. Ghadie, M.A., Lambourne, L., Vidal, M. and Xia, Y. (2017) Domain-based prediction of the human isoform interactome provides insights into the functional impact of alternative splicing. PLoS Comput. Biol., 13, e1005717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103. Narykov, O., Johnson, N.T. and Korkin, D. (2021) Predicting protein interaction network perturbation by alternative splicing with semi-supervised learning. Cell Rep., 37, 110045. [DOI] [PubMed] [Google Scholar]
- 104. Iancu, O.D., Colville, A., Oberbeck, D., Darakjian, P., McWeeney, S.K. and Hitzemann, R. (2015) Cosplicing network analysis of mammalian brain RNA-Seq data utilizing WGCNA and Mantel correlations. Front. Genet., 6, 174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105. Saha, A., Kim, Y., Gewirtz, A.D.H., Jo, B., Gao, C., McDowell, I.C., Consortium, G.T.E., Engelhardt, B.E. and Battle, A. (2017) Co-expression networks reveal the tissue-specific regulation of transcription and splicing. Genome Res., 27, 1843–1858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106. Eksi, R., Li, H.-D., Menon, R., Wen, Y., Omenn, G.S., Kretzler, M. and Guan, Y. (2013) Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data. PLoS Comput. Biol., 9, e1003314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107. Li, W., Kang, S., Liu, C.-C., Zhang, S., Shi, Y., Liu, Y. and Zhou, X.J. (2014) High-resolution functional annotation of human transcriptome: predicting isoform functions by a novel multiple instance-based label propagation method. Nucleic Acids Res., 42, e39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108. Li, W., Liu, C.-C., Kang, S., Li, J.-R., Tseng, Y.-T. and Zhou, X.J. (2016) Pushing the annotation of cellular activities to a higher resolution: Predicting functions at the isoform level. Methods, 93, 110–118. [DOI] [PubMed] [Google Scholar]
- 109. Möröy, T. and Heyd, F. (2007) The impact of alternative splicing in vivo: mouse models show the way. RNA, 13, 1155–1171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110. Bertomeu, T., Coulombe-Huntington, J., Chatr-Aryamontri, A., Bourdages, K.G., Coyaud, E., Raught, B., Xia, Y. and Tyers, M. (2018) A High-Resolution Genome-Wide CRISPR/Cas9 Viability Screen Reveals Structural Features and Contextual Diversity of the Human Cell-Essential Proteome. Mol. Cell. Biol., 38, e00302–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111. Prinos, P., Garneau, D., Lucier, J.-F., Gendron, D., Couture, S., Boivin, M., Brosseau, J.-P., Lapointe, E., Thibault, P., Durand, M. et al. (2011) Alternative splicing of SYK regulates mitosis and cell survival. Nat. Struct. Mol. Biol., 18, 673–679. [DOI] [PubMed] [Google Scholar]
- 112. Buljan, M., Chalancon, G., Eustermann, S., Wagner, G.P., Fuxreiter, M., Bateman, A. and Babu, M.M. (2012) Tissue-specific splicing of disordered segments that embed binding motifs rewires protein interaction networks. Mol. Cell, 46, 871–883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113. Ellis, J.D., Barrios-Rodiles, M., Colak, R., Irimia, M., Kim, T., Calarco, J.A., Wang, X., Pan, Q., O’Hanlon, D., Kim, P.M. et al. (2012) Tissue-specific alternative splicing remodels protein-protein interaction networks. Mol. Cell, 46, 884–892. [DOI] [PubMed] [Google Scholar]
- 114. Yang, X., Coulombe-Huntington, J., Kang, S., Sheynkman, G.M., Hao, T., Richardson, A., Sun, S., Yang, F., Shen, Y.A., Murray, R.R. et al. (2016) Widespread Expansion of Protein Interaction Capabilities by Alternative Splicing. Cell, 164, 805–817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115. Kerimov, N., Hayhurst, J.D., Peikova, K., Manning, J.R., Walter, P., Kolberg, L., Samoviča, M., Sakthivel, M.P., Kuzmin, I., Trevanion, S.J. et al. (2021) A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet., 53, 1290–1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116. Singh, M., Al-Eryani, G., Carswell, S., Ferguson, J.M., Blackburn, J., Barton, K., Roden, D., Luciani, F., Giang Phan, T., Junankar, S. et al. (2019) High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes. Nat. Commun., 10, 3120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117. Nelson, M.R., Tipney, H., Painter, J.L., Shen, J., Nicoletti, P., Shen, Y., Floratos, A., Sham, P.C., Li, M.J., Wang, J. et al. (2015) The support of human genetic evidence for approved drug indications. Nat. Genet., 47, 856–860. [DOI] [PubMed] [Google Scholar]
- 118. King, E.A., Wade Davis, J. and Degner, J.F. (2019) Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLoS Genet., 15, e1008489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119. Bennett, C.F. and Swayze, E.E. (2010) RNA targeting therapeutics: molecular mechanisms of antisense oligonucleotides as a therapeutic platform. Annu. Rev. Pharmacol. Toxicol., 50, 259–293. [DOI] [PubMed] [Google Scholar]
- 120. Petta, I., Lievens, S., Libert, C., Tavernier, J. and De Bosscher, K. (2016) Modulation of protein–protein interactions for the development of novel therapeutics. Mol. Ther., 24, 707–718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121. Rouhana, J.M., Wang, J., Eraslan, G., Anand, S., Hamel, A.R., Cole, B., Regev, A., Aguet, F., Ardlie, K.G. and Segrè, A.V. (2021) ECLIPSER: identifying causal cell types and genes for complex traits through single cell enrichment of e/sQTL-mapped genes in GWAS loci. ECLIPSER: identifying causal cell types and genes for complex traits through single cell enrichment of e/sQTL-mapped genes in GWAS loci. bioRxiv, https://www.biorxiv.org/content/10.1101/2021.11.24.469720v1. [Google Scholar]
- 122. Lopes, K.d.P., Snijders, G.J.L., Humphrey, J., Allan, A., Sneeboer, M.A.M., Navarro, E., Schilder, B.M., Vialle, R.A., Parks, M., Missall, R. et al. (2022) Genetic analysis of the human microglial transcriptome across brain regions, aging and disease pathologies. Nat. Genet., 54, 4–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123. Bellenguez, C., Küçükali, F., Jansen, I.E., Kleineidam, L., Moreno-Grau, S., Amin, N., Naj, A.C., Campos-Martin, R., Grenier-Boley, B., Andrade, V. et al. (2022) New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat. Genet., 54, 412–436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124. Tian, J., Chen, C., Rao, M., Zhang, M., Lu, Z., Cai, Y., Ying, P., Li, B., Wang, H., Wang, L. et al. (2022) Aberrant RNA splicing is a primary link between genetic variation and pancreatic cancer risk. Cancer Res., 82, 2084–2096. [DOI] [PubMed] [Google Scholar]
- 125. Yamaguchi, K., Ishigaki, K., Suzuki, A., Tsuchida, Y., Tsuchiya, H., Sumitomo, S., Nagafuchi, Y., Miya, F., Tsunoda, T., Hirofumi, S. et al. (2022) Splicing QTL analysis focusing on coding sequences reveals pathogenicity of disease susceptibility loci. Splicing QTL analysis focusing on coding sequences reveals pathogenicity of disease susceptibility loci. bioRxiv, 2021.12.30.474578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126. Zandi, P.P., Jaffe, A.E., Goes, F.S., Burke, E.E., Collado-Torres, L., Huuki-Myers, L., Seyedian, A., Lin, Y., Seifuddin, F., Pirooznia, M. et al. (2022) Amygdala and anterior cingulate transcriptomes from individuals with bipolar disorder reveal downregulated neuroimmune and synaptic pathways. Nat. Neurosci., 25, 381–389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127. Brotman, S.M., Raulerson, C.K., Vadlamudi, S., Currin, K.W., Shen, Q., Parsons, V.A., Iyengar, A.K., Roman, T.S., Furey, T.S., Kuusisto, J. et al. (2022) Subcutaneous adipose tissue splice quantitative trait loci reveal differences in isoform usage associated with cardiometabolic traits. Am. J. Hum. Genet., 109, 66–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128. Tian, Y., Soupir, A., Liu, Q., Wu, L., Huang, C.-C., Park, J.Y. and Wang, L. (2022) Novel role of prostate cancer risk variant rs7247241 on PPP1R14A isoform transition through allelic TF binding and CpG methylation. Hum. Mol. Genet., 31, 1610–1621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129. Aherrahrou, R., Lue, D., Noah Perry, R., Aberra, Y.T., Khan, M.D., Soh, J.Y., Örd, T., Singha, P., Gilani, H., Benavente, E.D. et al. (2022) Genetic regulation of human aortic smooth muscle cell gene expression and splicing predict causal coronary artery disease genes. Genetic regulation of human aortic smooth muscle cell gene expression and splicing predict causal coronary artery disease genes. bioRxiv, 2022.01.24.477536. [Google Scholar]
- 130. Aygün, N., Elwell, A.L., Liang, D., Lafferty, M.J., Cheek, K.E., Courtney, K.P., Mory, J., Hadden-Ford, E., Krupa, O., de la Torre-Ubieta, L. et al. (2021) Brain-trait-associated variants impact cell-type-specific gene regulation during neurogenesis. Am. J. Hum. Genet., 108, 1647–1668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131. Mu, Z., Wei, W., Fair, B., Miao, J., Zhu, P. and Li, Y.I. (2021) The impact of cell type and context-dependent regulatory variants on human immune traits. Genome Biol., 22, 122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132. Chen, B.Y., Bone, W.P., Lorenz, K., Levin, M., Ritchie, M.D. and Voight, B.F. (2021) ColocQuiaL: A QTL-GWAS colocalization pipeline. ColocQuiaL: A QTL-GWAS colocalization pipeline. Bioinformatics., 27. 10.1093/bioinformatics/btac512 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133. Stanzick, K.J., Li, Y., Schlosser, P., Gorski, M., Wuttke, M., Thomas, L.F., Rasheed, H., Rowan, B.X., Graham, S.E., Vanderweff, B.R. et al. (2021) Discovery and prioritization of variants and genes for kidney function in >1.2 million individuals. Nat. Commun., 12, 4350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134. Gao, Y., Chen, S., Gu, W.-Y., Fang, C., Huang, Y.-T., Gao, Y., Lu, Y., Su, J., Wu, M., Zhang, J. et al. (2021) Genome-wide association study reveals novel loci for adult type 1 diabetes in a 5-year nested case-control study. World J. Diabetes, 12, 2073–2086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135. Patro, C.P.K., Nousome, D., Glioma International Case Control Study (GICC) and Lai, R.K. (2021) Meta-Analyses of Splicing and Expression Quantitative Trait Loci Identified Susceptibility Genes of Glioma. Front. Genet., 12, 609657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136. Humphrey, J., Venkatesh, S., Hasan, R., Herb, J.T., de Paiva Lopes, K., Küçükali, F., Byrska-Bishop, M., Evani, U.S., Narzisi, G., Fagegaltier, D. et al. Integrative genetic analysis of the amyotrophic lateral sclerosis spinal cord implicates glial activation and suggests new risk genes. Integrative genetic analysis of the amyotrophic lateral sclerosis spinal cord implicates glial activation and suggests new risk genes. bioRxiv. https://www.medrxiv.org/content/10.1101/2021.08.31.21262682v1. [DOI] [PubMed] [Google Scholar]
- 137. Díez-Obrero, V., Dampier, C.H., Moratalla-Navarro, F., Devall, M., Plummer, S.J., Díez-Villanueva, A., Peters, U., Bien, S., Huyghe, J.R., Kundaje, A. et al. (2021) Genetic Effects on Transcriptome Profiles in Colon Epithelium Provide Functional Insights for Genetic Risk Loci. Cell. Mol. Gastroenterol. Hepatol., 12, 181–197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138. Navarro, E., Udine, E., de Paiva Lopes, K., Parks, M., Riboldi, G., Schilder, B.M., Humphrey, J., Snijders, G.J.L., Vialle, R.A., Zhuang, M. et al. (2021) Dysregulation of mitochondrial and proteolysosomal genes in Parkinson’s disease myeloid cells. Nat Aging, 1, 850–863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139. Akula, N., Marenco, S., Johnson, K., Feng, N., Zhu, K., Schulmann, A., Corona, W., Jiang, X., Cross, J., England, B. et al. (2021) Deep transcriptome sequencing of subgenual anterior cingulate cortex reveals cross-diagnostic and diagnosis-specific RNA expression changes in major psychiatric disorders. Neuropsychopharmacology, 46, 1364–1372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 140. Liu, S., Chen, Y., Wang, F., Jiang, Y., Duan, F., Xia, Y., Ning, Z., Li, M., Qiu, W., Ma, C. et al. (2021) Brain transcriptional regulatory architecture and schizophrenia etiology converge between East Asian and European ancestral populations. Brain transcriptional regulatory architecture and schizophrenia etiology converge between East Asian and European ancestral populations. bioRxiv, https://www.biorxiv.org/content/10.1101/2021.02.04.922880v1. [Google Scholar]
- 141. Zhang, T., Choi, J., Dilshat, R., Einarsdóttir, B.Ó., Kovacs, M.A., Xu, M., Malasky, M., Chowdhury, S., Jones, K., Bishop, D.T. et al. (2021) Cell-type-specific meQTLs extend melanoma GWAS annotation beyond eQTLs and inform melanocyte gene-regulatory mechanisms. Am. J. Hum. Genet., 108, 1631–1646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142. Yang, H.-S., White, C.C., Klein, H.-U., Yu, L., Gaiteri, C., Ma, Y., Felsky, D., Mostafavi, S., Petyuk, V.A., Sperling, R.A. et al. (2020) Genetics of Gene Expression in the Aging Human Brain Reveal TDP-43 Proteinopathy Pathophysiology. Neuron, 107, 496–508.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 143. Guo, Z., Zhu, H., Xu, W., Wang, X., Liu, H., Wu, Y., Wang, M., Chu, H. and Zhang, Z. (2020) Alternative splicing related genetic variants contribute to bladder cancer risk. Mol. Carcinog., 59, 923–929. [DOI] [PubMed] [Google Scholar]
- 144. Yuan, M., Yu, C. and Yu, K. (2020) Association of human XPA rs1800975 polymorphism and cancer susceptibility: an integrative analysis of 71 case-control studies. Cancer Cell Int., 20, 164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145. Nurnberg, S.T., Guerraty, M.A., Wirka, R.C., Rao, H.S., Pjanic, M., Norton, S., Serrano, F., Perisic, L., Elwyn, S., Pluta, J. et al. (2020) Genomic profiling of human vascular cells identifies TWIST1 as a causal gene for common vascular diseases. PLoS Genet., 16, e1008538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 146. Li, Y.I., Wong, G., Humphrey, J. and Raj, T. (2019) Prioritizing Parkinson’s disease genes using population-scale transcriptomic data. Nat. Commun., 10, 994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 147. Rotival, M., Quach, H. and Quintana-Murci, L. (2019) Defining the genetic and evolutionary architecture of alternative splicing in response to infection. Nat. Commun., 10, 1671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 148. Saferali, A., Yun, J.H., Parker, M.M., Sakornsakolpat, P., Chase, R.P., Lamb, A., Hobbs, B.D., Boezen, M.H., Dai, X., de Jong, K. et al. (2019) Analysis of genetically driven alternative splicing identifies FBXO38 as a novel COPD susceptibility gene. PLoS Genet., 15, e1008229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149. Gawronski, K.A.B., Bone, W., Park, Y., Pashos, E., Wang, X., Yang, W., Rader, D., Musunuru, K., Voight, B. and Brown, C. (2019) Evaluating the contribution of cell-type specific alternative splicing to variation in lipid levels. Evaluating the contribution of cell-type specific alternative splicing to variation in lipid levels. bioRxiv, 659326. [Google Scholar]
- 150. Raj, T., Li, Y.I., Wong, G., Humphrey, J., Wang, M., Ramdhani, S., Wang, Y.-C., Ng, B., Gupta, I., Haroutunian, V. et al. (2018) Integrative transcriptome analyses of the aging brain implicate altered splicing in Alzheimer’s disease susceptibility. Nat. Genet., 50, 1584–1592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151. Liu, B., Pjanic, M., Wang, T., Nguyen, T., Gloudemans, M., Rao, A., Castano, V.G., Nurnberg, S., Rader, D.J., Elwyn, S. et al. (2018) Genetic Regulatory Mechanisms of Smooth Muscle Cells Map to Coronary Artery Disease Risk Loci. Am. J. Hum. Genet., 103, 377–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152. Mikheenko, A., Prjibelski, A.D., Joglekar, A. and Tilgner, H.U. (2022) Sequencing of individual barcoded cDNAs using Pacific Biosciences and Oxford Nanopore Technologies reveals platform-specific error patterns. Genome Res., 32, 726–737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 153. Weirather, J.L., de Cesare, M., Wang, Y., Piazza, P., Sebastiano, V., Wang, X.-J., Buck, D. and Au, K.F. (2017) Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Res, 6, 100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 154. Massaiu, I., Songia, P., Chiesa, M., Valerio, V., Moschetta, D., Alfieri, V., Myasoedova, V.A., Schmid, M., Cassetta, L., Colombo, G.I. et al. (2021) Evaluation of Oxford Nanopore MinION RNA-Seq Performance for Human Primary Cells. Int. J. Mol. Sci., 22, 6317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 155. Chen, Y., Davidson, N.M., Wan, Y.K., Patel, H., Yao, F., Low, H.M., Hendra, C., Watten, L., Sim, A., Sawyer, C. et al. (2021) A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines. A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines. bioRxiv, https://www.medrxiv.org/content/10.1101/2021.08.31.21262682v1. [Google Scholar]
- 156. Gordon, S.P., Tseng, E., Salamov, A., Zhang, J., Meng, X., Zhao, Z., Kang, D., Underwood, J., Grigoriev, I.V., Figueroa, M. et al. (2015) Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing. PLoS One, 10, e0132628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 157. Tardaguila, M., de la Fuente, L., Marti, C., Pereira, C., Pardo-Palacios, F.J., Del Risco, H., Ferrell, M., Mellado, M., Macchietto, M., Verheggen, K. et al. (2018) SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res., 28, 396–411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 158. Schulz, L., Torres-Diz, M., Cortés-López, M., Hayer, K.E., Asnani, M., Tasian, S.K., Barash, Y., Sotillo, E., Zarnack, K., König, J. et al. (2021) Direct long-read RNA sequencing identifies a subset of questionable exitrons likely arising from reverse transcription artifacts. Genome Biol., 22, 190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159. Mulroney, L., Wulf, M.G., Schildkraut, I., Tzertzinis, G., Buswell, J., Jain, M., Olsen, H., Diekhans, M., Corrêa, I.R., Akeson, M. et al. (2022) Identification of high-confidence human poly(A) RNA isoform scaffolds using nanopore sequencing. Identification of high-confidence human poly(A) RNA isoform scaffolds using nanopore sequencing. RNA, 28, 162–176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160. Gleeson, J., Leger, A., Prawer, Y.D.J., Lane, T.A., Harrison, P.J., Haerty, W. and Clark, M.B. (2022) Accurate expression quantification from nanopore direct RNA sequencing with NanoCount. Nucleic Acids Res., 50, e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 161. Tang, A.D., Soulette, C.M., van Baren, M.J., Hart, K., Hrabeta-Robinson, E., Wu, C.J. and Brooks, A.N. (2020) Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun., 11, 1438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 162. Arzalluz-Luque, A., Salguero, P., Tarazona, S. and Conesa, A. (2022) acorde unravels functionally interpretable networks of isoform co-usage from single cell data. Nat. Commun., 13, 1828. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 163. Pardo-Palacios, F.J., Wang, D., Reese, F., Diekhans, M., Carbonell-Sala, S., Williams, B., Loveland, J.E., Adams, M.S., Balderrama-Gutierrez, G., Behera, A.K. et al. (2022) Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. 10.6084/m9.figshare.19642383.v1. [DOI] [PMC free article] [PubMed]
- 164. Lee, B.T., Barber, G.P., Benet-Pagès, A., Casper, J., Clawson, H., Diekhans, M., Fischer, C., Gonzalez, J.N., Hinrichs, A.S., Lee, C.M. et al. (2022) The UCSC Genome Browser database: 2022 update. Nucleic Acids Res., 50, D1115–D1122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 165. Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G. and Mesirov, J.P. (2011) Integrative genomics viewer. Nat. Biotechnol., 29, 24–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 166. Reese, F. and Mortazavi, A. (2021) Swan: a library for the analysis and visualization of long-read transcriptomes. Bioinformatics, 37, 1322–1323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 167. Katz, Y., Wang, E.T., Silterra, J., Schwartz, S., Wong, B., Thorvaldsdóttir, H., Robinson, J.T., Mesirov, J.P., Airoldi, E.M. and Burge, C.B. (2014) Sashimi plots: Quantitative visualization of alternative isoform expression from RNA-seq data. Sashimi plots: Quantitative visualization of alternative isoform expression from RNA-seq data. Bioinformatics., 31, 2400–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 168. Gustavsson, E.K., Zhang, D., Reynolds, R.H., Garcia-Ruiz, S. and Ryten, M. (2022) ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2. ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2. Bioinformatics., 38, 3844–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 169. Barann, M., Zimmer, R. and Birzele, F. (2017) Manananggal - a novel viewer for alternative splicing events. BMC Bioinformatics, 18, 120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 170. Stein, A.N., Joglekar, A., Poon, C.-L. and Tilgner, H.U. (2022) ScisorWiz: Visualizing Differential Isoform Expression in Single-Cell Long-Read Data. Bioinformatics, 38, 3474–3476. [DOI] [PMC free article] [PubMed] [Google Scholar]