Summary
The American College of Medical Genetics and Genomics (ACMG)/Association for Molecular Pathology (AMP) framework for classifying variants uses six evidence categories related to the splicing potential of variants: PVS1, PS3, PP3, BS3, BP4, and BP7. However, the lack of guidance on how to apply such codes has contributed to variation in the specifications developed by different Clinical Genome Resource (ClinGen) Variant Curation Expert Panels. The ClinGen Sequence Variant Interpretation Splicing Subgroup was established to refine recommendations for applying ACMG/AMP codes relating to splicing data and computational predictions. We utilized empirically derived splicing evidence to (1) determine the evidence weighting of splicing-related data and appropriate criteria code selection for general use, (2) outline a process for integrating splicing-related considerations when developing a gene-specific PVS1 decision tree, and (3) exemplify methodology to calibrate splice prediction tools. We propose repurposing the PVS1_Strength code to capture splicing assay data that provide experimental evidence for variants resulting in RNA transcript(s) with loss of function. Conversely, BP7 may be used to capture RNA results demonstrating no splicing impact for intronic and synonymous variants. We propose that the PS3/BS3 codes are applied only for well-established assays that measure functional impact not directly captured by RNA-splicing assays. We recommend the application of PS1 based on similarity of predicted RNA-splicing effects for a variant under assessment in comparison with a known pathogenic variant. The recommendations and approaches for consideration and evaluation of RNA-assay evidence described aim to help standardize variant pathogenicity classification processes when interpreting splicing-based evidence.
Keywords: RNA splicing, variant classification, ClinGen, ACMG/AMP codes, PVS1, PS1, PP3, BP4, BP7
Graphical abstract
The ClinGen SVI Splicing Subgroup provides recommendations for the application of existing splicing-related ACMG/AMP codes and re-purposing of other codes to capture splicing-related evidence. This study outlines a process for developing a gene-specific PVS1 decision tree and provides methodology to calibrate bioinformatic splice prediction tools.
Introduction
In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) reported a framework for classifying variants using multiple evidence categories.1 Since this report, the Clinical Genome Resource (ClinGen) Sequence Variant Interpretation (SVI) Working Group has developed further guidance to applying different codes; for example, PVS1 for loss-of-function (LoF) variants,2 PS3/BS3 for variants impacting gene function,3 and the stand-alone rule BA1 based on variant allele frequency.4
Although the ACMG/AMP guidelines recommend assessing whether gene variants could have an impact on natural RNA splicing, there is no standardized approach to interpreting this molecular information. Moreover, the complexities of the splicing process lead to challenges interpreting data generated from different computational predictions and laboratory assays. To date, the level of information in gene-specific ACMG/AMP guidelines provided by different ClinGen Variant Curation Expert Panels (VCEPs) for relevant codes (Table S1) has differed significantly, likely increasing uncertainty for diagnosticians. For example, when considering the functional assay codes (PS3/BS3) for splicing assays, the level of information provided by VCEPs ranges from no change to the original ACMG/AMP rules to detailed guidance for determining the appropriate strength of evidence based on assay type, assay outcome, and/or gene-specific variant location. The recommendations of Brnich et al. for the application of functional PS3/BS3 criteria noted that splicing assays could be used to strengthen support for computational predictions for “variants outside the canonical ±1,2 splice sites.”3 However, there is a lack of guidance for applying and combining evidence codes based on splicing predictions, splicing assay data, and other functional data.
Bioinformatic tools that predict variant impact on splicing play an important role in the assessment of variants of uncertain clinical significance. However, computational analysis of potential spliceogenic variants (defined as variants causing an altered mRNA transcript profile compared with control samples5) is challenging for variant curators due to the increasing number of tools available, each with bespoke settings that may not have been clinically validated. For the application of the PP3/BP4 codes, VCEPs differ significantly in their rules defining the level of computational evidence required to indicate a deleterious splicing effect (Table S1). Such differences include the number of splicing prediction tools used, the type of tool(s) to be used, and the thresholds to be applied for each. Indeed, some VCEPs require multiple splice predictors to agree to apply codes, which may be inappropriate if the specified tools are designed to assess variant impact on different splicing motifs. Although a number of studies have compared the sensitivity and specificity of different splicing prediction tools,6,7,8,9,10,11 there has been a lack of guidance on how to apply existing and future tools within the ACMG/AMP framework. Moreover, as there is such a high correlation between presence of variation at specific motif positions and likelihood to alter splicing, use of both position/prediction information and splicing assay data for interpretation might be considered redundant or prone to overweighting.
Another two codes from the 2015 guidelines that are or may be used for variant interpretation relating to potential splicing impact are PVS1 (null variant in a loss of function gene) and BP7 (synonymous change with no predicted impact on splicing). Additionally, PS1 (same amino acid change as a previously established pathogenic variant) could also be adapted for application to splicing-based evidence.
In response to groups seeking guidance on variant interpretation using splicing related evidence, the ClinGen SVI Working Group established the SVI Splicing Subgroup. In this manuscript, we detail evidence-based recommendations regarding the application of computational splicing prediction tools and in vitro splicing assays using a refined version of the ACMG/AMP sequence variant-interpretation framework. We also provide recommendations on how to develop a gene-specific PVS1 decision tree and how to combine evidence codes for splicing and protein function derived from computational predictions and experimental assay data. These recommendations have relevance for all users of the ACMG/AMP variant classification guidelines, both within and outside of the VCEP context.
Material and methods
Establishment of the ClinGen SVI Splicing Subgroup
The remit of the ClinGen SVI Splicing Subgroup was to refine recommendations for applying ACMG/AMP codes relating to splicing data and computational predictions. ACMG/AMP criteria specifically evaluated for application (or adaptation) to impact on splicing included PVS1, PS1, PS3, PM5, PP3, BS3, BP4, and BP7. The goals of the subgroup were to (1) determine the appropriate strength of evidence that can be applied to experimental splicing data for variant interpretation and appropriate criteria code selection for general use, (2) develop a framework for applying gene-specific PVS1 criteria, and (3) provide guidance on approaches to select and implement computational tools. Throughout this process we exchanged knowledge with several ClinGen VCEPs as they applied or developed ACMG/AMP specifications for different genes and incorporated relevant information from these parallel efforts into our recommendations.
A review of the 53 established ClinGen VCEPs revealed that 23 panels had publicly available classification rules for one or more disease-associated genes (Table S1). VCEP specifications reviewed and referred to in the text are shown in Table S1 and were as documented on February 12, 2023; please refer to the ClinGen Criteria Specification Repository for the most up-to-date versions of VCEP specifications (https://cspec.genome.network/cspec/ui/svi).
Terminology for sequence variants considering location with respect to splicing motifs
Descriptions reported in Richards et al.1 as relevant for variant predicted impact on splicing include the following: PVS1, predictive code for (presumed) LoF variants, including “…canonical ± 1 or 2 splice sites…” and BP7, synonymous (silent) variant for which splicing prediction algorithms predict neither an impact to the “splice consensus sequence nor the creation of a new splice site…” No further description was provided to define the terms used. Figure 1 shows sequence motifs most relevant for native splicing and describes the rationale for selection of nucleotide sequence ranges for various analyses. The terms below were used to describe different nucleotide ranges and positions for application of codes relevant for splicing impact.
Figure 1.
Schematic showing sequence motifs at intron-exon junctions and terminology applied in this report
Pre-mRNA splicing, the process of intron removal from nascent pre-mRNA followed by exon ligation, is principally reliant on the following cis-acting regulatory elements: the splice donor +1,2 dinucleotide flanking the 5′ end of the intron; the splice acceptor −1,2 dinucleotide flanking the 3′ end of the intron; the polypyrimidine tract and branch point site (usually at positions −18 to −4458) upstream of the 3′ splice acceptor.59 So-called consensus sequences for different splicing regulatory elements have been defined by comparing position weight matrices for surrounding sequence. The vast majority of introns (>98%) are recognized by highly conserved dinucleotides at the 5′ boundary (GT) and 3′ boundary (AG).57,60 Intron categories may variously be designated by the boundary dinucleotide sequence, spliceosome (likely) excising the intron (U2-type spliceosomes for most GT-AG introns, U12-type spliceosomes for most AT-AC introns, with some exceptions), and/or by comparing position weight matrices for surrounding sequence.60,61 Exon-intron boundaries for consensus donor and acceptor site motifs (including the polypyrimidine tract) defined by Burge et al.59 are as follows: 11 bases for the donor splice site motif (from the last 3 exonic to the first 8 intronic bases) and 14 bases for the acceptor splice site motif (from the last 12 intronic to the first 2 exonic bases). More detailed examination of the position weight matrix plot for the most prevalent U2-type intron17 reveals degeneracy at some positions in these motifs; thus, impact on splicing is most likely for variants located in the donor splice region (the last 3 bases of the exon and 3–6 nucleotides of intronic sequence adjacent to the exon) and the acceptor splice region (the first base of the exon and from 3 to 20 nucleotides upstream from the exon boundary). For secondary variant analysis, a minimal splice region was defined as above for the donor splice region, with the acceptor splice region designated as only the first base of the exon and the third nucleotide upstream from the exon boundary. Exonic and intronic splicing enhancer and silencer elements also play an important role in exon definition (not shown here).62
Splice donor/acceptor ±1,2 dinucleotide variant
A variant that occurs in the 2-base region at the 5′ or 3′ end of an intron.
Splice region variant
A sequence variant in which a change has occurred within the region of the donor motif (last 3 bases of the exon and 3–6 nucleotides of intronic sequence adjacent to the exon) or acceptor motif (first base of the exon and 3–20 nucleotides upstream from the exon boundary).
Minimal splice region variant
A sequence variant in which a change has occurred within the region of the donor, as defined above, or within a minimized highly conserved region for the acceptor motif (first base of the exon and the third nucleotide upstream from the exon boundary).
Dataset of annotated splice donor/acceptor ±1,2 dinucleotide variants identified in the diagnostic setting
A dataset comprising all individuals from a clinical (disease-affected) cohort referred to GeneDx (https://www.genedx.com/) for diagnostic testing over a 2-year time period was queried to extract information for all splice donor/acceptor ±1,2 dinucleotide variants associated with known clinically relevant transcripts. A total of 1,447 splice donor/acceptor ±1,2 dinucleotide variants were identified in 1,043 genes currently associated with disease through an LoF mechanism (Table S2). Based on results from tool comparison (see below— “splicing prediction tool comparison”), SpliceAI12 was selected for computing variant scores using default settings. The position of any predicted splice donor or acceptor loss was manually verified to occur at the actual natural splice donor/acceptor for the transcript being evaluated. Predicted donor or acceptor gains were also checked to ensure they occurred within the appropriate sequence context. Variant classification had previously been determined following GeneDx clinical protocols in line with application of ACMG/AMP criteria.1 These protocols included review of in silico prediction data (to assess predicted impact on transcript[s] and PVS1 strength), published literature pertaining to the variant, and clinical data for the proband.
Computational, splicing assay, and functional datasets
Two large truth datasets were used to demonstrate an approach to (1) compare different computational prediction tools and (2) to calibrate a selected tool to establish thresholds for predicting variant impact on splicing. The first dataset contained cell survival results (phenotypes: LoF, intermediate function, and functional) from 414 BRCA1 variants13 supplemented with computational splicing prediction results (this study). While the cell survival functional assay did not provide specific results about splicing events, the results were used to infer whether variants were functionally benign or pathogenic, which for intronic variants was presumed to be due to impact via RNA effects. The dataset included information for variants at the donor and acceptor splice regions based on The Sequence Ontology14 definition used by Ensembl (https://asia.ensembl.org/info/genome/variation/prediction/predicted_data.html), which includes 3 nucleotides exonic (synonymous substitutions only) and 8 nucleotides intronic at both the donor and acceptor sites for this analysis.
The second dataset contained in vitro splicing assay and computational splicing prediction results across a range of disease susceptibility genes that have been curated from the literature: 1,008 BRCA1/BRCA2 variants (MIM: 113705; 600185), 659 mismatch repair gene variants (MLH1 [MIM: 120436], MSH2 [MIM: 609309], MSH6 [MIM: 600678], PMS2 [MIM: 600259]),15 284 NF1 (MIM: 613113) variants,6 and 1,070 POU1F1 (MIM: 173110) variants.16 The majority of these variants were identified in variant carriers. The POU1F1 dataset differed in that it represented results from a high-throughput assay designed to test the effect of variants on upregulation of a minor isoform with transcriptional repressor activity. Of the 3,021 variants collated for this study, 767 were reported to be spliceogenic (associated with one or more aberrant splicing events). All variant data are provided in Table S3 and available for download from a web tool we developed to facilitate calculation of likelihood ratios for calibration (https://gwiggins.shinyapps.io/lr_shiny/).17 The main analysis considered predictions for variants in the following categories, as defined above and illustrated in Figure 1: (1) splice donor/acceptor ±1,2 dinucleotide variants, (2) splice region, and (3) outside splice region (and splice dinucleotides). Additionally, sensitivity and specificity were assessed for variants within the minimal splice region and then separately for intronic and exonic nucleotide positions outside of the splice donor/acceptor ±1,2 dinucleotides and the minimal splice region.
Splicing prediction tool comparison
The performance of 11 splicing prediction tools (listed in Table S4) was compared using the BRCA1 variant survival functional dataset from Findlay et al.,13 which consisted of single nucleotide variants at or near 13 BRCA1 exons that encode functionally critical domains. Each variant had been assessed and categorized as functional (FUNC) or non-functional (LoF) based on the outcome of a survival assay using HAP1 cells. Variants with intermediate function were excluded from the analysis. A total of 414 variants (n = 312 FUNC and n = 102 LoF) were used for the evaluation of prediction tools.
The relative performance of the splicing prediction tools was evaluated using receiver operating characteristic (ROC) curve analysis. Specificity was defined as the total true negatives out of all negatives (true negatives + false positives). Sensitivity was defined as the total true positives detected out of all positives (true positives + false negatives). The prediction tool SpliceScan II was modified to include (1) a Bayesian splice region sensor,18 (2) evolutionary conservation, and (3) the Bayesian classification framework as introduced by Tavtigian et al. to score splice region motifs based on both splicing biology and variant pathogenicity.19 The modified framework is termed SpliceScan III. Only SpliceAI scores were generated for the literature-curated dataset of variants.
Calibration of code weights based on odds of spliceogenicity
Based on results from comparison of tools, the deep learning-based splice variant software SpliceAI12 (https://github.com/Illumina/SpliceAI—version 1.3.1) was selected as the prediction tool to demonstrate a model to establish thresholds for splicing prediction. We used the maximum raw SpliceAI Δ score, defined as the maximum probability of altered splicing across 4 output probabilities (designated by the tool as loss of acceptor or donor sites, gain of acceptor or donor sites) and the maximum distance of 10,000 nucleotides (±4,999 nucleotides from the variant of interest). Likelihood ratios (LRs) for prediction of spliceogenicity—as a measure of inferred pathogenicity—were estimated for different SpliceAI categories using our custom-built web tool (https://gwiggins.shinyapps.io/lr_shiny/; see Table S5 for a worked example). An iterative approach was used to select the best cutoff that minimized apparent false-negative and false-positive predictions and the overall proportion of variants within the uninformative zone. Sensitivity analysis was conducted to assess robustness of LR estimates for a narrower band of scores approaching the selected score cutoffs. Designation of LRs to ACMG/AMP rule code strengths were based on LR ranges recently proposed as consistent with ACMG/AMP qualitative rule strengths for future classification in a Bayesian framework.19 The benign (or non-spliceogenic) category intervals were calculated as inverse odds to the pathogenic (or spliceogenic) category intervals. These odds ranges assume a global prior probability of pathogenicity of 0.10. Bayesian pathogenicity criteria thresholds are shown in Table S6.
Results and discussion
To evaluate evidence types and codes relevant for variant impact on splicing, we addressed (1) comparisons of different VCEP specifications for (potential) splicing-related codes, (2) development of gene-specific PVS1 decision trees, (3) rationale and approach to capture experimental splicing results outside of codes PS3 and BS3, (4) considerations for weighting of experimental splicing data considering splicing assay design and other factors, (5) approaches to calibrate and apply splicing prediction tool output for bioinformatic codes PP3 and BP4, (6) application of the BP7 code originally designated to capture only synonymous (silent) variation, (7) application of the PS1 code originally designated to capture only information relating to non-synonymous (missense) variants, and (8) recommendations on how to combine evidence codes for splicing and/or protein function derived from all available evidence.
Review of existing VCEP guidelines
The ACMG/AMP guidelines for splicing-related results provide a framework for classifying variants in disease susceptibility genes, most relevant to those genes where LoF is the mechanism of pathogenesis. Adaptations of the original guidelines reported by Richards et al.1 have been described by ClinGen VCEPs, which enable curators to apply modifications to the relevant strength of each evidence type and gene-specific considerations. A review of the 53 established ClinGen VCEPs revealed that 23 panels had publicly available classification rules (Table S1). We also included pilot rules from the ENIGMA BRCA1 and BRCA2 VCEPs; the Hereditary Breast, Ovarian, and Pancreatic Cancer VCEP (PALB2); and InSiGHT Hereditary Colorectal Cancer/Polyposis VCEP (mismatch repair genes), which developed their specifications for splicing-related codes in parallel with the activities of the ClinGen SVI Splicing Subgroup. In addition, code specifications and detailed advice from the Cancer Variant Interpretation Group UK (CanVIG),20 a major clinical network operating independently of ClinGen, were also reviewed. Comparisons of these accumulated specifications demonstrate between-panel variation in application of ACMG/AMP codes (and code strengths) pertaining to splicing-related evidence.
For interpretation of variants impacting the splice donor/acceptor ±1,2 dinucleotides, rules from 6 ClinGen VCEPs and CanVIG use the original version of the PVS1 decision tree published by Abou Tayoun et al.2 to assign a final weight of evidence. As detailed in Table S1, 15 VCEPs developed gene-specific PVS1 guidelines by modifying the branches of the original decision tree. In addition, some VCEPs (e.g., CDH1; ENIGMA; Familial Hypercholesterolemia; Hereditary Breast, Ovarian, and Pancreatic Cancer; InSiGHT Hereditary Colorectal Cancer/Polyposis; Lysosomal Storage Disorders; and Rett and Angelman-like Disorders) and CanVIG provide scope for adapting other codes that address splicing, such as PS3/BS3, based on the level of evidence strength (supporting, moderate, strong, very strong). Importantly, there is marked variability between the different VCEP guidelines for applying ACMG/AMP splicing-related evidence types, variability which cannot be solely explained by gene-related and disease-specific factors. For example, guidelines on the number of splicing prediction tools required to warrant the PP3 code (computational evidence to support splicing) ranged from 1 to 3 tools. Notably, a limited number of VCEPs incorporated the interpretation of splicing prediction data into modifications of the missense-based codes PS1 (DICER1 and miRNA-Processing Gene; ENIGMA; Hearing Loss; Hereditary Breast, Ovarian, and Pancreatic Cancer; InSiGHT Hereditary Colorectal Cancer/Polyposis; Monogenic Diabetes; and PTEN) or PM5 (CDH1 and InSiGHT Hereditary Colorectal Cancer/Polyposis).
The following sections outline processes for establishing a gene-specific framework to derive strength for ACMG/AMP splicing-related evidence types for use in clinical variant interpretation.
Development and application of a gene-specific PVS1 decision tree for splice donor/acceptor ±1,2 dinucleotide variants
Assessing relevance of the PVS1 code for a given gene
PVS1 is a predictive code for (presumed) LoF variants, including nonsense, frameshift, splice donor/acceptor ±1,2 dinucleotide, single or multi-exon deletions/duplications, and the initiation codon.1 This code is applicable to a large proportion of genes where LoF is a known mechanism of disease. Establishing LoF as a disease mechanism can be subjective; however, several helpful resources to assess LoF include the ClinGen haploinsufficiency (HI) score,21 the probability of LoF intolerance (pLI22) score, and/or the “LoF observed/expected upper bound fraction” (LOEUF23). HI score availability is limited as it is based on manual curation of evidence with an output divided into 6 tiers, where a score of “3” indicates “sufficient evidence suggesting dosage sensitivity is associated with clinical phenotype” (curations available at https://search.clinicalgenome.org/kb/gene-dosage). Comparatively, pLI scores are computationally derived and measure the intolerance of a given gene to LoF variants in the general population; scores are available in gnomAD (https://gnomad.broadinstitute.org/) with a pLI > 0.9 suggesting a significantly lower-than-expected rate of LoFs in the gene.22 As the pLI value can be significantly influenced by sample size (i.e., a gene with low expected allele counts across the gene could not have a high pLI), LOEUF (available with gnomAD v2.1 release) is now recommended over pLI for LoF intolerance assessments; a LOEUF threshold of <0.35 suggests LoF intolerance that is similar to the pLI > 0.9 threshold. It is important to note that both pLI and LOEUF predictions are dependent on transcript selection and measure intolerance relative to reproductive fitness; thus, age of onset and disease severity must be considered when assessing whether LoF variants are expected to be observed in population datasets. If a gene can predispose to disease via both LoF and gain-of-function (GoF) mechanisms and a PVS1-eligible variant leads to a predicted (or experimentally observed) aberration expected to result in GoF, we recommend use of the PM4 code (described as “protein length changes due to in-frame deletions/insertions in a non-repeat region or stop-loss variants”) as opposed to PVS1 because PVS1 is specific to variants where LoF is the known disease mechanism.24
Construction of a gene-specific PVS1 decision tree
While most genetic alterations considered under the PVS1 code are bona fide LoF variants, it is critical to evaluate variants in the context of gene structure and expression, including the impact of the variant on alternative splicing, mRNA stability, and function of resultant protein products. For that reason, the ClinGen SVI previously developed recommendations for PVS1 application2 that specifically addressed 4 “rescue” mechanisms that may modulate the functional and clinical impact of assumed LoF variants: (1) premature termination codons (PTCs) at the 3′ end of the coding sequence leading to a shorter yet still functional protein, (2) stop gains (nonsense and frameshift variants) or deletions located in or encompassing non-constitutive exons, (3) in-frame splicing or genomic deletions/duplications leading to a shorter or longer yet still functional protein, and (4) rescue of initiation codon variants by use of an alternative in-frame ATG.
With regard to splice donor/acceptor ±1,2 dinucleotides, the final PVS1 code strength depends on which of these 4 rescue mechanisms may be relevant after review of the predicted consequence(s) of the splice donor/acceptor change. Here, we highlight key considerations and a process for developing a gene-specific PVS1 decision tree.
Characterizing expression and structure of the reference transcript
Differences in transcript structure resulting from alternative splicing can influence cellular function and contribute to disease. Furthermore, outcomes of abnormal splicing are typically varied with respect to transcript levels and the resulting protein-reading frame. Accurate variant interpretation for PVS1 requires scientific knowledge of the structure and function of transcripts expressed by each gene being examined. As noted previously by Richards et al.,1 a reference transcript for each gene should be identified, utilized, and reported when describing variants being investigated. To promote consistency for clinical relevance, the National Center for Biotechnology Information (NCBI) and the European Molecular Biology Laboratories - European Bioinformatics Institute (EMBL-EBI) collaborated on the release of reference transcripts through matched annotation from NCBI and EMBL-EBI (MANE, https://www.ncbi.nlm.nih.gov/refseq/MANE/).25 One transcript for each protein-coding locus, deemed the MANE Select, is annotated as the “default” reference across genomic resources, including University of California, Santa Cruz (UCSC) Browser (GRCh38; https://genome.ucsc.edu/), gnomAD v3.1.1, and ClinVar. While MANE Select is supported by experimental data, the single reference transcript does not necessarily capture biological complexity of disease causation. A second initiative, MANE Plus Clinical, captures secondary transcripts that contain pathogenic variants not found in the MANE Select transcripts or with more impactful effects (exonic versus intronic).
Mapping regions critical to protein function
Interpreting splicing effects of each gene transcript requires knowledge of regions critical to protein function. Identifying protein functional domains utilizes homology-based predictive analyses (e.g., InterPro, Pfam), supportive functional data (e.g., deletion of the protein domain has a measurable impact on a functional assay), and/or structural data (e.g., crystal structure of a protein complex shows residues directly interacting with a critical partner). Prior knowledge of pathogenic missense variants may also highlight regions critical to protein function and clinical relevance. However, the overall quality and granularity of the protein mapping depends on the level of information available for each specific gene. We propose updates to the Abou Tayoun et al. PVS1 decision tree2 (see Table 1) including assigning very strong evidence of pathogenicity to in-frame RNA skipping events encompassing undisputed clinically relevant residues. Note: the final PVS1 weighting may be reduced depending on the structural features of the critical region and size/location of the in-frame alteration (see Box S1). Further, final PVS1 weighting may be strengthened when variant(s) at complementary splice donor and acceptor dinucleotides for the same exon with the same predictions (e.g., in-frame exon skipping) have been classified as pathogenic. (Note: the term classified is defined here as variant curation following ACMG/AMP guidelines and preferably following VCEP gene-specific recommendations if available.)
Table 1.
Likelihood ratio analysis of the maximum SpliceAI Δ score for variants located outside of donor/acceptor ±1,2 dinucleotide positions using optimal cut offs
SpliceAI Δ score |
Splice event - no |
Splice event - yes |
LR | Low CI | High CI | Evidence strengtha | ||
---|---|---|---|---|---|---|---|---|
n | Proportion | n | Proportion | |||||
≤0.1 | 1,962 | 0.87 | 72 | 0.15 | 0.17 | 0.14 | 0.21 | moderate evidence (non-spliceogenicity) |
>0.1 and <0.2 | 175 | 0.08 | 38 | 0.08 | 1.00 | 0.71 | 1.39 | uninformative |
≥0.2 | 109 | 0.05 | 381 | 0.78 | 15.99 | 13.23 | 19.32 | moderate evidence (spliceogenicity) |
Abbreviation: CI, confidence interval; LR, likelihood ratio; n, number.
See Table S6 for criteria thresholds.
Building reference gene-specific splicing catalogs
Annotating comprehensive catalogs of naturally occurring splicing events that occur across different tissue types is critical for building reference datasets to guide the application of gene-specific ACMG/AMP codes and indeed has been (or is being) used by several VCEPs.26,27,28,29,30,31,32,33,34 The profile can be derived from publicly available curated databases (e.g., GENCODE Basic), in-house analysis of RNA sequencing (RNA-seq) repositories, and/or from a dedicated experimental approach. The quantity and quality of data might be variable for different genes, thus affecting the body of knowledge available for gene-specific adaptations of the PVS1 decision tree. Most technologies (conventional RT-PCR approaches and targeted and whole-transcriptome RNA-seq) analyze partial transcript sequences, providing information on alternative splicing events rather than on the complete exon structure of alternative mRNA isoforms. Long-read sequencing technologies may also be utilized to resolve complex exon structures of full-length transcripts without the need to bioinformatically reconstruct sequences.35 Employing such an approach has previously shown that some alternative transcripts contain multiple alternative splicing events,36,37 which has potential implications for determining the coding frame, interpreting the clinical significance of spliceogenic variants, and identifying naturally occurring candidate rescue transcripts (see next section). Comprehensive long-read RNA-seq data is currently lacking for many disease susceptibility genes. Furthermore, these alternative splicing data also provide a useful diagnostic resource for the design and interpretation of in vitro splicing assays.
Identifying naturally occurring candidate rescue transcripts
Some splice donor/acceptor ±1,2 dinucleotide variants impact non-constitutive exons in alternatively spliced transcripts. Splicing of naturally occurring transcript(s) excluding that exon will not be impacted, and if predicted to encode a functional protein are considered to be candidate rescue transcript(s). The same rescue model is pertinent to the classification of stop-gain variants (nonsense and frameshift variants) in that exon and deletions of that exon because the relevant variants would be absent from rescue transcripts. (See Box S1 and Figures S1–S3 for further description of the rescue transcript model and identification of transcripts with potential to contribute to a rescue mechanism.)
The initial step to identify candidate rescue transcripts is to review alternative transcripts against a gene-specific map of critical protein domains; transcripts preserving the reading frame and coding for critical protein domains are candidate rescue transcripts. We recommend annotating physiological alternative splicing events as candidate rescue transcripts even if tissue-specific expression is uncertain or unknown (e.g., due to lack of data). The appropriate baseline expression threshold for designating a candidate rescue transcript will likely be gene specific and ideally should be based on experimental data. We anticipate such data will be scarce for many clinically relevant genes and propose 10% of the overall gene expression as a conservative operational threshold (Figure S3) with the expectation that gene-disease experts may provide relevant empirically derived modifications to this threshold, including consideration of tissue-specific effects. We propose use of PVS1_N/A for variants for which there is a plausible rescue model, based on observation of naturally occurring alternative spliced transcripts.
Recommendations to improve gene-specific PVS1 decision trees
The main role of bioinformatic predictions for splice donor/acceptor ±1,2 dinucleotide variants is to determine whether the expected predominant splicing alteration will be in frame or out of frame. Thus, we recommend use of a tool that can predict not only spliceogenicity but also the nature of the transcript produced by the splicing alteration (i.e., exon skipping and/or use of a cryptic/de novo splice motif and/or intron retention). To further add to recommendations from the original PVS1 decision tree publication, we recommend the following generic issues to be considered.
-
•
Bioinformatic scores to invoke abrogation of splice donor dinucleotide GT are likely to differ from those for other splice donor dinucleotides (e.g., GC).
-
•
Some alterations at splice donor ±1,2 dinucleotides are not predicted to alter splicing and should not be assigned a PVS1 evidence strength (PVS1_N/A). For example, a subset of IVS+2T>C variants at a native GT splice donor will result in a functional GC splice donor, and the bioinformatic score for a native GC splice donor may be improved by an IVS+2C>T variant.38
-
•
The range of evidence strengths applicable for predicted in-frame alterations could be increased if justified by functional and/or clinical evidence—for example, substituting the generic 10% protein size threshold with protein-specific thresholds based on protein structure considerations if available from the literature or knowledge of a pathogenic missense variant located within a functional domain. This update will affect splice donor/acceptor ±1,2 dinucleotide variants predicted to cause in-frame alterations (as noted above, this may include complementary acceptor and donor sites for the same exon) but also other in-frame alterations such as exon deletions. A gene-specific example can be seen in the PVS1 decision tree for ATM developed by the Hereditary Breast, Ovarian, and Pancreatic Cancer VCEP (https://clinicalgenome.org/site/assets/files/7451/clingen_hbop_acmg_specifications_atm_v1_1.pdf).
Given these considerations, our recommended modifications to the original decision tree published by Abou Tayoun et al.2 are summarized in Table 1 in conjunction with examples of PVS1 strength modifications for splice donor/acceptor ±1,2 dinucleotide variant effects within different gene-specific contexts (Figure 2).
Figure 2.
Schematic demonstrating assignment of gene-specific codes to splice donor/acceptor ±1,2 dinucleotide variants based on a modified version of the original ClinGen SVI PVS1 framework
Original framework refers to recommendations as published.2 It is important to note that each PVS1 assigned weight may be reduced if there is evidence of potential rescue mechanisms. For example, skipping of either exon 4 or 7 may lead to a protein that retains partial function. Annotating gene-specific lists of naturally occurring splicing events can provide greater evidence of potential “rescue” isoforms. Also see Box S1.
Re-purposing of the PVS1 and BP7 codes to capture observed in vitro splicing data irrespective of variant location
To date, application of the PVS1 decision tree for splicing has focused on splice donor/acceptor ±1,2 dinucleotide variants. We raise 2 points for consideration. The first is that variation at these dinucleotides has such high probability to impact splicing that confirmation of variant effect by RNA assay data is generally unlikely to alter the starting PVS1 weight. The second point for consideration is that if a variant located outside of the splice donor/acceptor ±1,2 positions is proven to result in an aberrant splicing profile that is interpretable via the PVS1 decision process, it can be assumed to have clinical impact equivalent to that of a splice donor/acceptor ±1,2 dinucleotide variant assigned a PVS1 code (either predictive or based on observed experimental data) using the same PVS1 decision process. We note that these considerations are made assuming curation of variants in the context of Mendelian disease, the baseline assumption for the ACMG/AMP guidelines as originally published. For further guidance on interpretation of splicing profiles, see section below (“adaptive weighting of evidence based on splicing assay type, design and complexity of transcript profile”).
High correlation of splice donor/acceptor PVS1 weights with clinical findings and use of the PVS1 code to capture splice assay results
Experimental confirmation of a splicing event does not necessarily increase confidence in the clinical significance of that event. For example, a variant at the splice donor/acceptor ±1,2 dinucleotide position that is predicted to lead to an in-frame splicing event could be downweighted (PVS1_Strong or PVS1_Moderate), dependent on knowledge of functional relevance and/or the extent of protein lost (Figure 2). Confirmation of such an in-frame splicing event would not change confidence in the clinical significance of that event, and use of PS3 for results from splicing assays in addition to the relevant downweighted PVS1 code would overweight the evidence toward pathogenicity. For splice donor/acceptor ±1,2 dinucleotide variants, splicing assay data has most value in resolving impact where predictions are less confident (e.g., IVS+2T>C variants39) or may suggest multiple possible transcripts; establishing impact and weighting where multi-exon skipping (out of scope of predictions) is suspected. Where variant predicted impact on transcript profile differs from transcript(s) identified using RNA assays, particularly in terms of designated weight, this information should be used to upgrade or downgrade the PVS1 code weight for that splice donor/acceptor ±1,2 dinucleotide in the context of a gene-specific decision tree.
Indeed, a review of clinical laboratory data on splice donor/acceptor ±1,2 dinucleotide variants identified in 1,043 genes currently associated with disease due to LoF (listed in Table S2) supports the a priori assumption that, while the vast majority of such variants are disease causing, additional curation is needed to refine PVS1 weighting. Of 3,400 total splice donor/acceptor dinucleotide variants in these 1,043 genes, 3,031 were internally classified as (likely) pathogenic (89%) and 347 as variants of uncertain significance (VUSs) (10%). Another 22 (0.6%) were classified as (likely) benign. Bioinformatically predicted impact was consistent with location of (likely) pathogenic variants at either acceptor or donor motifs; i.e., acceptor variants were enriched for high acceptor loss scores and donor variants were enriched for high donor loss scores (Figure S4). As detailed in Table S8, classification for (likely) benign variants was largely based on population and/or benign clinical evidence. More detailed examination of (likely) benign variants revealed that five variants reported commonly in gnomAD were likely sequencing artifacts. Predicted splicing effect revealed that another 16 variants were unlikely to lead to functional impact due to location in the 5′ UTR, location in the last coding exon, predicted minimal effect on encoded protein length or readthrough, predicted in-frame event, or no predicted effect on splicing. The remaining variant (c.594−2A>C [GenBank: NM_007294.4] [BRCA1]) is known to be benign due to a rescue transcript mechanism (see Box S1). That is, having excluded sequencing artifacts and after consideration of rescue transcripts, these (likely) benign variants would not be assigned PVS1 based on the thorough application of a PVS1 decision tree (Figure 2).
Moreover, a total of 1,670/3,400 splice donor/acceptor dinucleotide variants were reported in publications; 404 of these had functional information available, of which 392 (97%) were classified as (likely) pathogenic. The remaining publications included cohort analyses, case studies, and review papers. When looking at all variants with publication data, 1,576/1,670 (94%) were (likely) pathogenic, indicating that the vast majority of splice donor/acceptor dinucleotide variants that have been investigated are damaging. Closer review of the published variants revealed a large amount of evidence in favor of pathogenicity including in silico (assessment of predicted null effect, presence at the same position as another pathogenic variant, presence in clinical databases), laboratory data (functional assays), and/or clinical information (de novo status in affected individuals, in trans with a pathogenic variant, identification in an individual with phenotypic fit, segregation with disease in families). To determine how often the classification was impacted by the application of in silico criteria alone, we evaluated how many (likely) pathogenic variants had clinical evidence (PS2, PM3, PM6, PP1, PP4) or laboratory evidence (PS3) versus only in silico evidence. For this comparison, PM2 was not considered as clinical evidence. PVS1 or PVS1_Strong was applied to 2,894 (likely) pathogenic variants. At least 1 (and up to 7) clinical or laboratory criteria were applied to 1,781/2,894 (62%) variants in addition to PVS1 or PVS1_Strong. That is, only 38% relied on absence in controls, an indirect form of clinical data, to reach a likely pathogenic classification. These data further support the proposition that predicted disruption of the canonical splice donor/acceptor ±1,2 dinucleotide that meets at least a strong PVS1 code assignment leads to LoF.
To help distinguish when PVS1 is applied for splice donor/acceptor ±1,2 dinucleotide variants due to splicing assay data as opposed to predictions only, we recommend additional annotation to PVS1 application, such as PVS1_Strength (RNA). If the annotation of “RNA” to PVS1 is not available or possible in relevant curation systems, then curators should note that application of PVS1 was due to the presence of splicing assay data in both the explanation for criteria application as well as the overall variant evidence summary. Subsequently, it is recommended that the PS3 (or BS3) code is applied only for well-established assays assessing functional impact that is not directly captured by RNA-splicing assays (e.g., in vitro assays that by design measure only effects on protein function or cellular assays that capture impact on protein function as well as on mRNA stability or processing).
Using PVS1 decision trees to apply PVS1 for confirmed spliceogenic variants outside of splice donor/acceptor ±1,2 dinucleotides
The second point for consideration relates to variants located outside of the splice donor/acceptor ±1,2 dinucleotides. If in vitro analysis of a variant under investigation results in an aberrant splicing profile that is interpretable via the PVS1 decision process, this can be assumed to have clinical impact equivalent to that of a splice donor/acceptor dinucleotide variant assigned a PVS1 code (predictive or based on experimental data) using the same PVS1 decision tree. This allows a variant originally assigned only a supporting bioinformatic code (PP3 or perhaps even BP4 for poorly predicted splicing events such as exonic splicing enhancer alterations) to be assigned a PVS1 code based on the experimentally observed splicing event effect for LoF. That is, PP3 (or BP4) would be replaced with an appropriately weighted PVS1 code designation based on experimental data. As a result, variants observed to lead to the exact same splicing aberration (type and level) will receive the same experimentally derived PVS1 weight irrespective of their location relative to the canonical dinucleotide positions.
Use of BP7 code to annotate absence of experimentally observed splicing impacts for silent substitution and intronic variants
As noted above, it is recommended that PS3 (or BS3) is applied only for well-established assays assessing functional impact that is not directly captured by RNA-splicing assays. To distinguish when a silent substitution is confirmed to have no impact on splicing in vitro, we recommend upweighting BP7 with an additional annotation, namely BP7_Strong (RNA). Further, consistent with the extended use of BP7 to capture the low prior probability of pathogenicity for intronic variants with no predicted impact on splicing (see section “application of the BP7 computational code”), we recommend that this same annotation may be used to capture in vitro evidence of no splicing impact for intronic variants irrespective of position and predicted impact on splicing. We anticipate that this will be relevant almost exclusively for intronic variants outside of the donor/acceptor splice regions but would also be applicable in the unlikely event that a splice donor/acceptor ±1,2 dinucleotide variant does not impact splicing, e.g., as has been reported for +2C>T changes.38 If the annotation of BP7 to reflect RNA assay is not possible in relevant curation systems, curators should note that application of BP7_Strong was due to the presence of splicing assay data in both the explanation for criteria application as well as overall variant evidence summary.
Adaptive weighting of evidence based on splicing assay type, design, and complexity of transcript profile
Many factors can influence results from splicing assays. Examples of considerations for both splicing assay design and the interpretation of results from splicing assays are listed in Table S9. These include RNA source, assay design, methodology and technology, and quantitative measurement of variant-impacted transcripts. The weight applied to PVS1 or BP7 codes based on experimental data should consider the confidence in the RNA findings having considered such factors. Where possible, gene-specific information should also be incorporated for determining the most appropriate weights.
Conservatively, PVS1 and BP7 codes based on experimental RNA data may be considered applicable at full weight only for results from assays conducted using non-tumor tissue samples from individuals (e.g., fresh blood, cultured lymphocytes, and lymphoblastoid cell lines, or relevant tissue types where gene expression is tissue specific). Because expression of spliced transcripts can be tissue specific,40 it is important that evidence for variant interpretation from splicing assays considers potential relationships between impact of spliceogenic variants and tissue-specific expression of any gene transcript.41 The Human Protein Atlas (https://www.proteinatlas.org/) provides a data resource of RNA expression from 40 non-diseased human tissue types and incorporates 3 large searchable datasets including (1) Human Protein Atlas based on RNA-seq,42 including data from the Genotype-Tissue Expression (GTEx) project,43 and (2) the FANTOM consortium using cap analysis gene expression (CAGE) technology.44 Such datasets form the basis of categorizing gene transcripts for splicing assays based on known expression levels and tissue distribution. Furthermore, they provide key information to consider how closely a splicing assay from the clinically accessible tissue (e.g., blood or fibroblast) reflects biological activity in the relevant disease tissue (e.g., breast or brain) and thereby assess the validity of the splicing assay. To remove the possibility of misinterpreting the clinical relevance of spliceogenic variants in samples from variant carriers, naturally occurring alternative splicing events must be established using control samples from matching tissue.
While assays using relevant material from variant carriers are generally considered preferable for assessing variant effect on RNA splicing, minigene constructs—as a hemizygous system—are useful in providing allele-specific quantitative measurements of exonic and particularly intronic variant impact on splicing. More recently, massively parallel reporter assays (MPRAs) have been applied to screen for spliceogenic effects of thousands of variants in a single experiment.45 Such data could be a rich source of information as a truth dataset for assessing the sensitivity and specificity of computational splicing prediction tools and also as a measure of splicing impact in the context of variant interpretation. Interpretation considerations of splicing results from these construct-based assays include (1) the fact that size constraints of constructs prevent analysis of very large exons and may exclude important regulatory sequences due to restrictions in extent of intronic sequence captured, (2) dependency on both the cell type used for assays and the promoter used in the reporter construct, and (3) that measurement of observed changes in splicing (e.g., using percent spliced in [ψ] from RNA-seq data) requires careful consideration when defining data thresholds for “impact on splicing” in different tissues. In light of these, a conservative approach would be to apply information from construct data alone at lower weight in the absence of calibration of an experimental system against clinical data for proven spliceogenic and non-spliceogenic variants.
Classification of incomplete (“leaky”) splicing variants, which reduce but do not abolish normal splicing in the reference transcript, remain a challenge. The proportion of alternatively spliced gene transcript(s) arising from a variant allele may have a significant impact on the severity of an individual’s clinical phenotype (variable expressivity) and/or be associated with reduced penetrance of the expected clinical phenotypes. It is therefore important to establish what proportion of alternatively spliced gene transcript arising from a variant allele will or will not confer pathogenicity for a given gene-disease relationship. For example, BRCA1 variants resulting in 20%–30% expression of a functional transcript should not be considered pathogenic for hereditary breast-ovarian cancer (MIM: 604370).46 By comparison, partial penetrance of autosomal recessive cystic fibrosis (MIM: 219700) has been correlated with the level of the full-length CFTR transcripts in respiratory epithelial cells from individuals with spliceogenic variants.47 Individuals harboring pathogenic CFTR alleles who have normal lung function and minimal/no lung disease demonstrate that >25% of full-length CFTR transcripts is sufficient to maintain function.
Defining thresholds for incomplete splicing is complex and requires expert knowledge of the disease susceptibility gene. Quantitative assays, such as RNA-seq, are needed to ascertain the level of incomplete splicing for variant carriers in individuals with or without disease. The extent of aberration induced by leaky variants may be evaluated by directly detecting allele-specific expression for variants expressed in an exon or that activate a cryptic splice motif within the intron or otherwise measuring a common exonic variant in cis with the variant under assessment. Another metric commonly used with RNA-seq is calculating the relative expression of splicing events measured by percent splicing index (PSI).48 The PSI value is defined by the number of reads supporting the alternative splicing event (i.e., the non-canonical/abnormal event) divided by the number of all reads in the region covering the splicing event. As noted above, it is important to consider factors such as the gene- and tissue-specific nature of the splicing result, experimental approach (e.g., use of nonsense-mediated decay inhibitors, potential for PCR bias), and technology used to assay splicing. It is therefore important to calibrate thresholds for specific RNA sources and assay protocols using known pathogenic and benign variants to interpret variants as spliceogenic or non-spliceogenic and to consider their value for providing quantitative information. Further, combined evidence across studies of different design can provide complementary information and more confidence in result interpretation, e.g., assay data derived from a variant carrier without allele-specific quantification with an accompanying mini-gene result providing evidence of the complete impact leading to the same splicing aberration.
In addition to the considerations highlighted above, another important aspect for result interpretation and weighting is how to assign PVS1/BP7 evidence strength to experimental data showing complex readouts (2 or more alternative transcripts from the same allele). A logical approach is to assign a PVS1 evidence strength to each individual transcript, pool together transcripts with same evidence strength, and then apply an appropriately conservative “overall” evidence strength that considers the relative contribution of different transcripts to the overall expression (including full-length transcript).
Application of PP3 and BP4 predictive codes for impact at the RNA-splicing level
Computational splicing prediction tools provide key information for variant spliceogenicity (impact on splicing profile) and related pathogenicity (association with disease risk). To date, numerous splicing prediction tools have been used to inform application of computational codes PP3 and BP4 (and the dependent BP7 code). For examples, see Table S4. However, there has been no standardized approach for deriving and applying prediction score thresholds for each tool. Here, we show several examples of approaches to compare sensitivity and specificity of different prediction tools and how to apply computational codes based on tool outputs. Consistent with ClinGen recommendations for tool use by VCEPs, we only assessed tools that were publicly available.
Comparison of splicing prediction tools
Eleven splicing prediction tools covering a wide variety of algorithmic approaches (Table S4) were compared with functional data from a saturation genome editing assay of 13 BRCA1 exons13 that measured cell survival to infer whether splicing variants are benign or pathogenic. Importantly, most variants in this dataset are not present in repositories such as ClinVar or the Human Gene Mutation Database (http://www.hgmd.cf.ac.uk/ac/index.php), which several tools use in training their model. The ROC analysis of each splicing tool demonstrated that SpliceAI outperformed all other methods when assessing the area under the curve (AUC = 0.959) for variants located at conserved positions at the donor and acceptor regions (Figure S5). These results are consistent with those reported in other publications.6,10,49
The strength of evidence associated with SpliceAI prediction score categories was also assessed using the same cell survival dataset.13 As shown in Table S10, SpliceAI score ≥ 0.5 (the cutoff recommended in the original SpliceAI paper13) yielded strong evidence for variant impact on cell survival (i.e., inferred pathogenic; lower 95% confidence interval [CI] LR = 18.58), while SpliceAI score < 0.1 provided moderate evidence for no variant impact on cell survival (i.e., inferred benign, lower 95% CI LR = 0.03). Only a supporting level of evidence is required for application of PP3, but these data suggest that many spliceogenic variants would not receive a predictive code if PP3 was based on SpliceAI score ≥ 0.5. Notably, this assessment used a single preselected SpliceAI cut point and inferred pathogenicity based on analysis of multiplexed assays of variant effect (MAVE) data and a relatively low number of variants (414 total) from selected functional domains of a single gene.
Model for establishing thresholds for splicing prediction tools
We demonstrate below a process for calibrating thresholds for any computational tool using large variant datasets with splicing assay results (Figure 3A). As an example, we investigated the optimal threshold values for SpliceAI (a tool trained on predicting splicing events, not pathogenicity) by comparing the Δ score output for 2,736 variants located outside splice donor/acceptor ±1,2 dinucleotides across 8 genes (BRCA1, BRCA2, MLH1, MSH2, MSH6, PMS2, NF1, and POU1F1) and with associated in vitro splicing assay data curated from the literature. Output from our web tool (https://gwiggins.shinyapps.io/lr_shiny/) was used to compare the sensitivity and specificity of different SpliceAI cut points across 3 categories (see methods). This approach intentionally designated a middle category to represent a range of uninformative spliceogenicity scores and for which bioinformatic codes should not be applied (i.e., both PP3 and BP4 are not met). Based on this analysis, the optimal threshold for assigning PP3 to variants outside the donor/acceptor ±1,2 dinucleotides was determined to be ≥0.2, which equated to a moderate level of evidence for spliceogenicity (Table 1 and Figures 3B and 4). This threshold provided 78% sensitivity for true spliceogenic variants. Sensitivity increased to 91% for the subset of variants located within the donor/acceptor splice region (Figure 1; donor splice region—the last 3 bases of the exon and 3–6 nucleotides of intronic sequence adjacent to the exon; acceptor splice region—first base of the exon and from 3 to 20 nucleotides upstream from the exon boundary) (Table S11). The optimal threshold for assigning BP4 was determined to be ≤0.1 for moderate level of evidence for non-spliceogenicity, with 87% specificity for all variants outside the donor/acceptor ±1,2 dinucleotides (Table 1) and 73% specificity for variants within the donor/acceptor splice region (Table S11). LRs for narrower bands of scores around the selected cut points were consistent with moderate evidence against spliceogenicity (e.g., SpliceAI score range 0.08–0.1, LR = 0.122) and supporting evidence for spliceogenicity (e.g., score range 0.20–0.22, LR = 2.52).
Figure 3.
Model for optimizing thresholds for prediction algorithms of alternative splicing
(A) Schematic demonstrating how collation of 3 variant datasets (in vitro splicing data, splicing prediction scores, and clinical classification data) enable calibration of splicing prediction algorithms for pathogenicity. While clinically classified variant data is preferable, splicing assay data can be used as an imperfect surrogate for pathogenicity. More extensive annotation of alternative splicing events and level of aberration will lead to an improved correlation of splicing events with variant pathogenicity. The distribution of hypothetical computationally predicted splice scores is illustrated, showing significant overlap of non-spliceogenic/spliceogenic datasets (left side) and benign/pathogenic datasets (right side). The low, intermediate, and high prediction score used to assign ACMG/AMP code weighting can be determined by calculating likelihood ratios for different score categories and obtaining consensus on the score thresholds to be applied.
(B) Process for calibrating splicing prediction score thresholds for computational tools. A worked example of a likelihood ratio calculation is shown in Table S5. Note: truth datasets were filtered to exclude splice donor/acceptor ±1,2 dinucleotide variants, which are captured by the PVS1 decision tree process.
Figure 4.
An exemplar PP3, BP4, and BP7 decision tree for maximum SpliceAI splicing prediction scores and calibrated cutoff scores
The analytical process is shown in Figure 3B and data shown in Table 1. BP7 should not be applied for donor/acceptor splice regions given their higher prior for harboring spliceogenic variants. This may be defined as the splice region (a conservative application already implemented by several VCEPs) or the minimal splice region. PP3 may still be applied for missense or insertion-deletion variants that show computational evidence for a deleterious effect for change in protein sequence.
It should be noted that the SpliceAI calibration results were not materially different when excluding large-scale construct-based datasets (data not shown). Although the results are based on a limited number of genes (given the availability of highly curated information on variant-associated splicing impact), they are expected to be widely applicable because the mechanism of RNA splicing is not gene specific. It is notable that the threshold of 0.2 for predicting spliceogenicity is consistent with that defined by the developers of SpliceAI for a high recall threshold.12
The moderate strength of evidence for spliceogenicity associated with the upper (≥0.2) and lower (≤0.1) SpliceAI thresholds did not change when assessing variants within the donor/acceptor splice region and was at least supporting for variants (exonic or intronic) outside the splice region (and donor/acceptor dinucleotides) (Table S11). Results from analysis restricting to variants within the minimal splice region (Table S12) showed similar positive predictions (92%) but more false positive predictions of spliceogenicity (28%). Additional analysis separating exonic and intronic variants outside the minimal splice region (Table S12) showed much good negative prediction (84%) but lower positive prediction for exonic variants (58%), presumably due to poorer prediction of variants impacting exonic splicing enhancers (ESEs). Both positive and negative predictions were good for intronic variants (>84%).
These observations are consistent with overall higher prior for spliceogenicity for variants located at highly conserved splice motif positions and justify the conservative approach that BP7 should not be applied for variants within a designated splice region (standard or minimal). Findings also justify that application of BP7 is warranted for both synonymous exonic and intronic variants outside designated splice regions and with no predicted splice impact (BP4 met).
We recognize that not all spliceogenic variants will be pathogenic either due to incomplete effect (level of aberration) and/or type of resulting aberration(s) (in-frame or rescue transcripts); however, at this point in time, current tools are not yet sufficiently developed to accurately predict these relationships. Further, there are many reasons why a specific aberration type may not be captured by existing tools (e.g., large intron retention or multi-exon skipping). For this reason, combined with the results for sensitivity analysis considering LRs for narrow bands around selected cut points, a conservative approach is to calibrate codes to reach a moderate level of evidence but only apply predictive PP3/BP4 codes at a supporting weight.
We provide a model for evaluating and calibrating individual bioinformatic tools. We demonstrate that prediction using a single tool, trained in a data-driven way to detect splicing impact via multiple mechanisms, can be sufficient to provide at least supporting level of evidence for application of BP4 or PP3. Future analysis with much larger datasets that allow more extensive stratification by variant location, and also considering more detailed examination of mRNA transcript impact on encoded protein,50 will provide more clarity on the applicability of higher weights, as has recently been recommended in the context of missense prediction.51 There will be a continuing need to train, test, and calibrate tools to improve prediction for variants resulting in pseudoexonization (influenced by a range of factors) and/or variants that impact atypical (non-GT-AG) splice donor/acceptor ±1,2 positions or splicing elements with relatively poor conservation and/or redundancy such as exonic and intronic splicing enhancers/silencers and branch points.10,52,53,54 Where a single tool cannot provide adequate prediction across multiple mechanisms, there may be value in using complementary tools to detect splicing aberrations due to different mechanisms following a decision-tree process.55 Where possible, gene-specific knowledge (e.g., naturally occurring isoform patterns) should be considered in assessing and applying bioinformatic scores for predicted splicing as part of variant curation protocols.
Application of the BP7 computational code
The description for BP7, as drawn from the original ACMG/AMP publication,1 is as follows: “A synonymous (silent) variant for which splicing prediction algorithms predict neither an impact to the splice consensus sequence nor the creation of a new splice site AND the nucleotide is not highly conserved.” Review of existing VCEP definitions for this code indicate variability in application, particularly in relation to consideration of conservation and relevance to intronic variation. We highlight the following issues to consider regarding use of this code to capture bioinformatic predictions.
First, Richards et al.1 recommend that BP7 is applicable after assignment of BP4 for no adverse splicing predictions in order to capture the low prior probability of pathogenicity of silent variants. We further recommend that BP7 should not be applied for synonymous substitutions in certain locations, specifically those located at the first base or the last 3 bases of the exon, with higher likelihood to impact splicing due to disruption of the acceptor or donor motif. Motif range may be altered as appropriate for non-GT-AG introns.
Second, we caution against inclusion of evolutionary conservation in assessment of silent or intronic variants for application of code BP7 without empirically derived justification for choice of tool, evolutionary depth, and cut point to define a nucleotide as “not highly conserved.” Published evidence from an analysis of 27,733 variants within or adjacent to 2,198 human exons shows that while mean conservation score is higher for splice-disrupting variants (as expected given the nucleotide conservation is a feature of functional motifs), conservation has limited predictive power to detect non-splice disrupting variants.45 Further, secondary analysis of our curated dataset indicates that, for silent and intronic variants for which BP4 might be applied on the basis of SpliceAI score ≤ 0.1, fewer variants would have BP7 applied with no improvement in negative predictive value for applying an additional filter of “not highly conserved” (Table S13). This likely reflects that splicing prediction tools implicitly capture conservation in the context of predicted variant impact via abrogation of function of splicing motifs, while position weight matrices also allow for some nucleotide sequence variation within the motif.
Third, we acknowledge and concur with protocols for many diagnostic testing laboratories and VCEPs that the BP7 code may be applied for intronic variants, assuming that mechanism of possible effects will be on mRNA processing, as invoked for silent variants. Currently, 10 VCEPs (Brain Malformations; CDH1; DICER and miRNA-Processing Gene; ENIGMA BRCA1 and BRCA2; Glaucoma; Hereditary Breast, Ovarian, and Pancreatic Cancer; InSiGHT Hereditary Colorectal Cancer/Polyposis; Myeloid Malignancy; PTEN; and RASopathy) refer to the use of BP7 to include intronic variant classification (Table S1). Guidelines from 6 VCEPs (CDH1; DICER and miRNA-Processing Gene; ENIGMA; Hereditary Breast, Ovarian, and Pancreatic Cancer [PALB2], InSiGHT Hereditary Colorectal Cancer/Polyposis [APC]; and PTEN) specifically note that BP7 is only applicable for intronic/non-coding variants at or beyond positions +7/−21 with no predicted effect on splicing and that this code is applied in addition to BP4. The Hereditary Breast, Ovarian, and Pancreatic Cancer VCEP has designated BP7 application for variants at or beyond positions +7/−40 in ATM (MIM: 607585). The ENIGMA BRCA1 and BRCA2 VCEP specifications justify the application of BP7 for variants outside donor/acceptor splice regions based on maximum likelihood estimation analysis of breast cancer (MIM: 114480) case-control data56 with results indicating that location of an intronic variant at or beyond positions +7/−21 provides moderate evidence against pathogenicity, even without applying a bioinformatic prediction filter.
Reassessment of the designation of intronic nucleotide boundaries and the strength of evidence for BP7 code application (in addition to BP4) for intronic variants over time will be important as more information accrues from sequencing of clinical cohorts. Until such time, we consider a conservative application of BP7 as a predictive computational code for intronic variants located outside the donor/acceptor splice regions designated here (at or beyond positions +7/−21) having also met BP4, consistent with application by several VCEPs to date (as noted above). However, we acknowledge, based on the analysis presented here (Table S12), that there is rationale to expand application of BP7 to intronic variants located outside the minimal splice region (Figure 1). (Note: this restriction with respect to intronic variant location is not relevant to the application of BP7_Strong to capture RNA results demonstrating no impact on splicing.)
Adaptation of the PS1 code to incorporate splicing predictions
The ACMG-AMP code PS1 is strong pathogenic evidence defined as “same amino acid change as a previously established pathogenic variant regardless of nucleotide change.” The underlying premise for the PS1 code is that the clinical evidence for a previously classified missense variant can be applied to infer pathogenicity for the same predicted molecular change at the protein level. Although both the pathogenic variant and variant under assessment (VUA) should have no predicted/confirmed effect on RNA splicing for PS1 to be applied for a missense variant, there is no specific requirement for experimental validation of the predicted molecular effect of the amino acid change encoded by the pathogenic variant or the VUA.
There is thus rationale to apply code PS1 in the context of exonic and intronic variants based on similarity of predicted RNA effects for a VUA in comparison with a known pathogenic variant. This follows the logic that this PS1 code captures existing evidence of pathogenicity for a variant with an identical mechanism of pathogenicity. Indeed, this concept has been used in specifications (or pilot specifications) for four different ClinGen VCEPs (Hearing Loss; ENIGMA BRCA1 and BRCA2; Hereditary Breast, Ovarian, and Pancreatic Cancer; and PTEN) and also by CanVIG (Table S1). However, the recommendations differ between 4 of the 5 expert groups in relation to the relative location of the 2 nucleotide changes being compared, applicability to variants at the splice donor/acceptor ±1,2 dinucleotide positions, applicability to exonic variants, code weight, and/or applicability when the previously classified variant has likely pathogenic assertion versus pathogenic assertion. Further, the current specifications for the DNA mismatch repair (MMR) gene VCEP denote ACMG-AMP code PM5 (originally defined as “missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before”) in the context of predicted splicing for a variant under assessment, compared with both prediction and splice assay data for a comparison variant, namely “variants affecting the same splice site as a confirmed splice variant with similar or worse splicing in silico predictions.”
Here, we provide more guidance on the application of a PS1 code with justification for differences in weights considering the relative locations of the variant being assessed and the previously classified (likely) pathogenic variant (Table 2). The concept of PM5 for splicing prediction is not considered directly applicable, because the recommendation is to apply PS1 only when the predicted event of the known (likely) pathogenic variant precisely matches the predicted event of VUA, e.g., where the effect is via impact on the same motif, such as 2 different variants within the same splice donor or acceptor dinucleotide that are both predicted with similar scores to disrupt function of that donor site or 2 different variants in the same donor or acceptor splice region that are both predicted with similar scores to lead to use of the same cryptic splice motif.
Table 2.
PS1 code weights for variants with same predicted splicing event as a known (likely) pathogenic variant
Variant under assessment (VUA) | Baseline computational/predictive code applicable to VUA | Position of comparison variant relative to VUA |
PS1 code applicable to VUA |
|
---|---|---|---|---|
with P comparison variant | with LP comparison variant | |||
Located outside splice donor/acceptor ±1,2 dinucleotide positions | PP3 | same nucleotide | PS1 | PS1_Moderate |
PP3 | within same splice donor/acceptor motif (including at ±1,2 positions) | PS1_Moderate | PS1_Supporting | |
Located at splice donor/acceptor ±1,2 dinucleotide positions |
PVS1 | within same splice donor/acceptor ±1,2 dinucleotide | PS1_Supporting | N/A |
PVS1 | within same splice donor/acceptor region, but outside ±1,2 dinucleotidea | PS1_Supporting | PS1_Supporting | |
PVS1_Strong, PVS1_Moderate, or PVS1_Supporting | within same splice donor/acceptor ±1,2 dinucleotide | PS1 | N/A | |
PVS1_Strong, PVS1_Moderate, or PVS1_Supporting | within same splice donor/acceptor motif, but outside ±1,2 dinucleotidea | PS1_Moderate | PS1_Supporting |
Prerequisite for all: the predicted event of the VUA must precisely match the predicted event of the comparison (likely) pathogenic variant (e.g., both predicted to lead to exon skipping, or both to lead to enhanced use of a cryptic splice motif, AND the strength of the prediction for the VUA must be of similar or higher strength than the strength of the prediction for the comparison [likely] pathogenic variant). For an exonic variant, predicted or proven functional effect of missense substitution(s) encoded by the VUA and (likely) pathogenic variant should also be considered before application of this code. Dinucleotide positions refer to donor and acceptor dinucleotides in reference transcript(s) used for curation. Designated donor and acceptor motif ranges should be based on position weight matrices for intron category (see methods). For GT-AG introns these are defined as follows: the donor motif, last 3 bases of the exon and 6 nucleotides of intronic sequence adjacent to the exon; acceptor motif, first base of the exon and 20 nucleotides upstream from the exon boundary. Consider other motif ranges for non-GT-AG introns.
If relevant, splicing assay data for a pathogenic variant outside a ±1,2 dinucleotide position may be used to update a PVS1 decision tree and hence the applicable PVS1 code for a ±1,2 dinucleotide variant.
Application of the code varies depending on whether the VUA and the comparison known (likely) pathogenic variant are located outside or inside a splice donor/acceptor ±1,2 dinucleotide (see Table 2) and VUA location relative to the (likely) pathogenic variant. PS1 is applied at full strength for a variant located outside the splice donor/acceptor ±1,2 dinucleotide with a predicted RNA event that matches (with similar or higher strength of prediction) that of another pathogenic variant at the same nucleotide. PS1 may be applied at moderate weight for a VUA in the event that the comparison variant is classified as likely pathogenic under the assumption that some clinical information would have been required to apply this classification for a donor/acceptor splice region variant. The predicted splicing event could be the result of any combination of loss of the native splice donor/acceptor motif or gain/strengthening of a splice donor/acceptor motif, but as noted above, the predicted event for the known (likely) pathogenic variant must match the predicted event of the VUA.
There is also allowance for comparison of a VUA to a (likely) pathogenic variant at different nucleotide positions outside the donor/acceptor dinucleotide positions. We recommend limiting the comparisons to variants located within the same splice donor or acceptor motif and reducing criterion weight by one strength level. These restrictions recognize the chance that, despite similarity in predicted impact, the variant position may possibly be associated with differences in type(s) and level of RNA aberration(s) produced. We suggest that the designation of donor and acceptor motif ranges be based on position weight matrices (see Table 2 for more information).
PS1 may also be applied for a variant located at the donor/acceptor splice dinucleotides and for which there is another (likely) pathogenic variant located within the same splice motif (including at the splice dinucleotide positions). However, in most instances, the proposed strength levels are reduced to prevent overweighting of the VUA compared with the original (likely) pathogenic comparison variant. Suggested weights consider whether the comparison variant lies within the same splice donor/acceptor ±1,2 dinucleotide or at other positions within the same motif, the PVS1 weight applicable to the VUA, and whether the comparison variant is classified as pathogenic or likely pathogenic.
Combined application of ACMG/AMP codes that capture evidence relating to variant location, splicing predictions, splicing assay data, and variant type
As denoted, multiple ACMG/AMP criteria may be suitable to record structured evidence relevant to a variant’s (predicted) impact on splicing. We propose re-purposing of existing evidence codes (PVS1, BP7) to capture mRNA-splicing assay data separately from other protein or cell-based functional assay data (recorded under PS3/BS3) and abrogating bioinformatic prediction codes when splicing assay data is used to designate a variant as LoF following the PVS1 decision process.
Figure 5 provides a general scheme of code application and strength based on variant position, bioinformatic predictions, other (likely) pathogenic variants with similar predicted impact on splicing, and how to add, replace, or modify codes based on RNA/splicing assay data. We anticipate that the more challenging aspects to implementing these splicing interpretation recommendations will be (1) decisions on similarity in predicted effect when comparing predictions for a VUA against a known (likely) pathogenic variant, (2) assigning baseline PVS1 code weights, (3) consideration of factors that are relevant to determining and/or downweighting of PVS1 or BP7 codes based on splice assay results, including partial splicing impact or complex aberration profiles, (4) removal of PP3/BP4 prediction codes upon upgrading of a variant to PVS1 on the basis of splicing data (consistent with recommendations to not use PP3 together with PVS1), and (5) recording—but not applying a code for—splicing data for missense or in-frame insertion/deletion variants in the absence of protein functional data that is required to fully account for all potential mechanism(s) of variant impact.
Figure 5.
Decision tree for application of bioinformatic codes and RNA-splicing assay results for variant interpretation
(A) Alternative prediction tools/thresholds may be appropriate for variants that impact sites other than GT-AT donor-acceptor motifs.
(B) LP variants at the canonical positions should only be used as evidence if additional supporting clinical evidence is present.
(C) Silent (excluding last 3 nucleotides of exon and first nucleotide of exon) and intronic variants at or beyond the +7 and −21 positions (conservative designation for donor/acceptor splice region) or otherwise at or beyond the +7 and −4 positions (less conservative designation for the minimal donor/acceptor splice region).
(D) If multiple impacts are observed from a splicing assay, use flowchart for the most conservative application of PVS1 based on experimental data.
(E) We recommend that these thresholds be refined and applied in a disease- and gene-specific manner, including advice from VCEPs. Categorization as complete or near complete needs to consider multiple factors, including assay/technique, RNA source, and validation of assay weights using established controls. Examples of laboratory-specific approaches and suggested operational thresholds have been reported previously.27,63,64,65,66 See Table S9 for additional considerations for interpretation of mRNA assay data.
Accounting for all potential disease mechanisms of other coding variant types (e.g., missense, in-frame insertions/deletions) can be exceptionally challenging during variant classification. We generally recommend that the effects on the protein with and without splicing impact be evaluated independently as separate variant classifications based on these different mechanisms, with the most deleterious classification resulting in the final classification. For variants with experimental splicing assay data indicating no impact, we conservatively recommend retaining the most deleterious effect of the bioinformatic prediction (i.e., the most severe impact of splicing or other variant effect, such as a missense alteration). Splicing results should be recorded as explanatory text, but BP7_Strong (RNA) should not be applied in the final classification of such coding variants without evidence of lack of impact for the missense alteration at the protein level. In cases where experimental data indicate no functional impact for the other coding variant effects and experimental splicing assay data indicate no impact on splicing, then BP7_Strong (RNA) may be applied to allow for optimal tracking of experimental data. Rarely, if relevant gene-specific considerations obviate need for functional assays for other coding variant effects, e.g., outside of known clinically important functional domains, then BP7_Strong (RNA) may be applied in the absence of protein functional data.
Conclusions
This manuscript provides recommendations regarding use of ACMG/AMP codes to better categorize splicing prediction and/or laboratory (splicing assay) evidence, including clarification on the application of existing splicing-related codes and re-purposing of other codes to capture splicing-related evidence (summarized in Table 3). We describe generic protocols useful for assigning strength levels to different evidence criteria to facilitate recalibration of code strengths as new information accrues. We also provide a generic decision tree to guide variant assessment with combinations of evidence codes relating to variant location, splicing predictions, splicing assay data, and variant type.
Table 3.
ACMG/AMP codes recommended for recording evidence relevant to variant position and predicted and experimentally observed impact on splicing
ACMG/AMP Code | Original definition1 | (Re)definition for RNA impact | Notes |
---|---|---|---|
PVS1 | PVS1. Null variant (nonsense, frameshift, “canonical ±1 or 2 splice sites,” initiation codon, single or multi-exon deletion) in a gene where LoF is a known mechanism of disease. | bioinformatic data onry–variants impacting splice donor/acceptor ±1,2 dinucleotides in a gene with established LoF as a disease mechanism |
|
PVS1_Strength (RNA) | – | splicing assay data–assays demonstrating that a variant leads to aberrant splicing profile that can be categorized against a PVS1 decision tree |
|
PS1 | PS1. Same amino acid change as a previously established pathogenic variant regardless of nucleotide change | same predicted splicing impact as a previously classified (likely) pathogenic variant |
|
PS3 | PS3. Well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product | not applicable for splicing effects |
|
PP3 | PP3. Multiple lines of computational evidence support a deleterious effect on the gene or gene product (conservation, evolutionary, splicing impact, etc.). Caveat: Because many in silico algorithms use the same or very similar input for their predictions, each algorithm should not be counted as an independent criterion. PP3 can be used only once in any evaluation of a variant. |
Computational evidence from calibrated prediction tool(s) supports impact on splicing. |
|
BS3 | BS3. Well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product. | not applicable for splicing effects |
|
BP4 | BP4. Multiple lines of computational evidence suggest no impact on gene or gene product (conservation, evolutionary, splicing impact, etc.). Caveat: Because many in silico algorithms use the same or very similar input for their predictions, each algorithm cannot be counted as an independent criterion. BP4 can be used only once in any evaluation of a variant. |
Computational evidence from calibrated prediction tool(s) supports no impact on splicing. |
|
BP7 | BP7. A synonymous (silent) variant for which splicing prediction algorithms predict no impact to the “splice consensus sequence or the creation of a new splice site” AND the nucleotide is not highly conserved | synonymous (silent) variant or intronic variant with low prior probability of pathogenicity if no predicted impact on splicing |
|
BP7_Strong (RNA) | – | Splicing assay data demonstrating a variant is not associated with aberrantly spliced transcript(s) relative to transcript profiles in controls. |
|
The framework presented herein is generally applicable to the majority of splice donor/acceptor motifs in mammalian genomes, with 98% harboring GT-AG dinucleotides and so-called consensus splice motifs.57 Evaluation of other splice donor-acceptor dinucleotide pairs (e.g., GC-AG) used for splicing, or variation in genomic regions with low prior probability of altering splicing (e.g., branch points, deep intronic leading to pseudoexonization), may also result in splicing aberrations. It is anticipated that these additional mechanisms underlying splicing aberrations will most likely be identified through the detailed evaluation of unexplained clinical cases through experimental RNA assay data rather than genomic DNA analysis. As more data accrues, it will be important to assess and further refine bioinformatic tool predictive capacity. Such efforts will benefit from large-scale high-throughput studies that have not been directed toward clinically detected variants to mitigate against potential bias toward alterations at more conserved sites.
We provide a web-based tool that can be used to recalibrate using new datasets or subsets of datasets (https://gwiggins.shinyapps.io/lr_shiny/). To ensure gene- and disease-specification requirements may be assessed, we recommend that ClinGen VCEPs and other expert groups continually improve on these recommendations as their knowledge of clinically relevant transcripts and their variation in expression evolve through the use of new tools and molecular techniques.
Acknowledgments
A.B.S. and M.T.P. were supported by Australian NHMRC funding (APP177524). L.C.W. and G.A.R.W. were supported by Health Research Council of New Zealand funding (19/460 and 22/187) and the Mackenzie Charitable Foundation. We thank Hadley Northcott for his contribution to the development of the BRCA1/BRCA2 splicing dataset. M.d.l.H. is supported by a grant from the Spanish Ministry of Science and Innovation, Plan Nacional de I+D+I 2013-2016, ISCIII (PI20/00110) co-funded by FEDER from Regional Development European Funds (European Union). A.B.B. and S.M.H. were supported by NIH grant U24 HG006834. ClinGen is primarily funded by the National Human Genome Research Institute (NHGRI) with co-funding from the National Cancer Institute (NCI) through the following grants: U24 HG009649 (to Baylor/Stanford), U24 HG006834 (to Broad/Geisinger), and U24 HG009650 (to UNC/Kaiser). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The funders had no role in the design of the study, the collection, analysis, or interpretation of the data, the writing of the manuscript, or the decision to submit the manuscript for publication.
Author contributions
L.C.W., A.L., L.M.V., T.P., R.K., S.H., and A.B.S. conceived the study design. The complete writing group consisted of L.C.W., M.d.l.H., A.L., L.M.V., A.B.B., T.P., R.K., S.H., and A.B.S. A.L. provided a dataset comprising individuals from a clinical cohort. Data analysis was performed by L.C.W., G.A.R.W., A.L., M.T.P., D.M.C., D.B.-B., A.C., A.T., and H.Z. S.H. chaired the ClinGen SVI Splicing Working Group and presented the data to the ClinGen SVI Working Group. All authors read and approved the final manuscript.
Declaration of interests
A.L., L.M.V., S.H., H.Z., R.K., D.B.-B., A.C., A.T., and T.P. are employed by fee-for-service laboratories performing clinical sequencing services.
Published: June 22, 2023
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2023.06.002.
Contributor Information
Steven M. Harrison, Email: sharriso@broadinstitute.org.
ClinGen Sequence Variant Interpretation Working Group:
Leslie G. Biesecker, Steven M. Harrison, Ahmad A. Tayoun, Jonathan S. Berg, Steven E. Brenner, Garry R. Cutting, Sian Ellard, Marc S. Greenblatt, Peter Kang, Izabela Karbassi, Rachel Karchin, Jessica Mester, Anne O’Donnell-Luria, Tina Pesaran, Sharon E. Plon, Heidi L. Rehm, Natasha T. Strande, Sean V. Tavtigian, and Scott Topper
Web resources
Likelihood ratio for spliceogenicity, https://gwiggins.shinyapps.io/lr_shiny/
Supplemental information
Data and code availability
GeneDx has contributed deidentified data to this study to improve clinical interpretation of genomic data in accordance with individual consent and in conformance with the ACMG position statement on genomic data sharing. The availability of the GeneDx dataset is restricted due to the nature of consent for clinical testing. GeneDx-referred individuals are consented for aggregate, deidentified research and subject to US HIPAA privacy protection. As such, GeneDx is not able to share individual-level BAM or VCF data, which are potentially identifiable without a HIPAA business associate agreement. Access to the deidentified aggregate data used in this analysis is available upon request to GeneDx. The cell survival assay data used to compare performance of difference splicing prediction tools is available from the original published article.13 The variant and splicing assay data collated from multiple publications for calibration of SpliceAI have been included in Table S3.
References
- 1.Richards S., Aziz N., Bale S., Bick D., Das S., Gastier-Foster J., Grody W.W., Hegde M., Lyon E., Spector E., et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 2015;17:405–424. doi: 10.1038/gim.2015.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Abou Tayoun A.N., Pesaran T., DiStefano M.T., Oza A., Rehm H.L., Biesecker L.G., Harrison S.M., ClinGen Sequence Variant Interpretation Working Group ClinGen SVI Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion. Hum. Mutat. 2018;39:1517–1524. doi: 10.1002/humu.23626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Brnich S.E., Abou Tayoun A.N., Couch F.J., Cutting G.R., Greenblatt M.S., Heinen C.D., Kanavy D.M., Luo X., McNulty S.M., Starita L.M., et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 2019;12:3. doi: 10.1186/s13073-019-0690-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ghosh R., Harrison S.M., Rehm H.L., Plon S.E., Biesecker L.G., ClinGen Sequence Variant Interpretation Working Group Updated recommendation for the benign stand-alone ACMG/AMP criterion. Hum. Mutat. 2018;39:1525–1530. doi: 10.1002/humu.23642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Spurdle A.B., Greville-Heygate S., Antoniou A.C., Brown M., Burke L., de la Hoya M., Domchek S., Dörk T., Firth H.V., Monteiro A.N., et al. Towards controlled terminology for reporting germline cancer susceptibility variants: an ENIGMA report. J. Med. Genet. 2019;56:347–357. doi: 10.1136/jmedgenet-2018-105872. [DOI] [PubMed] [Google Scholar]
- 6.Ha C., Kim J.W., Jang J.H. Performance Evaluation of SpliceAI for the Prediction of Splicing of NF1 Variants. Genes. 2021;12:1308. doi: 10.3390/genes12091308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Halperin R.F., Hegde A., Lang J.D., Raupach E.A., C4RCD Research Group. Legendre C., Liang W.S., LoRusso P.M., Sekulic A., Sosman J.A., et al. Improved methods for RNAseq-based alternative splicing analysis. Sci. Rep. 2021;11:10740. doi: 10.1038/s41598-021-89938-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Houdayer C., Caux-Moncoutier V., Krieger S., Barrois M., Bonnet F., Bourdon V., Bronner M., Buisson M., Coulet F., Gaildrat P., et al. Guidelines for splicing analysis in molecular diagnosis derived from a set of 327 combined in silico/in vitro studies on BRCA1 and BRCA2 variants. Hum. Mutat. 2012;33:1228–1238. doi: 10.1002/humu.22101. [DOI] [PubMed] [Google Scholar]
- 9.Moles-Fernández A., Duran-Lozano L., Montalban G., Bonache S., López-Perolio I., Menéndez M., Santamariña M., Behar R., Blanco A., Carrasco E., et al. Computational Tools for Splicing Defect Prediction in Breast/Ovarian Cancer Genes: How Efficient Are They at Predicting RNA Alterations? Front. Genet. 2018;9:366. doi: 10.3389/fgene.2018.00366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Riepe T.V., Khan M., Roosing S., Cremers F.P.M., t Hoen P.A.C. Benchmarking deep learning splice prediction tools using functional splice assays. Hum. Mutat. 2021;42:799–810. doi: 10.1002/humu.24212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Tamana S., Xenophontos M., Minaidou A., Stephanou C., Harteveld C.L., Bento C., Traeger-Synodinos J., Fylaktou I., Yasin N.M., Abdul Hamid F.S., et al. Evaluation of in silico predictors on short nucleotide variants in HBA1, HBA2, and HBB associated with haemoglobinopathies. Elife. 2022;11:e79713. doi: 10.7554/eLife.79713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Jaganathan K., Kyriazopoulou Panagiotopoulou S., McRae J.F., Darbandi S.F., Knowles D., Li Y.I., Kosmicki J.A., Arbelaez J., Cui W., Schwartz G.B., et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell. 2019;176:535–548.e24. doi: 10.1016/j.cell.2018.12.015. [DOI] [PubMed] [Google Scholar]
- 13.Findlay G.M., Daza R.M., Martin B., Zhang M.D., Leith A.P., Gasperini M., Janizek J.D., Huang X., Starita L.M., Shendure J. Accurate classification of BRCA1 variants with saturation genome editing. Nature. 2018;562:217–222. doi: 10.1038/s41586-018-0461-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Eilbeck K., Lewis S.E., Mungall C.J., Yandell M., Stein L., Durbin R., Ashburner M. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 2005;6:R44. doi: 10.1186/gb-2005-6-5-r44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Thompson B.A., Walters R., Parsons M.T., Dumenil T., Drost M., Tiersma Y., Lindor N.M., Tavtigian S.V., de Wind N., Spurdle A.B., InSiGHT Variant Interpretation Committee Contribution of mRNA Splicing to Mismatch Repair Gene Sequence Variant Interpretation. Front. Genet. 2020;11:798. doi: 10.3389/fgene.2020.00798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Gergics P., Smith C., Bando H., Jorge A.A.L., Rockstroh-Lippold D., Vishnopolska S.A., Castinetti F., Maksutova M., Carvalho L.R.S., Hoppmann J., et al. High-throughput splicing assays identify missense and silent splice-disruptive POU1F1 variants underlying pituitary hormone deficiency. Am. J. Hum. Genet. 2021;108:1526–1539. doi: 10.1016/j.ajhg.2021.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Cartegni L., Chew S.L., Krainer A.R. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat. Rev. Genet. 2002;3:285–298. doi: 10.1038/nrg775. [DOI] [PubMed] [Google Scholar]
- 18.Churbanov A., Rogozin I.B., Deogun J.S., Ali H. Method of predicting splice sites based on signal interactions. Biol. Direct. 2006;1:10. doi: 10.1186/1745-6150-1-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Tavtigian S.V., Greenblatt M.S., Harrison S.M., Nussbaum R.L., Prabhu S.A., Boucher K.M., Biesecker L.G., ClinGen Sequence Variant Interpretation Working Group ClinGen SVI Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet. Med. 2018;20:1054–1060. doi: 10.1038/gim.2017.210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Garrett A., Callaway A., Durkie M., Cubuk C., Alikian M., Burghel G.J., Robinson R., Izatt L., Talukdar S., Side L., et al. Cancer Variant Interpretation Group UK (CanVIG-UK): an exemplar national subspecialty multidisciplinary network. J. Med. Genet. 2020;57:829–834. doi: 10.1136/jmedgenet-2019-106759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Riggs E.R., Church D.M., Hanson K., Horner V.L., Kaminsky E.B., Kuhn R.M., Wain K.E., Williams E.S., Aradhya S., Kearney H.M., et al. Towards an evidence-based process for the clinical interpretation of copy number variation. Clin. Genet. 2012;81:403–412. doi: 10.1111/j.1399-0004.2011.01818.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lek M., Karczewski K.J., Minikel E.V., Samocha K.E., Banks E., Fennell T., O'Donnell-Luria A.H., Ware J.S., Hill A.J., Cummings B.B., et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. doi: 10.1038/nature19057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P., et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Burdon K.P., Graham P., Hadler J., Hulleman J.D., Pasutto F., Boese E.A., Craig J.E., Fingert J.H., Hewitt A.W., Siggs O.M., et al. Specifications of the ACMG/AMP variant curation guidelines for myocilin: Recommendations from the clingen glaucoma expert panel. Hum. Mutat. 2022;43:2170–2186. doi: 10.1002/humu.24482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Morales J., Pujar S., Loveland J.E., Astashyn A., Bennett R., Berry A., Cox E., Davidson C., Ermolaeva O., Farrell C.M., et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature. 2022;604:310–315. doi: 10.1038/s41586-022-04558-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Brandão R.D., Mensaert K., López-Perolio I., Tserpelis D., Xenakis M., Lattimore V., Walker L.C., Kvist A., Vega A., Gutiérrez-Enríquez S., et al. Targeted RNA-seq successfully identifies normal and pathogenic splicing events in breast/ovarian cancer susceptibility and Lynch syndrome genes. Int. J. Cancer. 2019;145:401–414. doi: 10.1002/ijc.32114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Landrith T., Li B., Cass A.A., Conner B.R., LaDuca H., McKenna D.B., Maxwell K.N., Domchek S., Morman N.A., Heinlen C., et al. Splicing profile by capture RNA-seq identifies pathogenic germline variants in tumor suppressor genes. NPJ Precis. Oncol. 2020;4:4. doi: 10.1038/s41698-020-0109-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lopez-Perolio I., Leman R., Behar R., Lattimore V., Pearson J.F., Castéra L., Martins A., Vaur D., Goardon N., Davy G., et al. Alternative splicing and ACMG-AMP-2015-based classification of PALB2 genetic variants: an ENIGMA report. J. Med. Genet. 2019;56:453–460. doi: 10.1136/jmedgenet-2018-105834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Walker L.C., Lattimore V.L., Kvist A., Kleiblova P., Zemankova P., de Jong L., Wiggins G.A.R., Hakkaart C., Cree S.L., Behar R., et al. Comprehensive Assessment of BARD1 Messenger Ribonucleic Acid Splicing With Implications for Variant Classification. Front. Genet. 2019;10:1139. doi: 10.3389/fgene.2019.01139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Davy G., Rousselin A., Goardon N., Castéra L., Harter V., Legros A., Muller E., Fouillet R., Brault B., Smirnova A.S., et al. Detecting splicing patterns in genes involved in hereditary breast and ovarian cancer. Eur. J. Hum. Genet. 2017;25:1147–1154. doi: 10.1038/ejhg.2017.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Schoch K., Tan Q.K.G., Stong N., Deak K.L., McConkie-Rosell A., McDonald M.T., Undiagnosed Diseases Network. Goldstein D.B., Jiang Y.H., Shashi V. Alternative transcripts in variant interpretation: the potential for missed diagnoses and misdiagnoses. Genet. Med. 2020;22:1269–1275. doi: 10.1038/s41436-020-0781-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Cummings B.B., Karczewski K.J., Kosmicki J.A., Seaby E.G., Watts N.A., Singer-Berk M., Mudge J.M., Karjalainen J., Satterstrom F.K., O'Donnell-Luria A.H., et al. Transcript expression-aware annotation improves rare variant interpretation. Nature. 2020;581:452–458. doi: 10.1038/s41586-020-2329-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Colombo M., Blok M.J., Whiley P., Santamariña M., Gutiérrez-Enríquez S., Romero A., Garre P., Becker A., Smith L.D., De Vecchi G., et al. Comprehensive annotation of splice junctions supports pervasive alternative splicing at the BRCA1 locus: a report from the ENIGMA consortium. Hum. Mol. Genet. 2014;23:3666–3680. doi: 10.1093/hmg/ddu075. [DOI] [PubMed] [Google Scholar]
- 34.Fackenthal J.D., Yoshimatsu T., Zhang B., de Garibay G.R., Colombo M., De Vecchi G., Ayoub S.C., Lal K., Olopade O.I., Vega A., et al. Naturally occurring BRCA2 alternative mRNA splicing events in clinically relevant samples. J. Med. Genet. 2016;53:548–558. doi: 10.1136/jmedgenet-2015-103570. [DOI] [PubMed] [Google Scholar]
- 35.Glinos D.A., Garborcauskas G., Hoffman P., Ehsan N., Jiang L., Gokden A., Dai X., Aguet F., Brown K.L., Garimella K., et al. Transcriptome variation in human tissues revealed by long-read sequencing. Nature. 2022;608:353–359. doi: 10.1038/s41586-022-05035-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Clark M.B., Wrzesinski T., Garcia A.B., Hall N.A.L., Kleinman J.E., Hyde T., Weinberger D.R., Harrison P.J., Haerty W., Tunbridge E.M. Long-read sequencing reveals the complex splicing profile of the psychiatric risk gene CACNA1C in human brain. Mol. Psychiatry. 2020;25:37–47. doi: 10.1038/s41380-019-0583-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.de Jong L.C., Cree S., Lattimore V., Wiggins G.A.R., Spurdle A.B., kConFab Investigators. Miller A., Kennedy M.A., Walker L.C. Nanopore sequencing of full-length BRCA1 mRNA transcripts reveals co-occurrence of known exon skipping events. Breast Cancer Res. 2017;19:127. doi: 10.1186/s13058-017-0919-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lin J.H., Masson E., Boulling A., Hayden M., Cooper D.N., Férec C., Liao Z., Chen J.M. 5' splice site GC>GT and GT>GC variants differ markedly in terms of their functionality and pathogenicity. Hum. Mutat. 2020;41:1358–1364. doi: 10.1002/humu.24029. [DOI] [PubMed] [Google Scholar]
- 39.Chen J.M., Lin J.H., Masson E., Liao Z., Férec C., Cooper D.N., Hayden M. The Experimentally Obtained Functional Impact Assessments of 5' Splice Site GT'GC Variants Differ Markedly from Those Predicted. Curr. Genomics. 2020;21:56–66. doi: 10.2174/1389202921666200210141701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Castle J.C., Zhang C., Shah J.K., Kulkarni A.V., Kalsotra A., Cooper T.A., Johnson J.M. Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nat. Genet. 2008;40:1416–1425. doi: 10.1038/ng.264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Frésard L., Smail C., Ferraro N.M., Teran N.A., Li X., Smith K.S., Bonner D., Kernohan K.D., Marwaha S., Zappala Z., et al. Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nat. Med. 2019;25:911–919. doi: 10.1038/s41591-019-0457-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Fagerberg L., Hallström B.M., Oksvold P., Kampf C., Djureinovic D., Odeberg J., Habuka M., Tahmasebpoor S., Danielsson A., Edlund K., et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics. 2014;13:397–406. doi: 10.1074/mcp.M113.035600. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.GTEx Consortium Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348:648–660. doi: 10.1126/science.1262110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Yu N.Y.L., Hallström B.M., Fagerberg L., Ponten F., Kawaji H., Carninci P., Forrest A.R.R., Fantom Consortium. Hayashizaki Y., Uhlén M., Daub C.O. Complementing tissue characterization by integrating transcriptome profiling from the Human Protein Atlas and from the FANTOM5 consortium. Nucleic Acids Res. 2015;43:6787–6798. doi: 10.1093/nar/gkv608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Cheung R., Insigne K.D., Yao D., Burghard C.P., Wang J., Hsiao Y.H.E., Jones E.M., Goodman D.B., Xiao X., Kosuri S. A Multiplexed Assay for Exon Recognition Reveals that an Unappreciated Fraction of Rare Genetic Variants Cause Large-Effect Splicing Disruptions. Mol. Cell. 2019;73:183–194.e8. doi: 10.1016/j.molcel.2018.10.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.de la Hoya M., Soukarieh O., López-Perolio I., Vega A., Walker L.C., van Ierland Y., Baralle D., Santamariña M., Lattimore V., Wijnen J., et al. Combined genetic and splicing analysis of BRCA1 c.[594-2A>C; 641A>G] highlights the relevance of naturally occurring in-frame transcripts for developing disease gene variant classification algorithms. Hum. Mol. Genet. 2016;25:2256–2268. doi: 10.1093/hmg/ddw094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Rave-Harel N., Kerem E., Nissim-Rafinia M., Madjar I., Goshen R., Augarten A., Rahat A., Hurwitz A., Darvasi A., Kerem B. The molecular basis of partial penetrance of splicing mutations in cystic fibrosis. Am. J. Hum. Genet. 1997;60:87–94. [PMC free article] [PubMed] [Google Scholar]
- 48.Schafer S., Miao K., Benson C.C., Heinig M., Cook S.A., Hubner N. Alternative Splicing Signatures in RNA-seq Data: Percent Spliced in (PSI) Curr. Protoc. Hum. Genet. 2015;87:11.16.1. doi: 10.1002/0471142905.hg1116s87. 11 16 14. [DOI] [PubMed] [Google Scholar]
- 49.Jang W., Park J., Chae H., Kim M. Comparison of In Silico Tools for Splice-Altering Variant Prediction Using Established Spliceogenic Variants: An End-User's Point of View. Int. J. Genomics. 2022;2022:5265686. doi: 10.1155/2022/5265686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Canson D.M., Davidson A.L., de la Hoya M., Parsons M.T., Glubb D.M., Kondrashova O., Spurdle A.B. SpliceAI-10k calculator for the prediction of pseudoexonization, intron retention, and exon deletion. Bioinformatics. 2023;39:btad179. doi: 10.1093/bioinformatics/btad179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Pejaver V., Byrne A.B., Feng B.J., Pagel K.A., Mooney S.D., Karchin R., O'Donnell-Luria A., Harrison S.M., Tavtigian S.V., Greenblatt M.S., et al. Calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for PP3/BP4 criteria. Am. J. Hum. Genet. 2022;109:2163–2177. doi: 10.1016/j.ajhg.2022.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Dawes R., Joshi H., Cooper S.T. Empirical prediction of variant-activated cryptic splice donors using population-based RNA-Seq data. Nat. Commun. 2022;13:1655. doi: 10.1038/s41467-022-29271-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Canson D., Glubb D., Spurdle A.B. Variant effect on splicing regulatory elements, branchpoint usage, and pseudoexonization: Strategies to enhance bioinformatic prediction using hereditary cancer genes as exemplars. Hum. Mutat. 2020;41:1705–1721. doi: 10.1002/humu.24074. [DOI] [PubMed] [Google Scholar]
- 54.Canson D.M., Dumenil T., Parsons M.T., O'Mara T.A., Davidson A.L., Okano S., Signal B., Mercer T.R., Glubb D.M., Spurdle A.B. The splicing effect of variants at branchpoint elements in cancer genes. Genet. Med. 2022;24:398–409. doi: 10.1016/j.gim.2021.09.020. [DOI] [PubMed] [Google Scholar]
- 55.Leman R., Parfait B., Vidaud D., Girodon E., Pacot L., Le Gac G., Ka C., Ferec C., Fichou Y., Quesnelle C., et al. SPiP: Splicing Prediction Pipeline, a machine learning tool for massive detection of exonic and intronic variant effects on mRNA splicing. Hum. Mutat. 2022;43:2308–2323. doi: 10.1002/humu.24491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.James P.A., Fortuno C., Li N., Lim B.W.X., Campbell I.G., Spurdle A.B. Estimating the proportion of pathogenic variants from breast cancer case-control data: Application to calibration of ACMG/AMP variant classification criteria. Hum. Mutat. 2022;43:882–888. doi: 10.1002/humu.24357. [DOI] [PubMed] [Google Scholar]
- 57.Burset M., Seledtsov I.A., Solovyev V.V. Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res. 2000;28:4364–4375. doi: 10.1093/nar/28.21.4364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Mercer T.R., Clark M.B., Andersen S.B., Brunck M.E., Haerty W., Crawford J., Taft R.J., Nielsen L.K., Dinger M.E., Mattick J.S. Genome-wide discovery of human splicing branchpoints. Genome Res. 2015;25:290–303. doi: 10.1101/gr.182899.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Burge C.B., Tuschl T., Sharp P.A. Splicing of precursors to mRNAs by the spliceosomes. Cold Spring Harb. Monogr. Ser. 1999;37:525–560. [Google Scholar]
- 60.Sibley C.R., Blazquez L., Ule J. Lessons from non-canonical splicing. Nat. Rev. Genet. 2016;17:407–421. doi: 10.1038/nrg.2016.46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Sharp P.A., Burge C.B. Classification of introns: U2-type or U12-type. Cell. 1997;91:875–879. doi: 10.1016/s0092-8674(00)80479-1. [DOI] [PubMed] [Google Scholar]
- 62.Wang Z., Burge C.B. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA. 2008;14:802–813. doi: 10.1261/rna.876308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Horton C., Cass A., Conner B.R., Hoang L., Zimmermann H., Abualkheir N., Burks D., Qian D., Molparia B., Vuong H., et al. Mutational and splicing landscape in a cohort of 43,000 patients tested for hereditary cancer. NPJ Genom. Med. 2022;7:49. doi: 10.1038/s41525-022-00323-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Karam R., Conner B., LaDuca H., McGoldrick K., Krempely K., Richardson M.E., Zimmermann H., Gutierrez S., Reineke P., Hoang L., et al. Assessment of Diagnostic Outcomes of RNA Genetic Testing for Hereditary Cancer. JAMA Netw. Open. 2019;2:e1913900. doi: 10.1001/jamanetworkopen.2019.13900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Kamps-Hughes N., Carlton V., Fresard L., Osazuwa S., Starks E., Vincent J.J., Albritton S., Nussbaum R.L., Nykamp K. An algorithm to detect abnormal mRNA splicing and assess its clinical impact in individuals undergoing genetic testing for hereditary cancer syndromes. bioRxiv. 2022 doi: 10.1101/2022.07.12.499782. Preprint at. [DOI] [PubMed] [Google Scholar]
- 66.Bournazos A.M., Riley L.G., Bommireddipalli S., Ades L., Akesson L.S., Al-Shinnag M., Alexander S.I., Archibald A.D., Balasubramaniam S., Berman Y., et al. Standardized practices for RNA diagnostics using clinically accessible specimens reclassifies 75% of putative splicing variants. Genet. Med. 2022;24:130–145. doi: 10.1016/j.gim.2021.09.001. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
GeneDx has contributed deidentified data to this study to improve clinical interpretation of genomic data in accordance with individual consent and in conformance with the ACMG position statement on genomic data sharing. The availability of the GeneDx dataset is restricted due to the nature of consent for clinical testing. GeneDx-referred individuals are consented for aggregate, deidentified research and subject to US HIPAA privacy protection. As such, GeneDx is not able to share individual-level BAM or VCF data, which are potentially identifiable without a HIPAA business associate agreement. Access to the deidentified aggregate data used in this analysis is available upon request to GeneDx. The cell survival assay data used to compare performance of difference splicing prediction tools is available from the original published article.13 The variant and splicing assay data collated from multiple publications for calibration of SpliceAI have been included in Table S3.