Abstract
A precise genetic diagnosis is the single most important step for families with genetic disorders to enable personalized and preventative medicine. In addition to genetic variants in coding regions (exons) that can change a protein sequence, abnormal pre-mRNA splicing can be devastating for the encoded protein, inducing a frameshift or in-frame deletion/insertion of multiple residues. Non-coding variants that disrupt splicing are extremely challenging to identify. Stemming from an initial clinical discovery in two index Australian families, we define 25 families with genetic disorders caused by a class of pathogenic non-coding splice variant due to intronic deletions. These pathogenic intronic deletions spare all consensus splice motifs, though they critically shorten the minimal distance between the 5′ splice-site (5′SS) and branchpoint. The mechanistic basis for abnormal splicing is due to biophysical constraint precluding U1/U2 spliceosome assembly, which stalls in A-complexes (that bridge the 5′SS and branchpoint). Substitution of deleted nucleotides with non-specific sequences restores spliceosome assembly and normal splicing, arguing against loss of an intronic element as the primary causal basis. Incremental lengthening of 5′SS-branchpoint length in our index EMD case subject defines 45–47 nt as the critical elongation enabling (inefficient) spliceosome assembly for EMD intron 5. The 5′SS-branchpoint space constraint mechanism, not currently factored by genomic informatics pipelines, is relevant to diagnosis and precision medicine across the breadth of Mendelian disorders and cancer genomics.
Keywords: pre-mRNA splicing, spliceosome assembly, non-coding variant, intronic deletion, abnormal splicing, 5′ splice site, branchpoint, pathogenic splice variant
Introduction
Massively parallel sequencing is transforming diagnosis of rare genetic conditions, providing health-economic cost savings1 and reducing the diagnostic odyssey for affected families.2, 3 Determination of a precise genetic diagnosis greatly impacts affected families, informing clinical management and enabling prenatal counseling for disease prevention. Despite advances of parallel sequencing, interpretation of splice variants remains a great challenge.4 Splicing occurs at the level of pre-mRNA which contains both exon and intron sequences. The spliceosome is a large multi-megadalton complex comprised of five small nuclear ribonucleoproteins (U1, U2, U5, and U4/U6), which work synergistically with more than one hundred accessory and regulatory factors to splice together exons into mRNAs encoding protein isoforms.5, 6
Consensus splice sites recognized by the spliceosome are highly evolutionarily conserved between yeast, plants, Drosophila, and vertebrates,7 with similar constituents and 2-dimensional structures determined for spliceosomal complexes derived from yeast,8, 9, 10 Drosophila, and humans.11, 12, 13 Translational tools to predict splicing defects rely on evolutionary conservation of consensus splice-site sequences—and effectively predict adverse consequences of substitutions affecting the essential splice sites (the almost invariant GT and AG at either end of the intron),14, 15, 16 which are being recognized increasingly as pathogenic variants in genetic disorders.4 However, exonic variants creating cryptic splice sites and extended splice-site or intronic variants remain challenging to interpret.17, 18, 19 There is a great need for refinement of informatics tools to predict splicing outcomes and to keep pace with clinical translation of parallel sequencing approaches.
There are important differences in intron features between humans and model organisms.20 Human introns show great diversity in length, with a median length of 1,455 nucleotides (nt; 90th percentile 148–11,098) (Figure 1A). In contrast, Drosophila introns are generally shorter, with a median length of 72 nt (90th percentile 56–2,374 nt), and C. elegans introns even shorter, with a median length of 63 nt (90th percentile 45–764 nt). In particular, short introns <71 nt are extremely rare in the human genome (0.1th percentile, Figure 1B), though they represent the majority of introns of Drosophila and C. elegans (Figure 1A). Among the shortest human introns (<200 nt), there is a direct correlation between intron length and the distance between the 5′ splice site and branchpoint (5′SS-branchpoint). The shorter the intron, the shorter the 5′SS-branchpoint distance; three genome-wide branchpoint studies21, 22, 23 define 59 nt the 0.1th percentile for 5′SS-branchpoint distance among human introns recruiting the U1/U2 spliceosomal machinery (GT-AG) (Figure 1C). Herein we demonstrate that intronic deletions that unnaturally shorten 5′SS-branchpoint length below a critical, minimum length present a class of pathogenic splice variant currently overlooked by informatics pipelines.
Figure 1.
Exon and Intron Length across Species and Clinical Details for Families A and B with Pathogenic Intronic Deletions
(A) Exon and intron length in worms (C. elegans), flies (D. melanogaster), fish (D. rerio), and humans (H. sapiens). Left and middle: Violin plots depicting the genome-wide distribution of exon and intron lengths. Right: Histogram showing the relative abundance of small introns <100 nt across species.
(B) Minimal intron length in humans. Abundance of human introns <200 nt. Dashed line: 99.9th percentile among all GT-AG human introns. All introns <66 nt in length were curated manually (see Material and Methods and Table S1 for curation).
(C) Correlation between 5′SS-branchpoint length and intron length among the shortest human introns. Datapoints shown reflect branchpoints identified concordantly by 2/3 genome-wide studies of branchpoint length.21, 22, 23 See Material and Methods and Table S2 for curation of introns with reported 5′SS-branchpoint lengths ≤50 nt. Dashed line: 99.9th percentile among all GT-AG human introns.
(D) Clinical details for family A. (i) Family A pedigree. (ii) AII:1 with severe scoliosis at age 25. (iii) Facial weakness of AII:2 at age 20 years when asked to tightly close eyes. (iv) Haematoxylin and eosin (H&E) staining of AII:1 deltoid (biopsy age 15 years) shows mild myopathic changes, with occasional internal nuclei (arrow) and evidence for fiber-splitting (asterisks). (v) western blot of skeletal muscle from AII:1 (deltoid, age 15 years) and two control subjects (C1 and C2, malignant hypothermia negative) shows marked reduction/near absence of normal-sized DOK7 protein. C1: female, vastus lateralis, 18 years. C2: female, vastus lateralis, 14 years.
(E) Clinical details for family B. (i) Family B pedigree. (ii) H&E staining of BII:1 (gastrocnemius) shows marked variation in fiber size, abundant internal nuclei (arrows), and fiber splitting (asterisks). (iii) Staining for fast skeletal muscle myosin shows some evidence of fiber type grouping. (iv) western blot of skeletal muscle from BII:1 (deltoid, age 10 years) shows absent normal length emerin. Control 1 (C1): male, malignant hypothermia positive, quadriceps, age 14 years. Control 2 (C2): female, malignant hypothermia negative, quadriceps, age 60 years. C2 was run on the same blot but was not loaded adjacent to BII:1. Samples not relevant to this study were removed. Scale bars 200 μm.
Material and Methods
Ethics and Consent
Ethical approval was obtained from the Human Research Ethics Committees of the Children’s Hospital at Westmead, Australia (10/CHW/45) with written, informed consent from all participants.
Parallel Sequencing
Parallel sequencing of known neuromuscular disease-associated genes was performed by a commercial gene panel (v2) offered by PathWest Laboratory, Australia. WES2, 3 and RNA sequencing (RNA-seq)24 was performed by the Broad Institute of MIT and Harvard University, USA, as described previously. EMD (MIM: 300384) and DOK7 (MIM: 610285) variant numbering are in reference to transcripts GenBank: NM_000117.2 and NM_173660.4, respectively.
Exon/Intron Species Data
RefSeq browser extensible data (BED) files representing exon/intron regions were obtained from the UCSC table browser. Exon/intron datasets were further filtered by selecting one transcript (per gene) possessing the largest length and number of exons. Data (and scripts) are hosted online at GitHub (see Web Resources).
Analysis of 5′SS-Branchpoint Length among Human Introns with Canonical Splice Sites
Human introns were extracted from NCBI reference sequences.25 Introns less than 70 nt in length (n = 121) were extracted, identifying 32 introns <66 nt in length (8/10 pathogenic intronic deletions identified in this paper rendered the intron <66 nt in length). 20/32 introns <66 nt in length were determined unlikely to be true introns and excluded, due to the following reasons. (1) Cross-referencing annotated exon/intron boundaries between GRCh37 and GRCh38 genome assemblies and RNA-seq data from ENCODE provided convincing evidence for mis-annotation/mis-alignment of sequences. (2) Lack of evidence within RNA-seq data from ENCODE and introns lacked identifiable splice sites. (3) Lack of consensus between ENSEMBL and RefSeq. Branchpoint datasets from Mercer et al.,21 Pineda and Bradley,22 and Taggart et al.23 were combined and filtered to include only GT-AG introns (n = 181,139). Data points presented in Figure 1C present a high-confidence dataset of branchpoints defined as being concordantly identified in 2/3 studies (n = 39,628). From this data, 16 introns were determined to have a 5′SS-branchpoint distance of ≤50 nt (all pathogenic intronic deletions identified in this paper rendered the 5′SS-branchpoint distance <50 nt). 7/16 branchpoints were excluded due to annotation issues (i.e., aligned to the irrelevant transcript), with the smallest plausible 5′SS-branchpoint distance being 40 nt in length. Manual curation data are shown in Tables S1 and S2.
EMD Expression Construct
A pCMV6-entry vector containing the EMD genomic locus GRCh37:chrX:153607583_153609881 was purchased from BlueHeron Biotech. The native EMD stop codon precedes the vector epitope tags which are therefore not encoded within the EMD pre-mRNA. The EMD genomic sequence ordered had two synonymous substitutions, GRCh37:chrX:153609413G>T (c.621G>T [p.Arg207(=)]) and GRCh37:chrX:153609416T>G (c.624T>G [p.Pro208(=)]), introducing a unique BspEI restriction site for molecular manipulation of EMD intron 5. Gene fragments (gBlocks) with the sequences described in Figure 3A were supplied by Integrated DNA technologies and subcloned into pCMV6-EMD via PstI and BspEI restriction digest. Constructs were verified by Sanger sequencing.
Figure 3.
Incremental Restoration of EMD Intron 5 5′SS-Branchpoint Length
(A) Top: Schematic of EMD genomic locus subcloned into pCMV6, with six numbered exons, and indicative locations of PCR primers used for RT-PCR. Bottom: WT (79 nt): intron 5 with sequences deleted in BII:1 shown in blue font. Schematics below depict the specific sequences deleted (shown in gray font) in each expression construct. RC, reverse complement sequences shown in teal. Branchpoint-A shown in red font (c.450−24A).
(B and C) Transfection studies in primary myoblasts from an individual with a hemizygous exon 6 duplication (c.651_655dupGGGCC [p.Gln219ArgfsTer20]). Untransfected (UnT) myoblasts express low levels of abnormal truncated p.Gln219ArgfsTer20 endogenous emerin (eEmerin) and no normal sized emerin protein. Replicate plates from each set of transfections were harvested simultaneously for RT-PCR and western blot. The entire experiment was repeated twice, with identical results.
(B) RT-PCR of cDNA derived from oligo-dT reverse transcription of mRNA isolated from transfected primary myoblasts. Forward primers were positioned in exon 4 or bridging the exon 5/6 junction (see Material and Methods). Reverse primer 6R2 is positioned at the EMD exon 6 GGGCC duplication, and preferentially amplifies EMD transcripts from the transfected expression construct (optimization data not shown). Asterisk: Sanger sequencing identifies this band as containing normally spliced exon 4-5-6 sequences, as well as intron 5 retention (perhaps due to heteroduplex or contamination from the high migrating band). (i) Branchpoint mutagenesis experiments define c.450−24A as the predominant branchpoint-A used by the spliceosome for splicing of EMD intron 5. Substitution of c.450−24A>T potently impairs intron 5 splicing, whereas substitution of c.450−37A>T does not impair splicing efficiency. (ii) Incremental restoration of the 23 residues deleted in BII:1 shows intron 5 splicing is enabled when intron 5 is elongated to 70 nt (5′SS–branchpoint length of 47 nt).
(C) Western blot of 10 μg total protein probed with NCL-Emerin and HRP-conjugated secondary antibody. Membranes were reprobed with anti-β-tubulin as loading control.
(D) Sanger sequencing of gel purified RT-PCR amplicons as marked. WT-Band 1: Mixture of normal exon 5–6 splicing (sequence shown underneath in capital letters) and intron 5 retention (sequence shown above in lower case). WT-Band 2 and 5: Normal exon 5–6 splicing, confirmed to be derived from the EMD construct through detection of the downstream engineered substitutions (c.621G>T [p.Arg207(=)] and c.624T>G [p.Pro208(=)]; see Material and Methods). 56-Band 3: Intron 5 retention. 56-Band 4: Exon 5 skipping.
Primary Myoblast Transfection
EMD primary myoblasts derived from a male proband with a pathogenic 5 nt duplication in EMD exon 6 (GRCh37:chrX:153609443_153609447dupGGGCC, c.651_655dupGGGCC [p.Gln219ArgfsTer20]) were transfected with Lipofectamine 3000 reagent, according to the manufacturer’s instructions. Cells were harvested 72 h following transfection for western blot and RT-PCR.
RNA Isolation, cDNA Synthesis, and RT-PCR
RNA isolation was performed from 30 × 8 μm thick muscle cryosections (10 mm2 surface area) or from 20 cm2 surface area of transfected primary myoblasts using Invitrogen TRIzol Reagent according to the product user guide. RNA was purified using the RNeasy Mini Kit from QIAGEN, according to the kit protocol. cDNA was synthesized from 1 μg of total skeletal muscle RNA using oligo-dT and/or random hexamers using the Invitrogen SuperScript IV First-Strand Synthesis System as per the manufacturer’s protocol. DOK7 RT-PCR used primers: 5′UTR-F1 5′-CGCGGAACCATGACAGAAG-3′ or 5′UTR-F2 5′-TTTTGAAAGTGACCCTGGGC-3′ with exon-3R 5′-TGGGACAGGCAGACAATGG-3′. EMD RT-PCR used primers: exon-3F 5′-CTTCCCAAGAAAGAGGACGC-3′ and exon-6R1 5′-GTGAGCCATGAAGAGGAAGATG-3′; exon-6R2 5′-CCTGGCGATCCTGGCCCA-3′ (primer preferentially amplifying EMD cDNA derived from the construct); exon-5/6F 5′-GAGTGCAAGGATAGGGAACG-3′ (bridging primer specific for normally spliced EMD pre-mRNA).
Western Blot
Western blot of skeletal muscle and transfected myoblasts was carried out as described previously.26 Primary antibodies used were anti-DOK7 (AF6398@1:1,000, R&D Systems), NCL-Emerin (1:1,000), and NCL-β-DG (1:250) from Leica Biosystems, anti-Caveolin-3 (610421@1:1,000, BD Transduction Laboratories), anti-α-actinin-2 (4A3@1:250,000; kind gift from A. Beggs, Children’s Hospital Boston, USA), and anti-β-tubulin @ 1:5,000 (clone E7, Developmental Studies Hybridoma bank). α-mouse light chain HRP conjugated (1:5,000) and α-rabbit light chain HRP conjugated (1:3,000) secondary antibodies were used followed by detection with ECL chemiluminescent reagents (GE healthcare).
Human Intron Splice-Site Pictograms
BEDTools and hg19 fasta sequences were used to extract sequences for all human introns (25 nt intron + 5 nt flanking exon). Splice-site consensus sequences were visualized using BioPerl’s pictogram module. Truncated fasta files and pictograms are hosted online at GitHub.
Extraction of Submissions Involving Intronic Deletions from ClinVar and LOVD
ClinVar variants where the molecular consequence contained “intron” and variant was denoted as a “del/indel” were extracted from a transformed set of ClinVar variants available at GitHub (see Web Resources).27 ClinVar variants were cross-referenced with UCSC to refine a short list of confirmed intronic variants for manual curation.27 Leiden Open Variant Database (LOVD) variants were extracted using their application programming interface (API), then cross-referenced with the HGVS python module available at GitHub28 to obtain genomic coordinates. ClinVar and LOVD variants were cross-referenced with UCSC to refine a short list of confirmed intronic variants for manual curation (Table S3).
In Vitro Spliceosome Assembly and Splicing Studies
Splicing reactions contained 40% (v/v) HeLa nuclear extract prepared according to Dignam et al.29 with 65 mM KCl, 3 mM MgCl2, 2 mM ATP, 20 mM creatine phosphate, and 10 nM 32P-labeled, m7G-capped COL6A2 (MIM: 120240) pre-mRNA, incubated at 30°C for the indicated times. Uniformly 32P-labeled, m7G(5′)ppp(5′)G-capped pre-mRNA was synthesized in vitro by incorporation of [32P]UTP (3,000 Ci/mmol; Perkin Elmer) in a T7 runoff transcription. The antisense DNA oligonucleotide used to block the cryptic 5′SS (5′-CCAAATTCACCCTGTGTAGG-3′) was added at 1 μM final concentration to the splicing reaction. Spliceosomal complexes were analyzed on 2% native agarose gels30 after adding heparin (final concentration of 0.1 μg/μL). RNA was recovered at the indicated time points by PCl extraction, ethanol precipitated, and analyzed on a 10% polyacrylamide gel containing 6 M urea. Unspliced pre-mRNA, splicing intermediates, and products were detected using a Typhoon phosphoimager (GE Healthcare).
WT-COL6A2 and Δ28-COL6A2 DNA sequences used for in vitro splicing reaction were synthesized by BlueHeron Biotech (USA) and amplified by PCR with the primers: COL6A2-T7-F1 5′-ACCTAATACGACTCACTATAgggtgcccatgatgctttgagg-3′ and COL6A2-R1 5′-atgcctctgtgagaccagtcc-3′. COL6A2-T7-F1 comprises a T7 promoter. PCR products were gel purified and used as template for in vitro transcription reactions. WT-COL6A2mut, Δ28-COL6A2mut, AA-COL6A2mut constructs, inserted in Puc18, were synthesized by GenScript Inc. (USA). Each construct contained an upstream T7 promoter and downstream KpnI restriction site. Vectors were linearized with the KpnI restriction enzyme, gel purified, and used as template in in vitro transcription reaction.
Results
Clinical and Genetic Findings
Family A
Family A consisted of two affected siblings with a clinical history of fluctuating, severe limb-girdle muscle weakness, born at term to non-consanguineous parents of European descent (Figure 1D). Both siblings presented in the neonatal period, becoming floppy and weak after vaccination, requiring hospitalization for the female proband (at age 2 months). Both siblings showed persistent fluctuations in muscle strength throughout infancy and early childhood, with facial weakness and delayed motor milestones but no speech delay or intellectual impairment. The female sibling (AII:1) required BiPAP (bilevel positive airway pressure) at 8 years of age for respiratory weakness and suffered from recurrent infections. She lost ambulation at age 10 and required scoliosis surgery at 15 years. Muscle biopsy at age 15 showed mild myopathic changes (Figure 1Div). Creatine kinase (CK) levels were normal to mildly elevated (37 and 700 U/L; normal <200 U/L). On re-examination at age 25, she had bilateral mild ptosis, fatigable limb-girdle weakness, severe scoliosis (Figure 1Dii), and severe restrictive ventilatory defect. Electrocardiogram and echocardiogram were normal. The male sibling (AII:2) developed lumbar lordosis at the age of 4, required use of a power wheelchair at 8 years, and had scoliosis surgery at age 14. At age 18, he required BiPAP for respiratory weakness, with respiratory function tests at age 20 revealing severe reduction in forced vital capacity of 2.25 L; 45% of predicted. Serum CK levels and echocardiogram were normal.
Whole-exome sequencing performed for AII:1 and AII:2 revealed compound heterozygous DOK7 variants (Figures 1Di and 2A), a gene associated with congenital myasthenic syndrome (CMS [MIM: 254300]).31 The maternal allele carried the common DOK7 exon 7 duplication32 (GRCh37: chr4:3494837_3494840dupTGCC, c.1124_1127dup [p.Ala378SerfsTer30]), reported in >140 recessive CMS-affected case subjects in the Leiden Open Variant (LOVD) database. Both siblings also carried a 10 base-pair (bp) deletion within DOK7 intron-1 (GRCh37:chr4:3465164_3465173del, c.54+8_54+17del), at the +8 position of the extended splice-site, and a position without significant base preference (see Figure 6A). This variant was not present in gnomAD, exome variant server (EVS), ClinVar, or LOVD databases. Paternal DNA was unavailable to confirm inheritance. The intron 1 deletion was not predicted to cause abnormal splicing of the DOK7 pre-mRNA using Alamut Visual (Interactive Biosoftware) that incorporates five splicing algorithms (SpliceSiteFinder-Like,33 MaxEntScan,34 GeneSplicer,35 Human Splicing Finder,14 and NNSPLICE).36
Figure 2.
RT-PCR Analyses of DOK7 and EMD Intronic Deletions
(A) (i) Schematic of family A DOK7 intron 1 deletion (c.54+8_54+17del), with flanking exons (colored cylinders), intervening intron sequence, and consensus splicing predictions from Alamut Visual biosoftware. Blue font: intronic deletion. Lariat branchpoint A is shown in red font. Polypyrimidine tract is italicized. (ii) Agarose gels of RT-PCR with adjacent schematic of splicing consequences and effect on encoded DOK7 protein. (iii) Sanger sequencing of gel purified RT-PCR amplicons.
(B) (i) Schematic of family B EMD intron 5 deletion (c.449+23_450−35del). (ii) Agarose gels of RT-PCR with adjacent schematic of splicing consequences and effect for encoded emerin protein. (iii) Sanger sequencing of gel purified RT-PCR amplicons.
Figure 6.
Pictograms of Nucleotide Conservation Surrounding the 5′ and 3′ Splice Site for Introns of Different Lengths and Structural Model of the Spliceosomal B Complex
(A) Pictograms of consensus residues encompassing the 5′ and 3′ splice sites, subgrouped by intron length. Residues shown include 25 nt intron sequence (annotated as +1, 2, 3… and −1, −2, −3…) plus 5 nt flanking exon (gray shaded regions).
(B) 3D model of a human spliceosomal B complex formed on a pre-mRNA with a 120 nt intron, determined by cryo-electron microscopy; image adapted from Bertram et al.11 RNA helices formed between U6 and intron nucleotides near the 5′SS or between U2 and intron nucleotides adjacent to the branchpoint are circled. At the 5′ end of the intron, a 17 nt extended helix is formed between the U6 snRNA (via its ACAGAG box and adjacent nucleotides) and intron nucleotides downstream of the 5′SS GU. The branchpoint and upstream nucleotides form a 14 nt helix with the U2 snRNA. Within spliceosomal B complexes assembled on a 120 nt intron, these two extended helices are separated by 15 nm, which corresponds to ∼21 nt of RNA in an extended conformation.11 Thus, minimally 52 intron nucleotides are required to span the 5′SS and the branchpoint in the human B complex without altering its structure.
(C) Line drawing of the tertiary conformation of the human and Drosophila spliceosome derived from cryo-electron microscopy and published in Bertram et al.12 Left, green: Human spliceosome assembled on a MINX-M3 pre-mRNA with a 120 nt intron expressed in HeLa cells. Middle, blue: Drosophila spliceosome assembled on a Ftz-M3 pre-mRNA with a 147 nt exon expressed in Kc cells. Right, blue: Drosophila spliceosome assembled on a Zeste62 pre-mRNA expressed with a 62 nt intron expressed in Kc cells. Note: Physical repositioning of the Drosophila spliceosome head region when assembled on a short intron (Zeste62, right).
Family B
In family B, BII:1 was born at term to non-consanguineous parents with no family history of neuromuscular disease (Figure 1E). He presented aged 3 years with distal lower limb weakness and required surgery for ankle contractures at 9 years. Examination at 9 years showed scapulo-peroneal muscle weakness with bilateral scapular winging and reduced muscle bulk for his age. Muscle biopsy when he was aged 10 years showed marked variation in fiber size with fiber splitting and abundant internal nuclei (Figure 1Eii). Serum CK levels were mildly elevated (585 U/L). Neuromuscular gene panel screening revealed a previously unreported hemizygous 23 bp deletion within intron 5 of EMD (at the +23 position, GRCh37:chrX:153609185_153609207del, c.449+23_450−35del), a gene associated with X-linked Emery-Dreifuss muscular dystrophy (MIM: 310300) (Figures 1Ei and 2B). The hemizygous deletion was maternally inherited and not predicted to cause abnormal splicing using Alamut Visual software. This variant was not present in gnomAD, EVS, LOVD, or ClinVar databases.
Small Intronic Deletions Ablate Normal Splicing of DOK7 and EMD Genes
Reverse transcription PCR studies (RT-PCR) of mRNA extracted from skeletal muscle from affected individuals AII:1 (DOK7) and BII:1 (EMD) showed clear evidence for pathogenic splicing abnormalities (Figures 2A and 2B).
For AII:1, a cDNA amplicon encompassing exons 1–3 of DOK7 showed two bands (Figure 2Aii). Sanger sequencing confirmed the upper band represents normal splicing, whereas the lower band represents an abnormally spliced mRNA utilizing an exon 1 cryptic 5′ splice site (5′SS) (Figure 2Aiii), derived only from the paternal allele with the 10 bp intron 1 deletion (determined using an informative SNP in exon 1). Use of the exon 1 cryptic 5′SS removes 24 nt from the DOK7 mRNA and loss of eight conserved residues within the encoded DOK7 pleckstrin homology domain (Figure 2Aii). Western blot analyses of skeletal muscle biospecimens show marked reduction/near deficiency of DOK7 protein in AII:1 relative to aged-matched control subjects (Figure 1Dv).
RT-PCR of cDNA derived from BII:I amplifying EMD exons 3–6 revealed absence of normally spliced mRNA (Figure 2Bii). Sanger sequencing of amplicons showed the EMD intron 5 hemizygous 23 nt deletion primarily induced intron 5 retention or use of a cryptic 3′ splice site (3′SS) within exon 6, with exon 5 skipping a minor species (asterisk, Figure 2Bii and 2Biii). Each abnormally spliced EMD transcript induces a frameshift to the emerin reading frame, resulting in C-terminal missense amino acids and a premature stop codon. Encoded mutant forms of emerin have an abnormal lamin-binding domain and lack a transmembrane anchor. Western blot analyses confirmed deficiency of normal-sized emerin protein in muscle of the affected proband BII:1 (Figure 1Eiv).
Despite confirming abnormal splicing of DOK7 and EMD in the muscle biospecimens in family A and family B, the exact cause was not clear, so we extensively investigated the mechanistic basis for abnormal splicing.
EMD Partial Splicing Is Enabled with 5′SS-Branchpoint Length of 47 nt
We derived a panel of EMD full gene expression constructs, modeling the 23 bp deletion identified in BII:1, then incrementally restored the residues deleted (Figure 3). Correctly spliced mRNA derived from the wild-type (WT) EMD construct (Figure 3B, lower band) is translated into full-length emerin protein (Figure 3C, WT), that is readily distinguished from the abnormal truncated emerin expressed endogenously in the primary myoblasts used for this study (Figure 3C, UnT, lower band; myoblasts used bear a hemizygous duplication within EMD exon 6; c.651_655dupGGGCC [p.Gln219ArgfsTer20]). Significant levels of intron 5 retention observed by RT-PCR with transfection of the WT EMD construct (Figure 3B, upper band) imply inherent challenges splicing the 79 nt intron 5 when overexpressed. Nevertheless, recapitulating the 23 nt deletion in BII:1 ablates normal splicing of EMD (Figure 3B, lane 56 nt), with concordant absence of full-length emerin on western blot (Figure 3C, lane 56 nt).
There are two plausible branchpoints within EMD intron 5: c.450−24A and c.450−37A (see Figure 3A). Our informatics analyses revealed thymine is the least favored branchpoint among short introns, so we performed mutagenesis of c.450−24A>T and c.450−37A>T. Mutagenesis of c.450−24A>T potently impairs intron 5 splicing (Figure 3Bi, lane −24A>T). Low levels of residual intron 5 splicing with c.450−24A>T may be identified using a forward primer bridging exons 5 and 6 to selectively amplify normal exon 5–6 splicing; with concordant low levels of translated full-length emerin protein seen by western blot (Figure 3Bi and 3Ci, lane −24A>T). In contrast, mutagenesis of c.450−37A>T showed a consistent slight improvement in intron 5 splicing efficiency relative to WT, with a concordant increase in levels of translated full-length emerin protein detected by western blot (see Figures 3Bi and 3Ci, lane −37A>T). Only extremely low levels of intron 5 splicing is observed following mutagenesis of both c.450−24A>T and c.450−37A>T, which may be enabled by inefficient use of the mutagenized branchpoint T, or through use of an alternate low-efficiency branchpoint. Collective data therefore establish c.450−24A as the major branchpoint used by the spliceosome for splicing of EMD intron 5. Our evidence further suggests that use of the c.450−24A branchpoint is enhanced slightly with mutagenesis of c.450−37A>T, perhaps due to loss of competitive binding of spliceosomal components to the mutant branchpoint c.450−37T.
Incremental restoration of the 23 residues deleted in family B results in abrupt restoration of splicing and emerin protein production when intron 5 is elongated to 70 nt (Figure 3Bii and 3Cii, lane 70 nt, corresponding to a 5′SS-branchpoint length of 47 nt). Higher migrating abnormal emerin protein detected with intron 5 lengths of 66 and 72 nt likely correspond to proteins translated from EMD mRNA where intron 5 retention is in-frame (Figure 3Cii, upper bands). The EMD construct with a reverse complement 23 nt sequence substituted for residues deleted in BII:1 (Figures 3Bii and 3Cii, lane RC) was consistently spliced less efficiently than the wild-type EMD construct, perhaps due to loss of triplet G motifs implicated as intronic splice enhancers for short introns.37 Although substitution with reverse complement sequences impairs splicing, the abrupt restoration of splice fitness with an intron 5 length of 70 nt argues that the primary basis underlying failed splicing of EMD-56, 64a/b, 66, and 68 may relate to biophysics, and a space constraint for spliceosome assembly.
The Human Spliceosome Is Unable to Assemble within, or Splice, Critically Shortened Introns
In vitro splicing assays were used to confirm whether biophysical constraint precluding spliceosome assembly is the underlying mechanistic basis for abnormal splicing. Small mini-genes do not always work for in vitro splicing assays, and unfortunately the spliceosome was unable to correctly assemble within, or splice, wild-type DOK7 (exons 1–2) and EMD (exons 5–6) mini-genes. However, modeling a previously reported 28 nt deletion in COL6A2 intron 9,38 splicing and excision of the intron 9 lariat occurs efficiently for a wild-type (WT) COL6A2 pre-mRNA (exons 9–10) but fails for the Δ28-COL6A2 pre-mRNA (Figure 4Ai), with spliceosome assembly stalling in A complexes that bridge 5′SS and branchpoint (Figure 4Bi, asterisk). As shown in Figure 4Ai, deletion of 28 nt results in the formation of an abnormal 5′exon 56 nt cleavage product for the Δ28-COL6A2 pre-mRNA (lower right, black rectangle). This appears due to abnormal spliceosome assembly on a weak cryptic 5′SS in exon 9, 23 nt upstream of the natural 5′SS at the exon 9/intron junction (see Figure 4C); as masking the exon 9 cryptic 5′SS with an antisense DNA oligonucleotide potently blocks C complex assembly (Figure 4Bi, hash). Despite detectable C complex assembly for Δ28-COL6A2 using the cryptic splice 5′SS (Figure 4Bi, hash), the spliceosome appears unable to execute excision of an intron lariat (no detectable excised splicing product, Figure 4Ai).
Figure 4.
In Vitro Splicing Studies and Temporal Progression of Spliceosome Assembly within a COL6A2 Pre-mRNA Bearing a Pathogenic 28 nt Intronic Deletion
(A) Polyacrylamide gel electrophoresis of 32P-labeled COL6A2 pre-mRNAs incubated with a HeLa nuclear extract containing spliceosomal components, for various time periods. Schematics illustrating the nature of the spliced products migrating at different molecular weights are shown. (i) WT-COL6A2 and Δ28-COL6A2 pre-mRNAs (with the native exon 9 cryptic 5′SS). The Δ28-COL6A2 deletion within intron 9 renders 5′SS-branchpoint length 36 nt. (ii) WT-COL6A2mut and Δ28-COL6A2mut pre-mRNAs with the exon 9 cryptic 5′SS mutated and AA-COL6A2mut where intron 9 length has been lengthened with poly(A) nucleotides, restoring 5′SS-branchpoint length to 61 nt.
(B) Native agarose gel electrophoresis showing temporal progression of spliceosome complex assembly on COL6A2 pre-mRNAs. The migration of E/H, A, B, C, and B-activated complexes are indicated. (i) WT-COL6A2 and Δ28-COL6A2 pre-mRNAs with and without the antisense oligonucleotide masking the exon 9 cryptic 5′SS. (ii) Temporal assembly of spliceosomal complexes on WT-COL6A2mut, Δ28-COL6A2mut, and AA-COL6A2mut pre-mRNAs.
(C) Schematic of COL6A2 mini-genes employed in the in vitro splicing assays.
Mutation of the cryptic 5′SS site prevents its use by the spliceosome and results in normal spliceosome assembly and splicing for the WT-COL6A2mut pre-mRNA (Figures 4Aii and 4Bii). In contrast, there is no observed splicing of Δ28-COL6A2mut (and concomitant absence of the abnormal 56 nt cleavage product; Figure 4Aii, middle), with spliceosome assembly stalled in A complexes (Figure 4Bii, middle, asterisk). Restoring 5′SS-branchpoint length within Δ28-COL6A2mut intron 9 to 61 nt with a non-specific poly A sequence (AA-COL6A2mut; intron 9 length 89 nt) restores normal splicing (Figure 4Aii, right) and temporal progression of spliceosome assembly (Figure 4Bii, right). We varied the position of the stretch of adenines to demonstrate position-independent rescue of spliceosome assembly. These data provide compelling evidence that failed spliceosome assembly is due primarily to distance and not specific nucleotide composition.
ClinVar Data-Mining Identifies 23 Additional Families with Pathogenicity Likely due to Minimal 5′SS-Branchpoint Deletions
We performed informatics analyses of intronic deletions submitted to ClinVar or the Leiden Open Variant database (LOVD) and identified ten deletions classified as pathogenic or likely pathogenic, sparing consensus extended splice sites, which reduced predicted 5′SS-branchpoint length to less than 47 nt (Figure 5): ten families with DOK7 CMS,32, 37, 40, 41, 42 three families with ROGDI (MIM: 614574) Kohlschutter-Tonz syndrome (MIM: 226750),44 seven families with AMN (MIM: 605799) Imerslund-Gräsbeck syndrome (MIM: 261100),39 two families with MUTYH (MIM: 604933) colorectal adenomatous polyposis (MIM: 608456),43 and one family with COL6A2 Ullrich congenital muscular dystrophy (MIM: 254090).38 Despite no adverse consequences predicted by splicing algorithms, due to phenotypic fit and clinical suspicion, splicing analyses were performed for variants affecting 21/23 families. In all 21 cases, aberrant splicing was confirmed to be associated with the intronic deletions.32, 38, 39, 40, 43, 44 We further identified an intronic deletion identified in one case of suspected MYBPC3 familial hypertrophic cardiomyopathy (RCV000151135.3, not formally classified) that reduce 5′SS-branchpoint length below 47 nt, and likely to be abnormally spliced due to this mechanism (Figure 5F). All identified intronic deletions rendering resultant intron length <100 nt (including those classified as benign or VUS), together with our analyses of predicted 5′SS-branchpoint length(s), are provided in Table S3.
Figure 5.
Pathogenic Intronic Deletions Extracted from ClinVar or LOVD Proposed due to the 5′SS-Branchpoint Space Constraint Mechanism
Schematics depict flanking exons (colored cylinders) and intervening intron sequence. Polypyrimidine tracts are italicized and potential lariat branchpoint A (predicted by Alamut Visual with scores >50) shown in red font. Reported intronic deletions are in blue font, and splicing outcomes depicted.
(A) AMN Imerslund-Gräsbeck syndrome, RT-PCR studies described in Tanner et al.39
(B) COL6A2 Ullrich congenital muscular dystrophy, RT-PCR studies described in Gualandi et al.38
(C) DOK7 CMS, a recurrent 15 bp intron 1 deletion identified in ten families.32, 37, 40, 41, 42
(D) MUTYH colorectal adenomatous polyposis.43 Two branchpoints with scores of 93.7 and 70.3 were confirmed to be used by the spliceosome in Mercer et al.,21 whereas the third branchpoint shown in gray (score 84.9) was not found in Mercer et al.21
(E) ROGDI Kohlschutter-Tonz syndrome with RT-PCR studies performed in Tucci et al.44
(F) MYBPC3 familial hypertrophic cardiomyopathy; one case without a formal classification in ClinVar (RCV000151135.3). Two branchpoints with scores of 57.0 and 50.6 were predicted, with weaker branchpoint shown in gray. The deletion affects the +6 position of the 5′SS; however, compliance with 5′SS-branchpoint minimal length infers strong likelihood for abnormal splicing due to this mechanism.
Clinical Impact of Defining Space Constraint Variants
Since salbutamol treatment is known to be beneficial for individuals with DOK7 CMS,45, 46, 47 salbutamol treatment was initiated and titrated to 6 mg twice a day in both siblings from family A. Proband AII:1, dependent on her motorized wheelchair over the last 15 years, after 6 months salbutamol treatment could walk 40 m independently and ∼100 m with a guided frame. On examination she showed reduced dysphonia, ptosis had improved, and lung infections were less frequent. Following salbutamol treatment, AII:2 could stand without using his hands and walk a few steps. He could mobilize with a guided frame for 40 m and was able to drive a car. Transfers between the chair, bed, and the shower became more feasible and he recently managed to climb a flight of stairs.
Family B, with two affected children, now has a confirmed genetic diagnosis that has enabled prenatal genetic counseling.
Discussion
A previous study established clearly the importance of intron length for normal splicing of the rabbit β-globin gene.48 Despite preservation of critical splicing motifs, normal splicing was ablated with a deletion reducing intron 2 of rabbit β-globin to 30 nt, with normal splicing restored by extending intron length to ≥80 nt with polyoma or pBR322 sequences.48 However, this important observation has not been synthesized, or defined precisely, in the context of genomic medicine as a primary basis for splicing abnormalities in human genetic disorders.
Single particle cryo-electron microscopy has revealed that the human and Drosophila pre-catalytic B complexes show similar size and structural features at the level of 2-dimensional (2D) class averages when assembled on mini-genes bearing a ∼100–150 nt intron.13 However, the human spliceosome was unable to assemble on13 (or splice)49 a Drosophila minigene with a short intron of 62 nt. Interestingly, this study revealed the Drosophila spliceosomal B complex adopts a different conformation on short (62 nt) versus longer (147 nt) intron minigenes, with physical repositioning of its head region relative to the body of the B complex.13 An inability of the human U1/U2 spliceosomal B complex to assemble within short, Drosophila introns,13, 49 or critically shortened human introns (EMD intron 5 in Figure 3 and COL6A2 intron 9 in Figure 4), may therefore be due to an inability to form a B complex with repositioning of the head region, as is adopted by the Drosophila spliceosome when assembled within short introns13 (see Figure 6C).
Ultra-short introns within the human genome commonly bear non-canonical essential splice-sites recruiting the minor U11/U12 spliceosome50, 51, 52 and lack a defined polypyrimidine tract (analogous to Drosophila short introns which also lack a polypyrimidine tract).49 In contrast, the ten pathogenic 5′SS-branchpoint deletions we collate herein affect introns with canonical U1/U2 splice sites. Extensive curation of three genome-wide studies of human branchpoints21, 22, 23 identifies only nine canonical GU-AG introns with plausible 5′SS-branchpoints of ≤50 nt21, 22, 23 (Figure 1C and Table S2); this extreme rarity speaks to atypical or specialist splicing. Figure 6A shows that the shortest human introns (60–87 nt) have several features distinct from “typical introns” (201–2,500 nt); a G-C gradient from 5′SS to 3′SS, preference for G>A at +3 position, and C rather than T preference within the polypyrimidine tract. These features, and potentially other exonic or intronic motifs or structural features unique to short introns and their flanking exons, may recruit specialized splicing co-factors to aide splicing of short introns.
Figure 3 establishes that while an EMD pre-mRNA with a 5′SS-branchpoint distance of 47 nt is able to be spliced, though inefficiently; a 5′SS-branchpoint length of 45 nt could not be spliced. Further, our in vitro splicing studies using a COL6A2 Δ28 pre-mRNA, which models the previously reported pathogenic intronic deletion,38 indicates the spliceosome appears unable to transition from A to B complexes when assembling within a critically shortened intron (Figure 4B), implying that a minimal 5′SS-branchpoint distance is required for efficient spliceosomal B complex formation. The critical distance between the 5′SS and branchpoint is determined by the 3-dimensional space between two helices formed within the spliceosomal B complex (see Figure 6B, adapted from Bertram et al.,11 helices circled). At the 5′ end of the intron, a 17 nt extended helix is formed between U6 (via its ACAGAG box and adjacent nucleotides) and intronic nucleotides downstream of the 5′SS GU, while the branchpoint and upstream nucleotides form a 14 nt helix with the U2 snRNA. For B complexes formed on a pre-mRNA with a 120 nt intron, these two extended helices are separated by 15 nm (see Figure 6B), which corresponds to ∼21 nt of RNA in an extended conformation.11 Extrapolation of these three measurements (17 nt intron/U6 helix + 14 nt intron/U2 helix + 21 nt span between helices) therefore identifies 52 intronic nucleotides as the minimal span between the 5′SS and branchpoint to encompass and bridge these two helices, without altering the structure of the spliceosome. However, slightly shorter lengths between 5′SS and branchpoint may be tolerated for B complex assembly (as observed for EMD @ 47 nt in Figure 3), via minimal movement of the head domain with respect to the main body of the B complex and/or if the U6/intron helix were shortened by a few base pairs.
Unfortunately, short introns were notably under-represented within three recent transcriptomic and/or informatics analyses studying human branchpoints.21, 22, 23 Lariat branch sites show high sequence diversity, and it is very difficult to accurately predict the precise lariat branchpoint adenine, and more than one branchpoint may be used. Thus, until advances in lariat branchpoint determination are developed, detection of putative minimal 5′SS-branchpoint deletions will likely rely on expert variant curators. Abnormal splicing due to biophysical constraint for spliceosome assembly may occur at different 5′SS-branchpoint distances in different genomic contexts. For example, it is possible that even low levels of biophysical constraint for spliceosome assembly may enhance abnormal use of a competing cryptic 5′SS splice site, and this may occur at 5′SS-branchpoint distances significantly longer than the 45–47 critical elongation that enables splicing of EMD intron 5 (Figure 3). We therefore advocate scrutiny of any deletion in a phenotypically consistent gene that renders overall intron length <71 nt (0.1th percentile among human introns) or 5′SS-branchpoint length <59 nt (0.1th percentile among human introns), with our data alerting extreme risk for splicing abnormalities for introns with 5′SS-branchpoint length reduced to <50 nt. We invite submission of suspected minimal 5′SS-branchpoint deletions to our laboratory (see Web Resources for a submission link), to collate and/or provide experimental confirmation of splicing consequences, to prospectively define potential variance of minimal 5′SS-branchpoint length(s) among different OMIM genes.
In summary, we define critical shortening of 5′SS-branchpoint minimal length as a mechanistic basis and primary determinant for abnormal splicing in human genetic conditions. Genomics informatics pipelines currently overlook non-coding intronic deletions. Only short introns <100 nt may be interrogated by exome-sequencing pipelines, which capture ∼50 nt of the flanking intron. However, whole-genome sequencing informatics pipelines enable genome-wide screening for potential 5′SS-branchpoint deletions, which must also be considered in the context of structural rearrangements. The 5′SS-branchpoint space constraint mechanism is relevant to all human introns bearing canonical splice sites that recruit the U1/U2 spliceosome (>99% of all introns) and thus are relevant across the breadth of Mendelian disorders and cancer genomics.
Declaration of Interests
Professor S.T.C. is director of Frontier Genomics Pty Ltd (Australia). Frontier Genomics has not yet traded and has no current financial interests that will benefit from publication of these data. Methods to detect pathogenic intronic variants is the subject of a provisional patent application owned jointly by the University of Sydney and Sydney Children’s Hospital Network with Professor S.T.C. as named inventor. The remaining authors declare no competing interests.
Acknowledgments
This study was supported by the National Health and Medical Research Council of Australia (APP1048816 and APP1136197 S.T.C., APP1117510 N.G.L., 1080587 N.G.L., S.T.C., D.G.M.). S.J.B. is supported by a Muscular Dystrophy New South Wales PhD scholarship. WES and RNA-seq was provided by the Broad Institute of MIT and Harvard Center for Mendelian Genomics (Broad CMG) and was funded by the National Human Genome Research Institute, the National Eye Institute, and the National Heart, Lung and Blood Institute grant UM1 HG008900 to D.G.M. and Heidi Rehm. We thank the families for their invaluable contributions to this research, and the clinicians involved in their assessment and management.
Published: August 22, 2019
Footnotes
Supplemental Data can be found online at https://doi.org/10.1016/j.ajhg.2019.07.013.
Web Resources
BioPerl’s pictogram module, https://search.cpan.org/dist/BioPerl/Bio/Draw/Pictogram.pm
Exon/intron datasets, truncated fasta files, pictograms and supplemental tables, https://github.com/kidsneuro-lab/minimal_introns
hg19 fasta sequences, http://hgdownload.cse.ucsc.edu/goldenpath/hg19/chromosomes/
HGVS python module, https://github.com/biocommons/hgvs
Leiden muscular dystrophy, http://www.dmd.nl/
Minimal intron cases submission, http://www.kidsneuroscience.org.au/minimal_intron_cases
NHLBI Exome Sequencing Project (ESP) Exome Variant Server, https://evs.gs.washington.edu/EVS/
OMIM, https://www.omim.org/
Transformed set of ClinVar variants, https://github.com/macarthur-lab/clinvar
UCSC table browser, https://genome.ucsc.edu/cgi-bin/hgTables
Supplemental Data
Table S1. Curation of 32 introns < 66 nt length extracted from NCBI RefSeq. Table S2. Curation of human introns with 5′SS-branchpoint distance ≤ 50 nt. Table S3. Curation of intronic insertions and deletions (indels) extracted from ClinVar and LOVD with resultant intron length of ≤ 100 bp.
References
- 1.Schofield D., Alam K., Douglas L., Shrestha R., MacArthur D.G., Davis M., Laing N.G., Clarke N.F., Burns J., Cooper S.T. Cost-effectiveness of massively parallel sequencing for diagnosis of paediatric muscle diseases. NPJ Genom. Med. 2017 doi: 10.1038/s41525-017-0006-7. Published online March 3, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Ghaoui R., Cooper S.T., Lek M., Jones K., Corbett A., Reddel S.W., Needham M., Liang C., Waddell L.B., Nicholson G. Use of Whole-Exome Sequencing for Diagnosis of Limb-Girdle Muscular Dystrophy: Outcomes and Lessons Learned. JAMA Neurol. 2015;72:1424–1432. doi: 10.1001/jamaneurol.2015.2274. [DOI] [PubMed] [Google Scholar]
- 3.O’Grady G.L., Lek M., Lamande S.R., Waddell L., Oates E.C., Punetha J., Ghaoui R., Sandaradura S.A., Best H., Kaur S. Diagnosis and etiology of congenital muscular dystrophy: We are halfway there. Ann. Neurol. 2016;80:101–111. doi: 10.1002/ana.24687. [DOI] [PubMed] [Google Scholar]
- 4.Scotti M.M., Swanson M.S. RNA mis-splicing in disease. Nat. Rev. Genet. 2016;17:19–32. doi: 10.1038/nrg.2015.3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Matera A.G., Wang Z. A day in the life of the spliceosome. Nat. Rev. Mol. Cell Biol. 2014;15:108–121. doi: 10.1038/nrm3742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Will C.L., Lührmann R. Spliceosome structure and function. Cold Spring Harb. Perspect. Biol. 2011;3:3. doi: 10.1101/cshperspect.a003707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sheth N., Roca X., Hastings M.L., Roeder T., Krainer A.R., Sachidanandam R. Comprehensive splice-site analysis using comparative genomics. Nucleic Acids Res. 2006;34:3955–3967. doi: 10.1093/nar/gkl556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yan C., Hang J., Wan R., Huang M., Wong C.C., Shi Y. Structure of a yeast spliceosome at 3.6-angstrom resolution. Science. 2015;349:1182–1191. doi: 10.1126/science.aac7629. [DOI] [PubMed] [Google Scholar]
- 9.Yan C., Wan R., Bai R., Huang G., Shi Y. Structure of a yeast step II catalytically activated spliceosome. Science. 2017;355:149–155. doi: 10.1126/science.aak9979. [DOI] [PubMed] [Google Scholar]
- 10.Fica S.M., Oubridge C., Galej W.P., Wilkinson M.E., Bai X.C., Newman A.J., Nagai K. Structure of a spliceosome remodelled for exon ligation. Nature. 2017;542:377–380. doi: 10.1038/nature21078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Bertram K., Agafonov D.E., Dybkov O., Haselbach D., Leelaram M.N., Will C.L., Urlaub H., Kastner B., Lührmann R., Stark H. Cryo-EM Structure of a Pre-catalytic Human Spliceosome Primed for Activation. Cell. 2017;170:701–713.e11. doi: 10.1016/j.cell.2017.07.011. [DOI] [PubMed] [Google Scholar]
- 12.Bertram K., Agafonov D.E., Liu W.T., Dybkov O., Will C.L., Hartmuth K., Urlaub H., Kastner B., Stark H., Lührmann R. Cryo-EM structure of a human spliceosome activated for step 2 of splicing. Nature. 2017;542:318–323. doi: 10.1038/nature21079. [DOI] [PubMed] [Google Scholar]
- 13.Herold N., Will C.L., Wolf E., Kastner B., Urlaub H., Lührmann R. Conservation of the protein composition and electron microscopy structure of Drosophila melanogaster and human spliceosomal complexes. Mol. Cell. Biol. 2009;29:281–301. doi: 10.1128/MCB.01415-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Desmet F.O., Hamroun D., Lalande M., Collod-Béroud G., Claustres M., Béroud C. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 2009;37:e67. doi: 10.1093/nar/gkp215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Houdayer C. In silico prediction of splice-affecting nucleotide variants. Methods Mol. Biol. 2011;760:269–281. doi: 10.1007/978-1-61779-176-5_17. [DOI] [PubMed] [Google Scholar]
- 16.Jian X., Boerwinkle E., Liu X. In silico tools for splicing defect prediction: a survey from the viewpoint of end users. Genet. Med. 2014;16:497–503. doi: 10.1038/gim.2013.176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Soemedi R., Cygan K.J., Rhine C.L., Wang J., Bulacan C., Yang J., Bayrak-Toydemir P., McDonald J., Fairbrother W.G. Pathogenic variants that alter protein code often disrupt splicing. Nat. Genet. 2017;49:848–855. doi: 10.1038/ng.3837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Soukarieh O., Gaildrat P., Hamieh M., Drouet A., Baert-Desurmont S., Frébourg T., Tosi M., Martins A. Exonic Splicing Mutations Are More Prevalent than Currently Estimated and Can Be Predicted by Using In Silico Tools. PLoS Genet. 2016;12:e1005756. doi: 10.1371/journal.pgen.1005756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Vaz-Drago R., Custódio N., Carmo-Fonseca M. Deep intronic mutations and human disease. Hum. Genet. 2017;136:1093–1111. doi: 10.1007/s00439-017-1809-4. [DOI] [PubMed] [Google Scholar]
- 20.Zhu L., Zhang Y., Zhang W., Yang S., Chen J.Q., Tian D. Patterns of exon-intron architecture variation of genes in eukaryotic genomes. BMC Genomics. 2009;10:47. doi: 10.1186/1471-2164-10-47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mercer T.R., Clark M.B., Andersen S.B., Brunck M.E., Haerty W., Crawford J., Taft R.J., Nielsen L.K., Dinger M.E., Mattick J.S. Genome-wide discovery of human splicing branchpoints. Genome Res. 2015;25:290–303. doi: 10.1101/gr.182899.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Pineda J.M.B., Bradley R.K. Most human introns are recognized via multiple and tissue-specific branchpoints. Genes Dev. 2018;32:577–591. doi: 10.1101/gad.312058.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Taggart A.J., Lin C.L., Shrestha B., Heintzelman C., Kim S., Fairbrother W.G. Large-scale analysis of branchpoint usage across species and cell lines. Genome Res. 2017;27:639–649. doi: 10.1101/gr.202820.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Cummings B.B., Marshall J.L., Tukiainen T., Lek M., Donkervoort S., Foley A.R., Bolduc V., Waddell L.B., Sandaradura S.A., O’Grady G.L., Genotype-Tissue Expression Consortium Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. 2017;9:9. doi: 10.1126/scitranslmed.aal5209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.O’Leary N.A., Wright M.W., Brister J.R., Ciufo S., Haddad D., McVeigh R., Rajput B., Robbertse B., Smith-White B., Ako-Adjei D. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016;44(D1):D733–D745. doi: 10.1093/nar/gkv1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Cooper S.T., Lo H.P., North K.N. Single section Western blot: improving the molecular diagnosis of the muscular dystrophies. Neurology. 2003;61:93–97. doi: 10.1212/01.wnl.0000069460.53438.38. [DOI] [PubMed] [Google Scholar]
- 27.Zhang X., Minikel E.V., O’Donnell-Luria A.H., MacArthur D.G., Ware J.S., Weisburd B. ClinVar data parsing. Wellcome Open Res. 2017;2:33. doi: 10.12688/wellcomeopenres.11640.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hart R.K., Rico R., Hare E., Garcia J., Westbrook J., Fusaro V.A. A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature. Bioinformatics. 2015;31:268–270. doi: 10.1093/bioinformatics/btu630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Dignam J.D., Lebovitz R.M., Roeder R.G. Accurate transcription initiation by RNA polymerase II in a soluble extract from isolated mammalian nuclei. Nucleic Acids Res. 1983;11:1475–1489. doi: 10.1093/nar/11.5.1475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Behzadnia N., Hartmuth K., Will C.L., Lührmann R. Functional spliceosomal A complexes can be assembled in vitro in the absence of a penta-snRNP. RNA. 2006;12:1738–1746. doi: 10.1261/rna.120606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Beeson D., Higuchi O., Palace J., Cossins J., Spearman H., Maxwell S., Newsom-Davis J., Burke G., Fawcett P., Motomura M. Dok-7 mutations underlie a neuromuscular junction synaptopathy. Science. 2006;313:1975–1978. doi: 10.1126/science.1130837. [DOI] [PubMed] [Google Scholar]
- 32.Selcen D., Milone M., Shen X.M., Harper C.M., Stans A.A., Wieben E.D., Engel A.G. Dok-7 myasthenia: phenotypic and molecular genetic studies in 16 patients. Ann. Neurol. 2008;64:71–87. doi: 10.1002/ana.21408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Zhang M.Q. Statistical features of human exons and their flanking regions. Hum. Mol. Genet. 1998;7:919–932. doi: 10.1093/hmg/7.5.919. [DOI] [PubMed] [Google Scholar]
- 34.Yeo G., Burge C.B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 2004;11:377–394. doi: 10.1089/1066527041410418. [DOI] [PubMed] [Google Scholar]
- 35.Pertea M., Lin X., Salzberg S.L. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res. 2001;29:1185–1190. doi: 10.1093/nar/29.5.1185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Reese M.G., Eeckman F.H., Kulp D., Haussler D. Improved splice site detection in Genie. J. Comput. Biol. 1997;4:311–323. doi: 10.1089/cmb.1997.4.311. [DOI] [PubMed] [Google Scholar]
- 37.Schara U., Barisic N., Deschauer M., Lindberg C., Straub V., Strigl-Pill N., Wendt M., Abicht A., Müller J.S., Lochmüller H. Ephedrine therapy in eight patients with congenital myasthenic syndrome due to DOK7 mutations. Neuromuscul. Disord. 2009;19:828–832. doi: 10.1016/j.nmd.2009.09.008. [DOI] [PubMed] [Google Scholar]
- 38.Gualandi F., Manzati E., Sabatelli P., Passarelli C., Bovolenta M., Pellegrini C., Perrone D., Squarzoni S., Pegoraro E., Bonaldo P., Ferlini A. Antisense-induced messenger depletion corrects a COL6A2 dominant mutation in Ullrich myopathy. Hum. Gene Ther. 2012;23:1313–1318. doi: 10.1089/hum.2012.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Tanner S.M., Sturm A.C., Baack E.C., Liyanarachchi S., de la Chapelle A. Inherited cobalamin malabsorption. Mutations in three genes reveal functional and ethnic patterns. Orphanet J. Rare Dis. 2012;7:56. doi: 10.1186/1750-1172-7-56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Ben Ammar A., Petit F., Alexandri N., Gaudon K., Bauché S., Rouche A., Gras D., Fournier E., Koenig J., Stojkovic T. Phenotype genotype analysis in 15 patients presenting a congenital myasthenic syndrome due to mutations in DOK7. J. Neurol. 2010;257:754–766. doi: 10.1007/s00415-009-5405-y. [DOI] [PubMed] [Google Scholar]
- 41.Cossins J., Liu W.W., Belaya K., Maxwell S., Oldridge M., Lester T., Robb S., Beeson D. The spectrum of mutations that underlie the neuromuscular junction synaptopathy in DOK7 congenital myasthenic syndrome. Hum. Mol. Genet. 2012;21:3765–3775. doi: 10.1093/hmg/dds198. [DOI] [PubMed] [Google Scholar]
- 42.Lashley D., Palace J., Jayawant S., Robb S., Beeson D. Ephedrine treatment in congenital myasthenic syndrome due to mutations in DOK7. Neurology. 2010;74:1517–1523. doi: 10.1212/WNL.0b013e3181dd43bf. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Fostira F., Papademitriou C., Efremidis A., Yannoukakos D. An in-frame exon-skipping MUTYH mutation is associated with early-onset colorectal cancer. Dis. Colon Rectum. 2010;53:1197–1201. doi: 10.1007/DCR.0b013e3181dcf0c1. [DOI] [PubMed] [Google Scholar]
- 44.Tucci A., Kara E., Schossig A., Wolf N.I., Plagnol V., Fawcett K., Paisán-Ruiz C., Moore M., Hernandez D., Musumeci S. Kohlschütter-Tönz syndrome: mutations in ROGDI and evidence of genetic heterogeneity. Hum. Mutat. 2013;34:296–300. doi: 10.1002/humu.22241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Burke G., Hiscock A., Klein A., Niks E.H., Main M., Manzur A.Y., Ng J., de Vile C., Muntoni F., Beeson D., Robb S. Salbutamol benefits children with congenital myasthenic syndrome due to DOK7 mutations. Neuromuscul. Disord. 2013;23:170–175. doi: 10.1016/j.nmd.2012.11.004. [DOI] [PubMed] [Google Scholar]
- 46.Gallenmüller C., Müller-Felber W., Dusl M., Stucka R., Guergueltcheva V., Blaschek A., von der Hagen M., Huebner A., Müller J.S., Lochmüller H., Abicht A. Salbutamol-responsive limb-girdle congenital myasthenic syndrome due to a novel missense mutation and heteroallelic deletion in MUSK. Neuromuscul. Disord. 2014;24:31–35. doi: 10.1016/j.nmd.2013.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Lorenzoni P.J., Scola R.H., Kay C.S., Filla L., Miranda A.P., Pinheiro J.M., Chaouch A., Lochmüller H., Werneck L.C. Salbutamol therapy in congenital myasthenic syndrome due to DOK7 mutation. J. Neurol. Sci. 2013;331:155–157. doi: 10.1016/j.jns.2013.05.017. [DOI] [PubMed] [Google Scholar]
- 48.Wieringa B., Hofer E., Weissmann C. A minimal intron length but no specific internal sequence is required for splicing the large rabbit beta-globin intron. Cell. 1984;37:915–925. doi: 10.1016/0092-8674(84)90426-4. [DOI] [PubMed] [Google Scholar]
- 49.Guo M., Lo P.C., Mount S.M. Species-specific signals for the splicing of a short Drosophila intron in vitro. Mol. Cell. Biol. 1993;13:1104–1118. doi: 10.1128/mcb.13.2.1104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Shimada M.K., Sasaki-Haraguchi N., Mayeda A. Identification and Validation of Evolutionarily Conserved Unusually Short Pre-mRNA Introns in the Human Genome. Int. J. Mol. Sci. 2015;16:10376–10388. doi: 10.3390/ijms160510376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Abebrese E.L., Ali S.H., Arnold Z.R., Andrews V.M., Armstrong K., Burns L., Crowder H.R., Day R.T., Jr., Hsu D.G., Jarrell K. Identification of human short introns. PLoS ONE. 2017;12:e0175393. doi: 10.1371/journal.pone.0175393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.McCullough A.J., Berget S.M. G triplets located throughout a class of small vertebrate introns enforce intron borders and regulate splice site selection. Mol. Cell. Biol. 1997;17:4562–4571. doi: 10.1128/mcb.17.8.4562. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1. Curation of 32 introns < 66 nt length extracted from NCBI RefSeq. Table S2. Curation of human introns with 5′SS-branchpoint distance ≤ 50 nt. Table S3. Curation of intronic insertions and deletions (indels) extracted from ClinVar and LOVD with resultant intron length of ≤ 100 bp.






