Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Jul 6;15(7):e0235655. doi: 10.1371/journal.pone.0235655

Clinical interpretation of variants identified in RNU4ATAC, a non-coding spliceosomal gene

Clara Benoit-Pilven 1,2,¤,#, Alicia Besson 1,#, Audrey Putoux 1,3, Claire Benetollo 4, Clément Saccaro 1, Justine Guguin 1, Gabriel Sala 1, Audric Cologne 1,2, Marion Delous 1, Gaetan Lesca 1,3, Richard A Padgett 5, Anne-Louise Leutenegger 6, Vincent Lacroix 2, Patrick Edery 1,3,, Sylvie Mazoyer 1,‡,*
Editor: Klaus Brusgaard7
PMCID: PMC7337319  PMID: 32628740

Abstract

Biallelic variants in RNU4ATAC, a non-coding gene transcribed into the minor spliceosome component U4atac snRNA, are responsible for three rare recessive developmental diseases, namely Taybi-Linder/MOPD1, Roifman and Lowry-Wood syndromes. Next-generation sequencing of clinically heterogeneous cohorts (children with either a suspected genetic disorder or a congenital microcephaly) recently identified mutations in this gene, illustrating how profoundly these technologies are modifying genetic testing and assessment. As RNU4ATAC has a single non-coding exon, the bioinformatic prediction algorithms assessing the effect of sequence variants on splicing or protein function are irrelevant, which makes variant interpretation challenging to molecular diagnostic laboratories. In order to facilitate and improve clinical diagnostic assessment and genetic counseling, we present i) an update of the previously reported RNU4ATAC mutations and an analysis of the genetic variations affecting this gene using the Genome Aggregation Database (gnomAD) resource; ii) the pathogenicity prediction performances of scores computed based on an RNA structure prediction tool and of those produced by the Combined Annotation Dependent Depletion tool for the 285 RNU4ATAC variants identified in patients or in large-scale sequencing projects; iii) a method, based on a cellular assay, that allows to measure the effect of RNU4ATAC variants on splicing efficiency of a minor (U12-type) reporter intron. Lastly, the concordance of bioinformatic predictions and cellular assay results was investigated.

Introduction

The sequences of thousands of genes involved in the aetiology of one or several Mendelian genetic diseases are routinely evaluated in patients in order to provide or confirm their diagnosis and help managing their health care. DNA variant interpretation is currently one of the major challenges of genetic testing. Indeed, diagnostic laboratories have seen a massive increase in the number of variants identified due to the widespread implementation of next-generation sequencing. Recommendations for the homogenised setting of variant classification pipelines have been published: they take into account variant characteristics (i.e. the type of variant: missense, nonsense, indel, splice site; sequence conservation among species; in silico predicted consequence), epidemiological and segregation data, and functional evaluation [1]. Most genes involved in Mendelian diseases are protein-coding genes and therefore the main features of these recommendations apply to them, despite their many different biological properties. However, there exist a few rare diseases for which pathogenic variants have been identified in a handful of non-coding genes, one of them being the snRNA gene RNU4ATAC [2].

RNU4ATAC was first found mutated in an autosomal recessive disorder named microcephalic osteodysplastic primordial dwarfism type 1 (MOPD1, OMIM 210710) or Taybi-Linder syndrome (TALS) [3,4]. This very rare (~50 reported cases worldwide) and severe disorder is characterized by intellectual disability and multiple malformations including severe microcephaly, cortical brain malformations (neuronal migration defects), corpus callosum agenesis/dysgenesis, dysmorphic features, dwarfism, and bone anomalies. It leads to early unexplained death occurring within the first two years of life in more than 70% of the published cases. Other very rare congenital disorders named Roifman syndrome (RFMN, OMIM 616651) [5] and Lowry Wood syndrome (LWS, OMIM 226960) [6] have subsequently been attributed to biallelic RNU4ATAC mutations. Both RFMN and LWS have features overlapping with TALS (i.e. microcephaly, growth retardation, skeletal dysplasia, intellectual disability). However, severe structural brain anomalies and early death are not observed in these two latter disorders, and microcephaly and growth retardation are less pronounced [7]. On the other hand, because RFMN patients’ parents first consult because of their child’s recurrent infections, immune defects have been thoroughly investigated and are well documented in this syndrome, whereas this is not the case for TALS and LWS.

Most small nuclear RNAs (snRNAs) are components of either the major and/or the minor spliceosome, which respectively removes major (also called U2-type) or minor (U12-type) introns from pre-mRNAs. Pre-mRNA splicing is a crucial step in the expression of eukaryotic genes, especially in humans where about 97% of the ~20.000 protein-coding genes contain at least one intron. Major introns represent more than 99% of the total number of introns (~220.000), while there are only ~850 minor introns present in about 700 genes [810]. The two spliceosomes are highly homologous and owe their specificity to different consensus splice-site sequences as well as different sets of snRNAs, i.e. U1, U2, U4 and U6 for the major spliceosome and U11, U12, U4atac and U6atac for the minor one [11]. On the other hand, U5 snRNA and the protein components are common to both spliceosomes, apart from seven proteins specific to the minor spliceosome. Homologous snRNAs with equivalent functions in the two spliceosomes, namely U1/U2 and U11/U12, U4/U6 and U4atac/U6atac, share a common secondary structure formed by intra- and/or intermolecular base-pairing despite their divergent nucleic acid sequences. The U4atac/U6atac small nuclear ribonucleoprotein particle (U4atac/U6atac di-snRNP) is composed of U4atac snRNA stably base-paired with U6atac snRNA and of seven Sm proteins and other particle-specific proteins. This di-snRNP then associates with the U5 snRNP, forming the U4atac-U6atac.U5 tri-snRNP, a component of the pre-catalytic complex which gains its catalytic activity to excise the intron by the withdrawal of U4atac and the pairing of U6atac with U12 [11]. Several RNU4ATAC mutations identified in TALS patients have been shown to result in defects in minor tri-snRNP formation [12]. Further, transcriptomic analyses of cells from RFMN and TALS patients revealed massive U12 intron retentions [5,9,13,14].

Even though TALS, RFMN and LWS syndromes are extremely rare, biallelic RNU4ATAC mutations have been recently identified during the screening of clinically heterogeneous cohorts. Indeed, one carrier was found in a whole genome sequencing analysis of 103 patients from pediatric non-genetic subspecialty clinics, each with a clinical phenotype suggestive of an underlying genetic disorder, yet undiagnosed [15]. Another one was found, less unexpectedly, in a gene panel or exome sequencing of 150 patients (104 families) with Mendelian forms of congenital microcephaly (occipital frontal circumference below -2 SD or with a reported history of microcephaly at birth) [16]. Genotypes and phenotypes of these two cases were compatible with RFMN and TALS syndromes, respectively. The generalisation of the use of exome and whole-genome sequencing to diagnose disorders with a suspected genetic cause implies that new RNU4ATAC variants will undoubtedly be identified in the future in laboratories without expertise on this gene. Yet, the classification of RNU4ATAC variants to provide accurate genetic counselling is difficult because the criteria used to predict the impact of variants in coding genes are partially inappropriate. Furthermore, functional assays are lacking and no published guidelines are yet available. In order to facilitate variant interpretation for diagnostic laboratories, we first reviewed the literature and compiled the RNU4ATAC variants reported as pathogenic in patients at the homozygous or compound heterozygous state. We next analysed the extent of genetic variations found in this gene using the Genome Aggregation Database (gnomAD) resource. We further predicted the impact of all of these variants on the secondary structure of the U4atac/U6atac bimolecule using an available bioinformatic tool, RNAstructure, and with the widely used Combined Annotation Dependent Depletion (CADD) tool. In addition, we assessed the splicing efficiency of U4atac molecules carrying either one of 24 variants using a cellular model. We then compared the scores obtained with RNAstructure and CADD with the results obtained with the cellular assay and confronted them to their known pathogenic status.

Methods

Variant information extraction from gnomAD

RNU4ATAC variant information was extracted from the Genome Aggregation Database v2.1 (gnomAD, http://gnomad.broadinstitute.org/) [17]. We kept the variants whose coordinates were comprised in the following interval: chr2:122,288,456–122,288,585 (GrCh37). Due to the misannotation of the RNU4ATAC sequence in gnomAD, we had to correct the names of the variants by adding 1 to the coordinate of the RNU4ATAC nucleotide (e.g. n.50G>A was transformed into n.51G>A).

Bioinformatics model for large-scale U4atac/U6atac secondary structure predictions

Three nucleotide substitutions found in patients but not present in gnomAD were added to the list, leading to a total number of 285 different RNU4ATAC variants for the 130 nucleotides of the RNU4ATAC gene. The bioinformatics pipeline that we set-up follows three steps. First, we generated the mutated RNU4ATAC sequences for the 285 variants. Then, for all the obtained sequences, we used the bifold function of the RNAstructure package (version 6.1) [18] to predict the secondary structure of the bimolecule formed by wild type U6atac and mutant U4atac. This tool predicts the bimolecular structures of two sequences folded into their lowest hybrid free energy conformation. All default parameters were used to run bifold except the ‘‘-p” parameter that we set to 2. This option allowed us to keep all suboptimal structures with a free energy close to the energy of the minimal free energy (MFE) structure. Finally, the predicted bimolecules were compared to the wild type MFE structure to compute a score based on the number of base-pairing changes due to the presence of the variants and the known importance for splicing of the regions affected by the change of structure: score = A x 3 + B x 1 + C x 0.5, where A, B and C are the numbers of modified base pairings in regions of major, variable or limited/null importance for splicing respectively, as defined by Merico et al. [5]. This score was computed for all suboptimal structures corresponding to each mutant U4atac, allowing us to choose among the suboptimal structures the closest one to the structure of wild type U4atac/wild type U6atac. The final output is the list of RNU4ATAC variants with their associated score. The scripts to run this analysis are available on github: https://github.com/cbenoitp/RNU4atac_variants.

Combined Annotation Dependent Depletion (CADD) predictions

The CADD score combines the results of >60 variant annotation prediction tools into one metric by contrasting variants that survived natural selection with simulated mutations, hence representing a measure of deleteriousness for single nucleotide variants and small indels (version v1.5 at https://cadd.gs.washington.edu/). A low score indicates that a variant resembles commonly occurring genetic variation that poses no apparent disadvantage for an organism. In contrast, a high score represents variants that are more likely to have deleterious effects [19]. In this study, we used “raw” C-scores rather than the “scaled” ones as the comparison with the scores obtained with variants in protein-coding genes appeared of little value.

Cellular model

Plasmids

The P120 minigene reporter plasmid was constructed by R. Padgett and coll. [20]. Briefly, it derives from the pCB6 expression vector into which a portion of the human NOP2 (NOP2 Nucleolar protein) gene has been inserted downstream the CMV promoter. This portion consists of parts of exons 5 and 8, and all of exons 6 and 7 and introns E, F (a U12-type intron) and G. In the U4atac expressing vector also constructed by R. Padgett and coll., the human U4atac snRNA sequence replaces the U1 snRNA sequence of a functional U1 snRNA gene cloned into the pUC13 expression vector [21].

Cells

Primary fibroblasts were derived from skin biopsies of a control child or from a TALS patient carrying the RNU4ATAC n.51G>A pathogenic variant in the homozygous state (TALS2 in [3]). Informed written consent for the use of these samples in research was obtained from parents of the TALS patient and control child. Cells were collected, processed, and stored in Lyon University Hospital biobank (CBC Biotec). Authorisation for their collection and their use in research has been granted by the Ministry of Research, by the Comité de protection des Personnes Sud-Est IV and the Regional Agency for Hospital Services under the number DC-2015-2566. The project has been approved by the local ethics committee of the Hospices Civils de Lyon.

Transfection

Transient transfection of the P120 minigene and U4atac snRNA expression plasmids into cultured n.51G>A TALS patient fibroblast cells was performed using Lipofectamine® 2000 Reagent (Thermo Fisher Scientific) according to the manufacturer’s protocol. For these experiments, 0.06 μg of P120, 1.25 μg of the U4atac snRNA expression plasmid, and 1.19 μg of empty pUC19 plasmid (added to keep a total plasmid DNA quantity of 2.5 μg as required for an optimal transfection) were added to 250,000 cells /well of a 6-well plate. For testing the joint effect of two different variants, co-transfections of 0.625 μg of each version of U4atac snRNA expression plasmids was realised. Every transfection experiment performed to test a batch of RNU4ATAC variants included a set of cells transfected with the WT U4atac snRNA expression plasmid.

RNA extraction

Total RNA was isolated from cells 48h post-transfection using the Nucleospin RNA kit (Macherey Nagel) according to the manufacturer’s instructions. A RNA Qualified-DNase treatment (RQ1 DNase, Promega) was performed to remove contaminating genomic DNA from RNA samples according to manufacturer’s protocol.

Reverse transcription

200 ng of RNA was reverse transcribed using the GoScript Reverse Transcriptase kit (Promega) and a reverse primer specific of P120 minigene construct (R: 5’-GGA TCC TCT AGA GTC GAC C-3’), which allows to target the transcripts produced from the P120 plasmid specifically, without amplifying the endogenous NOP2 transcripts.

Semi-quantitative RT-PCR

Semi-quantitative RT-PCR were performed using the Platinum Taq DNA polymerase (Thermo Fisher Scientific) according to the manufacturer’s instructions and a primer pair flanking the P120 U12 intron (intron F) (F: 5’-TGA GGA ACC ATT TGT GCT GC-3’, R: 5’-ATC CGC TTG TGA ACT CGT TG-3’). The PCR products were separated on a 2% agarose gel electrophoresis using GelRed nucleic acid gel stain (Biotium). The PCR product intensity was quantified using Image J. Splicing efficiency was measured as the ratio of spliced RNA to spliced RNA and unspliced RNA (expressed as a percentage). In addition, RT-PCR were performed with cDNA made from RNA of untransfected cells to check the absence of endogenous NOP2 gene amplification. A negative control (RT-PCR without reverse transcriptase) was also performed to check the absence of genomic DNA in RNA samples.

Quantitative RT-PCR

qRT-PCR were performed in triplicate using Rotor-Gene SYBR Green PCR kit (Qiagen) according to the manufacturer’s instructions. For these experiments, three primer pairs were used to measure relative splicing efficiency: i) primer pair which targets P120 transcripts with unspliced intron F (F: 5’-TGA GGA ACC ATT TGT GCT GC-3’, R: 5’-GGA AAT CCC TCT CCC AAC C-3’); ii) primer pair which targets P120 transcripts with spliced intron F (F: 5’-GGA GAT GGA GCA GGA TGC-3’, R: 5’-TCCCGCTGAGCCCCAAAA-3’); iii) primer pair which targets all P120 transcripts (F: 5’-CAG ACC TGC AAC GAG TTC AC-3’, R: 5’-GTA TTC AGA ACG AGA CCG CC-3’). The 2–ΔΔCt method was used to measure relative splicing efficiency by qPCR. We computed the fold change of number of P120 unspliced and spliced transcripts in mutant U4atac snRNA condition relative to wild-type U4atac snRNA condition, normalized to the number of P120 total transcripts. Then, the relative splicing efficiency was calculated as the ratio of spliced RNA to unspliced and spliced RNA. The relative splicing efficiency in wild-type U4atac snRNA condition was fixed at 100%. Relative quantification of U4atac snRNA expression were performed using qRT-PCR and the primer pair (F: 5’-AAC CAT CCT TTT CTT GGG GTT GC-3’, R: 5’-ATT TTT CCA AAA ATT GCA CCA AAA TAA AGC-3’). For each variant, the significance of the difference between the relative splicing efficiency and 100% was tested using a one-tailed t test.

Results

Inventory of the disease-associated RNU4ATAC variants

To date, biallelic RNU4ATAC pathogenic variants have been reported in the literature in 46 families: 31 TALS (46 patients and 8 foetuses), 11 RFMN (15 patients) and 4 LWS families (5 patients) (S1 Table). Half of these independent occurrences (23/46) are due to homozygous RNU4ATAC variants which are (or are suspected to be) the result of consanguineous unions. Among the 30 different pathogenic variants identified, 29 are substitutions concerning 23 of the 131 U4atac nucleotides, and the remaining one is an 85-nt duplication (Table 1). As shown in Fig 1, the single nucleotide variants found in U4atac are mostly located in three regions of the U4atac/U6atac bimolecule (26/30): the intramolecular 5’ Stem-Loop structure (12 substitutions), the region forming the Stem II structure through interaction with U6atac (7 substitutions), and the single-stranded Sm protein-binding region (7 substitutions). Unsurprisingly, these regions have been characterised as the most important ones for U4atac maturation and function [22,23]. The 5’ Stem-Loop exhibits a structural K-turn motif that interacts with the 15.5K protein, then inducing the association of PRPF31 protein to the U4atac/U6atac-15.5K complex and, in turn, the binding of the PRPF31/PRPF4/PPIH heterodimer to Stem II [2225]. The resulting snRNP complex is essential for the activation of the spliceosome prior to catalysis. The Sm protein-binding region present in every snRNA but U6 and U6atac, allows the interaction with seven Sm proteins required for the snRNA maturation [26]. Three variants are located outside of these three functionally important regions, namely a substitution that resides just 3’ to Stem I and two substitutions located in the intramolecular 3’ Stem-Loop. The 3’ Stem-Loop is not known to interact with any protein and has been shown to be less important for splicing than the 5’ Stem-Loop using cellular assays, but it is nevertheless suspected to function as part of the Sm protein-binding signal [22].

Table 1. Summary of the published disease-associated RNU4ATAC variants.

Variant RNA domain Number of independent patient(s) (homozygous + compound heterozygous states) Origin of the patients (number of independent cases) References
n.5A>C Stem II 1 LWS (0+1) ? (1) [6]
n.8C>A Stem II 1 LWS (0+1) Italian ? (1) [7]
n.8C>T Stem II 1 RFMN (0+1) Albanian (1) [5]
n.13C>T Stem II 4 RFMN (0+4) English (1), Italian (1), Belgian (1), ? (1) [5,37,38]
n.13C>G Stem II 1 RFMN (0+1) ? (1) [15]
n.16_100dup 1 TALS (0+1) Danish (1) [34]
n.16G>A Stem II 6 RFMN (4+2) Lebanese (1), Pakistani (1), Belgian (2) [5,13,14]
n.17G>A Stem II 1 RFMN (0+1) Tamil (1) [13]
n.29T>C 5' Stem-Loop critical region 1 TALS (0+1) Chinese (1) [15]
1 RFMN (0+1) ? (1) [39]
n.30G>A 5' Stem loop critical region 1 TALS (0+1) German (1) [4]
n.37G>A 5' Stem loop critical region 1 RFMN (0+1) English (1), Italian (1) [5]
n.40C>T NOP domain binding region 2 TALS (0+2) Danish (1), French (1) [34,40]
n.46G>A 1 TALS (1+0) Turkish (1) [41]
1 LWS (0+1—in cis with n.123G>A) ? (1) [6]
1 RFMN (0+1) Belgian (1) [14]
n.48G>A 5' Stem loop critical region 1 RFMN (0+1) Italian (1) [5]
n.50G>C 5' Stem loop critical region 1 TALS (0+1) North American (1) [3]
n.50G>A 5' Stem loop critical region 1 TALS (0+1) North American (1) [3]
n.51G>A 5' Stem loop critical region 20 TALS (14+6) Algerian (1), Turkish (2), Moroccan (2), Indian (2), North American (4), Norwegian (1), Maltese (2), Rwandan (1), French (1), Deutch (1), Chinese (1), Egyptian (2) [3,4,39,40,42,43]
1 RFMN (0+1) Lebanese (1) [5]
1 LWS (0+1) ? (1) [6]
n.53C>G 5' Stem loop critical region 1 TALS (0+1) Norwegian (1) [3]
n.53C>T 5' Stem loop critical region 1 LWS (0+1) Italian ? (1) [7]
n.55G>A 5' Stem loop critical region 6 TALS (4+2) German (1), Egyptian (2), Yemeni (1), Indian (1), ? (1) [4,16,35,44, 45]
n.66G>C 1 TALS (0+1) Egyptian (1) [45]
n.111G>A 3' Stem loop 1 TALS (0+1) German (1) [4]
1 LWS (0+1) ? (1) [6]
n.114G>C 3' Stem loop 1 LWS (0+1) Italian ? (1) [7]
n.116A>T Sm protein-binding site 1 TALS (0+1) Belgian (1), ?(1) [16,37]
1 RFMN (0+1)
n.116A>G Sm protein-binding site 1 RFMN (0+1) Tamil (1) [13]
n.116A>C Sm protein-binding site 1 RFMN (0+1) German (1) [38]
n.118T>C Sm protein-binding site 1 RFMN (0+1) Albanian (1) [5]
n.120T>G Sm protein-binding site 1 LWS (0+1) Italian ? (1) [7]
n.123G>A Sm protein-binding site 1 LWS (0+1 –in cis with n.46G>A) ? (1) [6]
n.124G>A Sm protein-binding site 3 TALS (0+3) Egyptian (1), French (1), Deutch (1) [40, 45]

LWS: Lowry-Wood syndrome; RFMN: Roifman syndrome; TALS: Taybi-Linder syndrome

Fig 1. U4atac nucleotides mutated in TALS, RFMN and LWS patients.

Fig 1

The U4atac/U6atac bimolecule is represented, as well as its main interacting proteins (15.5K, PRPF3, PRPF4, PRPF31, PPIH, and the Sm proteins). Arrowheads point to the mutated nucleotides observed in TALS, RFMN and/or LWS. There are five nucleotides for which more than one substitution was identified (more than one arrowhead in the figure). The 85 nt-duplication (n.16_100dup) is not shown.

The most frequent pathogenic variant, n.51G>A, was identified in 22 out of the 46 RNU4ATAC families, particularly in TALS families (20/31) whether they reside in Africa, Europe, Middle East, North America, or Asia. Fourteen of them, presenting the most severe form of TALS (death occurring before 2.5 years of age), carry n.51G>A in the homozygous state, while the remaining six TALS families carry various variants on the other allele. The n.51G>A variant was also found in the compound heterozygous state in one RFMN and one LWS family. Only one other variant was found in all three RNU4ATAC-associated syndromes, namely n.46G>A (one TALS, one RFMN and one LWS families). Three additional variants, n.29T>C, n.111G>A, and n.116A>T were found in two syndromes (TALS and RFMN for the former and the latter, TALS and LWS for the other one), while the other variants are restricted to one phenotype so far (S1 Table).

Attempts at correlating RNU4ATAC genotype with disease phenotype are hampered by the small number of patients, especially in the cases of RFMN and LWS. It is nevertheless striking to notice that 25 out of the 31 TALS families carry homozygous or compound heterozygous mutations in the 5’ Stem-Loop (20 and 5 families respectively), while another 5 families with compound heterozygous mutations carry one mutation in this region and the other one in either the Sm protein-binding site (3 families) or the 3’ Stem-Loop (1 family); in one family, the second mutation is a large duplication. The only TALS family without mutation in the 5’ Stem-Loop carries a mutation just one nucleotide 3’ to Stem I and the other in the 3’ Stem-Loop (S1 Table). Early lethality appears to be associated with n.51G>A, although not systematically (one n.51G>A;n.55G>A child was still alive at 6 years-old), and to lesser extents to n.55G>A, n.50G>C and n.50G>A. Concerning RFMN families, it is also striking that all 11 of them carry one mutation in Stem II, a region which has never been found mutated in TALS patients, as noted in the princeps paper [5]. Two among them are homozygous for a mutation in Stem II, n.16G>A, while the others carry either a second mutation in the 5’ Stem-Loop (5 families) or in the Sm protein-binding site (4 families) (S1 Table). No clear pattern is seen for the 4 LWS families, which present with diverse combinations of RNU4ATAC mutation location (S1 Table).

RNU4ATAC variants identified in large-scale sequencing projects

To gain insight into the extent of RNU4ATAC genetic variability, we took advantage of the Genome Aggregation Database (gnomAD) resource (http://gnomad.broadinstitute.org/), exploiting the data present in the v2.1.1 version derived from 15,708 whole-genome and 65,258 exome sequences [17,27]. Of note, RNU4ATAC is half as much covered by exome sequencing as most coding genes because some commercially available exome capture kits target only a fraction of all non-coding genes.

We listed 282 variants in RNU4ATAC: 239 substitutions concerning 123 of its 130 nucleotide-long genomic sequence, as well as 19 insertions, 16 duplications and 8 deletions whose size range from 1 to 107 (S2 Table). Nearly half of these variants (129/282; 46%) were found in only one or two of the ~81.000 screened individuals [allele count: 1–465, median = 3]. The most frequent variant identified, n.23C>T (never identified in patients), is present in only 0.29% of the screened alleles, suggesting a strong selective pressure against variations in this gene. This is consistent with the fact that genes encoding components of complexes such as the spliceosome, ribosome and proteasome involved in core biological processes are the most constrained ones [27]. However, when looking into sub-population allelic frequencies, 23 variants have a frequency > 0.1% in at least one sub-population (S2 Table), and three have a frequency > 1% and are found in the homozygous state, namely n.58C>T (1.88% in Africans; two homozygous individuals), n.87C>T (2.99% in Ashkenazi Jewishs; one homozygous individual), and n.93G>A (1.19% in South Asians; six homozygous individuals) (S2 Table). Despite lower allele frequencies, three more variants were found in the homozygous state: n.110delT (0.23% in South Asians; two homozygotes), n.91dupT (0.29% in Latinos; one homozygote) and n.45A>G (0.01% in South Asians; one homozygote) (S2 Table). Among the six variants found in the homozygous state, the most unlikely to be pathogenic, four are located in the 3’ Stem-Loop in a region considered of either limited or null importance for splicing (n.93G>A) or variable importance (n.87C>T, n.91dupT and n.110delT); n.58C>T lies between the 5’ Stem-Loop and Stem I, and n.45A>G is in the loop of the 5’ Stem-Loop, both regions also considered to be of limited or null importance for splicing [22].

Among the 30 variants reported as pathogenic, 27 were identified in large-scale sequencing projects and have allele frequency ranging from 0.0008% to 0.04% for the most frequent pathogenic variant, i.e. n.51G>A. Noteworthily, n.51G>A is the 20th most frequent RNU4ATAC variant among all those present in gnomAD and it has been identified in all sub-populations but one (Ashkenazi Jewish) at frequencies ranging from 0.03% to 0.06% (S2 Table).

In summary, population data showed that although the extent of variations in RNU4ATAC is extremely large, all of them classify as rare variants.

Bioinformatics predictions

RNU4ATAC is a single exon non-coding gene and the functional consequences of its variants cannot be predicted using traditional tools which focus on amino acid substitutions or splicing variants. It is however possible to predict the impact of the variants on the secondary structure of the U4atac/U6atac bimolecule using a dedicated software, RNAstructure [18], the only tool predicting the correct structure, to date. Using the bifold function of RNAstructure (http://rna.urmc.rochester.edu/RNAstructure.html) in an automated way which we set up in house (Fig 2A), we modelled the effect of the 282 variants identified in large-scale sequencing projects as well as the three pathogenic variants not present in gnomAD, namely n.8C>T, n.120T>G and n.16_100dup. To assess the structural modifications of the U4atac/U6atac bimolecule resulting from U4atac variants, we defined a scoring system that takes into account the number of base-pairing changes and the importance for splicing of the U4atac snRNA domain where they occur (Fig 2B). The distribution of the produced scores is shown in Fig 3A. Among the 30 disease-associated RNU4ATAC variants (S3 Table), 13 had a score ≥ 10 (43%) and 20 ≥ 1 (67%); among the 255 other variants, 50 had a score ≥ 10 (20%) and 138 ≥ 1 (54%). The 10 disease-associated variants with a null score are located in the Sm protein-binding region (6 variants), in Stem II (2 variants), and in the non-canonical base pair forming across the pentaloop of the 5’ Stem-Loop which allows U4atac to establish more intimate interactions with PRPF31 (2 variants). Among the 6 variants found in the homozygous state in large-scale sequencing projects, only one modifies the structure of the bimolecule according to RNAstructure, n.110delT, which has a score of 5 and resides within the central loop of the 3’ Stem-Loop, a region shown in a cellular assay to be of variable importance for splicing [22]. In summary, these findings show that using only predictions of the modification of the structure of the bimolecule made with RNAstructure suffer from a lack of sensitivity for predicting RNU4ATAC variant pathogenicity, as 10 of the 30 variants identified in patients are not found to have structural consequences.

Fig 2. Prediction of the alteration of the U4atac/U6atac bimolecule bi-dimensional structure.

Fig 2

A. Schematic diagram picturing the bioinformatic pipeline allowing to compute the scores. After generating the mutated RNU4ATAC sequences for the 285 variants, the secondary structure of all bimolecules formed by wild type U6atac and mutant U4atac was obtained with the bifold function of the RNAstructure package, compared with that of the wild type bimolecule and differences were scored. B. Schematic diagram picturing the importance of U4atac regions for splicing (adapted from [5].) that was used to calculate the score. Score = A x 3 + B x 1 + C x 0.5, where A, B and C are the numbers of modified base pairings in red, grey and blue regions respectively.

Fig 3.

Fig 3

Distribution of the scores obtained with A. the bioinformatic pipeline presented in this work and B. the Combined Annotation Dependent Depletion (CADD) prediction tools. CADD combines the results of >60 variant annotation prediction tools into one metric reflecting likelihood of deleteriousness.

We also tested the Combined Annotation Dependent Depletion (CADD) tool equipped to handle both coding and non-coding variants, contrary to most performing tools that are protein-based and focus on non-synonymous variants [19,28]. The CADD scores ranged from -0.44 to 2.52 when considering all the 285 RNU4ATAC variants identified (mean: 1.64; median: 1.96), and from 1.60 to 2.39 when considering only the 30 variants identified in patients (S3 Table and Fig 3B). Four variants found in the homozygous state in large-scale sequencing projects have scores inferior to those of the variants associated with a pathology (-0.13 to 1.42), while the other two are within the same ranges (1.66 for n.45A>G and 2.42 for n.58C>T). Overall, the CADD tool appears sensitive for predicting RNU4ATAC variant pathogenicity.

Cellular assay

To set-up a functional assay that could be useful for testing RNU4ATAC variants in the diagnostic field, we adapted a system developed to study the functional features of the U6atac and U4atac snRNAs [21] and extensively used thereafter [4,22,2932]. The original assay is based on the co-transfection in rodent cells (CHO cells) of two minigenes. The first one contains the human RNU4ATAC coding region (pUC-U4atac) while the other one contains exons 5–8 as well as introns E, F (an U12-type intron) and G of the human NOP2 gene, encoding the nucleolar protein P120 (P120 minigene plasmid). To prevent the use of endogenous rodent U4atac molecules, the splicing of the U12-type reporter intron depends in this system on the expression of exogenous U4atac snRNA, achieved through complementary mutations of the sequences of the F intron and of human U4atac. For our cellular model, we aimed to test the functionality of RNU4ATAC variants within the context of the native sequence. In order to have reduced interference of the endogenous U4atac, we used primary fibroblasts derived from a TALS patient carrying n.51G>A in the homozygous state. We transiently co-transfected these cells with the native P120 vector and pUC-U4atac constructs containing either the WT or mutated U4atac sequence. The splicing efficiency of the U12-type F intron was then quantified by quantitative RT-PCR (qRT-PCR) (Fig 4A). We thus tested 23 substitutions, among which 7 were found at the heterozygous state in large-scale sequencing projects only and 16 were identified in patients (Fig 4B), as well as the pathogenic 85-nt duplication, each introduced separately in pUC-U4atac. Among the seven variants found in large-scale sequencing projects only, CADD and RNAstructure gave concordant predictions for five and divergent predictions for two of them (S3 Table). Three of these seven variants are localised in the 5’ Stem-Loop, one is in Stem I, one is in the single-stranded region between Stem I and the 3’ Stem-Loop, and the last two are in the 3’ Stem-Loop (Fig 4B).

Fig 4.

Fig 4

A. Schematic representation of the cellular assay. The pUC13 vector containing the mutant or wild type RNU4ATAC sequence and the PCB6 plasmid containing the P120 minigene reporter are transiently co-transfected into fibroblast cells derived from a TALS patient homozygous for n.51G>A. The P120 minigene reporter consists of exons 5–8 and introns E, F (U12-type intron) and G of the human NOP2 gene (NOP2 Nucleolar protein). Sizes are given in base pairs. The relative splicing efficiency of the F intron was determined by quantitative RT-PCR. B. Localisation of the 23 substitutions tested in the cellular assay. The RNU4ATAC genomic sequence is 130 nuceotide-long while the U4atac snRNA is 131 nucleotide-long due to a post-transcriptionally added adenylic acid residue on its 3′-end.

We first validated our cellular assay by comparing the splicing efficiency of the U12-type reporter intron in control and TALS fibroblasts in the absence or presence of exogenous WT U4atac. We found, as expected, a reduced splicing efficiency in TALS compared to control fibroblasts (~30% versus ~80%). While the splicing efficiency in control cells remained unchanged upon WT RNU4ATAC transfection, it was partially restored in TALS fibroblasts, confirming that this cellular model was valuable to test the effect on splicing of RNU4ATAC variants (S1A Fig). Of note, transfection of pUC-U4atac in TALS fibroblast cells led to a 6 to 13-fold increase in the amount of U4atac, as shown by qRT-PCR (S1B Fig).

We next compared the splicing efficiency of the U12-type reporter intron following transfection of mutated and WT versions of pUC-U4atac, the splicing efficiency for the WT sequence being set at 100% (see S3 Table and Fig 5). Sixteen of the 17 disease-associated variants led to a splicing efficiency below that obtained with WT U4atac, ranging from that seen in untransfected cells, i.e. ~37% (n.16_100dup, n.37G>A and n.48G>A), to 88% (n.111G>A). The splicing efficiency for the most frequent pathogenic variant, n.51G>A, which is also associated with the most severe phenotypes, was 71%. These effects, even if they were less dramatic, were consistent with those seen with the original cellular assay for three of these variants, n.51G>A, n.55G>A and n.111G>A [4]. If we consider the diseases in which the studied variants are involved, variants found in RFMN patients were globally associated with stronger effects than those identified in TALS patients, while those identified in LWS had the lowest effects (Fig 5A). On the other hand, none of the seven variants identified only in large-scale sequencing projects impacted the splicing efficiency. When considering the location, the stronger effects were found for the large duplication and for variants residing in the 15.5K protein binding site of the 5’ Stem-Loop, in the internal single-stranded region just 3’ to Stem I, in the Sm protein-binding site just 3’ to the 3’ Stem-Loop, followed by variants found in Stem II, whose structure allows the binding of PRPF4 (Fig 5B and 5C). Variants located in the first stem of the 3’ and 5’ Stem-Loops, outside the 15.5K binding site, have more moderate effects. The only pathogenic variant associated with a full splicing efficiency is n.124G>A, localised in the Sm protein-binding site. The variants identified only in large-scale sequencing projects and without effect in our cellular assay are located in the loops of the 5’ and 3’ Stem-Loops, in Stem I, in the single-stranded region just 5’ to the 3’ Stem-Loop or in the first stem of the 5’ Stem-Loop.

Fig 5. Results of the cellular assay obtained with the 24 tested variants.

Fig 5

Variants are organised by A. associated pathologies (the same variant may appear several times when found in more than one syndrome) or B. mutated region of the U4atac/U6atac bi-molecule. Variants are further organised by relative splicing efficiency. The strength of the effect on splicing is colour-coded using a gradient from yellow showing the lowest effect (high splicing efficiency) to dark red showing the highest effect (low splicing efficiency). C. Representation of the magnitude of the impact of variants on splicing on the U4atac/U6atac bi-molecule using the same colour code. For clarity, variants’ names were simplified; those identified in the compound heterozygous state only are shown with a # (all the others have been identified in both homozygous and compound heterozygous states). Error-bars represent standard error of the mean (SEM) of at least three independent experiments. Statistically significant differences in splicing efficiency are indicated by asterisks: * P-values < 0.05; ** P-values < 0.01 and *** P-values < 0.001 (one-tailed t-test). The red horizontal line indicates 100% splicing efficiency (i.e. splicing efficiency when transfecting WT RNU4ATAC).

Because half of the families with a RNU4ATAC-associated pathology carry compound heterozygous variants, notably LWS and nearly all RFMN families, we also tested the effect of co-transfecting a combination of two RNU4ATAC minigenes carrying variants, thus allowing us to compare the splicing efficiency of variants mimicking three RFMN and seven TALS genotypes (S2 Fig). This resulted in the confirmation that p120 U12 intron splicing deficiency was uncorrelated with disease severity. In particular, the most severe presentation, n.51G>A homozygosity, produced average splicing efficiency values.

Lastly, we investigated the concordance of bioinformatic predictions and cellular assay results by plotting P120 U12 intron splicing efficiency against the scores obtained for each tested variant based on RNAstructure or with CADD. No strong correlation was observed for the former (r2 = 0.006, Fig 6A), while the correlation was better for the latter (r2 = 0.303, Fig 6B).

Fig 6.

Fig 6

Splicing efficiency of the U12-type reporter intron measured after transfection of RNU4ATAC variants as a function of A. the score obtained with bioinformatic predictions of the U4atac/U6atac bimolecule structure or B. the Combined Annotation Dependent Depletion (CADD) raw score. For clarity, variants’ names were simplified.

Discussion

Non-coding RNAs, whose large number was revealed by comprehensive transcriptomic studies in the recent years, are generally categorised by their size: long non-coding RNAs of up to hundreds of nucleotides, whose function is still poorly characterized, and short RNAs ranging from 19 to 140 nt. These short RNAs comprise the long- and well-known snRNAs, rRNAs and tRNAs, whose function in splicing for the former and in translation for the two latter is essential. Among the few non-coding genes that have presently been found responsible for Mendelian genetic diseases, two are transcribed in snRNA components of the minor spliceosome, namely RNU4ATAC and RNU12. While most snRNAs, rRNAs and tRNAs genes are present in the human genome in multiple copies, a genomic configuration that protects cells against the damaging effect of mutations, RNU4ATAC and RNU12 are both single copy-genes. RNU12 biallelic variants have been found so far in a single large consanguineous family in which members displayed autosomal recessively inherited early onset cerebellar ataxia [33], while RNU4ATAC biallelic variants have been reported in 46 families to date. Some of these were identified by diagnostic laboratories that did not have any specific expertise of this gene, a situation expected to arise more and more frequently due to the generalisation of diagnostic next-generation sequencing platforms. Yet, interpreting RNU4ATAC genetic variants is challenging as classical guidelines do not apply in their integrality.

The difficulties in variant interpretation start from the initial and crucial step of describing the identified variant for laboratories relying for nomenclature on variant-calling pipelines that use reference genome annotation. Most non-coding RNA genes are annotated by aligning genomic sequences against entries of the Rfam database (the database listing RNA families), thus numbering starts from the first nucleotide of the RNA. Yet for unknown reasons, Homo sapiens RNU4ATAC nucleotidic coordinates are inconsistent and erroneous in Rfam, Ensembl, and gnomAD websites with a +1 or +2 shifting [chr2:122,288,457–122,288,583 or chr2:122,288,458–122,288,584 instead of chr2:122,288,456–122,288,585 (Genome build GRCh37/hg19); chr2:121,530,881–121,531,007 instead of chr2:121,530,880–121,531,009 (GRCh38/hg38)]. These small errors are substantial as they induce sequence variant description mistakes. Furthermore, they generate a -1 shifting in the nomenclature of the inferred consequences of all of the RNU4ATAC variants listed in gnomAD (for example, Variant ID “2-122288506-G-A”, rs188343279, is indicated as n.50G>A when it should be n.51G>A), which can lead to confusion or error when taking into account variant frequency for clinical interpretation.

The knowledge of population genetic variation at a specific locus increases our ability to understand the locus contributions to health and disease. Allele frequency filtering, which rests on established frequency cutoffs (i.e. thresholds above which it is estimated that the allele frequency is incompatible with pathogenicity), are very helpful for appropriate genetic counselling as it allows to remove variants from consideration. The conservative cutoff value of 1% is classically recommended for rare recessive diseases whose prevalence is ill-defined and for which genetic and allelic heterogeneity as well as penetrance are mostly unknown. This filtering step appears of little use in the case of RNU4ATAC as the highest overall allelic frequency found among the 282 variants reported in gnomAD (0.29%) is well below this cutoff, not surprisingly given the high sequence and secondary structure constraints on small RNAs. When looking more closely, three variants have allele frequencies > 1% into one sub-population (n.58C>T, n.87C>T and n.93G>A in Africans, in South Asians or in Ashkenazy Jewishs respectively) and have been identified in the homozygous state in large-scale sequencing projects. They should therefore be considered unlikely to be disease-causing.

Evaluating whether the clinical phenotype of the carrier of a putative pathogenic variant is compatible with the disease associated with the gene is a major determinant of variant pathogenicity. However, it is frequent that molecular genetic diagnosis, in the era of exome and genome sequencing, uncovers previously unsuspected clinical heterogeneity. In the case of RNU4ATAC-associated diseases, the patients originally diagnosed with TALS had a highly homogeneous phenotypic presentation. They were found to carry the same homozygous mutation, n.51G>A, or mutations in its close vicinity in the 5’ Stem-Loop. However, the identification of new RNU4ATAC variants revealed more diverse associated phenotypes in terms of severity, both within the TALS spectrum [34,35] and with the discovery of new RNU4ATAC-associated syndromes, RFMN and LWS. In this context, new RNU4ATAC variants should not be dismissed because the phenotype of the carrier(s) does not strictly fit with previous descriptions. In the same line of thought, it should be noted that the delineation of the RNU4ATAC-associated syndromes will probably need to be reconsidered based on RNU4ATAC genotypes following the identification of a growing number of RNU4ATAC biallelic carriers.

We used two approaches to assess the consequences of RNU4ATAC variants, bioinformatics predictions and functional testing. Firstly, we investigated the capacity of U4atac/U6atac structural predictions, using the bifold function of the RNAstructure bioinformatics tool, to predict pathogenicity of RNU4ATAC variants. Indeed this tool, which is used in some but not in all publications reporting pathogenic biallelic RNU4ATAC variants, had not been evaluated yet regarding its utility for variant classification. We found that ten of the 30 variants identified in patients are not predicted to have structural consequences, not only the six variants located in the Sm protein-binding site, a single-stranded region, but also two variants located in Stem II and two others in the 5’ Stem-Loop. This approach, therefore, shows a lack of sensitivity reflecting the fact that the structure of the U4atac-U6atac bimolecule is not the only parameter governing the stability and the function of U4atac. In consequence, this structure prediction tool may be helpful to understand why a variant is pathogenic, but cannot be used on its own to predict pathogenicity. On the other hand, the CADD tool, a widely used integrative annotation built from more than 60 genomic features, appears quite effective at predicting pathogenic variants, as all 30 variants identified in patients had high scores. Concerning the specificity of these two bioinformatic tools, it is difficult to assess, given that there are only three variants, n.58C>T, n.87C>T and n.93G>A that can be assumed to be non pathogenic on the basis of their allelic frequency > 1% in some sub-populations and their detection in the homozygous state in large-scale sequencing studies. Concerning the other three variants found in the homozygous state, n.45A>G, n.91dupT and n.110delT, it may be unwise to exclude their deleteriousness given that only one or two homozygotes were identified respectively among the ~81.000 or ~65.000 screened individuals, and the possibility that some of the gnomAD participants do actually suffer from a disease (although it can be assumed that these variants in the homozygous state are not associated with the most severe TALS phenotype). These three probably non pathogenic variants have null scores with RNAstructure but one of them, n.45A>G, has a high CADD score, suggesting high specificity for the former and a lesser one for the latter. It is too premature at this stage to determine if the fact that nearly two thirds of the variants identified to date have a score in the same range as the pathogenic ones reflects a lack of specificity of this tool or a very high gene constraint.

Secondly, we developed a functional cellular test that assays the splicing activity of mutant U4atac snRNA on a U12-type test intron, the 99 bp-long intron F of the NOP2 gene encoding the human nucleolar protein P120, in fibroblasts from a TALS patient homozygous for the n.51G>A mutation. The NOP2 minor intron appeared a suitable test intron as we found that its splicing is systematically impaired, albeit to a small extent, in fibroblasts derived from TALS patients contrary to other less sensitive minor introns (http://lbbe-shiny.univ-lyon1.fr/TALS-RNAseq/) [9]. We found that all but one variant identified in patients were associated with lower splicing efficiency than that obtained with the transfected WT U4atac snRNA, the exception being a variant in the Sm protein-binding site, n.124G>A. Binding of the Sm protein complex has been shown to be an essential step in the snRNP assembly process and is required for stability of the snRNA. Indeed, n.124G>A leads to a large reduction in the amount of U4atac snRNA in transfected CHO cells [12], but this large reduction is possibly counteracted in our system by the important increase in U4atac levels (6–13 times) following transfection of RNU4ATAC plasmids (S1 Fig). Interestingly, we noted that the extent of minor splicing impairment of the test intron in our assay did not reflect the severity of the disease observed for the carriers of the tested variants, not surprisingly given the various degrees of retention of the hundreds of minor introns we were able to record in cells derived from TALS patients [9]. On the other hand, the seven RNU4ATAC variants never associated with a phenotype did not impact the splicing efficiency in our functional test, even those four predicted by the RNAstructure software to have a strong impact in regions important for splicing, among which three had also a high CADD score. This absence of effect could derive, again, from the fact that we test the splicing of only one U12-type intron while we showed that the level of minor intron retention of the 699 human genes containing a U12-type intron varies depending on the gene, the cell-type and the RNU4ATAC genotype of TALS patients [9]. Therefore, a negative result obtained with our cellular assay may not necessarily mean that the tested variant has no effect on minor splicing. Adding up to the previously discussed limitations of U4atac/U6atac secondary structure predictions, the poor concordance of these predictions and cellular test results could also be due to the fact that an important parameter, i.e. the stability of U4atac/U6atac di-snRNP, is not taken into account by the RNAstructure software. Stability is nevertheless a very important factor, certainly explaining the strong effect of the n.66G>C variant located just 3’ to Stem I. Indeed, the presence of this variant lengthens and therefore stabilises Stem I, which most probably impairs the U4atac ejection needed for the activation of U6atac [36].

In this work, we 1) provided all the available relevant information needed for RNU4ATAC sequence variant classification in clinical molecular diagnostic, 2) evaluated for the first time the utility in the clinical setting of a structural and of the CADD prediction tools, and 3) presented a cellular assay that functionally assesses the variants identified in patients. This functional test allows to move variants from the “probably pathogenic” to the “pathogenic” class, leading to greater confidence in patient reporting and clinical management. Taken together, these data and tools will benefit to diagnostic laboratories in order to insure genetic counselling consistency.

Supporting information

S1 Fig. Cellular assay validation.

A. Splicing efficiency of the U12-type reporter intron in control or TALS fibroblasts non transfected (-) or transfected (+) with the P120 and WT pUC-U4atac plasmids, as indicated. Error-bars represent standard error of the mean (SEM) of at least three independent experiments. Statistically significant differences in splicing efficiency are indicated by an asterisk (P-values < 0.05). PCR products amplified from unspliced and spliced transcripts are shown. B. Relative amount of U4atac snRNA in TALS fibroblasts non transfected (-) or transfected (+) with the P120 and WT pUC-U4atac plasmids, as indicated.

(TIF)

S2 Fig. Results of the cellular assay obtained after transfection of a single variant or cotransfection of two variants mimicking respectively the homozygous or compound heterozygous genotypes of RFMN and TALS patients.

Error-bars represent standard error of the mean (SEM) of at least three independent experiments. Statistically significant differences in splicing efficiency are indicated by asterisks: * P-values < 0.05; ** P-values < 0.01 and *** P-values < 0.001 (one-tailed t-test). The red horizontal line indicates 100% splicing efficiency.

(TIF)

S1 Table. Census of all patients with biallelic RNU4ATAC pathogenic variants published in the literature (May 2020).

(XLSX)

S2 Table. Census of all RNU4ATAC variants present in the gnomAD resource (v2.1, January 2020).

(XLSX)

S3 Table. Scores computed based on an RNA structure prediction tool and CADD scores for the variants identified in patients and/or in large-scale sequencing projects; results from the cellular assay for the 24 RNU4ATAC variants tested.

(XLSX)

Acknowledgments

We thank the CBC Biotec biobank for biosample management (Emilie Chopin, Isabelle Rouvet), and the GENDEV team members for stimulating discussions.

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

This work was supported by CNRS, Inserm, Université Paris 7 and Université Lyon 1 through recurrent funding, the ANR Aster (no. ANR-16-CE23-0001) and U4ATAC-BRAIN (no. ANR-18CE12-0007-01) grants and an Inserm/Hospices Civils de Lyon grant to P.E. (Contrat d’Interface pour Hospitaliers). A.C. was supported by a grant from Inria (Thése Inria-Inserm “Médecine Numérique” - 2016) and C.B.P. by a grant from the Fondation pour la Recherche Médicale to P.E. (Financement d’un ingénieur - ING20160435660). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17: 405–424. 10.1038/gim.2015.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.de Almeida RA, Fraczek MG, Parker S, Delneri D, O’Keefe RT. Non-coding RNAs and disease: the classical ncRNAs make a comeback. Biochem Soc Trans. 2016;44: 1073–1078. 10.1042/BST20160089 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Edery P, Marcaillou C, Sahbatou M, Labalme A, Chastang J, Touraine R, et al. Association of TALS developmental disorder with defect in minor splicing component U4atac snRNA. Science. 2011;332: 240–243. 10.1126/science.1202205 [DOI] [PubMed] [Google Scholar]
  • 4.He H, Liyanarachchi S, Akagi K, Nagy R, Li J, Dietrich RC, et al. Mutations in U4atac snRNA, a component of the minor spliceosome, in the developmental disorder MOPD I. Science. 2011;332: 238–240. 10.1126/science.1200587 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Merico D, Roifman M, Braunschweig U, Yuen RKC, Alexandrova R, Bates A, et al. Compound heterozygous mutations in the noncoding RNU4ATAC cause Roifman Syndrome by disrupting minor intron splicing. Nat Commun. 2015;6: 8718 10.1038/ncomms9718 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Farach LS, Little ME, Duker AL, Logan CV, Jackson A, Hecht JT, et al. The expanding phenotype of RNU4ATAC pathogenic variants to Lowry Wood syndrome. Am J Med Genet A. 2018;176: 465–469. 10.1002/ajmg.a.38581 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Shelihan I, Ehresmann S, Magnani C, Forzano F, Baldo C, Brunetti-Pierri N, et al. Lowry-Wood syndrome: further evidence of association with RNU4ATAC, and correlation between genotype and phenotype. Hum Genet. 2018;137: 905–909. 10.1007/s00439-018-1950-8 [DOI] [PubMed] [Google Scholar]
  • 8.Alioto TS. U12DB: a database of orthologous U12-type spliceosomal introns. Nucleic Acids Res. 2007;35: D110–5. 10.1093/nar/gkl796 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cologne A, Benoit-Pilven C, Besson A, Putoux A, Campan-Fournier A, Bober MB, et al. New insights into minor splicing-a transcriptomic analysis of cells derived from TALS patients. RNA. 2019;25: 1130–1149. 10.1261/rna.071423.119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Olthof AM, Hyatt KC, Kanadia RN. Minor intron splicing revisited: identification of new minor intron-containing genes and tissue-dependent retention and alternative splicing of minor introns. BMC Genomics. 2019;20: 686 10.1186/s12864-019-6046-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Turunen JJ, Niemelä EH, Verma B, Frilander MJ. The significant other: splicing by the minor spliceosome. Wiley Interdiscip Rev RNA. 2013;4: 61–76. 10.1002/wrna.1141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Jafarifar F, Dietrich RC, Hiznay JM, Padgett RA. Biochemical defects in minor spliceosome function in the developmental disorder MOPD I. RNA. 2014;20: 1078–1089. 10.1261/rna.045187.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Dinur Schejter Y, Ovadia A, Alexandrova R, Thiruvahindrapuram B, Pereira SL, Manson DE, et al. A homozygous mutation in the stem II domain of causes typical Roifman syndrome. NPJ Genom Med. 2017;2: 23 10.1038/s41525-017-0024-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Heremans J, Garcia-Perez JE, Turro E, Schlenner SM, Casteels I, Collin R, et al. Abnormal differentiation of B cells and megakaryocytes in patients with Roifman syndrome. J Allergy Clin Immunol. 2018;142: 630–646. 10.1016/j.jaci.2017.11.061 [DOI] [PubMed] [Google Scholar]
  • 15.Lionel AC, Costain G, Monfared N, Walker S, Reuter MS, Hosseini SM, et al. Improved diagnostic yield compared with targeted gene sequencing panels suggests a role for whole-genome sequencing as a first-tier genetic test. Genet Med. 2018;20: 435–443. 10.1038/gim.2017.119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shaheen R, Maddirevula S, Ewida N, Alsahli S, Abdel-Salam GMH, Zaki MS, et al. Genomic and phenotypic delineation of congenital microcephaly. Genet Med. 2019;21: 545–552. 10.1038/s41436-018-0140-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. 10.1038/s41586-020-2308-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Reuter JS, Mathews DH. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics. 2010;11: 129 10.1186/1471-2105-11-129 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46: 310–315. 10.1038/ng.2892 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hall SL, Padgett RA. Requirement of U12 snRNA for in vivo splicing of a minor class of eukaryotic nuclear pre-mRNA introns. Science. 1996;271: 1716–1718. 10.1126/science.271.5256.1716 [DOI] [PubMed] [Google Scholar]
  • 21.Shukla GC, Padgett RA. Conservation of functional features of U6atac and U12 snRNAs between vertebrates and higher plants. RNA. 1999;5: 525–538. 10.1017/s1355838299982213 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Shukla GC, Cole AJ, Dietrich RC, Padgett RA. Domains of human U4atac snRNA required for U12-dependent splicing in vivo. Nucleic Acids Res. 2002;30: 4650–4657. 10.1093/nar/gkf609 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Nottrott S, Hartmuth K, Fabrizio P, Urlaub H, Vidovic I, Ficner R, et al. Functional interaction of a novel 15.5kD [U4/U6.U5] tri-snRNP protein with the 5’ stem-loop of U4 snRNA. EMBO J. 1999;18: 6119–6133. 10.1093/emboj/18.21.6119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Nottrott S, Urlaub H, Lührmann R. Hierarchical, clustered protein interactions with U4/U6 snRNA: a biochemical role for U4/U6 proteins. EMBO J. 2002;21: 5527–5538. 10.1093/emboj/cdf544 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Liu S, Ghalei H, Lührmann R, Wahl MC. Structural basis for the dual U4 and U4atac snRNA-binding specificity of spliceosomal protein hPrp31. RNA. 2011;17: 1655–1663. 10.1261/rna.2690611 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Gruss OJ, Meduri R, Schilling M, Fischer U. UsnRNP biogenesis: mechanisms and regulation. Chromosoma. 2017;126: 577–593. 10.1007/s00412-017-0637-6 [DOI] [PubMed] [Google Scholar]
  • 27.Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536: 285–291. 10.1038/nature19057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Research. 2019. pp. D886–D894. 10.1093/nar/gky1016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Shukla GC, Padgett RA. The intramolecular stem-loop structure of U6 snRNA can functionally replace the U6atac snRNA stem-loop. RNA. 2001;7: 94–105. 10.1017/s1355838201000218 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shukla GC, Padgett RA. A catalytically active group II intron domain 5 can function in the U12-dependent spliceosome. Mol Cell. 2002;9: 1145–1150. 10.1016/s1097-2765(02)00505-1 [DOI] [PubMed] [Google Scholar]
  • 31.Shukla GC, Padgett RA. U4 small nuclear RNA can function in both the major and minor spliceosomes. Proc Natl Acad Sci U S A. 2004;101: 93–98. 10.1073/pnas.0304919101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Dietrich RC, Padgett RA, Shukla GC. The conserved 3’ end domain of U6atac snRNA can direct U6 snRNA to the minor spliceosome. RNA. 2009;15: 1198–1207. 10.1261/rna.1505709 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Elsaid MF, Chalhoub N, Ben-Omran T, Kumar P, Kamel H, Ibrahim K, et al. Mutation in noncoding RNA RNU12 causes early onset cerebellar ataxia. Ann Neurol. 2017;81: 68–78. 10.1002/ana.24826 [DOI] [PubMed] [Google Scholar]
  • 34.Krøigård AB, Jackson AP, Bicknell LS, Baple E, Brusgaard K, Hansen LK, et al. Two novel mutations in RNU4ATAC in two siblings with an atypical mild phenotype of microcephalic osteodysplastic primordial dwarfism type 1. Clin Dysmorphol. 2016;25: 68–72. 10.1097/MCD.0000000000000110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Abdel-Salam GMH, Emam BA, Khalil YM, Abdel-Hamid MS. Long-term survival in microcephalic osteodysplastic primordial dwarfism type I: Evaluation of an 18-year-old male with g.55G>A homozygous mutation in RNU4ATAC. Am J Med Genet A. 2016;170A: 277–282. 10.1002/ajmg.a.37409 [DOI] [PubMed] [Google Scholar]
  • 36.Didychuk AL, Butcher SE, Brow DA. The life of U6 small nuclear RNA, from cradle to grave. RNA. 2018;24: 437–460. 10.1261/rna.065136.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bogaert DJ, Dullaers M, Kuehn HS, Leroy BP, Niemela JE, De Wilde H, et al. Early-onset primary antibody deficiency resembling common variable immunodeficiency challenges the diagnosis of Wiedeman-Steiner and Roifman syndromes. Sci Rep. 2017;7: 3702 10.1038/s41598-017-02434-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Hallermayr A, Graf J, Koehler U, Laner A, Schönfeld B, Benet-Pagès A, et al. Extending the critical regions for mutations in the non-coding gene in another patient with Roifman Syndrome. Clin Case Rep. 2018;6: 2224–2228. 10.1002/ccr3.1830 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wang Y, Wu X, Du L, Zheng J, Deng S, Bi X, et al. Identification of compound heterozygous variants in the noncoding RNU4ATAC gene in a Chinese family with two successive foetuses with severe microcephaly. Hum Genomics. 2018;12: 3 10.1186/s40246-018-0135-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Putoux A, Alqahtani A, Pinson L, Paulussen ADC, Michel J, Besson A, et al. Refining the phenotypical and mutational spectrum of Taybi-Linder syndrome. Clin Genet. 2016;90: 550–555. 10.1111/cge.12781 [DOI] [PubMed] [Google Scholar]
  • 41.Kilic E, Yigit G, Utine GE, Wollnik B, Mihci E, Nur BG, et al. A novel mutation in RNU4ATAC in a patient with microcephalic osteodysplastic primordial dwarfism type I. Am J Med Genet A. 2015;167A: 919–921. 10.1002/ajmg.a.36955 [DOI] [PubMed] [Google Scholar]
  • 42.Abdel-Salam GMH, Abdel-Hamid MS, Hassan NA, Issa MY, Effat L, Ismail S, et al. Further delineation of the clinical spectrum in RNU4ATAC related microcephalic osteodysplastic primordial dwarfism type I. Am J Med Genet A. 2013;161A: 1875–1881. 10.1002/ajmg.a.36009 [DOI] [PubMed] [Google Scholar]
  • 43.Ferrell S, Johnson A, Pearson W. Microcephalic osteodysplastic primordial dwarfism type 1. BMJ Case Rep. 2016;2016. 10.1136/bcr-2016-215502 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Abdel-Salam GMH, Miyake N, Eid MM, Abdel-Hamid MS, Hassan NA, Eid OM, et al. A homozygous mutation in RNU4ATAC as a cause of microcephalic osteodysplastic primordial dwarfism type I (MOPD I) with associated pigmentary disorder. Am J Med Genet A. 2011;155A: 2885–2896. 10.1002/ajmg.a.34299 [DOI] [PubMed] [Google Scholar]
  • 45.Abdel-Salam GMH, Abdel-Hamid MS, Issa M, Magdy A, El-Kotoury A, Amr K. Expanding the phenotypic and mutational spectrum in microcephalic osteodysplastic primordial dwarfism type I. American Journal of Medical Genetics Part A. 2012. pp. 1455–1461. 10.1002/ajmg.a.35356 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Klaus Brusgaard

21 Apr 2020

PONE-D-20-06355

Clinical interpretation of variants identified in RNU4ATAC, a non-coding spliceosomal gene

PLOS ONE

Dear Dr. Sylvie Mazoyer,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The manuscript is interesting and can be of interest to the scientific community, but serious attention to figures and readability of the manuscript is a necessity. 

Please carefully address the concerns of the reviewers especially pay attention to eventual endogeneous backgground expression of RNU4ATAC.  

We would appreciate receiving your revised manuscript by Jun 05 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Klaus Brusgaard

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for including the following ethics statement on the submission details page:

' Authorisation for their collection and their use in research has been granted by the Ministry of Research, by the Comité de protection des Personnes Sud-Est IV and the Regional Agency for Hospital Services under the number DC-2015-2566. The project has been approved by the local ethics committee of the Hospices Civils de Lyon.'

Please also include this information in the ethics statement in the Methods section of your manuscript.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript:

'This work was supported by CNRS, Inserm, Université Paris7 and Université Lyon 1 through recurrent funding, the ANR Aster (no. ANR-16-CE23-0001) and U4ATAC-BRAIN (no. ANR-18CE12-0007-01) grants and an Inserm/Hospices Civils de Lyon grant to P.E. (Contrat d'Interface pour Hospitaliers). A.C. was supported by a grant from Inria (Thése Inria-Inserm “Médecine Numérique” - 2016) and C.B.P. by a grant from the Fondation pour la Recherche Médicale to P.E. (Financement d’un ingénieur - ING20160435660).'

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

'The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.'

  1. Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution.

  2. State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

  3. If any authors received a salary from any of your funders, please state which authors and which funders.

  4. If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

  5. Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. Please amend your list of authors on the manuscript to ensure that each author is linked to an affiliation. Authors’ affiliations should reflect the institution where the work was done (if authors moved subsequently, you can also list the new affiliation stating “current affiliation:….” as necessary).

5. Your ethics statement must appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please also ensure that your ethics statement is included in your manuscript, as the ethics section of your online submission will not be published alongside your manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper looks at a very rare, deadly disease, that is caused by variants in a non-coding spliceosome gene. Variant interpretation is difficult, because existing methods of variant interpretation have focused on coding genes. This paper describes surveying known variants from patients, evaluation of allele frequencies from gnomAD, and splice efficiency assays to evaluate a subset of the variants. The results and approach should be of interested to other non-coding disease gene scenerios.

Figure 1 is impressive -- the vast majority of all variants are in known domains.

Figure 2B -- The authors have a link between "Mutate sequence" and RNU4ATAC WT sequence" -- presumably the flow of data is from "RNU4ATAC WT sequence" is to the "Mutate sequences" step. If that link is not bidirectional, it should have an arrow. The same applies to the other links -- since you are using arrows for the other links.

Figure 4 fonts are difficult to read. Maybe do not use "bold."

Figure 5A/B labels are impossible to read. Figure 5B caption says "Variants are further

organised by relative splicing efficiency, the stronger the effect on splicing is, the more intense the colour." Given that I can't read the labels -- I don't know if red is more intense, or yellow is more intense. How do you define an intense color? The range of effect appears to be <0.4% (red), and >= 1% (yellow). Presumably the yellow is greatest effect (has the highest splicing efficiency) on splicing efficiency.

Am I able to distinguish which variants in Figure 5C correspond to variants in 5A/5B? I can't tell because I can't read the labels.

What does the red horizontal line in Figure 5A/B mean? I can see from Supp Figure 2 that the red horizontal line is at 1% efficiency. Why is the threshold important? Whoever made Supp Figure 2 needs to redo figure 5A/5B.

Supp Table 3.xlsx -> Scores for all variants -> This tab contains the "CADD phredScore."

Supp Table 3.xlsx -> Scores for disease-associated -> This tab only has the "CADD rawScore." I believe readers will be interested in the "CADD phredScore" scores for the "disease-associated" variants in the event CADD scores are updated. Can you add that column to this table?

Figure 3 and Figure 6 legend should really say you used the "raw" CADD scores.

It looks like they assayed 35 cells for splicing efficiency, against 2 cells for controls (Supp Table 3.xlsx -> Results cell.assay tab) -- each cell is a different variant -- performed in triplicate according to the methods.

Reviewer #2: Review: Clinical interpretation of variants identified in RNU4ATAC, a non-coding spliceosomal gene

Summary: In the manuscript by Mazoyer et al, they describe the spectrum of variants reported in the literature and then proceed to bioinformatics and cell-based assessment of variant effects. The prioritization of noncoding variants is an understudied and important area of human genetics, so this study has the potential to add significantly to the field. The authors have modified a cell based assay and use this modified assay to validate computationally identified variants as likely pathogenic. However, there are significant concerns. The cell based assay measures splicing of RNA4ATAC. However, the sensitivity of this assay is not established here.

Major concerns:

- What is the sensitivity of the cell assay? A) The authors have focused on a 'positive' signal (i.e. impaired splicing) in cells transfected with mutant RNA4ATAC RNA. Splcing was demonstrated to be variable (line 370), but this variability was not accounted for in the calculation of splicing efficiency. Further, there was background expression of RNU4ATAC, wildtype or mutant, which could affect some variants more than others if there are any mixed heterodimers forming. Why was an RNU4ATAC null line not used for the assay? Or specific knockdown of endogenous U4atac to ensure that the majority of splicing activity was from the transfected RNA4ATAC and not endogenous activity, as there could be compensatory upregulation of the endogenous gene expression? Similarly, is the effect of heterodimers just due to the increased among of protein expressed in dual-transfected cells (line 396)? Why were those 23 variants selected? That information is necessary to interpret any results. And does this assay have any relevance for downstream function? What percentage decrease would actually harm a cell? The tolerance of the system to decreased splicing efficiency has not been determined in this assay so the disease relevance is limited.

B) False positives: The authors claim that essentially all tested variants from affected individuals perturb splicing. However a number of variants that are common and that should be not be pathogenic should also be tested. Does any variant that perturbs 'secondary structure' perturb RNA splicing? Do any loop variants affect splicing?

- Statistical analysis is needed throughout. Currently the results are largely descriptive. For example, is the difference in age at death truly different for n.51G>A (line 246)? Simple comparison of age or proportions below a certain age is necessary to support this claim. Similarly for the bioinformatic predictions, what is the enrichment for “pathogenic” vs population variants at each score threshold? The proportions must be evaluated statistically and not simply described. The CADD analysis similarly needs burden evaluation. None of Figure 5 has any statistical evaluation – which variants are truly different from reference? Were they different in all 3 replicates? This needs to be re-calculated after protein expression levels are accounted for.

- What is the scientific basis for assaying U4atac/U6atac interaction as a disease-relevant measure of variant effect? This is never substantiated and therefore has limited applicability to human disease.

- Carefully review genomic coordinates before claiming that a reference genome is incorrect. Some resources are 0-based while others are 1-based. The authors should submit firm evidence of their claims that references are truly off by 1 and it is not an expected difference in data structure. This is especially true as much of the discussion is devoted to denouncing the references (line 424-437).

- The bioinformatic approach needs to be more fully described. Why was this RNA structure tool selected? Were alternatives tried? Since the bioinformatic prioritization seems to have not had any correlations, this null result needs to be more fully explored. A description of why the bioinformatic score was selected needs to be included. How successfully and relevant is the scoring system by Merico? Were other approaches tried?

- Selective pressure described in 283 should be substantiated. How low is the frequency compared to variation across other noncoding RNAs? What is the z-score for these variants compared to other noncoding RNA bases? Selective pressure should not be determined by the raw value alone

- The population frequencies described in line 286-297 are highly misleading because the sub-populations mentioned are too small and thus the allele frequency should have a large confidence interval. Given that a large proportion of cases are compound heterozygous genotypes, this also seems unnecessary to dwell on.

-

Minor concerns:

- The text suffers from long sentences with many subordinate clauses. The clarity of the scientific message would benefit from rewriting. This is particularly true of the first and last paragraphs of the introduction, and all descriptions of the cellular assay.

- In all reported cases, can it be confirmed that no other potentially pathogenic variants were identified? More description in the methods as to the process for assessing of the validity of the reported variants is necessary before proceeding to bioinformatic or functional assessment

- It is stated that all data are available, but the VCFs for the large cohorts described in the studies should be clearly available for assessment of other potentially pathogenic variants. Are those available from other publications? If so, that information should be provided in the methods

- It seems that the preferred term is MOPD1 instead of TALS, to prioritize descriptions of pathology rather than named after the describing team

- A better summary of the unique and overlapping features of the three syndromes could be provided as a supplemental table

- Line 68 references unexplained deaths but there seemed to be neurological etiologies described in the other papers

- The third paragraph of the introduction needs far more references to support the scientific statements. It would be helpful to specify which tri-snRNPs are disrupted in MOPD1 (line 94-95)

- Line 123 should not have an apostrophe and gnomAD should not be capitalized (here and throughout)

- Line 133 totalising is not a word

- A specific link to the github mentioned in line 150 should be provided

- Detailed discussion of the domains of U4atac in 222-243 would be better for the introduction than results

- Similarly, much of the variant description in line 251-269 is highly redundant and could be summarized in a much shorter form

- Half as less is not a phrase, in line 275

- Figure 2A needs to be significantly reformatted to be linear. The zig zag structure makes it as confusing as the text

- Figure 3A would be more clear as a log-scale x-axis to minimize the white space

- Figures 4A/5C are very poor quality and cannot be read

- Figure 5 text is too small to be read

- In discussion, noncoding genes are not only transcribed long RNAs or short RNAs. Some encode micropeptides, or have other functions. Therefore the word either in line 408 is not appropriate.

- For Figure 6, are there common features of the variants with high vs low correlation? Does that show that the cellular assay has important information about some of the very low CADD scores? If those two low CADD scores are excluded, what is the new R2 value. Also that figure does not need as many decimal points displayed in the figure.

- Figure 1 there is an extra triangle pointing to the U on the right side of the 5-stem loop. What does this signify? Also the colors are too similar in Figure 1 – especially the purple and dark blue

- Abstract the phrase: "… that allows to measure the splicing efficiency of RNU4ATAC variants on a minor (U12-type) reporter intron" needs modification.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Jul 6;15(7):e0235655. doi: 10.1371/journal.pone.0235655.r002

Author response to Decision Letter 0


2 Jun 2020

Reviewer #1: This paper looks at a very rare, deadly disease, that is caused by variants in a non-coding spliceosome gene. Variant interpretation is difficult, because existing methods of variant interpretation have focused on coding genes. This paper describes surveying known variants from patients, evaluation of allele frequencies from gnomAD, and splice efficiency assays to evaluate a subset of the variants. The results and approach should be of interested to other non-coding disease gene scenerios.

Figure 1 is impressive -- the vast majority of all variants are in known domains.

Indeed!

Figure 2B -- The authors have a link between "Mutate sequence" and RNU4ATAC WT sequence" -- presumably the flow of data is from "RNU4ATAC WT sequence" is to the "Mutate sequences" step. If that link is not bidirectional, it should have an arrow. The same applies to the other links -- since you are using arrows for the other links.

We have changed the figure according to the recommendations.

Figure 4 fonts are difficult to read. Maybe do not use "bold."

We apologize for the poor quality of the figure. We improved it.

Figure 5A/B labels are impossible to read. Figure 5B caption says "Variants are further organised by relative splicing efficiency, the stronger the effect on splicing is, the more intense the colour." Given that I can't read the labels -- I don't know if red is more intense, or yellow is more intense. How do you define an intense color? The range of effect appears to be <0.4% (red), and >= 1% (yellow). Presumably the yellow is greatest effect (has the highest splicing efficiency) on splicing efficiency.

We apologize for the lack of clarity of the legend and labels in Figure 5A/B. We increased the size of the labels and rephrased the legend: “The strength of the effect on splicing is colour-coded using a gradient from yellow showing the lowest effect (high splicing efficiency) to dark red showing the highest effect (low splicing efficiency).” (lines 823-825)

Am I able to distinguish which variants in Figure 5C correspond to variants in 5A/5B? I can't tell because I can't read the labels.

We increased the size of the labels in Figure 5 to improve its readability.

What does the red horizontal line in Figure 5A/B mean? I can see from Supp Figure 2 that the red horizontal line is at 1% efficiency. Why is the threshold important? Whoever made Supp Figure 2 needs to redo figure 5A/5B.

The red horizontal line in Figure 5A/B and Supp Figure 2, at 100% efficiency, represents splicing efficiency when transfecting wild-type RNU4ATAC sequence. This information was added in the figures’ legends (lines 832-833).

Figure 5A/B and Supp Figure 2 were harmonised.

Supp Table 3.xlsx -> Scores for all variants -> This tab contains the "CADD phredScore."

Supp Table 3.xlsx -> Scores for disease-associated -> This tab only has the "CADD rawScore." I believe readers will be interested in the "CADD phredScore" scores for the "disease-associated" variants in the event CADD scores are updated. Can you add that column to this table?

We added the CADD phredScore in the 3 tabs that did not contain it, i.e. “Scores for disease-associated”, “Scores gnomAD hmz variants” and “Scores variants cell. assay”.

Figure 3 and Figure 6 legend should really say you used the "raw" CADD scores.

We changed the label “CADD score” in Figure 3 and Figure 6 to “CADD raw score” as suggested.

It looks like they assayed 35 cells for splicing efficiency, against 2 cells for controls (Supp Table 3.xlsx -> Results cell.assay tab) -- each cell is a different variant -- performed in triplicate according to the methods.

This is right, control fibroblasts were not used for our assay as it turned out that transfecting these cells with WT RNU4ATAC did not increase splicing efficiency of our test minor intron, contrary to what happened in patient fibroblasts. We thus used only fibroblasts derived from a TALS patient.

Reviewer #2: Review: Clinical interpretation of variants identified in RNU4ATAC, a non-coding spliceosomal gene

Summary: In the manuscript by Mazoyer et al, they describe the spectrum of variants reported in the literature and then proceed to bioinformatics and cell-based assessment of variant effects. The prioritization of noncoding variants is an understudied and important area of human genetics, so this study has the potential to add significantly to the field. The authors have modified a cell based assay and use this modified assay to validate computationally identified variants as likely pathogenic. However, there are significant concerns. The cell based assay measures splicing of RNA4ATAC. However, the sensitivity of this assay is not established here.

Major concerns:

- What is the sensitivity of the cell assay?

A) The authors have focused on a 'positive' signal (i.e. impaired splicing) in cells transfected with mutant RNA4ATAC RNA. Splcing was demonstrated to be variable (line 370), but this variability was not accounted for in the calculation of splicing efficiency.

Indeed, we found that the increase in the amount of U4atac following transfection of pUC-U4atac in TALS fibroblast cells was variable from one transfection to the other (6 to 13-fold increase), most probably due to the variability of transfection efficiency. We circumvent this well-known technical bias by calculating the relative splicing efficiency of a variant to that of the WT of the same experiment, as explained in Material and Methods (lines 258-263). To make it clearer, we added the information that “Every transfection experiment performed to test a batch of RNU4ATAC variants included a set of cells transfected with the WT U4atac snRNA expression plasmid.“ (lines 225-227).

Further, there was background expression of RNU4ATAC, wildtype or mutant, which could affect some variants more than others if there are any mixed heterodimers forming.

There is no background expression of wildtype RNU4ATAC, as the transfected cells are homozygous for the n.51G>A variant. The level of the endogenous mutated U4atac snRNA being much lower than that of the transfected U4atac (see Fig S1B), we do not believe that it could have an effect. Further, the possibility of bias introduced by "mixed heterodimers" is null as U4atac does not homodimerize. Heterodimers form only between U4atac and U6atac.

Why was an RNU4ATAC null line not used for the assay?

At the time of genome editing, the question of a null cell line is reasonable. Nevertheless, given the essential role of U4atac in splicing, we assumed that its complete loss of function would impede cell survival, preventing us to perform experiments. For this reason, we rather chose to use a hypomorphic mutation (n.51G>A) allowing cell survival, and took the opportunity of patient cells we had the chance to collect.

In favour of the assumption that a null cell line would not grow, we generated u4atac KO zebrafish line, and observed that mutant embryos do not survive beyond 24 hpf, exhibiting a growth arrest followed by necrosis soon after the consumption of maternally contributed spliced transcripts (unpublished results).

Or specific knockdown of endogenous U4atac to ensure that the majority of splicing activity was from the transfected RNA4ATAC and not endogenous activity, as there could be compensatory upregulation of the endogenous gene expression?

Specific knockdown of endogenous U4atac to ensure that the majority of splicing activity came from the transfected RNU4ATAC was not necessary as the endogenous activity of U4atac carrying the n.51G>A variant (the only molecule present in these cells) is low, as shown in Figure S1. Even if a compensatory upregulation of the endogenous gene expression happens, it still does not allow to splice efficiently the transfected P120 test intron. Upon transfection of WT U4atac, we see a 2 fold increase in splicing efficiency of the P120 transcript minor intron.

Similarly, is the effect of heterodimers just due to the increased among of protein expressed in dual-transfected cells (line 396)?

In order to test the joint effect of two different variants, co-transfections of half the amount of each version of U4atac snRNA expression plasmids, as compared to single transfection, was realised (lines 223-225).

Why were those 23 variants selected? That information is necessary to interpret any results.

We tested more than half of the variants identified in patients anywhere in the world (17 out of 30); they were chosen on the basis of their chronological identification and of their localisation, to make sure that all regions were represented. The seven variants not associated (as yet) with a disease (found in large-scale sequencing projects) were chosen on the basis of their predicted impact on the bimolecule according to CADD and RNAstructure, and of their localisation, once again to make sure that all the important regions were represented (lines 424-426).

And does this assay have any relevance for downstream function?

U4atac’s only known function is minor splicing. Impaired minor splicing efficiency of all transcripts containing U12 introns has been evidenced in RNA-seq analyses in cells from patients for TALS and RFMN syndromes (Merico et al 2015; Dinur Schejter et al 2017; Heremans et al 2018; Cologne et al 2019) (This information was added in lines 117-118). In particular, we found in our analysis (Cologne et al 2019) that the NOP2 minor intron splicing was systematically impaired in fibroblasts derived from 5 TALS patients, as compared to controls (lines 587-590).

What percentage decrease would actually harm a cell?

This is a very interesting question that we are tackling in the laboratory, but this is altogether another project.

The tolerance of the system to decreased splicing efficiency has not been determined in this assay so the disease relevance is limited.

Our assay doesn’t measure and doesn’t depend on cell viability and/or functioning capacity. Its goal is to measure splicing efficiency and no more. The role of minor splicing impairment in the etiology of diseases due to RNU4ATAC bi-allelic variants is certain (see the answer to the previous question).

B) False positives: The authors claim that essentially all tested variants from affected individuals perturb splicing. However a number of variants that are common and that should be not be pathogenic should also be tested.

There are no RNU4ATAC common variants (minor allele frequency > 5%) and no RNU4ATAC polymorphism (minor allele frequency > 1%). We tested 7 variants that had never been found in patients (minor allele frequency ranging from 0.01% to 0.06%) and found that six of them didn’t impact splicing efficiency of our test minor intron. The remaining one, n.31T>A, associated with a slight splicing efficiency decrease, is located at the frontier of the 15.5K protein binding-site.

Does any variant that perturbs 'secondary structure' perturb RNA splicing?

We showed in our analysis that variants predicted to perturb U4atac/U6atac secondary structure do not always perturb P120 minor intron splicing (see Fig6A). However, we cannot be sure that these variants indeed perturb the secondary structure and even if they do, their effect on splicing efficiency is likely to depend on their location in the U4atac molecule.

Do any loop variants affect splicing?

We tested three “loop variants” (n.42C>G, n.45A>C and n.99T>C), none had an effect on splicing efficiency.

- Statistical analysis is needed throughout. Currently the results are largely descriptive. For example, is the difference in age at death truly different for n.51G>A (line 246)? Simple comparison of age or proportions below a certain age is necessary to support this claim.

The difference in age at death of 17 TALS cases was noted as early as in 2012, one year after the publication of RNU4ATAC being the TALS gene. Indeed, Nagy et al. found a significant difference in survival between those carrying two copies of the 51G>A mutation (mean survival=10.4 months) vs those with zero or one copies of the 51G>A mutation (mean survival 78.75 months) (p value=0.02 using the log-rank test for differences in survival curves) (2012).

Here is what the survival curve looks like when taking into account the 22 n.51G>A homozygous vs the 16 TALS patients with other genotypes published to date.

Similarly for the bioinformatic predictions, what is the enrichment for “pathogenic” vs population variants at each score threshold? The proportions must be evaluated statistically and not simply described.

We didn’t provide statistical analyses because we think that these proportions are only indicative as we can’t consider the variants found in large-scale sequencing studies as “non pathogenic”. Indeed, the large majority of variants found at the heterozygous state in population studies could potentially lead to a disease if associated in trans with another variant or if present at the homozygous state. There are only three variants that can be assumed to be non pathogenic on the basis of their allelic frequency > 1% in some sub-populations and their detection in the homozygous state in large-scale sequencing studies (n.58C>T, n.87C>T and n.93G>A).

The CADD analysis similarly needs burden evaluation.

The answer is the same as for structural conformation predictions.

None of Figure 5 has any statistical evaluation – which variants are truly different from reference? Were they different in all 3 replicates? This needs to be re-calculated after protein expression levels are accounted for.

Differences between replicates are shown with the error bars. As explained earlier, transfection efficiency differences are taken into account.

Statistically significant differences in splicing efficiency are now indicated by asterisks (* P-values<0.05; ** P-values<0.01 and *** P-values<0.001).

- What is the scientific basis for assaying U4atac/U6atac interaction as a disease-relevant measure of variant effect? This is never substantiated and therefore has limited applicability to human disease.

We clarified in the text the importance of the U4atac/U6atac bimolecule for minor splicing by adding more information in the introduction (lines 110-115).

« The U4atac/U6atac small nuclear ribonucleoprotein particle (U4atac/U6atac di-snRNP) is composed of U4atac snRNA stably base-paired with U6atac snRNA and of seven Sm proteins and other particle-specific proteins. This di-snRNP then associates with the U5 snRNP, forming the U4atac-U6atac.U5 tri-snRNPS, a component of the pre-catalytic complex which gains its catalytic activity that will allow to excise the intron following the dissociation of U4atac and the pairing of U6atac with U12. Several RNU4ATAC mutations identified in TALS patients have been shown to result in defects in minor tri-snRNP formation [11]. Further, transcriptomic analyses of cells from RFMN and TALS patients revealed massive U12 intron retentions [5, 9, 36, 37]. »

- Carefully review genomic coordinates before claiming that a reference genome is incorrect. Some resources are 0-based while others are 1-based. The authors should submit firm evidence of their claims that references are truly off by 1 and it is not an expected difference in data structure. This is especially true as much of the discussion is devoted to denouncing the references (line 424-437).

We are well aware that some resources are 0-based while others are 1-based. However, whatever the system used, the HGVS variant nomenclature should be respected, which is not the case for RNU4ATAC variants described in gnomAD (n.51G>A is reported as n.50G>A). We can thus rightfully say that there is indeed an error in RNU4ATAC genomic coordinates in some databases. We signaled this to gnomAD and Rfam.

- The bioinformatic approach needs to be more fully described. Why was this RNA structure tool selected? Were alternatives tried?

Presently, there are only two bioinformatic tools that predict secondary structure of bimolecules of RNAs. RNAstructure is the only tool that can be used at the moment, as the other one proposed by Vienna RNA gives a U4atac/U6atac structure different from the published one. This has been added in the text (line 369).

Since the bioinformatic prioritization seems to have not had any correlations, this null result needs to be more fully explored. A description of why the bioinformatic score was selected needs to be included. How successfully and relevant is the scoring system by Merico? Were other approaches tried?

The problem with the bioinformatic prioritization based on structural predictions comes from the fact that ten of the 30 variants identified in patients are not predicted to have structural consequences according to RNAstructure, as we discuss in the manuscript (lines 559-566). A different scoring system would provide similar results. We would have liked to add, on the structural predictions, data concerning for example the nucleotides involved in protein binding but the available information is not precise enough.

- Selective pressure described in 283 should be substantiated. How low is the frequency compared to variation across other noncoding RNAs?

The cited sentence reads as follows: “The most frequent variant identified, n.23C>T (never identified in patients), is present in only 0.29% of the screened alleles, suggesting a strong selective pressure against variations in this gene”. We do not say that selective pressure is more important for RNU4ATAC than for other short noncoding genes.

What is the z-score for these variants compared to other noncoding RNA bases? Selective pressure should not be determined by the raw value alone

We believe that this type of analysis falls beyond the scope of our publication.

- The population frequencies described in line 286-297 are highly misleading because the sub-populations mentioned are too small and thus the allele frequency should have a large confidence interval. Given that a large proportion of cases are compound heterozygous genotypes, this also seems unnecessary to dwell on.

We mainly discuss, in the cited paragraph, variants which have been found in the homozygous state, as they are the most unlikely to be pathogenic.

Half of the 46 TALS, RFMN or LWS families are due to homozygous RNU4ATAC variants (if we consider TALS families only, that’s 20 out of 31).

Minor concerns:

- The text suffers from long sentences with many subordinate clauses. The clarity of the scientific message would benefit from rewriting. This is particularly true of the first and last paragraphs of the introduction, and all descriptions of the cellular assay.

Long sentences were shortened or rephrased in the first and last paragraphs of the introduction and in the result section concerning the cellular assay.

- In all reported cases, can it be confirmed that no other potentially pathogenic variants were identified? More description in the methods as to the process for assessing of the validity of the reported variants is necessary before proceeding to bioinformatic or functional assessment

Whole genome or exome sequencing has been seldom performed, so it is not known whether other potentially pathogenic variants could be identified in some of the reported cases. However, the homogeneity of the clinical signs of the carriers of bi-allelic RNU4ATAC variants and the rarity of these variants in all screened populations make none of the RNU4ATAC publications dubious.

- It is stated that all data are available, but the VCFs for the large cohorts described in the studies should be clearly available for assessment of other potentially pathogenic variants. Are those available from other publications? If so, that information should be provided in the methods

This information is available in the two original publications (Lionel et al, 2018 ; Shaheen et al, 2019).

- It seems that the preferred term is MOPD1 instead of TALS, to prioritize descriptions of pathology rather than named after the describing team

The two terms are widely used; in the two papers published in Science in 2011, one used TALS and the other MOPD I. For our part, we’ve been using TALS in our publications since the beginning.

- A better summary of the unique and overlapping features of the three syndromes could be provided as a supplemental table

Such a summary can be found in the publication by Farach et al (2017).

- Line 68 references unexplained deaths but there seemed to be neurological etiologies described in the other papers

Death in most cases follows a benign infection, but there is no explanation so far as to why these TALS children die so suddenly.

- The third paragraph of the introduction needs far more references to support the scientific statements. It would be helpful to specify which tri-snRNPs are disrupted in MOPD1 (line 94-95)

The reference of a review, which provides all the relevant information, was added.

- Line 123 should not have an apostrophe and gnomAD should not be capitalized (here and throughout)

The apostroph has been removed and the two occurrences of GnomAD corrected.

- Line 133 totalising is not a word

We wrote instead “leading to a total number of”

- A specific link to the github mentioned in line 150 should be provided

The github link (https://github.com/cbenoitp/RNU4atac_variants) has been added to the methods.

- Detailed discussion of the domains of U4atac in 222-243 would be better for the introduction than results

We think that the paragraph in question (now lines 271-293) better fits into its present location than in the introduction because the first section of the results consists of an inventory of all RNU4ATAC variants associated with a disease. It is illustrated with a detailed figure. We also discuss in this paragraph variants’ localisation, and summarize the genotypes of patients.

- Similarly, much of the variant description in line 251-269 is highly redundant and could be summarized in a much shorter form

While in the first part of the section, we discuss all RNU4ATAC variants associated with a disease, in the second one, we discuss them according to whether they were found in TALS, RFMN and/or LWS syndromes. We reckon that some parts may appear redundant but we believe that discussing genotypes/phenotypes correlations is expected and important.

- Half as less is not a phrase, in line 275

We changed “half as less” by “half as much”.

- Figure 2A needs to be significantly reformatted to be linear. The zig zag structure makes it as confusing as the text

We have changed the figure according to the recommendations.

- Figure 3A would be more clear as a log-scale x-axis to minimize the white space

We have changed the figure according to the recommendations.

- Figures 4A/5C are very poor quality and cannot be read

We apologize for the poor quality of the figures displayed in the pdf version of our submitted manuscript. The submitted figures were of much better quality. We further increased the quality and readability of Figures 4A and 5C.

- Figure 5 text is too small to be read

The size of the text in Figure 5 was increased to improve readability.

- In discussion, noncoding genes are not only transcribed long RNAs or short RNAs. Some encode micropeptides, or have other functions. Therefore the word either in line 408 is not appropriate.

We agree that noncoding genes are not restricted to those transcribed in long and short noncoding RNAs, as there is the pseudogene category as well. We have replaced “noncoding genes” by “noncoding RNAs” in the discussion.

- For Figure 6, are there common features of the variants with high vs low correlation? Does that show that the cellular assay has important information about some of the very low CADD scores? If those two low CADD scores are excluded, what is the new R2 value.

The two variants with low CADD scores, n.42C>G and n.99T>C, are associated with a high splicing efficiency, so there is a very good correlation for them between CADD raw score and splicing efficiency. Those variants for which the correlation between these two parameters is low (n.45A>C, n.65C>T, n.72A>G, n.83G>A and n.124G>A) have all of them a high score and a high splicing efficiency. We discuss in lines 503-508 the likely reason for the high splicing efficiency of n.124G>A. Concerning the other variants, all found in population studies at the heterozygous state, it’s not possible to know if our assay is not sensitive enough to detect their effect or if the CADD predictions in these cases are unreliable.

Also that figure does not need as many decimal points displayed in the figure.

For the r2 values, the number of decimals displayed on Figure 6 were reduced.

- Figure 1 there is an extra triangle pointing to the U on the right side of the 5-stem loop. What does this signify?

The triangle on the right side of the 5’ stem loop corresponds to the end of the arrow pointing to the 15.5K protein. We improved the quality of the figure to avoid any confusion.

Also the colors are too similar in Figure 1 – especially the purple and dark blue

We changed the color of the dark purple triangle to a lighter color to make it more distinguishable from the dark blue triangles.

- Abstract the phrase: "… that allows to measure the splicing efficiency of RNU4ATAC variants on a minor (U12-type) reporter intron" needs modification.

We changed the phrase to “… that allows to measure the effect of RNU4ATAC variants on splicing efficiency of a minor (U12-type) reporter intron.”

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Klaus Brusgaard

22 Jun 2020

Clinical interpretation of variants identified in RNU4ATAC, a non-coding spliceosomal gene

PONE-D-20-06355R1

Dear Dr. Sylvie Mazoyer,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Klaus Brusgaard

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: The authors have addressed all of my concerns.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Acceptance letter

Klaus Brusgaard

25 Jun 2020

PONE-D-20-06355R1

Clinical interpretation of variants identified in RNU4ATAC, a non-coding spliceosomal gene

Dear Dr. Mazoyer:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Klaus Brusgaard

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Cellular assay validation.

    A. Splicing efficiency of the U12-type reporter intron in control or TALS fibroblasts non transfected (-) or transfected (+) with the P120 and WT pUC-U4atac plasmids, as indicated. Error-bars represent standard error of the mean (SEM) of at least three independent experiments. Statistically significant differences in splicing efficiency are indicated by an asterisk (P-values < 0.05). PCR products amplified from unspliced and spliced transcripts are shown. B. Relative amount of U4atac snRNA in TALS fibroblasts non transfected (-) or transfected (+) with the P120 and WT pUC-U4atac plasmids, as indicated.

    (TIF)

    S2 Fig. Results of the cellular assay obtained after transfection of a single variant or cotransfection of two variants mimicking respectively the homozygous or compound heterozygous genotypes of RFMN and TALS patients.

    Error-bars represent standard error of the mean (SEM) of at least three independent experiments. Statistically significant differences in splicing efficiency are indicated by asterisks: * P-values < 0.05; ** P-values < 0.01 and *** P-values < 0.001 (one-tailed t-test). The red horizontal line indicates 100% splicing efficiency.

    (TIF)

    S1 Table. Census of all patients with biallelic RNU4ATAC pathogenic variants published in the literature (May 2020).

    (XLSX)

    S2 Table. Census of all RNU4ATAC variants present in the gnomAD resource (v2.1, January 2020).

    (XLSX)

    S3 Table. Scores computed based on an RNA structure prediction tool and CADD scores for the variants identified in patients and/or in large-scale sequencing projects; results from the cellular assay for the 24 RNU4ATAC variants tested.

    (XLSX)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES