Abstract
Spinal muscular atrophy (SMA) is a genetic disorder that causes progressive degeneration of lower motor neurons and the subsequent loss of muscle function throughout the body. It is the second most common recessive disorder in individuals of European descent and is present in all populations. Accurate tools exist for diagnosing SMA from genome sequencing data. However, there are no publicly available tools for GRCh38-aligned data from panel or exome sequencing assays which continue to be used as first line tests for neuromuscular disorders. This deficiency creates a critical gap in our ability to diagnose SMA in large existing rare disease cohorts, as well as newly sequenced exome and panel datasets. We therefore developed and extensively validated a new tool - SMA Finder - that can diagnose SMA not only in genome, but also exome and panel sequencing samples aligned to GRCh37, GRCh38, or T2T-CHM13. It works by evaluating aligned reads that overlap the c.840 position of SMN1 and SMN2 in order to detect the most common molecular causes of SMA. We applied SMA Finder to 16,626 exomes and 3,911 genomes from heterogeneous rare disease cohorts sequenced at the Broad Institute Center for Mendelian Genomics as well as 1,157 exomes and 8,762 panel sequencing samples from Tartu University Hospital. SMA Finder correctly identified all 16 known SMA cases and reported nine novel diagnoses which have since been confirmed by clinical testing, with another four novel diagnoses undergoing validation. Notably, out of the 29 total SMA positive cases, 23 had an initial clinical diagnosis of muscular dystrophy, congenital myasthenic syndrome, or myopathy. This underscored the frequency with which SMA can be misdiagnosed as other neuromuscular disorders and confirmed the utility of using SMA Finder to reanalyze phenotypically diverse neuromuscular disease cohorts. Finally, we evaluated SMA Finder on 198,868 individuals that had both exome and genome sequencing data within the UK Biobank (UKBB) and found that SMA Finder’s overall false positive rate was less than 1 / 200,000 exome samples, and its positive predictive value (PPV) was 97%. We also observed 100% concordance between UKBB exome and genome calls. This analysis showed that, even though it is located within a segmental duplication, the most common causal variant for SMA can be detected with comparable accuracy to monogenic disease variants in non-repetitive regions. Additionally, the high PPV demonstrated by SMA Finder, the existence of treatment options for SMA in which early diagnosis is imperative for therapeutic benefit, as well as widespread availability of clinical confirmatory testing for SMA, warrants the addition of SMN1 to the ACMG list of genes with reportable secondary findings after genome and exome sequencing.
Introduction
Spinal muscular atrophy (SMA) is a rare genetic condition characterized by progressive loss of muscle function due to the death of lower motor neurons. It is one of the most common recessive disorders, particularly in individuals of European descent where it affects between 1 in 6,000–12,000,1,2 while in other populations it occurs at lower but appreciable rates.3 In 2016, the US Food and Drug Administration (FDA) approved nusinersen, the first drug to slow disease progression, followed by the approval of onasemnogene abeparvovec-xioi in 2019, and risdiplam in 2020.4 The American Health Resources & Services Administration (HRSA) added SMA testing to the recommended uniform newborn screening panel (RUSP) in 2018, and it is now included in all 50 states. An increasing number of countries have also included SMA within their national screening programs. However, affected individuals may remain undiagnosed if they were born before testing became available in their region, if their parents declined newborn screening, or due to false-negative test reports.
The molecular etiology of SMA is linked to disruption of the SMN locus,5 most commonly due to deletions or gene conversions resulting in SMN1 deficiency. The SMN locus consists of two nearly identical paralogs, SMN1 and SMN2. The ‘C’ nucleotide at the c.840 position of SMN1 facilitates proper splicing, while the ‘T’ at that position in SMN2 causes exon 7 skipping in 85% of SMN2 transcripts (Figure 1A). As a result, SMN2 alone does not produce a sufficient amount of functional SMN protein, causing neurons that have zero functional copies of SMN1 to die prematurely. In the general population, individuals typically inherit two intact copies of each paralog, though genomes with between zero and five copies of each have been observed.3 Since SMN2 can still produce a small amount of SMN protein, disease severity and age of onset in affected individuals is influenced by the copy number of SMN2.
The near-perfect sequence homology between SMN1 and SMN2 causes read alignment algorithms like BWA6 to either mismap or ambiguously map reads at this locus, often resulting in reads having a mapping quality of 0. This, in turn, confounds standard variant calling pipelines such as GATK7, and prevents them from being directly useful for SMA diagnosis. To address these issues, multiple specialized tools have been developed to diagnose SMA from genome and long read sequencing data, including SMNCopyNumberCaller3 and Paraphase.8 Additionally, the MYO-SEQ study9 published an SMA calling pipeline for exome data on GRCh37 and used it to identify 5 novel SMA diagnoses in a cohort of 1001 cases with limb-girdle weakness. Most recently, the Chameleolyser tool was described in a study involving GRCh37-aligned exome samples from a cohort of 17,650 undiagnosed patients.10 There, Chameleolyser successfully identified 15 novel SMA cases. However, to our knowledge, no publicly available tool exists for detecting SMA in exome or panel sequencing samples on GRCh38 or the new telomere-to-telomere (T2T-CHM13)11 reference. We therefore created SMA Finder for use with exome, genome, or panel sequencing data aligned to GRCh37, GRCh38, or T2T-CHM13, and applied it to diagnose unsolved cases within large rare disease cohorts from the Broad Institute Center for Mendelian Genomics (CMG) and Tartu University Hospital.
Materials and Methods
SMA Finder algorithm
SMA Finder works by computing two numbers from the read data, r and N, where N is the total number of aligned reads that overlap the c.840 position of SMN1 and SMN2, and r is the subset of these reads that have a ‘C’ at the c.840 position. When there is sufficient read coverage (N ≥ 14), SMA Finder interprets the lack of reads with a ‘C’ at c.840 as evidence that the individual has zero functional copies of SMN1 (Figure 1B, Supplementary Methods), and so reports a positive call. Figure 3 shows how r and N are used to distinguish between true positive and true negative samples.
Cohort summary
The Broad Institute Center for Mendelian Genomics (CMG) cohort consisted of 20,537 individuals from 10,754 families with a wide spectrum of medical conditions. Families sequenced through the CMG were enrolled at the Broad Institute by the Rare Genomes Project or enrolled in research studies with local regulatory approval through the participating collaborators, including for sharing de-identified samples for sequencing and analysis. This project was approved by the Mass General Brigham IRB (protocols #2016P001422 and #2013P001477). In this cohort, 12,045 (59%) individuals were affected with a variety of suspected monogenic disorders, while 8,401 (41%) were unaffected parents or relatives (Figure 2A). We performed ancestry inference for 20,205 individuals by computing principal components for high-quality bi-allelic autosomal SNVs using the gnomAD v2 method.12 We found that 13,240 (66%) individuals were European and the remaining 34% came from various other populations (Figure 2B). Among the 12,045 affected individuals, 10,125 (84%) had phenotype descriptions encoded in Human Phenotype Ontology (HPO) terms.13 Of these, 5,138 (51%) had at least one HPO term in the “Abnormality of the nervous system” category, and 4,035 (40%) had at least one HPO term in the “Abnormality of the musculoskeletal system” category (Figure 2C).
Genome sequencing used PCR-free library preparation and Illumina HiSeq X Ten v2 chemistry to generate 150bp paired-end reads. The mean target coverage was >30x. Exome sequencing used Illumina Nextera or Twist exome capture (~38 Mb target) and similarly generated 150bp paired-end reads. It aimed to cover >80% of targets at 20x and a mean target coverage of >60x. Both genome and exome data were processed using GATK best practices, starting with BWA alignment to GRCh38 and GATK base quality score recalibration (BQSR). We used Hail Batch14 to run SMA Finder on all samples in parallel. The average number of reads overlapping the c.840 position of SMN1 + SMN2 was 77.9 across CMG genomes and 110.7 across CMG exomes. These numbers reflect the higher coverage in exomes compared to genomes at this locus, as well as the total copy number of SMN paralogs per individual.
The Tartu University Hospital cohort consisted of 9,919 pseudonymised samples sent for molecular diagnostics for various suspected medical conditions to Tartu University Hospital between 2015 – 2022. The cohort analysis was approved by the University of Tartu Research Ethics Committee (protocol # 374M-6). Out of the 9,919 samples, 1,157 (11.7%) were exomes and 8,762 (88.3%) were panel sequencing panels generated using the Illumina TruSight™ One or the Illumina TruSight™ One Expanded panel of 4,811 to 6,794 genes. Guaranteed mean coverage for both panels and exomes was >100x for panel regions. The data was aligned to reference genome build GRCh37 using BWA. Samples were annotated with summary clinical characteristics, and de-pseudonymized data was accessed for positive SMA Finder results only.
The UK Biobank (UKBB)15 dataset included de-identified genetic and medical records from 198,868 individuals as well as both exome and genome samples for each individual. These samples were aligned to GRCh38 using BWA. The UKBB is a large population-based prospective study that recruited UK residents between 40 and 69 years of age during the years 2006–2010. UKBB genotypes had a mean depth of coverage of 32.5x and a minimum of 23.5x.16 For UKBB exomes, the depth of coverage, on average, exceeded 20x at 95% of panel bases.17 The average number of reads overlapping the c.840 position of SMN1 + SMN2 was 56.8 in UKBB genomes and 75.1 in UKBB exomes, again reflecting the higher coverage in exomes at this locus, as well as the total copy number of SMN paralogs per individual.
Our cohorts did not include any samples aligned to the T2T-CHM13 reference. Therefore, to test SMA Finder’s performance on T2T-CHM13, we realigned the 21 confirmed-positive exome samples from the CMG cohort to the T2T-CHM13 reference using BWA MEM and ran SMA Finder on the realigned samples. We then repeated these steps on 21 randomly selected exomes from the CMG cohort that SMA Finder had previously called as SMA-negative. After realignment to T2T-CHM13, SMA Finder still reported the 21 positive samples as SMA-positive, and the 21 negative samples as SMA-negative, confirming its utility for T2T-aligned samples.
Results
We applied SMA Finder to 16,626 exomes and 3,911 genomes from phenotypically heterogeneous rare disease cohorts within the Broad CMG,18 part of the GREGoR Consortium. SMA Finder identified all 13 known SMA-positive samples (10 exomes and 3 genomes), and flagged 10 previously undiagnosed exome samples as candidate SMA cases, of which 8 have now been validated by gold standard methods such as MLPA. For the remaining novel diagnoses (n=2), confirmatory testing is pending in one case, and is not possible in the other due to loss of contact with the patient (Table 1). SMA Finder was negative for 8,401 unaffected samples (6,500 exomes and 1,901 genomes). SMA Finder reported insufficient read coverage to make a call in 112 exomes (0.7%) and 6 genomes (0.2%) due to these samples having fewer than 14 total reads overlapping c.840 in SMN1 + SMN2. Strikingly, out of the 23 total SMA-positive cases in this heterogeneous rare disease cohort, 20 had an initial clinical diagnosis of muscular dystrophy or myopathy while one had a clinical diagnosis of congenital myasthenic syndrome.
Table 1:
Table 1 lists all samples called as SMA-positive by SMA Finder within the Broad Institute Center for Mendelian Genomics (CMG) and Tartu University Hospital cohorts. The Original Clinical Diagnosis column lists the clinical diagnosis prior to SMA testing. The SMA Confirmation Status column shows whether the diagnosis was already known at the time of the SMA Finder result as well as the status of confirmatory testing for novel diagnoses. Age of Onset and SMA Type columns are based on physician reports, while the SMN2 Copies column is based on the SMN2 copy number as measured by MLPA genetic testing. The next three columns display the confidence score, r, and N counts computed by SMA Finder (see Figure1, Methods), while the last column shows the imputed ancestry as NFE (Non-Finnish European), MDE (Middle Eastern), AFR (African/African American), SAS (South Asian), or UNC (unclassified).
To evaluate its performance beyond the CMG dataset, we also ran SMA Finder on 9,919 samples (1,157 exomes and 8,762 panel sequencing samples) from another heterogeneous rare disease cohort at the Tartu University Hospital. SMA Finder detected three previously known cases and three new cases which have since been confirmed by clinical testing (Table 1). Of the three new cases, two presented as SMA type III, and one had symptoms that did not include obvious SMA features and thus may be a presymptomatic case of SMA type IV.
Finally, to measure SMA Finder’s positive predictive value (PPV) in a large population cohort as well as compare its performance to other tools and across sample types, we applied SMA Finder to 198,868 individuals from the UK Biobank (UKBB).15 Given the availability of an exome and genome sample for each individual, we ran SMA Finder on all exomes and genomes, while also running SMNCopyNumberCaller3 on the genomes. We found SMA Finder calls to be identical between exomes and genomes, with the only difference being that SMA Finder reported 85 out of 198,868 (0.042%) genome samples and 14 out of 198,868 (0.007%) exome samples (Fisher’s exact p=1.4×10−13) as having insufficient read coverage to make a call (Table S3). SMA Finder flagged two individuals (i1 and i2) as SMA-positive, with their exome and genome samples yielding the same result. It reported all other exomes and genomes that had sufficient coverage as SMA-negative (Figure 3E, F). Concordance between SMA Finder and SMNCopyNumberCaller was also nearly 100%. Like SMA Finder, SMNCopyNumberCaller (a genome-only tool) reported individual i1 to be SMA-positive, while reporting no-call for 35 genomes including i2 (Table S2).
The UKBB phenotype records for i1 included ICD-10 code G121 “Other inherited spinal muscular atrophy”, suggesting this to be a true positive call for both SMA Finder and SMNCopyNumberCaller. Additionally, i1’s records indicated that they died between the age of 65 and 70, with the cause of death listed as “Spinal muscular atrophy”. In contrast, i2 had few phenotype records, making it difficult to tell whether this was a false positive call by SMA Finder, or a missed call by SMNCopyNumberCaller. The fact that SMA Finder made the same call for both the exome and genome of i2 increases the likelihood that this is a true positive. However, even if we conservatively count i2 as a potential false positive call by SMA Finder based on the absence of phenotype records and the no-call result from SMNCopyNumberCaller, SMA Finder’s overall false positive rate would still be < 1 / 200,000 exomes. Additionally, this would yield a PPV of 97% based on 28 true positive samples (21 from CMG cohorts, 6 from Tartu University Hospital, and 1 from UKBB) and at most 1 hypothetical false positive (counting sample i2 from UKBB).
Discussion
We developed and validated SMA Finder, a new tool for detecting SMA-positive samples within short read exome, genome, and panel sequencing data. After testing this tool on multiple heterogeneous rare disease cohorts, as well as nearly 200,000 individuals from the UKBB, we found its false positive rate to be less than 1 in 200,000. Our results showed that SMA Finder is a robust and accurate tool for detecting the most common molecular causes of SMA using exome, genome, and panel sequencing samples.
However, the analysis had several limitations related to these cohorts. It was predominantly based on samples of European ancestry. A precise estimate of SMA Finder’s accuracy in other populations will require testing on cohorts with larger numbers of non-European samples. Similarly, since most samples used DNA extracted from blood, further testing is needed to measure SMA Finder’s accuracy for other sample sources such as saliva or dried blood spots.
The SMA Finder algorithm in its current form also has several important limitations. First, it only determines whether an individual has zero or more than zero functional copies of SMN1, and thus does not provide information on carrier status or SMN2 copy number. In the future, we may extend SMA Finder to estimate the exact copy numbers of SMN1 and SMN2. Second, SMA Finder only tests for variants that disrupt the c.840 position of SMN1. Studies have shown that, depending on ancestry, between 6% and 49% of SMA cases19 may be caused by other variants in regions of the SMN1 gene that don’t overlap the c.840 position. SMA Finder cannot currently detect these variants, and so would yield false negative results in those cases. Furthermore, SMA Finder was unable to make a call in 112 exomes (0.7%) and 6 genomes (0.2%) from the CMG as well as 14 exomes (0.007%) and 85 genomes (0.04%) from UKBB due to insufficient read coverage at the c.840 position. These no-call percentages may be useful for confirming sequencing data quality when applying SMA Finder to new cohorts where a significantly higher proportion of no-calls would indicate problems with target capture or sequencing at the SMN locus. Finally, SMA Finder was designed and tested on samples generated using Illumina sequencers and aligned using BWA.6 Samples generated using a different sequencing technology, or aligned using a different, non-functionally-equivalent aligner may not work as expected, particularly when aligning to the GRCh38 reference where the ALT contigs include additional copies of SMN1 and SMN2 (Supplementary Methods).
Despite these limitations, SMA Finder’s exceptionally high accuracy in detecting homozygous loss of SMN1 due to deletions or gene conversions at the c.840 position allowed us to shed new light on open questions about the genetic architecture of SMA. Over the years, reports have described cases of unaffected individuals with biallelic loss of SMN1 due to deletions that would be detectable by SMA Finder.20,21,22 However, the frequency of such cases in the general population has not been well-characterized. Our analysis of UKBB exomes and genomes using SMA Finder and SMNCopyNumberCaller shows that such cases are extremely rare in the European population, occurring at a frequency below 1 in 200,000.
Rare disease diagnostic pipelines based on short read sequencing data have, until recently, needed to exclude highly repetitive regions such as segmental duplications due to technical limitations of existing variant calling algorithms. However, our analysis adds to the growing body of research3,10,23 which demonstrates that, with focused tool development, it is possible to detect variants in these regions with sufficient accuracy for rare disease diagnosis. Furthermore, at some loci such as the SMN c.840 position, our ability to accurately detect the causal variant from exome or panel sequencing data can be comparable to the accuracy of variant calling in non-repetitive regions.
A striking observation from our cohorts was that 11 out of the 13 new SMA diagnoses were made in individuals carrying a clinical diagnosis of muscular dystrophy, congenital myasthenia, or myopathy, and 12 of the 16 known SMA cases carried an initial clinical misdiagnosis of muscular dystrophy, reinforcing similar observations by other groups.24 Although it is known that the differential diagnosis of an infant or adolescent with progressive weakness includes SMA in addition to muscular dystrophies, myopathies, and congenital myasthenic syndromes25, clinical gene panels for muscular dystrophy and myopathy may not include SMA testing. Moreover, clinicians and researchers may be misled by the presence of variants of unknown significance (VUS) in other neuromuscular disease genes, by complementary diagnostic tools such as electromyography (EMG) and muscle MRI, and by the highly elevated creatine kinase (CK) levels sometimes observed in ambulant patients with SMA type III. Prior to the inclusion of SMA testing in newborn screening, there was commonly a diagnostic delay in SMA, particularly for the milder types II and III which can present with clinical features that are similar to later-onset Mendelian myopathies.26 Now, with newly-available SMA treatments where early detection is imperative for the best therapeutic effect,29 there is increased urgency to address diagnostic delays. Our analysis demonstrates that the inclusion of an SMA detection tool within existing rare disease analysis pipelines - particularly for cohorts that include neuromuscular phenotypes - can successfully identify missed cases.
In 2021, the American College of Medical Genetics (ACMG) published an updated set of recommendations for reporting of secondary findings (SF) in clinical exome and genome sequencing.28 In the section titled “Poor candidates for secondary findings, due to concerns about analytical validity”, the authors listed reasons why genes with homologous sequences should be excluded from the SF list. These included concerns about the accuracy of variant detection in these regions as well as challenges in orthogonal validation. Given the clinically overlooked diagnoses presented in this paper, the demonstrated PPV of SMA Finder, as well as the widespread availability of confirmatory molecular testing for SMA, we propose that SMN1 should now be considered for inclusion in the ACMG SF list.
Supplementary Material
Acknowledgments
This paper is dedicated to Clara and Dmitry Bogomolny. We thank the many families who participate in these rare disease studies and the Broad CMG collaborators for sharing the samples and medical data. The Broad CMG sequencing and analysis was funded by the National Human Genome Research Institute (NHGRI), the National Eye Institute, the National Heart, Lung and Blood Institute grant UM1HG008900 and NHGRI grants U01HG011755 and R01HG009141. GH is supported by the GREGoR Consortium, and research in this publication was supported by the National Human Genome Research Institute of the National Institutes of Health under Award Number U24HG011746. This publication has also been made possible in part by CZI grants 2019-19927, 2020-224274, and 2022-309464 https://doi.org/10.37921/236582yuakxy, from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation (funder DOI 10.13039/100014989). The Tartu University Hospital cohort analysis was funded by the Estonian Research Council grants PSG774 and PRG471. VSG was supported by NIH NHGRI grant T32HG010464. Work in CGB’s laboratory is supported by intramural funds of NINDS/NIH. This research has been conducted using the UK Biobank Resource under application number 48511. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the other organizations that have funded the work. HL receives support from the Canadian Institutes of Health Research (CIHR) for Foundation Grant FDN-167281 (Precision Health for Neuromuscular Diseases), Transnational Team Grant ERT-174211 (ProDGNE) and Network Grant OR2-189333 (NMD4C), from the Canada Foundation for Innovation (CFI-JELF 38412), the Canada Research Chairs program (Canada Research Chair in Neuromuscular Genomics and Health, 950-232279), the European Commission (Grant # 101080249) and the Canada Research Coordinating Committee New Frontiers in Research Fund (NFRFG-2022-00033) for SIMPATHIC, and from the Government of Canada Canada First Research Excellence Fund (CFREF) for the Brain-Heart Interconnectome (CFREF-2022-00007). KP is a recipient of CIHR Postdoctoral fellowship (202210MFE-491707-404816). RH is supported by the Wellcome Discovery Award (226653/Z/22/Z), the Medical Research Council (UK) (MR/V009346/1), Ataxia UK, Action for AT, the Muscular Dystrophy UK, the LifeArc Centre to Treat Mitochondrial Diseases (LAC-TreatMito), the UKRI/Horizon Europe MSCA Doctoral Network Programme (Project 101120256 — MMM) and an MRC strategic award to establish an International Centre for Genomic Medicine in Neuromuscular Diseases (ICGNMD) MR/S005021/1. This research was supported by the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.
Declaration of interests
HLR receives research funding from Microsoft and previously received funding from Illumina to support rare disease gene discovery and diagnosis. AODL has consulted for Tome Biosciences, Ono Pharma USA Inc, and Addition Therapeutics, and is member of the scientific advisory board for Congenica Inc and the Simons Foundation SPARK for Autism study. AL received honoraria for speaking at educational events for Biogen, PTC and Roche, is a subinvestigator in clinical trials by Roche and PTC, and is involved in a project supported by Biogen (POL-SMA-17-11166). PBK has received research support from ML Bio and Sarepta Therapeutics, and has consulted for Lupin, Neurogene, NS Pharma, and Teneofour.
Data and code availability
SMA Finder is publicly available as a stand-alone tool at https://github.com/broadinstitute/sma_finder under the open-source MIT license.
Also, it is available as a WDL workflow on Terra at https://portal.firecloud.org/?return=terra#methods/translational-genomics-group/sma-finder/4
References
- 1.Sarv S. et al. The Birth Prevalence of Spinal Muscular Atrophy: A Population Specific Approach in Estonia. Front. Genet. 12, 796862 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Verhaart I. E. C. et al. Prevalence, incidence and carrier frequency of 5q-linked spinal muscular atrophy - a literature review. Orphanet J. Rare Dis. 12, 124 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Chen X. et al. Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data. Genet. Med. 22, 945–953 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Schorling D. C., Pechmann A. & Kirschner J. Advances in Treatment of Spinal Muscular Atrophy - New Phenotypes, New Challenges, New Implications for Care. J Neuromuscul Dis 7, 1–13 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lefebvre S. et al. Identification and characterization of a spinal muscular atrophy-determining gene. Cell 80, 155–165 (1995). [DOI] [PubMed] [Google Scholar]
- 6.Li H. & Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Poplin R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv 201178 (2018) doi: 10.1101/201178. [DOI] [Google Scholar]
- 8.Chen X. et al. Comprehensive SMN1 and SMN2 profiling for spinal muscular atrophy analysis using long-read PacBio HiFi sequencing. Am. J. Hum. Genet. 110, 240–250 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Töpf A. et al. Sequential targeted exome sequencing of 1001 patients affected by unexplained limb-girdle weakness. Genet. Med. 22, 1478–1488 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Steyaert W. et al. Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation. Nat. Commun. 14, 6845 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Nurk S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Karczewski K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Robinson P. N. et al. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am. J. Hum. Genet. 83, 610–615 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Poterba T. et al. The Scalable Variant Call Representation: Enabling Genetic Analysis Beyond One Million Genomes. bioRxiv (2024) doi: 10.1101/2024.01.09.574205. [DOI] [PubMed] [Google Scholar]
- 15.Sudlow C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Halldorsson B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Van Hout C. V. et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Baxter S. M. et al. Centers for Mendelian Genomics: A decade of facilitating gene discovery. Genet. Med. 24, 784–797 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Vorster E., Essop F. B., Rodda J. L. & Krause A. Spinal Muscular Atrophy in the Black South African Population: A Matter of Rearrangement? Front. Genet. 11, 54 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cobben J. M. et al. Deletions of the survival motor neuron gene in unaffected siblings of patients with spinal muscular atrophy. Am. J. Hum. Genet. 57, (1995). [PMC free article] [PubMed] [Google Scholar]
- 21.Oprea G. E. et al. Plastin 3 Is a Protective Modifier of Autosomal Recessive Spinal Muscular Atrophy. Science 320, 524 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Chong J. X. et al. A common spinal muscular atrophy deletion mutation is present on a single founder haplotype in the US Hutterites. Eur. J. Hum. Genet. 19, 1045–1051 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Peng X. et al. Overcoming the Pitfalls of Next-Generation Sequencing-Based Molecular Diagnosis of Shwachman-Diamond Syndrome. J. Mol. Diagn. 24, 1240–1253 (2022). [DOI] [PubMed] [Google Scholar]
- 24.Krenn M., Jengojan S. & Grisold W. Spinal muscular atrophy presenting with mild limb-girdle weakness in adulthood: Diagnostic pitfalls in the era of disease-modifying therapies. J. Neurol. Sci. 440, 120347 (2022). [DOI] [PubMed] [Google Scholar]
- 25.D’Amico A., Mercuri E., Tiziano F. D. & Bertini E. Spinal muscular atrophy. Orphanet J. Rare Dis. 6, 71 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lin C.-W., Kalb S. J. & Yeh W.-S. Delay in Diagnosis of Spinal Muscular Atrophy: A Systematic Literature Review. Pediatr. Neurol. 53, 293–300 (2015). [DOI] [PubMed] [Google Scholar]
- 27.Schwartz O. et al. Clinical Effectiveness of Newborn Screening for Spinal Muscular Atrophy: A Nonrandomized Controlled Trial. JAMA pediatrics vol. 178 540–547 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Miller D. T. et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2021 update: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 23, 1391–1398 (2021). [DOI] [PubMed] [Google Scholar]
- 29.Hail Team. Hail 0.2. https://github.com/hail-is/hail
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
SMA Finder is publicly available as a stand-alone tool at https://github.com/broadinstitute/sma_finder under the open-source MIT license.
Also, it is available as a WDL workflow on Terra at https://portal.firecloud.org/?return=terra#methods/translational-genomics-group/sma-finder/4