Abstract
Background and Objectives
Canavan disease (CD) is a neurodegenerative disorder in which biallelic pathogenic variants in ASPA result in spongiform degeneration of the cerebral white matter, leading to progressive and irreversible motor and cognitive decline. Despite comprehensive genetic testing, many individuals with clinical and biochemical diagnoses of CD remain without a definitive molecular diagnosis. This gap hinders access to emerging gene-targeted therapies and limits participation in clinical trials. Our objective was to understand the genetic etiology of 8 unsolved cases of CD.
Methods
We used long-read sequencing (LRS) to investigate 8 individuals clinically and biochemically diagnosed with CD but who had negative genetic testing results. We performed targeted LRS using the Oxford Nanopore Technologies platform for 3 unrelated individuals and PacBio HiFi for an additional individual from our cohort. We performed targeted LRS on barcoded and pooled samples from the remaining affected individuals. To investigate functional impact on gene function, we performed RNA sequencing (RNA-seq) with and without cycloheximide on fibroblasts. We then evaluated the allele frequency in the population using gnomAD.
Results
We identified an ∼2,600-bp SVA_E retrotransposon intronic insertion in ASPA in all 8 individuals. The insertion was found to be either homozygous or compound heterozygous trans with a known pathogenic variant in all individuals. RNA-seq indicated that the SVA_E insertion creates a novel splice acceptor site within intron 4 of ASPA that causes aberrant splicing and transcript degradation. Surprisingly, the frequency of this variant in population databases suggests that it is the most common pathogenic variant in ASPA and that it is present across ancestry groups.
Discussion
Our study identified the most common pathogenic variant in ASPA, which has been overlooked in 25 years of CD research. Considering this, it is important to ensure that all testing laboratories can detect this variant through diagnostic testing and carrier screening. Our study highlights a substantial blind spot in standard short-read diagnostic pipelines, which historically have missed or overlooked these types of insertions. It also shows the power of emerging technologies, such as LRS and RNA-seq, to identify new classes of variants for genetic disorders, including CD.
Introduction
Canavan disease (CD) (OMIM# 271900) is an autosomal recessive disorder caused by biallelic loss-of-function variants in the ASPA gene. Children affected with CD present with progressive and irreversible decline of previously acquired motor and cognitive milestones. Symptoms typically appear after normal development during the first months of life, including macrocephaly, hypotonia, loss of muscle control, feeding difficulties, developmental delay (including motor and verbal skills), optic atrophy, and seizures. Clinical severity and disease progression are likely associated with the enzyme's residual activity and conformational stability.1 In addition to suggestive clinical findings, the diagnosis is established by elevated NAA in urine through gas chromatography-mass spectrometry or in the brain through proton magnetic resonance spectroscopy.2
The ASPA gene is located on chromosome 17p13.2, contains 6 exons, and is approximately 30 kb long. Although the Ashkenazi Jewish population has a higher incidence of CD mainly due to the founder ASPA variants p.Glu285Ala and p.Tyr231Ter,3 individuals from all ancestry groups can be affected, and over 100 pathogenic variants have been submitted to variant databases such as Leiden Open Variation Database, ClinVar, and Human Gene Mutation Database (HGMD). These variants include missense/nonsense, splicing, deletions, and insertions. In the context of emerging therapeutic approaches for CD (NCT04833907, NCT04998396), it is becoming increasingly important to define molecular causes in addition to biochemical testing because genetic confirmation may be a prerequisite for participation in clinical trials or future approved therapies.
In this study, we report 8 individuals from 7 families (Figure 1A, eAppendix 1) with confirmed CD diagnoses based on clinical, biochemical, and neuroimaging evidence (Figure 1, B and C) in whom genetic testing, including the use of short-read genome sequencing (srGS), did not identify pathogenic variants on one or both alleles of the ASPA gene. Further examination through long-read sequencing (LRS) revealed the presence of an ∼2,600-bp SVA_E insertion in intron 4 of 5 (NM_000049.4) in all individuals (Figure 2A). Short interspersed element-variable number of tandem repeat-Alu (SINE-VNTR-Alu), subfamily E retrotransposons (SVA_E), are evolutionarily young hominid-specific transposable elements. SVA_Es are active in the human lineage and are known to cause disease through several mechanisms including insertional mutagenesis, exon shuffling, alternative splicing, and the generation of differentially methylated regions.4 SVA_E insertions have been identified and associated with a wide variety of genetic conditions including cancer,5 choroideremia,6 and Pompe disease.7 Detecting SVA_E elements using srGS is challenging because of their variability in size as they average approximately 2 kilobases but can be up to 4 kilobases or longer in length.8 Based on these findings, we propose retrospectively analyzing previously unsolved CD cases to evaluate for this SVA_E insertion and prospectively ensuring that all genetic testing laboratories can detect this SVA_E insertion both in diagnostic testing and in prenatal carrier screening.
Figure 1. Pedigrees and MRI Scans of Individuals With Canavan Disease.
(A) Pedigrees showing segregation of the SVA_E insertion (+) in the ASPA gene. In individuals FI:1, FII:1, and FIII:1, the SVA_E insertion is homozygous. Among the other affected individuals, the insertion is in trans with the previously identified pathogenic variant. (B) Brain MRI scans corresponding to each patient (except FVII:1 in whom MRI was not performed and FVI:1 whose neuroimaging was not available) grouped in panels of 4; axial T2-weighted images through the centrum semiovale (top left), basal ganglia (top right), and posterior fossa (bottom left) and axial DWI through the centrum semiovale (bottom right). There is symmetric diffuse white matter signal abnormality with variable areas of reduced diffusivity, notably involving the globus pallidus, thalamus, and dentate nuclei. There is relative sparing of the corpus striatum, corpus callosum, and small focal regions of the posterior limb of the internal capsule. These imaging features are consistent with Canavan disease. (C) Brain MR spectroscopy in each patient (except FIV:2 and FVII:1, in whom MRS was not performed, and FVI:1, whose neuroimaging was unavailable) revealed elevated NAA at 3 ppm, characteristic of Canavan disease.
Figure 2. An SVA_E Insertion in ASPA Is Associated With Canavan Disease.
(A) IGV9 view of LRS data (top) and srGS data (bottom) from individual FI:1 showing the SVA_E insertion (arrow). A target-site duplication is apparent in both the LRS and srGS data (dashed box). Colored reads in the srGS track indicate discordant read pairs, which occur when the mate of the read maps to another location in a genome. An increased number of these reads, as well as the soft-clipped sequences (eFigure 1), are seen at the site of a retrotransposon insertion. (B) Analysis of phased heterozygous and homozygous SNVs for FI:1, FIV:2, FV:1, and FVI:1 demonstrates that SNVs from 1 haplotype in each affected individual within the region that includes ASPA are highly similar to SNVs from FI:1, suggesting a shared founder event. Only SNVs homozygous in FI:1 are plotted for FIV:2, FV:1, and FVI:1.
Methods
Patient Ascertainment and Clinical Studies
Eight individuals from 7 unrelated families of self-reported European-Uruguayan (FI:1, FII:1, FIII:1), European-American (FIV:1, FIV:2, FV:1), and European (FVI:1, FVII:1) ancestry were recruited and studied (Table 1). Unless otherwise noted, patients provided consent and were enrolled under the Myelin Disorders Biorepository Project at the Children's Hospital of Philadelphia ([CHOP], IRB #14–011236), within the regulatory framework for the Global Leukodystrophy Initiative Clinical Trial Network. The study was approved by the institutional ethics committees of the CHOP, and written informed consent was obtained from all 5 families in accordance with the Declaration of Helsinki. For each individual enrolled, the medical records were collected from institutions where they received care. Family FVI provided consent and was enrolled under the Rare Diseases Now: Genomic Diagnoses and Personalised Care for Children with Undiagnosed Rare Diseases program (HREC Reference Number HREC/67401/RCHM-2020) approved by the Royal Children's Hospital Human Research Ethics Committee. Family FVII provided consent and was enrolled under the Leukodystrophy Research Program at the Murdoch Children's Research Institute (HREC#641943), approved by the Royal Children's Hospital Human Research Ethics Committee. All individuals had a clinical diagnosis of CD based on biochemical testing and neuroimaging, but genetic testing had identified one or no known pathogenic or likely pathogenic variants in the ASPA gene. Detailed clinical features, brain MRI, family history, and clinical notes were reviewed by a group of pediatric neurologists and a genetic counselor. Brain MRI scans were reviewed by experienced pediatric neuroradiologists.
Table 1.
Summary of Genotypes and Phenotypes of Individuals Included in This Report
| Individual | FI:1 | FII:1 | FIII:1 | FIV:1 | FIV:2 | FV:1 | FVI:1 | FVII:1 |
| Zygosity | HOM | HOM | HOM | CHET | CHET | CHET | CHET | CHET |
| Variant identified by clinical testing | None | None | None | c.914C>A p.A305E | c.914C>A p.A305E | c.634 + 1G>T | c.820G>A p.G274R | c.914C>A p.A305E |
| Variant identified by LRS | SVA_E | SVA_E | SVA_E | SVA_E and c.914C>A p.A305E | SVA_E and c.914C>A p.A305E | SVA_E and c.634 + 1G>T | SVA_E and c.820G>A p.G274R | SVA_E and c.914C>A p.A305E |
| Self-reported ancestry | European-Uruguayan | European-Uruguayan | European-Uruguayan | European-American | European-American | European-American | European | European |
| Consanguinity | No | No | Yes | No | No | No | No | No |
| Initial symptom | Nystagmus | GDD | GDD | GDD | Hypotonia | GDD | Hypotonia and macrocephaly | Hypotonia and macrocephaly |
| Progressive macrocephaly | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| GDD | Yes | Yes | Yes | Yes | Yes | Yes | Yes (mild) | Yes |
| Cerebral demyelination | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No MRI |
| Nystagmus | Yes | Not reported | Not reported | Yes | Yes | Yes | Yes | Yes |
| Hypotonia | Yes | Not reported | Not reported | Yes | Yes | Yes | Yes | Yes |
| Spasticity | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes |
| Seizures | Yes | Not reported | Yes | Yes | Not reported | Yes | Not reported | Yes |
| Increased NAA in urine | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Increased NAA in MRS | Yes | Yes | Yes | Yes | No MRS | Yes | Yes | No MRS |
Abbreviations: CHET = compound heterozygous; GDD = global developmental delay; HOM = homozygous; LRS = long-read sequencing.
Short-Read Sequencing of Individual FI:1
SRS (2 × 150 bp) was performed as a trio for patient FI:1 at the CHOP High Throughput Sequencing Core using a PCR-free protocol at a targeted mean coverage of >60x. The raw data (FASTQ) were aligned to the hg38 human reference genome and variants were called (single-nucleotide variants or SNVs, small insertion deletions or indels, copy number variants or CNVs, and regions of homozygosity or ROHs) using Illumina DRAGEN, v3.9.5.10 The quality control protocol included the determination of sex from the genetic data and the estimation of kinship coefficients to confirm the relationships within the family. Sequence variants (SNVs and indels) were annotated using the Variant Effect Predictor (VEP v106)11 and filtered against the gnomAD population database (v4)12 to remove variants that are observed in the general population (minor allele frequency [AF] threshold at 0.5%). Sequence variants were further prioritized using an in-house workflow to rank variants based on several factors, including prior evidence for pathogenicity using annotations from ClinVar13 and HGMD (Qiagen Inc., Germantown, MD)14 and several computational prediction scores, including CADD,15 REVEL,16 and SpliceAI.17 CNVs were annotated using the AnnotSV18 and filtered for variants involving the protein-coding regions of the genome.
Targeted LRS of Affected Individuals and Their Parents
Targeted LRS was performed using the Oxford Nanopore Technologies (ONT) platform for 3 unrelated individuals from our cohort. In brief, HMW DNA was isolated from whole blood using the Puregene DNA Purification Kit (Qiagen) for patients FI:1 and FV:1. Residual DNA for FIV:2 was obtained from an outside laboratory. Extracted DNA was quantified on a Qubit fluorometer (Invitrogen), and quality was assessed using a NanoDrop spectrophotometer (ThermoFisher) and an Agilent Femto Pulse system. The Short Read Eliminator kit (PacBio) was used to remove smaller fragments according to the manufacturer's instructions for FV:1 and FIV:2. Libraries were prepared using the ONT Ligation Sequencing Kit (SQK-LSK114), loaded onto an R10.4.1 flow cell, and run using adaptive sampling on a PromethION 24. Targets were enriched for regions of approximately 2 Mbp surrounding the genes HEPACAM, GALC, HEXA, GPRC5B, GCSH, ASPA, GFAP, AQP4, GCDH, MLC1, ARSA, AMT, HEXB, and GLDC, and for approximately 200 kbp surrounding FMR1 and COL1A1 (eTable 1). After sequencing, libraries were base-called using Dorado version 0.5 (ONT) with the super-accurate model, incorporating 5mCG and 5hmCG modifications. Run performance was evaluated using cramino (v0.14.1). Base-called reads were aligned to GRCh38 using minimap219; small variant calling and phasing were performed using Clair3.20 SVs were called using Sniffles221 and cuteSV.21,22 Reads were visualized using Integrative Genomics Viewer (IGV).9
Targeted LRS was also performed on barcoded and pooled samples from the remaining affected individuals. In brief, extracted DNA was quantified as described above. Libraries were prepared using the ONT Native Barcoding Kit (SQK-NBD114.24), and the pool was loaded onto an R10.4.1 flow cell and run using adaptive sampling on a PromethION 24. The flow cell was washed and reloaded twice during a 66-hour run. Targets were enriched using the same regions as given above (eTable 1). After sequencing, libraries were base-called using Dorado version 0.8.2 (ONT) with the superaccurate model, incorporating 5mCG and 5hmCG modifications, and then demultiplexed using Dorado. Run performance was evaluated using cramino (v0.14.1).23 Base-called reads were aligned to GRCh38 using minimap2.19 Reads were visualized using IGV.9
Illumina Whole-Genome Sequencing of Families VI and VII
WGS data generation and clinical analysis were performed using diagnostically accredited methods by the Victorian Clinical Genetics Services in Melbourne, Australia. DNA was extracted manually from blood collected in EDTA vacutainers using the QIAamp DNA Blood Mini Kit. DNA quantity and quality were assessed using the Qubit dsDNA BR (broad-range) Assay kit (Thermo Fisher) and TapeStation genomic DNA kit (Agilent), respectively. Whole-genome DNA libraries were created using Nextera DNA Flex Library Prep Kit/Illumina DNA Prep Kit (Illumina), followed by 2× 150-bp paired-end DNA sequencing on a NovaSeq 6,000 instrument (Illumina), variably using S2 or S4 flow cells. The targeted mean sequencing depth was 30×, with a minimum of 90% of bases sequenced to at least 10× for nuclear DNA and a mean coverage of 800× for mitochondrial DNA.
PacBio HiFi Sequencing and Alignment of Individual VI:1
DNA for sequencing was homogenized using a Diagenode Megaruptor with the 3 DNAFluid + Kit (E07020001, Diagenode) using the following parameters: volume, 150 µL; speed, 40; and concentration, 50 ng/µL. After homogenization, 3 ug of material was diluted in low TE to a volume of 130 µL. Samples were sheared with the Megaruptor Shearing Kit (E07010003, Diagenode) at a speed of 30 or 31 depending on the extraction fragment length, with the goal of recovering average fragment lengths of 15–24 kb. Clean-up and concentration of the sheared material were performed using SMRTbell clean-up beads (102-158-300, PacBio), and the sample was eluted in a volume of 47 µL. Sheared average fragment lengths were determined by Femto Pulse using the Genomic DNA 165 kb Analysis Kit (FP-1002-0275, Agilent).
SMRTbell libraries were prepared using SMRTbell prep kit 3.0 (102-141-700, PacBio) following the standard procedure and performing the unique barcoding of each sample with the SMRTbell adapter index plate 96A (102-009-200, PacBio). Size selection was performed using the AMPure PB beads size selection kit (102-182-500, PacBio) at a ratio of 2.9x (i.e., 50 µL of sample: 145 µL 35% beads). Size determination of the final SMRTbell library was performed using the Femto Pulse as described above. SMRTbell library average fragment lengths ranged between 12.895 and 26.001 kb, and final SMRTbell libraries were diluted to below 60 ng/µL before loading. Samples were sequenced with a movie time of 30 hours and using SMRT Link version 13.1.0.221970, chemistry bundle 13.1.0.217683, and parameter version 13.1.0. The following PacBio products were used for the sequencing reaction: Revio Sequencing Plate (02-587-400), Revio Polymerase Kit (102-739-100), and Revio SMRT Cell Tray (102-202-200). The on-plate loading concentration was set to 250 pm for all sequencing runs. Any sample that did not generate 90 Gb using a single SMRT cell was re-sequenced in pooled “top-up” runs where multiple SMRTbell libraries were combined.
After sequencing, HiFi FASTQ files were aligned to the hg38 reference genome using minimap2 (v2.14-r883).19 SNVs and indels were called using Clair3 (v1.0.4)20 and phased using WhatsHap (v2.1).24 SVs (including CNVs) were called with Sniffles221 (v2.07). SNVs and indels were functionally annotated using the Variant Effect Predictor (VEP)25 (v110), and SVs were annotated using the AnnotSV (v3.3.4).18
RNA Sequencing
Patient fibroblast cell lines were established, grown to 7 × 105 cells/mL, and then treated with or without cycloheximide (CHX) for 22–24 hours. Cells were centrifuged, and the resulting cell pellet was washed with PBS. Total RNA was extracted from the cell pellets, and the Illumina Stranded mRNA protocol was used for library prep, which was sequenced to a depth of 80 million reads. RNA sequencing (RNA-seq) data were mapped to the hg38 reference genome using STAR v2.7.3a.26 The 2-pass method within STAR was used with gencode.v35 gene annotations to enhance mapping and enable the detection of unique splicing events. Data quality was assessed with FastQC, and data were visualized using IGV.9
Establishing the Maximum Credible AF for ASPA
The maximum credible AF was calculated using an established statistically robust framework for assessing whether a variant is “too common” to be causative for a Mendelian disorder of interest.27 To calculate the maximum allele contribution in ASPA, we used data from gnomAD (v4)12 (eTable 2). The allele count (487) of the most common previously described pathogenic allele (17-3499060-C-A) was divided by the total number of pathogenic or likely pathogenic alleles (1,021). The prevalence of CD used for the calculation was 1 in 10,000; penetrance was determined to be 1 because CD is fully penetrant, and genetic heterogeneity was also 1 because it is a monogenic disease. The Frequency Filter App was used to perform the calculation.
Data Availability
Additional data that support the findings of this study are available. Deidentified genomic and associated data from this study are available on request from the corresponding author through formal data sharing agreements. All variants reported in this study have been deposited in ClinVar (SCV005407762).
The web resources used are as follows:
• ClinVar: ncbi.nlm.nih.gov/clinvar.
• gnomAD: gnomad.broadinstitute.org.
• IGV: software.broadinstitute.org/software/igv.
• NCBI RefSeq: ncbi.nlm.nih.gov/refseq.
• OMIM: omim.org.
• Frequency Filter: cardiodb.org/allelefrequencyapp/.
• UCSC: genome.ucsc.edu/.
Results
Identification and Validation of an SVA_E Insertion by LRS
The reported SVA_E insertion was identified independently by 2 groups simultaneously. One team performed targeted LRS of ASPA and a panel of genes associated with neurodegenerative disorders in infancy for individual FI:1. Filtering for allelic variants and SVs with allele frequencies less than 1% in ASPA revealed a single homozygous ∼2,600-bp SVA_E insertion in intron 4 of 5 (NM_000049.4) (Figure 2A). Separately, a second team performed manual analysis of srGS data in IGV from individual FVI:1, who had a single pathogenic variant identified by diagnostic testing. The candidate insertion was detected by identifying and analyzing the discordant read pairs and the soft-clipped sequence present on either side of the target site duplication, which matched the SVA_E sequence (eFigure 1). Subsequent LRS of individual FVI:1 confirmed an SVA_E insertion. Once identified, the insertion was found in 6 additional individuals who had biochemical and clinical evidence of disease but did not have a confirmed genetic diagnosis of CD. The insertion was found to be homozygous in all individuals with no candidate variant previously identified by prior clinical testing (FI:1, FII:1, FIII:1) and in trans with a known pathogenic variant in all individuals with a single candidate variant identified by standard clinical testing (FIV:1, FIV:2, FV:1, FVI:1, FVII:1). Insertions were confirmed by LRS in the probands and their family members (eFigure 2).
RNA-Seq Confirms the Impact of the SVA_E on Splicing
SVA insertions may disrupt transcription through various mechanisms,28 including direct disruption of coding sequence or induction of aberrant splicing patterns that include exon skipping, activation of cryptic splice sites around the SVA insertion, and the inclusion of the SVA sequence in gene transcripts. To investigate the impact of this SVA_E insertion on gene function, we performed RNA-seq, with and without CHX, on fibroblasts derived from a parental SVA carrier (family VI) and a proband (family VII). The untreated samples from both individuals lacked heterozygous variant calls that were present in the srGS data, indicating expression of only 1 allele (Figure 3B). Both CHX-treated samples were observed to have soft clipping in sequencing reads spanning the exon 4-intron 4 boundary that did not map to any ASPA genomic reference sequence but instead matched the sequence of the SVA_E present in intron 4 (Figure 3B). These data indicated that the SVA_E insertion creates a novel splice acceptor site within intron 4 of ASPA, resulting in the disruption of canonical splicing and the addition of an SVA_E-containing exon (Figure 3A). No heterozygous variants were observed downstream of the SVA_E-containing exon in the CHX-treated samples, suggesting that transcription ends within the SVA_E insertion. The transcript incorporating the SVA_E was only observed in CHX-treated samples, indicating that the products from this allele are targeted for degradation. The RNA-seq data provided evidence that the deep intronic SVA insertion causes aberrant splicing that leads to transcript degradation and, consequently, loss of function of ASPA. Given that intronic retrotransposon insertions do not always cause aberrant splicing, performing RNA-seq or RT-PCR analyses is essential to confirm the functional consequences of such insertions.
Figure 3. An Intronic SVA_E Insertion Results in Aberrant ASPA Splicing.
(A) The position of ASPA on chr17 is shown, along with the gene structure and location of the SVA_E insertion in intron 4. Gray indicates intronic sequence while blue represents exonic sequence, including the 5′ and 3′ UTRs. The structure of the canonical ASPA transcript is shown, along with an aberrant transcript that includes sequence from the SVA_E insertion. (B) IGV screenshots showing phased DNA sequencing data from individual FVI:1 and RNA-seq of fibroblast-derived RNA from the father of FVI:1, who is heterozygous for the SVA_E insertion. Haplotype 2 from individual FVI:1 carries the SVA_E insertion and was paternally inherited. The fibroblast sample was treated with CHX to inhibit nonsense-mediated decay. The soft-clipped sequence at the end of the RNA-seq data in exon 4 are observed (dashed box), which matches sequence from the SVA_E insertion, indicating splicing from exon 4 into the SVA_E. A heterozygous SNV in exon 5 is observed on haplotype 2—the same haplotype the SVA_E insertion is on—but is not observed in RNA-seq data (dashed box), suggesting that transcription ends within the SVA_E insertion.
SVA_E Insertion Is the Most Common Pathogenic Variant in ASPA
Given that the SVA_E insertion was identified in 7 unrelated families with a shared haplotype, we presumed that the variant would be present in population databases. We evaluated the AF of SVA_E insertion using gnomAD v4.12 Of approximately 126,000 haplotypes, 66 were heterozygous and none were homozygous for an insertion at this position annotated as an SVA, yielding an AF of 0.0005 (1/1,909 haplotypes carry the insertion) (eTable 2). In gnomAD, the SVA insertion was identified from srGS data using MELT,29 which estimated the insertion length to be 461 bp. Due to the discrepancy in length between the SVA reported in gnomAD and our sequencing data, confirmation was obtained from gnomAD that the ends of the SVA_E insertion in our cohort matched the gnomAD data, providing evidence that they are the same variant. This variant is observed in all genetic ancestry groups in gnomAD except individuals of Middle Eastern descent, where fewer than 100 haplotypes are represented (eTable 3).
Surprisingly, the most common previously reported pathogenic variant in ASPA (17-3499060-C-A) has a lower AF of 0.0003 (present in 1/3,314 haplotypes) than the SVA_E insertion. However, the AF of the SVA_E insertion is below the maximum credible AF of 0.0013 for ASPA (eFigure 3). In addition, several variants including the p.Glu285Ala and p.Tyr231Ter (Ashkenazi Jewish founder variants) have higher allele frequencies in the individual ancestry group in which they are most common (eTable 2). Together, these 2 pieces of evidence indicate that AF of the SVA_E insertion is not high enough to put into question its pathogenicity. Instead, this evidence suggests that the SVA_E insertion is the most common pathogenic variant in ASPA because of its prevalence across genetic ancestry groups rather than resulting from a founder effect in a single genetic ancestry group, as seen in the previously described most common ASPA variants. The ubiquitous population distribution of the SVA_E insertion makes this variant an appropriate addition to all pan-ethnic and ethnic-specific carrier screening tests that cover ASPA.
SVA_E Insertion Occurs on a Shared Haplotype
The unexpected identification of the same SVA insertion in 7 unrelated families suggested that either the insertion is a recurring event or that the variant is derived from a single past event. Analysis of heterozygous and homozygous SNVs within the target region (chr17:2,450,000–4,550,000) in individual FI:1 revealed no heterozygous SNVs, suggesting that this is a large region of homozygosity in this individual (Figure 2B). Thus, we hypothesized that all individuals with the SVA_E insertion would share a similar pattern of SNVs on the haplotype containing the insertion, as observed in FI:1. To evaluate this, we compared the homozygous SNVs from FI:1 with phased SNVs from FIV:1, FV:1, and FVI:1, who were all heterozygous for the SVA_E insertion by LRS. The analysis demonstrated that all 3 individuals carried SNVs with more than 99% identity to those in FI:1 in genes within the target region, confirming that this haplotype is indeed shared among these 4 individuals and that this pathogenic variant likely originated from a single insertion event (Figure 2B, eFigure 4). Of interest, due to the presence of this variant across genetic ancestry groups, this single insertion event likely originated early in human evolution, prior to the divergence of these different ancestry groups. It is important to note that the presence of this shared haplotype could be used to infer the presence of the SVA_E insertion in individuals with a CD diagnosis in whom traditional testing did not identify the insertion.
SVA_E Insertion Meets Criteria for Classification as a Pathogenic Variant
We applied the American College of Medical Genetics and Genomics criteria30 to the SVA_E insertion, and it met criteria to be classified as pathogenic. We identified the variant in trans with a known pathogenic variant in 5 patients (1 point each) and in the homozygous state in an additional 3 patients (1 point total), for a total of 6 points, thereby qualifying for PM3_VeryStrong modification. Criteria applied included PM3_VeryStrong (for recessive disorders, detected in trans with a pathogenic or likely pathogenic variant in an affected patient), PS3 (well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product), PP1 (the insertion segregates with the disease in multiple families), and PP4 (the phenotype is highly specific for CD). The variant meets 1 very strong criterion (PM3_VeryStrong), 1 strong criterion (PS3), 1 moderate criterion (PM2), and 2 supporting criteria (PP1 and PP4), meeting criteria for classification as pathogenic.
Discussion
While LRS was instrumental in defining the SVA_E insertion, analysis of srGS data can detect the insertion through the presence of the tandem duplication site and the identification of split or discordant reads at the insertion breakpoints through IGV or using software packages such as MELT6 and xTea31 (Figure 2A). The ability of TE software to detect the SVA_E insertion with srGS data was confirmed using xTea on sample FVI:1. The discordant read pairs exhibit a distinct pattern, with their mates aligning to multiple chromosomes because of the reads mapping to various SVA-containing regions in the human reference genome, as well as sequences from the SVA elements themselves. However, the repetitive nature and size of the SVA_E element necessitate LRS or PCR to fully characterize the insertion. This underscores a significant limitation of srGS: its inability to detect or fully resolve complex SVs, particularly those involving TEs.
Several reported individuals with clinical evidence of CD lack a definitive genetic diagnosis, with only 1 or no pathogenic variants identified in ASPA.32-34 The number of molecularly unsolved CD cases is likely to be underreported because commercial genetic testing laboratories may not report single heterozygous pathogenic variants in autosomal recessive genes such as ASPA and genetic confirmation of disease can be an exclusion criterion for case publication. Given this, revisiting previously unsolved CD cases to evaluate for this SVA_E insertion is vital. It is also important that genetic testing laboratories ensure that they can detect this SVA_E insertion both in diagnostic testing and in prenatal carrier screening. Such clinical implementation may require the use of LRS, PCR, or TE-detecting software as described above to identify the presence of this variant from short-read data. Alternatively, the described haplotype linked to the SVA_E insertion could be used to infer its presence. This finding underscores the need to implement clinical genome sequencing in cases of unsolved leukodystrophy, or when there is biochemical evidence of CD with only 1 or no identified variants. Finally, it highlights the importance of diagnostic orthogonal methods such as biochemical testing in cases where there is clinical suspicion for a specific condition such as CD, even in the context of nondiagnostic first-tier genetic testing.
This study highlights the identification of what may be the most common pathogenic variant in ASPA, which has been overlooked in 25 years of CD research. This finding is of significant importance to the CD community because it directly affects families receiving a genetic diagnosis, as well as potential future therapeutic applications. In addition, this finding has significant implications for the broader rare disease community because it highlights a significant blind spot in standard diagnostic pipelines that either miss or overlook these types of insertions. Specific software to detect TE insertions should be used in rare disease short-read data sets, or ideally, LRS deployed, especially where there is a strong clinical suspicion of a particular genetic disorder but traditional testing is nondiagnostic. As LRS technology continues to improve and becomes more accessible, it will likely help resolve the genetic etiology of other unsolved individuals and conditions. The growing number of studies analogous to the one presented here positions TE-related variants as an exciting avenue to explore in unsolved cases and shows that RNA-seq and LRS are powerful tools to solve them.
Acknowledgment
The authors thank Angela Miller for help with manuscript and figure preparation.
Glossary
- AF
allele frequency
- BR
broad-range
- CADD
Combined Annotation Dependent Depletion
- CD
Canavan disease
- CHOP
Children's Hospital of Philadelphia
- CHX
cycloheximide
- DWI
diffusion-weighted imaging
- HGMD
Human Gene Mutation Database
- HOM
homozygous
- HREC
Human Research Ethics Committee
- HWM
high molecular weight
- IGV
Integrative Genomics Viewer
- IRB
Institutional Review Board
- LRS
long-read sequencing
- MELT
Mobile Element Locator Tool
- MRS
magnetic resonance spectroscopy
- NAA
N-acetylaspartate
- NCBI
National Center for Biotechnology Information
- OMIM
Online Mendelian Inheritance in Man
- ONT
Oxford Nanopore Technologies
- PBS
phosphate-buffered saline
- RDCRN
Rare Diseases Clinical Research Network
- RNA-seq
RNA sequencing
- ROH
region of homozygosity
- RT-PCR
reverse transcription polymerase chain reaction
- SNV
single nucleotide variant
- srGS
short-read genome sequencing
- SRS
short-read sequencing
- STAR
Spliced Transcripts Alignment to a Reference
- SVA
SINE-VNTR-Alu
- TE
Tris-EDTA
- UCSC
University of California Santa Cruz
- WGS
whole-genome sequencing
Footnotes
Editorial, page e200300
Author Contributions
C.A. Dominguez Gonzalez: drafting/revision of the manuscript for content, including medical writing for content; major role in the acquisition of data; study concept or design; analysis or interpretation of data. K.M. Bell: drafting/revision of the manuscript for content, including medical writing for content; major role in the acquisition of data; analysis or interpretation of data. R. Rajagopalan: drafting/revision of the manuscript for content, including medical writing for content; analysis or interpretation of data. M.G. de Silva: drafting/revision of the manuscript for content, including medical writing for content. A. Lemes: drafting/revision of the manuscript for content, including medical writing for content. C. Zabala: drafting/revision of the manuscript for content, including medical writing for content. F. Pérez: drafting/revision of the manuscript for content, including medical writing for content. A. Cerisola: drafting/revision of the manuscript for content, including medical writing for content. A. Vossough: drafting/revision of the manuscript for content, including medical writing for content; analysis or interpretation of data. M.T. Whitehead: drafting/revision of the manuscript for content, including medical writing for content; analysis or interpretation of data. C. Cunningham: drafting/revision of the manuscript for content, including medical writing for content. N.J. Brown: drafting/revision of the manuscript for content, including medical writing for content. R. Quin: drafting/revision of the manuscript for content, including medical writing for content. C. Simons: drafting/revision of the manuscript for content, including medical writing for content. T. Conway: drafting/revision of the manuscript for content, including medical writing for content. E. Uebergang: drafting/revision of the manuscript for content, including medical writing for content. R. Rius: drafting/revision of the manuscript for content, including medical writing for content. M.A. Kumaheri: drafting/revision of the manuscript for content, including medical writing for content. E.R. Kotes: drafting/revision of the manuscript for content, including medical writing for content. A. Vohra: drafting/revision of the manuscript for content, including medical writing for content. M.P.G. Zalusky: drafting/revision of the manuscript for content, including medical writing for content. Z.B. Anderson: drafting/revision of the manuscript for content, including medical writing for content. S.H.R. Storz: drafting/revision of the manuscript for content, including medical writing for content. S.A. Ward: drafting/revision of the manuscript for content, including medical writing for content. J. Goffena: drafting/revision of the manuscript for content, including medical writing for content. J.A. Gustafson: drafting/revision of the manuscript for content, including medical writing for content. S.M. White: drafting/revision of the manuscript for content, including medical writing for content. A. Vanderver: drafting/revision of the manuscript for content, including medical writing for content; major role in the acquisition of data; analysis or interpretation of data. D.E. Miller: drafting/revision of the manuscript for content, including medical writing for content; major role in the acquisition of data; study concept or design; analysis or interpretation of data.
Study Funding
The Rare Disease Flagship acknowledges financial support from the Royal Children's Hospital Foundation and the Murdoch Children's Research Institute, Melbourne, Australia. The research conducted at the Murdoch Children's Research Institute was supported by the Victorian Government's Operational Infrastructure Support Program. Massimo's Mission acknowledges financial support from the Australian Government Department of Health and Aged Care (EPCD000034). A. Vanderver is funded by the RDCRN (U54TR002823) and by 5U24NS131172. D.E. Miller is supported by the NIH through the NIH Director's Early Independence Award, DP5-OD033357.
Disclosure
M.P.G. Zalusky and J. Goffena have received travel support from ONT. A. Vanderver receives grant and in-kind support for translational research without personal compensation from Affinia, Biogen, Boehringer Ingelheim, Eli Lilly, Illumina, Ionis, Homology, Myrtelle, Orchard therapeutics, Passage Bio, Sana, Sanofi, Synaptixbio, and Takeda. D.E. Miller holds stock options in MyOme and Basis Genetics, is on a scientific advisory board at ONT and Basis Genetics, is engaged in a research agreement with ONT, and has received travel support from ONT and PacBio. Go to Neurology.org/NG for full disclosures.
References
- 1.Mendes MI, Smith DE, Pop A, et al. Clinically distinct phenotypes of canavan disease correlate with residual aspartoacylase enzyme activity. Hum Mutat. 2017;38(5):524-531. doi: 10.1002/humu.23181 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Matalon R. Canavan disease: diagnosis and molecular analysis. Genet Test. 1997;1(1):21-25. doi: 10.1089/gte.1997.1.21 [DOI] [PubMed] [Google Scholar]
- 3.Fares F, Badarneh K, Abosaleh M, Harari-Shaham A, Diukman R, David M. Carrier frequency of autosomal-recessive disorders in the Ashkenazi Jewish population: should the rationale for mutation choice for screening be reevaluated? Prenat Diagn. 2008;28(3):236-241. doi: 10.1002/pd.1943 [DOI] [PubMed] [Google Scholar]
- 4.Hancks DC, Kazazian HH Jr. SVA retrotransposons: evolution and genetic instability. Semin Cancer Biol. 2010;20(4):234-245. doi: 10.1016/j.semcancer.2010.04.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Walsh T, Casadei S, Munson KM, et al. CRISPR-Cas9/long-read sequencing approach to identify cryptic mutations in BRCA1 and other tumour suppressor genes. J Med Genet. 2021;58(12):850-852. doi: 10.1136/jmedgenet-2020-107320 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Jones KD, Radziwon A, Birch DG, MacDonald IM. A novel SVA retrotransposon insertion in the CHM gene results in loss of REP-1 causing choroideremia. Ophthalmic Genet. 2020;41(4):341-344. doi: 10.1080/13816810.2020.1768557 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bychkov I, Baydakova G, Filatova A, et al. Complex transposon insertion as a novel cause of Pompe disease. Int J Mol Sci. 2021;22(19):10887. doi: 10.3390/ijms221910887 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hancks DC, Ewing AD, Chen JE, Tokunaga K, Kazazian HH Jr. Exon-trapping mediated by the human retrotransposon SVA. Genome Res. 2009;19(11):1983-1991. doi: 10.1101/gr.093153.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178-192. doi: 10.1093/bib/bbs017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Illumina. DRAGEN Secondary Analysis (Version 3.9.5). 2021. illumina.com/products/by-type/informatics-products/dragen-secondary-analysis.html.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/dragen-bio-it/200010784_00_DRAGEN-3.9.5-Customer-Release-Notes.pdf [Google Scholar]
- 11.Cunningham F, Allen JE, Allen J, et al. 2022, Ensembl 2022. Nucleic Acids Res. D988-D995. doi: 10.1093/nar/gkab1049 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chen S, Francioli LC, Goodrich JK, et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024;625(7993):92-100. doi: 10.1038/s41586-023-06045-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Landrum MJ, Lee JM, Riley GR, et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids. Res 2014;42(Database issue):D980-D985. doi: 10.1093/nar/gkt1113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Stenson PD, Mort M, Ball EV, et al. The Human Gene Mutation Database (HGMD(®)): optimizing its use in a clinical diagnostic or research setting. Hum Genet. 2020;139(10):1197-1207. doi: 10.1007/s00439-020-02199-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 2019;47(D1):D886-d894. doi: 10.1093/nar/gky1016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ioannidis NM, Rothstein JH, Pejaver V, et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am J Hum Genet. 2016;99(4):877-885. doi: 10.1016/j.ajhg.2016.08.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, et al. Predicting splicing from primary sequence with deep learning. Cell. 2019;176(3):535-548.e24. doi: 10.1016/j.cell.2018.12.015 [DOI] [PubMed] [Google Scholar]
- 18.Geoffroy V, Herenger Y, Kress A, et al. AnnotSV: an integrated tool for structural variations annotation. Bioinformatics. 2018;34(20):3572-3574. doi: 10.1093/bioinformatics/bty304 [DOI] [PubMed] [Google Scholar]
- 19.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zheng Z, Li S, Su J, Leung AW, Lam TW, Luo R. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat Comput Sci. 2022;2(12):797-803. doi: 10.1038/s43588-022-00387-x [DOI] [PubMed] [Google Scholar]
- 21.Smolka M, Paulin LF, Grochowski CM, et al. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol. 2024;42(10):1571-1580. doi: 10.1038/s41587-023-02024-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Jiang T, Liu Y, Jiang Y, et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 2020;21(1):189. doi: 10.1186/s13059-020-02107-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.De Coster W, Rademakers R. NanoPack2: population-scale evaluation of long-read sequencing data. Bioinformatics. 2023;39(5):btad311. doi: 10.1093/bioinformatics/btad311 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Martin M, Ebert P, Marschall T. Read-based phasing and analysis of phased variants with WhatsHap. Methods Mol Biol. 2023;2590:127-138. doi: 10.1007/978-1-0716-2819-5_8 [DOI] [PubMed] [Google Scholar]
- 25.McLaren W, Gil L, Hunt SE, et al. The ensembl variant effect predictor. Genome Biol. 2016;17(1):122. doi: 10.1186/s13059-016-0974-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15-21. doi: 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Whiffin N, Minikel E, Walsh R, et al. Using high-resolution variant frequencies to empower clinical genome interpretation. Genet Med. 2017;19(10):1151-1158. doi: 10.1038/gim.2017.26 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Savage AL, Schumann GG, Breen G, Bubb VJ, Al-Chalabi A, Quinn JP. Retrotransposons in the development and progression of amyotrophic lateral sclerosis. J Neurol Neurosurg Psychiatry. 2019;90(3):284-293. doi: 10.1136/jnnp-2018-319210 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Gardner EJ, Lam VK, Harris DN, et al. The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology. Genome Res. 2017;27(11):1916-1929. doi: 10.1101/gr.218032.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of medical genetics and genomics and the association for molecular pathology. Genet Med. 2015;17(5):405-424. doi: 10.1038/gim.2015.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Chu C, Borges-Monroy R, Viswanadham VV, et al. Comprehensive identification of transposable element insertions using multiple sequencing technologies. Nat Commun. 2021;12(1):3836. doi: 10.1038/s41467-021-24041-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zeng BJ, Wang ZH, Torres PA, et al. Rapid detection of three large novel deletions of the aspartoacylase gene in non-Jewish patients with Canavan disease. Mol Genet Metab. 2006;89(1-2):156-163. doi: 10.1016/j.ymgme.2006.05.014 [DOI] [PubMed] [Google Scholar]
- 33.Bley A, Denecke J, Kohlschütter A, et al. The natural history of Canavan disease: 23 new cases and comparison with patients from literature. Orphanet J Rare Dis. 2021;16(1):227. doi: 10.1186/s13023-020-01659-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Sistermans EA, de Coo RF, van Beerendonk HM, Poll-The BT, Kleijer WJ, van Oost BA. Mutation detection in the aspartoacylase gene in 17 patients with Canavan disease: four new mutations in the non-Jewish population. Eur J Hum Genet. 2000;8(7):557-560. doi: 10.1038/sj.ejhg.5200477 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Additional data that support the findings of this study are available. Deidentified genomic and associated data from this study are available on request from the corresponding author through formal data sharing agreements. All variants reported in this study have been deposited in ClinVar (SCV005407762).
The web resources used are as follows:
• ClinVar: ncbi.nlm.nih.gov/clinvar.
• gnomAD: gnomad.broadinstitute.org.
• IGV: software.broadinstitute.org/software/igv.
• NCBI RefSeq: ncbi.nlm.nih.gov/refseq.
• OMIM: omim.org.
• Frequency Filter: cardiodb.org/allelefrequencyapp/.
• UCSC: genome.ucsc.edu/.



