Skip to main content
BMC Medical Genomics logoLink to BMC Medical Genomics
. 2025 Aug 20;18:131. doi: 10.1186/s12920-025-02204-6

Long read whole genome sequencing-based discovery of structural variants and their role in aetiology of non-syndromic autism spectrum disorder in India

Jhanvi Shah 1, Debasrija Mondal 2, Deepika Jain 3, Priti Mhatre 4, Ketan Patel 5, Anand Iyer 6, Manoj Pandya 7, Bhargavi Menghani 8, Gayatri Dave 9, Jayesh Sheth 1, Frenny Sheth 1, Shweta Ramdas 2,✉,#, Harsh Sheth 1,✉,#
PMCID: PMC12366170  PMID: 40835948

Abstract

Background

Despite having heritability estimates of 80%, ~ 50% cases of autism spectrum disorders (ASD) remain without a genetic diagnosis. Structural variants (SVs) detected using long-read whole genome sequencing (lrWGS) are a relatively new class of variants implicated in neurodevelopmental disorders. Short read sequencing (SRS) and chromosomal microarray (CMA) are unable to resolve these SVs due to their inherent technological limitations. This study was aimed to detect and delineate the role of SVs in children with non-syndromic ASDs using lrWGS in whom prior traditional genetic tests did not yield a definitive genetic diagnosis.

Methods

A total of 23 patients with no prior genetic diagnosis from karyotyping, Fragile-X analysis, CMA and short read whole exome sequencing (srWES) were selected for lrWGS using Oxford Nanopore based sequencing platform. Samples were sequenced at an average coverage of ~ 7x. Contigs generated from high accuracy base calling were aligned against GRCh38/hg38 human reference genome build. SVs were called using five variant callers- Sniffles2, cuteSV, NanoVar, SVIM, and npInv, and annotated using AnnotSV. Calls from cuteSV were used as benchmark to identify concordant calls across at least three variant callers.

Results

An average whole genome coverage of ~ 7x and N50 read length of 6.65 ± 3.3 kb was obtained across 46 runs (two runs/ sample). On average, a total of approximately 235,163 calls were made across all callers for each sample. The average number of deletions, duplications, insertions, inversions and translocations were 54,787, 3,335, 62,459, 1,286, and 113,296, respectively, were detected across all callers per sample. Of 23 cases, a candidate SV, an inversion of approximately 2.7 Mb in size encompassing SNAP25-AS1 gene was observed. This gene is likely to be involved in the synaptic pathway and has previously been associated with autism.

Conclusion

This is the first study from India to assess the role of SVs in the aetiology of non-syndromic ASDs. Despite the small sample size, low-pass genome coverage, and modest N50 read length, the study indicates a modest contribution of SVs in the aetiology of non-syndromic ASD. Dearth of data supporting the role of SVs in non-syndromic ASDs in other cohorts from around the world further supports our conclusion.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12920-025-02204-6.

Keywords: Autism spectrum disorder, Long read whole genome sequencing, Structural variants, Oxford nanopore technology, Copy number variants, Inversion

Background

Autism spectrum disorders (ASD) are a group of neurodevelopmental disorders (NDD) characterized by impaired social communication/interaction along with repetitive behaviour and/or restrictive interests [1, 2]. Affecting more than 1% of individuals worldwide, ASD has an estimated heritability of approximately 80% [35]. However, in approximately 50% of the cases, a genetic diagnosis is not obtained [6]. This is primarily due to the disorder’s complex and heterogeneous genetic underpinnings, which can involve variant(s) in a single gene, cumulative effect of variants across multiple genes, or interactions between genetic variants and environmental factors [7, 8]. In our previous study, which aimed to explore the genetic architecture of ASD using current clinically used technologies such as chromosomal microarray (CMA) and exome sequencing (ES), we achieved a combined diagnostic yield of 30%, with over 70% of the variants being de novo in origin [6]. Additionally, rare inherited variants, often classified as variants of uncertain significance, contributed to another ~ 20% of cases [6]. Similar diagnostic yields and genetic landscapes have been observed across multiple cohorts from different population groups using these technologies [912]. CMA and ES are widely used in clinical practice, primarily to detect single nucleotide variants (SNVs), small insertion-deletions (indels), and copy number variants (CNVs). In 20–25% of non-syndromic ASD cases, an SNV is identified, while CNVs are found in less than 5% of cases [6, 13]. However, beyond SNVs, indels, and CNVs, copy-neutral structural variants (SVs) represent an important class of genetic variants that make up the genetic landscape. SVs are genomic rearrangements that either alter the copy number or shift the genomic location of sequences of at least 50 nucleotides in size. These variants include CNVs (deletions and duplications), insertions, inversions, translocations, complex rearrangements, mobile element insertions, and expansions of repetitive sequences [14, 15]. However, due to the inherent technical limitations, CMA and ES often fail to detect these SVs, particularly balanced rearrangements such as inversions and translocations, as well as cryptic gains or losses in the non-coding regions.

The advent of long-read sequencing (LRS) platforms, such as those from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), has made it possible to detect SVs that were previously difficult to identify and characterise [16, 17]. Long-read contigs, spanning thousands of bases, enable detection of SVs in both coding and non-coding regions. These reads can capture SVs that are too large for short-read sequencing (SRS) (> 50 bp) and too small for cytogenomic arrays (< 200–400 kb). The application of long read whole genome sequencing (lrWGS) in rare disease cohorts has led to improvements in genetic testing and diagnostic outcomes [18, 19]. For example, the first successful application of lrWGS identified a pathogenic 2.2 kb deletion in the PRKAR1A gene in a patient diagnosed with Carney complex at 4x sequencing coverage [18]. Additionally, population-based screenings using LRS have characterized SVs in large cohorts, linking them to multiple diseases and traits. In the Icelandic population, a multiallelic SV in the ACAN gene was found to be linked to height [20], while deletions in the HBA1 and HBA2 genes, which are rare and often population-specific, were identified in anemic individuals from China [21], and more recently, confirmation of a European founder origin of an intragenic MSH2 gene inversion and individuals with complex SV involving MECP2 mutational hotspots were reported [19]. While some studies have retrospectively used targeted LRS in cases with known candidate CNVs [22], relatively few have prospectively employed LRS for genome wide variant detection and diagnosis. For example, lrWGS identified a 5.9 Mb inversion disrupting the DMD gene in a familial case of Duchenne muscular dystrophy, which was previously undetected by SRS [23]. Furthermore, in a cohort of 34 cases suspected of autosomal recessive genetic disorders, lrWGS resolved 13 cases, nearly half of which harbored SVs [24].

Beyond Mendelian diseases, SVs have been implicated in aetiology of NDDs, such as autism, through utilisation of lrWGS technologies [25]. For instance, PacBio HiFi technology was recently employed to study 96 idiopathic individuals with NDD. This approach uncovered genetic causes in seven cases, with SVs identified in four (4.17%) of them [26]. Studies also suggest that SVs affecting non-coding genomic regions, particularly those altering the copy number or position of regulatory elements, play a role in the aetiology of autism [27]. While lrWGS is increasingly being used to detect SVs in patients with ASD in the research domain, particularly in those with negative molecular diagnoses, there is paucity of data on the distribution of these variants. Therefore, we aimed to assess the distribution of SVs and their role in the aetiology of non-syndromic ASD probands using Oxford Nanopore based lrWGS.

Methods

Selection of patients

A prior study to decipher the genetic landscape of non-syndromic ASD in India was conducted in 101 probands of Indian origin by evaluating them using karyotyping, Fragile-X testing, CMA and ES. These probands, aged 2 years 6 months to 15 years were diagnosed with ASD as per the Diagnostic and Statistical Manual of Mental Disorders, 5th edition (DSM-5) criteria [2]. Children with syndromic features, isolated language delay or isolated sensory processing disorders were not recruited. In nearly 50% (n = 49) cases no causative variant was identified [6]. Of these, parents of 23 probands gave consent for further genetic analysis using lrWGS in order to detect SVs. The parents or guardians of all probands provided a written informed consent as per the Helsinki Declaration and the study was approved by the research ethics committee at Foundation for Research in Genetics and Endocrinology, Ahmedabad (ID: FRIGE/IEC/19/2020). The family history and clinical phenotypes of the patients are provided in Supplementary Figure S1 and Table S1, respectively. Of note, two consultands had twin siblings who were also clinically diagnosed with autism. However, the co-twin siblings were excluded from lrWGS analysis, as it was postulated a priori that the variant detected in the consultand could be recapitulated using orthogonal methods in the siblings.

Sample requirements

High molecular weight genomic DNA (gDNA) was extracted from peripheral blood samples using salting-out method [28]. Genomic DNA was assessed for quality using QIAxpert (Qiagen, Hilden, Germany) that measured A260/280 (ideal value: 1.8-2.0) and A260/230 (ideal value: 2.0-2.2) and dsDNA was quantified by Qubit fluorometer (Invitrogen, Massachusetts, USA). A total of 1 µg gDNA of the selected 23 cases was used per sequencing run.

Long read whole genome sequencing (lrWGS)

lrWGS was carried out on the Oxford Nanopore’s MinION Mk1C platform (Oxford Nanopore, Oxford, UK) using compatible flow-cells (R9.4.1 and R10.4) (Oxford Nanopore, Oxford, UK). Two sequencing runs using two flow-cells for each patient were performed to obtain a combined theoretical genome coverage of ~ 10x (Supplementary Table S2). For library preparation, a compatible amplification-free ligation sequencing kit (SQK-LSK112 and SQK-LSK114) (Oxford Nanopore, Oxford, UK) was used that allowed long reads to be sequenced in their native form, thereby eliminating PCR bias. The entire process was followed according to the manufacturer’s recommendations.

Detection and annotation of SVs

The raw data in the form of FAST5 or POD5 files generated at the end of the two sequencing runs for a given sample were used as an input for downstream processing. The raw signals were converted to base calls in high accuracy mode using Dorado base caller v0.3.4 (https://github.com/nanoporetech/dorado) to construct long read contigs thereby generating FASTQ files. FASTQ files from both sequencing runs were merged to form a single combined FASTQ file for each sample which was subsequently subjected to a quality check using FastQC [29]. Additionally, pycoQC [30] was used to assess the sequencing text file generated by the sequencer to create an overview of the run output. The contigs were aligned against the GRCh38/hg38 human reference genome build using minimap2 v0.2-r123 [31] to generate BAM and BAI files. BAM files were analysed by Qualimap2 [32] to assess mapping quality and coverage of the genome. Structural variants (SVs) were called using the five variant callers- Sniffles2 [33], cuteSV [34], SVIM [35], NanoVar [36], and npInv [37] to generate individual variant call files (VCF). The first four variant callers call all types of SVs (deletions, duplications, inversions, insertions and translocations) whereas npInv is specifically designed to call inversions. Annotation of these variants was carried out using the AnnotSV v3.4 programme [38] that also ranked and interpreted SV pathogenicity by using compiled functional, regulatory and clinical data. Phenotype data of each patient was coded in the Human Phenotype Ontology (HPO) [39] terminologies and was used as an input to AnnotSV. AnnotSV leverages Exomiser [40, 41] to carry out gene-disease relationship based on the SV and the given HPO, and ranks the resulting SV into the 5 ACMG class. The annotated files were then merged and sorted based on SV type into five categories, namely, deletions (del), duplications (dup), insertions (ins), inversions (inv) and translocations (tra).

Filtering strategy of SVs

Filtration of called and annotated SVs involved: (a) the quality of the calls with a QUAL score of > 20 for SVs called by cuteSV, Sniffles2, npInv and SVIM and QUAL score of > 1 for variants called by NanoVar; (b) the presence of the SV in the general population using gnomAD [42, 43] and DGV [44] databases using minor allele frequency (MAF) cutoff of at 1%; (c) number of genes overlapping the SV requiring at least one gene either, protein coding or non-coding; and (d) the effect of the SV on gene regulation using ENCODE database [45] for variants disrupting the non-coding regions of the genome (Fig. 1).

Fig. 1.

Fig. 1

Schematic representation of the analysis pathway used for SV discovery, filtration, and prioritisation

Post filtration, SVs were manually curated to obtain calls concordant across three or four variant callers that were considered for prioritization. Previous SV call benchmarking across a range of sequence coverage has shown cuteSV to have a robust SV call performance compared to other SV callers [46]. Therefore, variants called by cuteSV were used as “benchmarks” to carry out concordant variant assessment, due to its high recall [46] for detecting SVs at low coverage while using minimap2 as an aligner [47]. The only exception was calling of inversion variant calls, where npInv [37] was preferred due to its specificity to detect inversions. The calls were deemed concordant when the genomic coordinates differed by no more than 100 bp. For downstream analysis and candidate variant discovery, concordant calls across three variant callers with cuteSV as the benchmark caller were included (Supplementary Tables S3 and S4).

Prioritisation and identification of candidate variants

Prioritisation of the putative SVs was carried out manually using the following strategy: (a) the variant was checked across all samples using IGV v2.17.4 [48], and was discarded if present in three or more cases, or was present in a repetitive region (specifically for insertions and deletions), or region with poor mapping quality; (b) variants that passed the above criteria but falling in poorly covered regions as observed in the DECIPHER database (e.g. genomic regions with high GC or repetitive content leading to reduction in alignment quality and coverage) [49] were filtered out; and (c) variants that passed criteria (b) were checked for encompassing coding regions, non-coding and/or regulatory elements using the UCSC genome browser [50] and ENCODE database [45]. In addition, variant pathogenicity (specifically for dels and dups) was checked based on the ACMG-AMP guidelines [51] and ClinGen [52, 53] framework. Lastly, additional evidence to support the functional impact of any prioritized variant on a coding region was obtained from the following: (a) Human Protein Atlas (HPA) [54] which provided tissue specific expression profile of a gene. Only SV encompassing genes that had enriched expression in the brain were retained for further analysis. (b) MGI database [55, 56] to assess phenotypes in murine models based on spontaneous/induced genetically engineered mutations. SVs encompassing genes whereby murine models had shown a neurological phenotype were retained for further evaluation. (c) SFARI database [57] which consists of a catalogue of genes associated or implicated in autism. (d) STRING database [58, 59] was used to prioritise genes which are known to interact with known autism causing genes through protein-protein interactions. And lastly (e) GeneCards [6062] and ENCODE [45] databases were queried to assess the impact of SVs encompassing regulatory elements or non-coding genes.

Validation of candidate variants and delineation of inheritance pattern

Putative candidate variant(s) were validated in the proband and the inheritance pattern was delineated by performing parental segregation analysis using orthogonal approaches. The choice of orthogonal method depended on the type of SV identified. Thereafter, the variant was re-evaluated for pathogenicity and was classified as unlikely to be a causative variant, a causative variant or a potential risk factor.

Results

Data generation

A total of 46 low pass lrWGS were carried out for 23 cases (two runs per case). The details of all the sequencing runs are provided in Supplementary Table S2. The average whole genome coverage across all samples was ~ 7x. Data processing and quality assessment followed by primary analysis of the FASTQ files of all samples showed that on an average, 3,644,614 ± 2,078,615 reads, 6.66 ± 3.32 kb N50 read length and 11.56 ± 3.7 Gb output per flow-cell per sample was obtained. On aligning the reads against the GRCh38/hg38 human reference genome build, all samples had a read mapping rate of > 95%.

Outcome of variant calling and annotation

Whilst several groups have developed strategies for calling SVs from lrWGS data, there is currently no known standardised approach to SV calling in lrWGS datasets, and different pipelines perform well for different datasets. To ensure maximum recall of true positive SV calls, five callers were used for each sample. Supplementary Table S3 describes the number of SV calls stratified by SV type and SV caller for each sample (includes autosomes only). Across all callers, an average of approximately 235,163 calls were made per sample. Individually, the average calls per sample were 1,440 for cuteSV, 21,046 for Sniffles2, 183,759 for SVIM, 18,764 for NanoVar, and 20 for npInv. In terms of SV type, the average number of calls per sample from all callers were 54,787 for deletion, 3,335 for duplication, 62,459 for insertion, 1,286 for inversion, and 113,296 for translocation. The highest number of SV calls were made by SVIM, while cuteSV made the least. Translocations and insertions were almost evenly represented, with inversions showing the lowest number of calls per sample. To minimise false positive and false negative SV calls, SVs that were called by at least three SV callers were prioritized. Figure 1 and Supplementary Table S4 shows the steps and filters applied during variant prioritization for each SV type.

Outcomes of filtration and prioritisation of the concordant variant set

Supplementary Table S5 describes the distribution of concordant calls made across each SV type for all samples. Concordant calls were manually analysed using SVs called by cuteSV as the “truth set” [46] for deletion, duplication, insertion and translocation calls, whereas, npInv caller was used as the basis for inversion calls. The case wise prioritised variant list is provided in Table 1. Prioritisation of these variants was carried out as previously explained in the methodology with all the listed variants passing criteria (c) (See Methods section and Fig. 1). However, based on the available literature evidence, consisting of disease associations and functional impact, only the inversion variant was considered a potential candidate and was considered for further investigation. The remaining variants were not analysed further due to lack of supporting information.

Table 1.

Prioritised variant list

Case ID SV type Product size (bp) Zygosity SV
ASD-090 Inversion 2,662,540 Heterozygous 20_10183741_12846281_INV_1
ASD-015 Insertion 90 ? 10_7868326_7868348_INS_1
ASD-100 Deletion 282 Homozygous 18_27044290_27044572_DEL_1
ASD-068 Insertion 248 ? 3_46227604_46227605_INS_1

SV: structural variant; bp: base pair

Candidate variant(s) discovery

A putative candidate SV associated with ASD phenotype was identified in the case, ASD-090 (Fig. 2). This SV is an inversion of approximately 2.7 Mb in size, detected at a depth of 10x. It is located on chromosome 20 with the breakpoint 1 (BP1) disrupting the SNAP25-AS1 gene at the locus 20p12.2 and the breakpoint 2 (BP2) in an intergenic region at the locus 20p12.1. The SNAP25-AS1 gene was disrupted between its second and third exons. This is likely to affect its transcription resulting in aberrant synthesis of the SNAP25-AS1 long non-coding RNA (lncRNA). Although the SNAP25-AS1 gene transcribes lncRNA, its specific function remains unknown. The BP1-induced disruption of the SNAP25-AS1 coding sequence likely generates an aberrant RNA that interferes with the SNAP25 gene function, due to the tail-on genomic orientation of the two genes, implying a cis-regulatory effect.

Fig. 2.

Fig. 2

IGV v2.17.4 showing split reads at both break points of the inversion, left- breakpoint 1, right- breakpoint 2. The blue and red reads suggests forward/positive read orientation and reverse/negative read orientation, respectively

The following rationale provided supporting evidence towards the candidacy of this variant being a potential aetiology of autism in the proband– (a) no inversions that disrupt the sequence of these genes have been observed in the gnomAD database SVs v4.1.0 [42]; (b) the SNAP25 gene has a pLOEUF score of 0.23 suggesting intolerance towards loss of function variants; (c) the SNAP25 and SNAP25-AS1 genes are predominantly and specifically expressed in the adult brain (dbGaP Accession phs000424.v10.p2 on 01/28/2025) with a primary role in synaptic signal transductions of excitatory and inhibitory neurons; (d) this gene is homologous to the mouse Snap25 gene with preferential expression in the hippocampus [63] and mice with modified Snap25 gene present with various behavioural and neurological symptoms (https://www.informatics.jax.org/marker/MGI:98331); (e) the human SNAP25 protein also interacts with multiple other synaptic proteins associated with autism [64]; (f) SNVs in the SNAP25 gene have previously been associated with autism in mice as well as humans [65] and also reported in the AutDB [66] and SFARI [57] databases; (g) the SNAP25-AS1 gene that transcribes an antisense RNA has been observed to be upregulated and associated with the synaptic vesicle cycling pathway in patients with ASD [67, 68]; (h) both genes have been previously associated with autism with SNAP25 gene playing a role in synaptic vesicle membrane docking and fusion pathway [63] and SNAP25-AS1 gene regulating the synaptic vesicle cycling pathway [68]; and (i) studies conducted on large ASD cohorts have repeatedly shown majority of the identified genes to encode proteins involved in synaptic formation, transcriptional regulation/ubiquitination and chromatin remodelling [69, 70]. Even with a genetically heterogeneous aetiology, they converge on a common pathway that intersects at the synapse [71]. Considering SNAP25 and SNAP25-AS1 genes are involved in the synaptic pathway, the inversion identified in the proband is likely to increase susceptibility to autism, if not be the sole cause of it. Of note, a recent extensive analysis of whole genome sequencing data from 33,924 families with rare disease suggest a notable role of inversions in rare disease aetiology [19].

Validation of candidate SVs by orthogonal methods

To validate the inversion detected by lrWGS, two primer pairs (Supplementary Table S6) were designed flanking the BP1 and BP2 loci (Fig. 3A) to carry out break-point PCR (Fig. 3B). The variant was validated in the affected proband, and segregation analysis was performed in his affected twin sibling, an elder unaffected brother, and his healthy parents. The variant was observed in both twins affected with autism and their mother, but was absent in the unaffected elder brother and father (Fig. 3B). Moreover, Sanger sequencing was also performed on the PCR products to confirm and visualise the difference in the wildtype and inverted sequences (Fig. 3C). Based on the segregation analysis and the aforementioned biological evidence, the variant was classified as a variant of uncertain significance in accordance with the ACMG-AMP classification system.

Fig. 3.

Fig. 3

A Figure depicting orientation of primer pairs on wildtype (Ref) and inverted (Alt) alleles flanking both breakpoints (BP1: breakpoint 1; BP2: breakpoint 2; FP: forward primer and RP: reverse primer). B Validation and parental segregation of the identified inversion by end-point PCR in the family members: Primer pair 1: BP1 FP + BP1 RP; Primer pair 2: BP2 FP + BP2 RP, Primer pair 3: BP1 FP + BP2 FP, and Primer pair 4: BP1 RP + BP2 RP; Primer pairs 1 and 2 and pairs 3 and 4 will amplify in the presence of the wildtype allele (Ref) and variant allele (Alt), respectively. Based on the gel image, the twins and their mother show amplification with all four primer pairs, consistent with heterozygosity or carrier status. In contrast, the father and the elder brother show amplification only with primer pairs 1 and 2, indicating they are homozygous for the wildtype allele. C Sanger chromatograms of wildtype and inverted sequences flanking both breakpoints in heterozygotes (both twins and mother)

Discussion

The ASD has been attributed to a high genetic heterogeneity in addition to its multifactorial inheritance and an estimated 3:1 male to female sex bias [72]. Genetic variants observed in the prior ASD cohorts range from chromosomal aberrations, CNVs to variations at gene level that include SNVs, indels, and small tandem duplications. SVs are another class of genetic variants that have recently been implicated in autism [68]. These include balanced copy neutral rearrangements or any other complex DNA rearrangements apart from CNVs across the genome that involve at least 50 nucleotides [9, 10]. They include rearrangements that affect the non-coding regions in the genome, like microRNAs (miRNA), and lncRNA which until now were neglected as they were considered translationally silent [11, 12]. Thousands of such rare SVs that were previously undetected by karyotyping, microarray and exome sequencing were observed using lrWGS [13, 14]. SVs, mainly deletion and duplications have commonly been reported in the literature due to their ease of detection compared to inversions, translocations and other complex SVs [15]. At present, lrWGS is widely used in the research domain in cases where a previous molecular diagnosis could not be identified in protein-coding regions in patients diagnosed with ASD [16]. Additionally, there is limited data on the proportion of ASD cases that harbour these complex SVs.

Autism is considered to be a complex multifactorial disorder with genetic heterogeneity. Multiple loci have been identified, some or all of which together may contribute to the phenotype [73]. Despite decades of research and array of tests, the diagnostic yield of approximately 30% is observed in the clinic [6]. Majority of these variants are in the coding region and follow de novo mode of inheritance. Compared to SNVs however, SVs are more likely to disrupt or rearrange functional elements in the genome due to their size and location and hence are more likely to affect gene regulation [27, 74, 75]. Regulatory elements may be the cause or add to the risk of various complex diseases including autism. Advancement in technology has resulted in the development of LRS and its application to identify SVs in cases where prior genetic tests remain inconclusive [76].

SRS based exome has become a primary diagnostic tool in clinical practice, boosting diagnostic yields and contributing to the discovery of new disease-related genes over the past decade. This exemplifies the primary hypotheses that the majority of the disease-causing variants are likely to be present within the coding region of the genome. However, recent exploration of the non-coding region of the genome using short read whole genome sequencing (srWGS) has facilitated the identification of causative SNVs within deep intronic regions, promoters, untranslated regions (UTRs), and other non-coding genomic loci. For instance, srWGS recently uncovered a recurrent de novo SNV in the RNU4-2 gene, which encodes a small nuclear RNA, in approximately 0.4% of NDD cases, suggesting a significant role for non-coding RNAs in these conditions. This finding shifted the focus from solely protein-coding genes and emphasized the need to explore non-coding regions for diagnosing NDDs [77]. Nonetheless, a definitive diagnosis can only be achieved in up to 70% of cases, depending on the genetic complexity of the disorder [78, 79]. This highlights that interpretation challenges continue, and increasing coverage with the same SRS technology may not fully address the detection of missing variants. These limitations emphasize the difficulties SRS encounters, especially in interpreting complex genomic regions [24].

In the present study, we report utilisation of Oxford Nanopore based lrWGS for delineation of the role of SVs in non-syndromic ASD. Of the 23 cases analysed, we observe a putative inversion variant involving SNAP25-AS1 gene in a case with non-syndromic ASD, which was previously undetected by CMA and srWES. The variant being small in size and copy neutral makes it undetectable by karyotyping and CMA, thereby suggesting the need for lrWGS. In congruence with this, a prior study involving triplets with autism were detected with a homozygous deletion on chromosome 19 using lrWGS. This variant disrupted the ZNF826P gene that overlapped with a 1.4 kb lncRNA and was reported as a potential cause of autism in them [76]. The BP1 of the inversion in the present case disrupts the SNAP25-AS1 gene that transcribes a lncRNA as well as the 5’ upstream region of the SNAP25 gene located downstream to it. This disruption may affect the transcription of the SNAP25-AS1 lncRNA or affect a regulatory element that governs the SNAP25 gene. Although, assessment of the variant’s impact was beyond the scope of the current study. The SNAP25 gene encodes for synaptosomal-associated protein 25, a presynaptic plasma membrane protein that plays an important role in the synaptic vesicle membrane docking and fusion pathway required for delivery of the cargo. SNVs in the SNAP25 gene have previously been associated with autism [65, 67, 80, 81]. On the other hand, SNAP25-AS1 is a four-exon gene that transcribes an antisense lncRNA. Antisense transcripts/transcription regulates gene expression and genome integrity [82]. The SNAP25-AS1 has also been associated with the synaptic vesicle cycling pathway and found to be up-regulated in ASD patients [67, 68]. In addition, it further supports the role of non-coding elements in the aetiology of autism. Prior reported studies suggest inversions to be associated with mental retardation and autism [19, 83, 84]. The absence of the inversion in the unaffected brother further increases the possibility of the inversion being associated with the autistic phenotype in the twins in the present study. Often, rare SVs involving regulatory elements have been considered to increase the risk rather than be the sole cause of the disease [76]. These also tend to be inherited from the parent. The presence of the SV in an apparently healthy mother could be explained by the sex-bias reported in autism which has recently been attributed to the female protective effect wherein mothers of children with ASD carry more genetic risks [85]. The proportion of non-syndromic ASD cases that harbour such SVs remains unknown presently. Another critical observation from our study suggests a modest role of SVs in the aetiology of non-syndromic ASD. This inference is in congruence with previous studies that have investigated the role of lrWGS in NDDs, particularly ASD. SVs, often involving large genomic rearrangements that impact multiple genes, are commonly linked to various phenotypes and are less likely to be found in non-syndromic ASD cases [86]. The majority of SVs detected in non-syndromic ASD cases through lrWGS do not appear to be causative.

LRS has emerged as a valuable tool for identifying a wide range of variants in hard-to-access genomic regions, helping to bridge the diagnostic gap, particularly in detecting SVs that SRS often misses. Although SVs are less common than SNVs, they can have significant functional implications due to their size and the extensive genomic regions they impact [15]. SVs in the non-coding regions can remove or replicate kilobases of DNA or even rearrange entire chromosome sections, making them more likely to influence gene regulation than SNVs [25]. It is estimated that on an average, an individual has ~ 22,500 SVs [20, 87]. lrWGS has shown potential for detecting SVs and copy number variations (CNVs), with recent advances in Oxford Nanopore sequencing enhancing SNV detection. However, lrWGS still faces challenges in identifying smaller variants; for SV detection, a read depth of 10-20x is sufficient, while SNV identification is more reliable at 20-30 × [17]. lrWGS has achieved 100% concordance with exome sequencing (ES) and chromosomal microarray analysis (CMA) for detecting CNVs at an average whole genome coverage of 1.13x. However, this study used adaptive sampling on 30 samples with a total of 50 replicates achieving an on-target average depth of 9.5 × [88]. Indeed, recent benchmarking data of long-read aligners and SV callers at various depth of coverage from Oxford Nanopore lrWGS data suggests that while reduction in F1 score, precision and recall is observed at 10x coverage, the reduction is likely to be stemming from SV calls that are smaller than 100 bp [46]. In the present study, an average genome wide coverage of ~ 7x was obtained and based on the prior benchmarking data, SV calls from cuteSV caller were used as “benchmark” data during variant filtering. Furthermore, addition of 4 other SV callers together with cuteSV as “benchmark” likely led to high SV call precision as well as an increase in false negative SV calls. Additionally, SV calls are affected by the average read length (N50). Indeed, SV recall and F-measure indicates that optimal performance is reached with reads having lengths of 15-20kb [17]. Lastly, recent comparison of alignment versus assembly based methods for SV detection suggests that assembly based tools while excel at detecting large SV and exhibit robustness to coverage fluctuations, alignment based methods show superior genotyping accuracy and calling of SVs such as translocations and inversions, especially at low sequence coverage (5-10x) [22]. Therefore, with the given average coverage of ~ 7x and N50 of ~ 6.65 kb, the current study utilised alignment-based SV calling approach for high SV call precision and genotype accuracy with the possibility of sub-optimal recall.

In the landscape of lrWGS, computational methods have become indispensable for annotation and interpretation of the functional relevance and clinical significance of SVs. Two potential computational techniques have been proposed to identify pathogenic germline SVs, which rely on their inherent paradigms- knowledge driven versus data-driven [89]. Knowledge based techniques leverage existing knowledge and ACMG framework, whereas, data driven techniques are beneficial in identifying novel SVs whose mechanism or functional impact haven’t been fully understood yet. In the present study, we utilised the AnnotSV tool which is a knowledge-based technique that incorporates the ACMG framework for variant annotation and interpretation. Recent benchmarking data has shown AnnotSV to deliver superior performance across various metrics including SV types, lengths, gene content, and evaluation of biological mechanisms [89]. Indeed, while guidelines for clinical interpretation of variants found in the non-coding regions of the genome are available, they are primarily intended to cover SNVs and indels which are < 50 bp in size [90]. Though the authors posit the applicability of the principles to CNVs and SVs in non-coding regions of the genome. Despite this, challenges persist as most annotation and interpretation techniques concentrate on deletion and duplication, often overlooking other SV types. This may be in part due to the lack of ACMG guidelines and scarcity of “gold standard” datasets for certain SV types.

There are certain limitations of the present study that need to be addressed [1]. The study included a relatively small sample size and proband-only sequencing approach, which could limit detection of de novo SVs [2]. With the inherent limitations of the base calling quality (< Q20 calls), SNV calling could not be carried out reliably at 7x coverage. This could preclude identification of putative causative SNVs in the non-coding regions of the genome [3]. Whilst data from Oxford Nanopore’s lrWGS could be used to interrogate the role of tandem repeat expansion [91] and native DNA methylation [92] with autism susceptibility, the analysis of the same was beyond the scope of the current study.

Conclusion

Data from large scale genomic and transcriptomic studies have helped to delineate the genetic architecture of non-syndromic ASD in European/non-Hispanic white, as well as, Indian populations. To the best of our knowledge, this is the first study to present the spectrum of SVs detected in non-syndromic ASD patients and the overall utility of lrWGS from India. Whilst this study focuses on the application of lrWGS in previously screened ASD cases, the combined output from the present study and that of our previous study utilising CMA and WES [6] suggests SVs play a modest role in the aetiology of ASD. In congruence with data from prior studies, the current study provides evidence towards minimal utility of lrWGS for carrying out clinical genetic diagnosis of non-syndromic ASD.

Supplementary Information

12920_2025_2204_MOESM1_ESM.docx (440.9KB, docx)

Supplementary Material 1: Supplementary Figure S1. Pedigree charts of the selected 23 ASD patient-parent trios in the current cohort.

12920_2025_2204_MOESM2_ESM.xlsx (78.8KB, xlsx)

Supplementary Material 2: Supplementary Table S1. Phenotypic description of 23 ASD probands in the present study cohort.

12920_2025_2204_MOESM3_ESM.docx (26.5KB, docx)

Supplementary Material 3: Supplementary Table S2. Pre and post sequencing run details including quality of the input template DNA.

12920_2025_2204_MOESM4_ESM.docx (42.1KB, docx)

Supplementary Material 4: Supplementary Table S3. Number of variants called for each sample across different SV types by each variant caller.

12920_2025_2204_MOESM5_ESM.xlsx (20.2KB, xlsx)

Supplementary Material 5: Supplementary Table S4. Steps used for filtering and prioritization of SVs.

12920_2025_2204_MOESM6_ESM.docx (21.8KB, docx)

Supplementary Material 6: Supplementary Table S5. Number of concordant calls and prioritized variants for each SV type for all cases.

12920_2025_2204_MOESM7_ESM.docx (15.1KB, docx)

Supplementary Material 7: Supplementary Table S6. Primer sequences and orthogonal methods used for validation of SVs.

Acknowledgements

We thank the families for their support and participation in the study.

Abbreviations

ASD

autism spectrum disorder

NDD

Neurodevelopmental disorder

CMA

Chromosomal microarray

ES

Exome sequencing

WES

Whole exome sequencing

WGS

Whole genome sequencing

SRS

Short read sequencing

LRS

Long read sequencing

lrWGS

Long read-whole genome sequencing

srWGS

Short read-whole genome sequencing

SNV

Single nucleotide variant

Indel

Insertion and deletion

CNV

Copy number variant

SV

Structural variant

VUS

Variant of uncertain significance

ACMG-AMP

American College of Medical Genetics- Association of Molecular Pathology

ClinGen

Clinical Genome Resource

lncRNA

Long non-coding RNA

miRNA

MicroRNA

DSM-5

Diagnostic and statistical manual − 5th edition

HPA

Human Protein Atlas

MGI

Mouse Genome Informatics

SFARI

Simons Foundation Autism Research Initiative

STRING

Search Tool for Retrieval of Interacting Genes/Proteins

ENCODE

Encyclopaedia of DNA Elements

Authors’ contributions

FS, SR and HS have full access to the data in the study and take responsibility for its integrity and accuracy of data analysis. Study design was carried out by HS, FS, SR and DJ. Patient acquisition and recruitment was carried out by DJ, PM, KP, AI, MP, BM and FS. Data generation, analysis and interpretation was carried out by JS, DM, SR and HS. Drafting the manuscript was carried out by JS, SR, JaS, GD and HS. Statistical analysis was carried out by JS, SR and HS. Critical evaluation of the manuscript was carried out by all authors. Study supervising and funding acquisition was carried out by FS, DJ, SR and HS. All authors read and approved the final manuscript.

Funding

This work was supported by the Gujarat State Biotechnology Mission grant GSBTM/JDR&D/618/21–22/00003681 (Frenny Sheth, Deepika Jain, Shweta Ramdas and Harsh Sheth). Furthermore, Jhanvi Shah was supported by the CSIR-NET fellowship (09/1331(0001)/2021-EMR-I). The funder had no role in the design and conduct of the study; collection, management, analysis and interpretation of data; preparation, review or approval of the manuscript; and decision to submit the manuscript for publication.

Data availability

Datasets supporting the conclusions of this article are available on the EGA website (European Genome-Phenome Archive) under the title “Long read sequencing to detect structural variants in Indian patients with non-syndromic autism spectrum disorder”. To access Oxford Nanopore long-read whole genome sequencing data, the study ID is “EGAS50000000842”. Datasets can be accessed from the EGA website using the following weblink: https://ega-archive.org/.

Declarations

Ethics approval and consent to participate

This study was approved by the research ethics committee at the Foundation for Research in Genetics and Endocrinology, Ahmedabad (ID: FRIGE/IEC/19/2020). A written informed consent as per the Helsinki Declaration was obtained from the parents and guardians of all probands. All the methods in the study were carried out as per the Helsinki Declaration. The entire dataset presented here does not contain any identifiable information.

Consent for publication

A written informed consent as per the Helsinki Declaration was obtained from the parents and guardians of all probands. All the methods in the study were carried out as per the Helsinki Declaration. The entire dataset presented here does not contain any identifiable information.

Competing interests

DJ, PM, KP, AI, MP, and BM are employed at a for-profit organisation. The remaining authors declare that they have no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Shweta Ramdas and Harsh Sheth contributed equally to this work.

Contributor Information

Shweta Ramdas, Email: shwetaramdas@iisc.ac.in.

Harsh Sheth, Email: harsh.sheth@frige.co.in.

References

  • 1.Vahia VN. Diagnostic and statistical manual of mental disorders 5: a quick glance. Indian J Psychiatry. 2013;55(3):220–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.DSM Library. 2023. Diagnostic and statistical manual of mental disorders. Available from: https://dsm.psychiatryonline.org/doi/book/10.1176/appi.books.9780890425596. [cited 2023 Jun 24].
  • 3.Tick B, Bolton P, Happé F, Rutter M, Rijsdijk F. Heritability of autism spectrum disorders: a meta-analysis of twin studies. J Child Psychol Psychiatry. 2016;57(5):585–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Koko M, Satterstrom FK, Aleksic B, Artomov M, Barbosa M, Benetti E, et al. Contribution of autosomal rare and de novo variants to sex differences in autism. Am J Hum Genet. 2025;112(3):599–614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bai D, Yip BHK, Windham GC, Sourander A, Francis R, Yoffe R, et al. Association of genetic and environmental factors with autism in a 5-country cohort. JAMA Psychiatr. 2019;76(10):1035–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sheth F, Shah J, Jain D, Shah S, Patel H, Patel K, et al. Comparative yield of molecular diagnostic algorithms for autism spectrum disorder diagnosis in India: evidence supporting whole exome sequencing as first tier test. BMC Neurol. 2023;23(1): 292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wilfert AB, Turner TN, Murali SC, Hsieh P, Sulovari A, Wang T, et al. Recent ultra-rare inherited variants implicate new autism candidate risk genes. Nat Genet. 2021;53(8):1125–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Weiner DJ, Wigdor EM, Ripke S, Walters RK, Kosmicki JA, Grove J, et al. Polygenic transmission disequilibrium confirms that common and rare variation act additively to create risk for autism spectrum disorders. Nat Genet. 2017;49(7):978–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Arteche-López A, Gómez Rodríguez MJ, Sánchez Calvin MT, Quesada-Espinosa JF, Lezana Rosales JM, Palma Milla C, et al. Towards a change in the diagnostic algorithm of autism spectrum disorders: evidence supporting whole exome sequencing as a first-tier test. Genes. 2021. 10.3390/genes12040560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Tammimies K, Marshall CR, Walker S, Kaur G, Thiruvahindrapuram B, Lionel AC, et al. Molecular diagnostic yield of chromosomal microarray analysis and whole-exome sequencing in children with autism spectrum disorder. JAMA. 2015;314(9):895–903. [DOI] [PubMed] [Google Scholar]
  • 11.Rossi M, El-Khechen D, Black MH, Farwell Hagman KD, Tang S, Powis Z. Outcomes of diagnostic exome sequencing in patients with diagnosed or suspected autism spectrum disorders. Pediatr Neurol. 2017;70:34-e432. [DOI] [PubMed] [Google Scholar]
  • 12.Gibitova EA, Dobrynin PV, Pomerantseva EA, Musatova EV, Kostareva A, Evsyukov I, et al. A study of the genomic variations associated with autistic spectrum disorders in a Russian cohort of patients using whole-exome sequencing. Genes. 2022. 10.3390/genes13050920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Srivastava S, Love-Nichols JA, Dies KA, Ledbetter DH, Martin CL, Chung WK, et al. Meta-analysis and multidisciplinary consensus statement: exome sequencing is a first-tier clinical diagnostic test for individuals with neurodevelopmental disorders. Genet Med. 2019;21(11):2413–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Escaramís G, Docampo E, Rabionet R. A decade of structural variants: description, history and methods to detect structural variation. Brief Funct Genomics. 2015;14(5):305–14. [DOI] [PubMed] [Google Scholar]
  • 15.Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526(7571):75–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Chaisson MJP, Sanders AD, Zhao X, Malhotra A, Porubsky D, Rausch T, et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat Commun. 2019;10(1): 1784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.De Coster W, Strazisar M, De Rijk P. Critical length in long-read resequencing. NAR Genomics Bioinform. 2020;2(1): lqz027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Merker JD, Wenger AM, Sneddon T, Grove M, Zappala Z, Fresard L, et al. Long-read genome sequencing identifies causal structural variation in a Mendelian disease. Genet Med. 2018;20(1):159–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Pagnamenta AT, Yu J, Walker S, Noble AJ, Lord J, Dutta P, et al. The impact of inversions across 33,924 families with rare disease from a National genome sequencing project. Am J Hum Genet. 2024;111(6):1140–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Beyter D, Ingimundardottir H, Oddsson A, Eggertsson HP, Bjornsson E, Jonsson H, et al. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat Genet. 2021;53(6):779–86. [DOI] [PubMed] [Google Scholar]
  • 21.Wu Z, Jiang Z, Li T, Xie C, Zhao L, Yang J, et al. Structural variants in the Chinese population and their impact on phenotypes, diseases and population adaptation. Nat Commun. 2021;12(1): 6501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Liu YH, Luo C, Golding SG, Ioffe JB, Zhou XM. Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nat Commun. 2024;15(1):2447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Bruels CC, Littel HR, Daugherty AL, Stafki S, Estrella EA, McGaughy ES et al. Diagnostic capabilities of nanopore long-read sequencing in muscular dystrophy. 2024. Available from: https://onlinelibrary.wiley.com/doi/10.1002/acn3.51612. [cited 2024 Dec 7]. [DOI] [PMC free article] [PubMed]
  • 24.AlAbdi L, Shamseldin HE, Khouj E, Helaby R, Aljamal B, Alqahtani M, et al. Beyond the exome: utility of long-read whole genome sequencing in exome-negative autosomal recessive diseases. Genome Med. 2023;15(1): 114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.D’haene E, Vergult S. Interpreting the impact of noncoding structural variation in neurodevelopmental disorders. Genet Med. 2021;23(1):34–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hiatt SM, Lawlor JMJ, Handley LH, Latner DR, Bonnstetter ZT, Finnila CR, et al. Long-read genome sequencing and variant reanalysis increase diagnostic yield in neurodevelopmental disorders. Genome Res. 2024;34(11):1747–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Brandler WM, Antaki D, Gujral M, Kleiber ML, Whitney J, Maile MS, et al. Paternally inherited cis-regulatory structural variants are associated with autism. Science. 2018;360(6386):327–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Miller SA, Dykes DD, Polesky HF. A simple salting out procedure for extracting DNA from human nucleated cells. Nucleic Acids Res. 1988;16(3):1215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Babraham Bioinformatics -. FastQC A Quality Control tool for High Throughput Sequence Data. 2024. Available from: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/. [cited 2024 Sep 13].
  • 30.Leger A, Leonardi T. pycoQC, interactive quality control for Oxford nanopore sequencing. J Open Source Softw. 2019;4(34): 1236. [Google Scholar]
  • 31.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Okonechnikov K, Conesa A, García-Alcalde F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics. 2016;32(2):292–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, von Haeseler A, et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods. 2018;15(6):461–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Jiang T, Liu Y, Jiang Y, Li J, Gao Y, Cui Z, et al. Long-read-based human genomic structural variation detection with CuteSV. Genome Biol. 2020;21(1): 189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Heller D, Vingron M. SVIM: structural variant identification using mapped long reads. Bioinformatics. 2019;35(17):2907–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Tham CY, Tirado-Magallanes R, Goh Y, Fullwood MJ, Koh BTH, Wang W, et al. Nanovar: accurate characterization of patients’ genomic structural variants using low-depth nanopore sequencing. Genome Biol. 2020;21(1): 56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Shao H, Ganesamoorthy D, Duarte T, Cao MD, Hoggart CJ, Coin LJM. NpInv: accurate detection and genotyping of inversions using long read sub-alignment. BMC Bioinformatics. 2018;19(1): 261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Geoffroy V, Herenger Y, Kress A, Stoetzel C, Piton A, Dollfus H, et al. AnnotSV: an integrated tool for structural variations annotation. Bioinformatics. 2018;34(20):3572–4. [DOI] [PubMed] [Google Scholar]
  • 39.Köhler S, Gargano M, Matentzoglu N, Carmody LC, Lewis-Smith D, Vasilevsky NA, et al. The human phenotype ontology in 2021. Nucleic Acids Res. 2020;49(D1):D1207–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Smedley D, Jacobsen JOB, Jager M, Köhler S, Holtgrewe M, Schubach M, et al. Next-generation diagnostics and disease-gene discovery with the exomiser. Nat Protoc. 2015;10(12):2004–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Vestito L, Jacobsen JOB, Walker S, Cipriani V, Harris NL, Haendel MA, et al. Efficient reinterpretation of rare disease cases using exomiser. NPJ Genom Med. 2024;9(1): 65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Francioli LC, et al. A structural variation reference for medical and population genetics. Nature. 2020;581(7809):444–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024;625(7993):92–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.MacDonald JR, Ziman R, Yuen RKC, Feuk L, Scherer SW. The database of genomic variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 2014;42(D1):D986-92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Helal AA, Saad BT, Saad MT, Mosaad GS, Aboshanab KM. Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data. Sci Rep. 2024;14(1): 6160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Jiang T, Liu S, Cao S, Liu Y, Cui Z, Wang Y, et al. Long-read sequencing settings for efficient structural variation detection based on comprehensive evaluation. BMC Bioinformatics. 2021;22:552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, et al. DECIPHER: database of chromosomal imbalance and phenotype in humans using ensembl resources. Am J Hum Genet. 2009;84(4):524–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Nassar LR, Barber GP, Benet-Pagès A, Casper J, Clawson H, Diekhans M, et al. The UCSC genome browser database: 2023 update. Nucleic Acids Res. 2023;51(D1):D1188–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Riggs ER, Andersen EF, Cherry AM, Kantarci S, Kearney H, Patel A, et al. Technical standards for the interpretation and reporting of constitutional copy number variants: a joint consensus recommendation of the American college of medical genetics and genomics (ACMG) and the clinical genome resource (ClinGen). Genet Med. 2020;22(2):245–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Thaxton C, Good ME, DiStefano MT, Luo X, Andersen EF, Thorland E, et al. Utilizing clinGen gene-disease validity and dosage sensitivity curations to inform variant classification. Hum Mutat. 2022;43(8):1031–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Preston CG, Wright MW, Madhavrao R, Harrison SM, Goldstein JL, Luo X, et al. ClinGen variant curation interface: a variant classification platform for the application of evidence criteria from ACMG/AMP guidelines. Genome Med. 2022;14(1):6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Uhlén M, Fagerberg L, Hallström BM, Lindskog C, Oksvold P, Mardinoglu A, et al. Proteomics. Tissue-based map of the human proteome. Science. 2015;347(6220):1260419. [DOI] [PubMed] [Google Scholar]
  • 55.Baldarelli RM, Smith CM, Finger JH, Hayamizu TF, McCright IJ, Xu J, et al. The mouse gene expression database (GXD): 2021 update. Nucleic Acids Res. 2021;49(D1):D924–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Baldarelli RM, Smith CL, Ringwald M, Richardson JE, Bult CJ. Mouse genome informatics: an integrated knowledgebase system for the laboratory mouse. Genetics. 2024;227(1): iyae031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Banerjee-Basu S, Packer A. SFARI gene: an evolving database for the autism research community. Dis Model Mech. 2010;3(3–4):133–5. [DOI] [PubMed] [Google Scholar]
  • 58.Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019;47(D1):D607–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023;51(D1):D638–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Barshir R, Fishilevich S, Iny-Stein T, Zelig O, Mazor Y, Guan-Golan Y, et al. Genecarna: a comprehensive gene-centric database of human non-coding RNAs in the genecards suite. J Mol Biol. 2021;433(11): 166913. [DOI] [PubMed] [Google Scholar]
  • 61.Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S et al. The GeneCards Suite: From gene data mining to disease genome sequence analyses. 2024. Available from: https://currentprotocols.onlinelibrary.wiley.com/doi/10.1002/cpbi.5. [cited 2024 Dec 7]. [DOI] [PubMed]
  • 62.Aggarwal S, Rosenblum C, Gould M, Ziman S, Barshir R, Zelig O, et al. Expanding and enriching the LncRNA gene–disease landscape using the GeneCaRNA database. Biomedicines. 2024;12(6): 1305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Zhao N, Hashida H, Takahashi N, Sakaki Y. Cloning and sequence analysis of the human SNAP25 cDNA. Gene. 1994;145(2):313–4. [DOI] [PubMed] [Google Scholar]
  • 64.SNAP25 protein. (human) - STRING interaction network. 2025. Available from: https://string-db.org/cgi/network?taskId=bh7qwOunm5sd&sessionId=b84iWKN1ga5f. [cited 2025 Jan 28].
  • 65.Braida D, Guerini FR, Ponzoni L, Corradini I, De Astis S, Pattini L, et al. Association between SNAP-25 gene polymorphisms and cognition in autism: functional consequences and potential therapeutic strategies. Transl Psychiatry. 2015;5(1):e500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Basu SN, Kollu R, Banerjee-Basu S. AutDB: a gene reference resource for autism research. Nucleic Acids Res. 2009;37(Database issue):D832-836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Wang Q, Wang Y, Ji W, Zhou G, He K, Li Z, et al. SNAP25 is associated with schizophrenia and major depressive disorder in the Han Chinese population. J Clin Psychiatry. 2015;76(1):e76–82. [DOI] [PubMed] [Google Scholar]
  • 68.Barros II, Leão V, Santis JO, Rosa RCA, Brotto DB, Storti CB, et al. Non-syndromic intellectual disability and its pathways: a long noncoding RNA perspective. Non-Coding RNA. 2021;7(1): 22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.De Rubeis S, He X, Goldberg AP, Poultney CS, Samocha K, Cicek AE, et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature. 2014;515(7526):209–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Rylaarsdam L, Guemez-Gamboa A. Genetic causes and modifiers of autism spectrum disorder. Front Cell Neurosci. 2019;13:385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Guang S, Pang N, Deng X, Yang L, He F, Wu L, et al. Synaptopathology involved in autism spectrum disorder. Front Cell Neurosci. 2018;12:470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Genovese A, Butler MG. The autism spectrum: behavioral, psychiatric and genetic associations. Genes. 2023;14(3):677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Schellenberg GD, Dawson G, Sung YJ, Estes A, Munson J, Rosenthal E, et al. Evidence for multiple loci from a genome scan of autism kindreds. Mol Psychiatry. 2006;11(11):1049–60. [DOI] [PubMed] [Google Scholar]
  • 74.Kainer D, Templeton AR, Prates ET, Jacboson D, Allan ERO, Climer S, et al. Structural variants identified using non-Mendelian inheritance patterns advance the mechanistic Understanding of autism spectrum disorder. Hum Genet Genomics Adv. 2022;4(1):100150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Feuk L, Carson AR, Scherer SW. Structural variation in the human genome. Nat Rev Genet. 2006;7(2):85–97. [DOI] [PubMed] [Google Scholar]
  • 76.Begum G, Albanna A, Bankapur A, Nassir N, Tambi R, Berdiev BK, et al. Long-read sequencing improves the detection of structural variations impacting complex non-coding elements of the genome. Int J Mol Sci. 2021;22(4): 2060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Chen Y, Dawes R, Kim HC, Stenton SL, Walker S, Ljungdahl A et al. De Novo variants in the non-coding spliceosomal SnRNA gene RNU4-2 are a frequent cause of syndromic neurodevelopmental disorders. MedRxiv. 2024;2024.04.07.24305438.
  • 78.Shickh S, Mighton C, Uleryk E, Pechlivanoglou P, Bombard Y. The clinical utility of exome and genome sequencing across clinical indications: a systematic review. Hum Genet. 2021;140(10):1403–16. [DOI] [PubMed] [Google Scholar]
  • 79.Clark MM, Stark Z, Farnaes L, Tan TY, White SM, Dimmock D, et al. Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases. NPJ Genom Med. 2018;3: 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Guerini FR, Bolognesi E, Chiappedi M, Manca S, Ghezzo A, Agliardi C, et al. SNAP-25 single nucleotide polymorphisms are associated with hyperactivity in autism spectrum disorders. Pharmacol Res. 2011;64(3):283–8. [DOI] [PubMed] [Google Scholar]
  • 81.Safari Mreza, Omrani MD, Noroozi R, Sayad A, Sarrafzadeh S, Komaki A, et al. Synaptosome-associated protein 25 (SNAP25) gene association analysis revealed risk variants for ASD, in Iranian population. J Mol Neurosci. 2017;61(3):305–11. [DOI] [PubMed] [Google Scholar]
  • 82.Barman P, Reddy D, Bhaumik SR. Mechanisms of antisense transcription initiation with implications in gene expression, genomic integrity and disease pathogenesis. Non-coding RNA. 2019;5(1): 11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Osborne LR, Li M, Pober B, Chitayat D, Bodurtha J, Mandel A, et al. A 1.5 million–base pair inversion polymorphism in families with Williams-Beuren syndrome. Nat Genet. 2001;29(3):321–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Cukier HN, Skaar DA, Rayner-Evans MY, Konidari I, Whitehead PL, Jaworski JM, et al. Identification of chromosome 7 inversion breakpoints in an autistic family narrows candidate region for autism susceptibility. Autism Res. 2009;2(5):258–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Wigdor EM, Weiner DJ, Grove J, Fu JM, Thompson WK, Carey CE, et al. The female protective effect against autism spectrum disorder. Cell Genom. 2022;2(6): 100134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Eisfeldt J, Higginbotham EJ, Lenner F, Howe J, Fernandez BA, Lindstrand A, et al. Resolving complex duplication variants in autism spectrum disorder using long-read genome sequencing. Genome Res. 2024;34(11):1763–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Negi S, Stenton SL, Berger SI, Canigiula P, McNulty B, Violich I, et al. Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection. Am J Hum Genet. 2025;112(2):428–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Greer SU, Botello J, Hongo D, Levy B, Shah P, Rabinowitz M, et al. Implementation of nanopore sequencing as a pragmatic workflow for copy number variant confirmation in the clinic. J Transl Med. 2023;21(1): 378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Liu X, Gu L, Hao C, Xu W, Leng F, Zhang P, et al. Systematic assessment of structural variant annotation tools for genomic interpretation. Life Sci Alliance. 2025;8(3): e202402949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Ellingford JM, Ahn JW, Bagnall RD, Baralle D, Barton S, Campbell C, et al. Recommendations for clinical interpretation of variants found in non-coding regions of the genome. Genome Med. 2022;14(1): 73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Trost B, Engchuan W, Nguyen CM, Thiruvahindrapuram B, Dolzhenko E, Backstrom I, et al. Genome-wide detection of tandem DNA repeats that are expanded in autism. Nature. 2020;586(7827):80–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Tremblay MW, Jiang YH. DNA methylation and susceptibility to autism spectrum disorder. Annu Rev Med. 2019;70:151–66. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12920_2025_2204_MOESM1_ESM.docx (440.9KB, docx)

Supplementary Material 1: Supplementary Figure S1. Pedigree charts of the selected 23 ASD patient-parent trios in the current cohort.

12920_2025_2204_MOESM2_ESM.xlsx (78.8KB, xlsx)

Supplementary Material 2: Supplementary Table S1. Phenotypic description of 23 ASD probands in the present study cohort.

12920_2025_2204_MOESM3_ESM.docx (26.5KB, docx)

Supplementary Material 3: Supplementary Table S2. Pre and post sequencing run details including quality of the input template DNA.

12920_2025_2204_MOESM4_ESM.docx (42.1KB, docx)

Supplementary Material 4: Supplementary Table S3. Number of variants called for each sample across different SV types by each variant caller.

12920_2025_2204_MOESM5_ESM.xlsx (20.2KB, xlsx)

Supplementary Material 5: Supplementary Table S4. Steps used for filtering and prioritization of SVs.

12920_2025_2204_MOESM6_ESM.docx (21.8KB, docx)

Supplementary Material 6: Supplementary Table S5. Number of concordant calls and prioritized variants for each SV type for all cases.

12920_2025_2204_MOESM7_ESM.docx (15.1KB, docx)

Supplementary Material 7: Supplementary Table S6. Primer sequences and orthogonal methods used for validation of SVs.

Data Availability Statement

Datasets supporting the conclusions of this article are available on the EGA website (European Genome-Phenome Archive) under the title “Long read sequencing to detect structural variants in Indian patients with non-syndromic autism spectrum disorder”. To access Oxford Nanopore long-read whole genome sequencing data, the study ID is “EGAS50000000842”. Datasets can be accessed from the EGA website using the following weblink: https://ega-archive.org/.


Articles from BMC Medical Genomics are provided here courtesy of BMC

RESOURCES