Abstract
Autism spectrum disorder (ASD) presents a set of childhood neurodevelopmental disorders with impairments in social communication and restricted, repetitive, and stereotyped patterns of behavior. Here, based on the whole-genome sequencing (WGS) data of three monozygotic twins discordant for ASD, we explored multiple patient-specific genetic variations and prioritized a list of ASD risk genes. Our results identified DVMT (discordant variation in monozygotic twin) observed in at least two twin pairs, including 14,310 SNPs, 2,425 indels, and 16,735 CNVs, referring to a total of 2,174 genes, and 37 of these were covered by all three types of variations. Gene ontology (GO) enrichment analysis of biological processes for 2,174 genes showed that the majority of these genes were related to neurodevelopmental processes. In addition, functional network analysis showed that there was a strong functional relevance between 37 genes covered by all three types of variations. In conclusion, for the first time, we conducted a comprehensive scan of genomic differences between monozygotic twins discordant for ASD, providing researchers with in-depth directions. It may also provide effective strategies for clinical treatment of individuals affected by ASD.
Keywords: genomic variations, autism spectrum disorder, monozygotic twins, whole-genome sequencing
Introduction
Autism spectrum disorder (ASD) represents a set of childhood neurodevelopmental disorders with impairments in social communication and restricted, repetitive, and stereotyped patterns of behavior.3 It is a lifelong developmental condition with a great degree of heritability and heterogeneity. A recent nationwide population-based study estimated that ASD prevalence was at 2.47% among U.S. children and adolescents in 2014–2016.4 Particularly, the prevalence rate for boys is nearly five times higher than that for girls.1 Although ASD typically occurs in childhood, patients may even need nursing during their whole life, bringing a heavy burden to the whole society. ASD is now considered as one of the world’s most urgent public health issues. Since ASD is highly heritable, family studies reconfirm the strong genetic contributions to this disease, declaring it as one of the most heritable neurodevelopmental and neuropsychiatric disorders.2, 5 Growing evidence shows the importance of complex genetic factors in ASD (single-nucleotide variants [SNVs], indels [insertions or deletions], and CNVs),6, 7 and suggests the possible existence of hundreds of autism risk genes.8, 9, 10, 11 Whole-exome sequencing (WES)9, 12, 13, 14 and whole-genome sequencing (WGS)15, 16, 17 have been used to detect de novo mutations as long as inherited mutations were involved in ASD and other neuropsychiatric disorders, facilitating the discovery of rare pathogenic variations and risk genes. The curation of large genomic datasets has produced a collection of many genes implicated in ASD risk and ASD-linked genes.8, 10, 18 However, even considering all of the data from these complementary approaches, the effect remains incomplete, and up to now, only about 20% of the ASD cases have been explained.6, 19, 20
Monozygotic or “identical” twins have been widely studied to dissect the relative contributions of genetics and environment to human diseases, and they are widely used in traditional experimental research. While recent high-throughput research only concerned fraternal twins, few studies have empirically tested for genetic differences between monozygotic (MZ) twins discordant for ASDs. MZ twins were thought to originate from a single fertilized zygote and thought to be genetically identical. However, some MZ twins have the same genetic foundation but presented different phenotypes; in this case, MZ twins appear to have discordant phenotypes for ASD. To unearth certain causes of this difference, we conducted a comprehensive scan of genomic differences between MZ twins discordant for ASD based on WGS (Figure 1). In this study, we first collected three pairs of MZ twins discordant for autism and captured all types of genetic variation, including SNVs, small insertions or deletions (indels), and copy number variations (CNVs). Further, for each twin pair, variations that only presented in individuals with ASD were filtered as discordant variations in MZ twins (DVMTs), and DVMTs in at least two twin pairs were identified for further analysis. At the same time we got a gene set of 2,174 genes involved in all DVMTs, gene ontology (GO) enrichment analysis found many genes were related to the development of autism disease. In addition, functional network analysis showed that there was a strong functional relevance between 37 genes covered by all three types of variations, showing a preference for a diagnosis of autism.
Results
Characteristics of WGS Data
After the filtering of raw data, the basic statistics collection of WGS data for three pairs of MZ discordant twins was concluded (Table S1). In summary, the average number of reads for raw data is 1,151,097,111. After stringent quality control at the sample level, the mean rate of clean data to raw data is 95.11%. Then we computed the genomes from 6 subjects and got an average of 99.53% coverage and 30.85× sequencing depth when mapping the clean data to hg38 with Burrows-Wheeler Aligner (BWA)21 (Table 1). Likewise, 99.25% of the genome was covered with at least 4× sequence depth. On average, the mapped reads for each sample were 1,040,389,962, and the mapping rate is 98.30%. The high genome coverage of WGS allows optimal variations calling and potentially captures many clinically relevant mutations.
Table 1.
Twin Pairs | Total Reads | Mapped Reads | Mapping Rate (%) | Average depth (×) | Coverage ≥ 1× (%) | Coverage ≥ 4× (%) |
---|---|---|---|---|---|---|
Twin Pair 1 | ||||||
Twin 1a | 1,051,026,934 | 1,034,220,852 | 98.40 | 30.63 | 99.42 | 99.15 |
Twin 2 | 1,079,153,824 | 1,062,219,892 | 98.43 | 31.34 | 99.40 | 99.11 |
Twin Pair 2 | ||||||
Twin 1a | 1,042,955,285 | 1,025,555,250 | 98.33 | 30.35 | 99.41 | 99.12 |
Twin 2 | 1,094,044,728 | 1,070,511,132 | 97.85 | 31.87 | 99.43 | 99.17 |
Twin Pair 3 | ||||||
Twin 1a | 1,074,131,972 | 1,057,584,856 | 98.46 | 31.38 | 99.75 | 99.48 |
Twin 2 | 1,009,031,528 | 992,247,792 | 98.34 | 29.57 | 99.76 | 99.49 |
Mean | 1,058,390,712 | 1,040,389,962 | 98.30 | 30.85 | 99.53 | 99.25 |
The twin with autism in the twin pair.
Identifying DVMTs in Three MZ Discordant Twins
For SNVs and short indels (Figure 2), all SNVs and indels of each individual were first captured by the Genome Analysis Toolkit (GATK).22 At the same time, those variations in Single Nucleotide Polymorphism Database (dbSNP) were filtered out. Second, for each twin pair, variations that only presented in the individuals with ASD were identified, including 50,644, 42,387, and 49,869 SNVs and 11,118, 9,373, and 10,141 indels. For CNVs (Figure 2), DELLY23 was performed to identify duplication and deletion, respectively. CNVs with lengths in the range of 100 bp to 1 million bp were retained, including 50,644, 42,387, and 49,869 for duplication and 11,118, 9,373, and 10,141 for deletion for each twin pair. All these variations were observed only in individuals with ASD and were defined as DVMTs.
Further, DVMTs that appeared in at least two twin pairs were identified (Tables S2 and S4). In particular, two CNVs with at least a 50% reciprocal size overlap with each other were the same, and this overlapping region was extracted as a CNVR (copy number variation region). Considering that, 14,310 CNVRs for duplication and 2,425 CNVRs for deletion were extracted (Table S6). Finally, those DVMTs were annotated using the ANNOVAR tool24 (Figures 3A–3D and 4A–4C; Tables S3, S5, and S7). We can see that variations located in the intergenic region account for the biggest proportion, the result of both SNVs and indels, in which non-coding RNA (ncRNA)_intronic has the biggest part of ncRNA. On the other side, no indels located in the exon and 5′ UTR compared to SNVs. For the final set of SNVs, we computed the mutation type distribution and found that C > T (G > A) accounted for the largest proportion (Figure 3E). Among those, transition has a higher rate than transversion (Figure 3F). For CNVs, we showed the distribution of duplication and deletion in three twin pairs, respectively (Figure 4B). Finally, to illustrate all kinds of variations, we displayed them in a Circos25 plot (Figure 5).
Prioritization of ASD Risk Genes Based on DVMT
While SNVs and indels usually cover one single gene, CNVs are typically large and may contain multiple genes, preventing the identification of the rightful pathogenic gene. To compile all possible candidate ASD risk genes and evaluate the influence of all variations on autism, we focused on a total of overlap 2,174 DVMT-related genes (Table S8), including 823 genes for SNVs, 557 genes for indels, and 1,174 genes for CNVRs (Figure 2). Considering the distribution of genes by ANNOVAR,24 239 of these genes located on exons. Functional enrichment analysis was performed for these 2,174 candidate genes, showing that the majority of these genes were involved in the regulation of cell morphogenesis, the development of dendrite and synapse, the regulation of synapse, and modulation of chemical synaptic transmission (Figure 6A).
To further explore candidate genes and prioritize important genes, we focused on 37 genes that appeared in all three types of variations (Table S9). We analyzed the functional network of these genes by GeneMANIA.26 Results indicated that 26 of 37 genes were contained in the network (Figure 6B), covering networks of co-expression, co-localization, and shared protein domain and genetic interactions, referring to 97.41%, 1.93%, 0.45%, and 0.21% of all genes, respectively, showing that there is a functional association among them. Referring to the network of 37 genes, several genes are related to important functions, which are also core connectors in the network. Among those genes, some are directly associated with ASD, such as Duchenne muscular dystrophy (DMD), a widely studied molecular factor that can be applied in the early diagnosis of autism; it contributes to the organization of autism-associated trans-synaptic neurexin-neuroligin complexes and to the clustering of synaptic GABAA receptors.27 Likewise, NRXN3, a type of neurexin gene that is a presynaptic cell-adhesion molecule affecting the function of synapses and mediating the conduction of nerve signals, plays an important role in normal brain development.28 ASMT might be a susceptibility gene for autism as indicated by a large-scale study of Chinese Han individuals, using target sequencing.29 ZNF713 and PTPRN2 are identified as new candidate susceptibility genes.30, 31 GPC5 suggests a possible role in growth regulation and in cognitive development.32, 33 Besides, those genes were related to a series of biological processes, such as growth regulation, cognitive development, normal brain development, and conduction of nerve signals.28, 29, 30, 31, 32, 33, 34 These functions demonstrate potential links in the progression of autism disorder disease.
Discussion
Growing evidence shows the importance of complex genetic factors in ASD (SNVs, indels, and CNVs),6, 7 suggesting the possible existence of hundreds of autism risk genes.8, 9, 10, 11 There exists several databases curating general autism risk genes, such as AutDB,35 autismKB,36 and NPdenovo.37 However, more studies are still needed to further refine and contribute to the existing genetic architecture of ASD,38, 39, 40 for the genetic contribution can only explain less than 20% of the patients. In this study, we reported the full detection of genomic differences between MZ twins discordant for autism and aimed to test whether somatic mutations play a role in autism etiology. Three genomic changes dominant in affected individuals were identified by deep sequencing in MZ discordant twin pairs. Our goal was the identification of mutations that were present only in the affected twin, which pointed to novel candidates for autism susceptibility loci.
Careful clinical characterization has revealed that ASD is extremely heterogeneous; variants may be linked to ASDs and other related disabilities, such as intellectual disability, or medical co-morbidities, such as seizures. Further investigation of clinical relevance to these variants is ongoing. To discuss the potential implications of our study, as a more general issue to the use of next-generation sequencing (NGS) for the study of autism and other neuropsychiatric disorders, we integrated evidence from protein-protein interaction (PPI) data curated from the String Database and collected other known neurodevelopmental disease risk genes from public databases (NPdenovo, AutDB, and autismKB) to find highly interactive genes, including a total of 4,434 genes, with an overlapping of 574 genes with our union gene set list (2,174 genes), and they are significantly enriched (p = 0.014, hypergeometric test). Based on PPI data, 20 of 37 genes are found to have a PPI with public risk genes. Evidence has shown that genes implicated in ASD also affect several neurodevelopmental disorders by disrupting a wide spectrum of pathways. For example, RBFOX1 plays a crucial role in the regulation of splicing and transcriptional networks in human neuronal development.41
Genes displayed as core linkers in the network are also closely related to neurodevelopmental diseases (Figure 6B). After a systemic search of papers in PubMed, we found that several genes have been reported to be related to autism. Among those genes, DMD is widely studied in autism disease and shows a preference for a diagnosis of autism; it contributes to the organization of autism-associated trans-synaptic neurexin-neuroligin complexes and to the clustering of synaptic GABAA receptors.27 Particularly, males with DMD are at a very high risk of neuropsychiatric disturbance.34 Likewise, NRXN3 is a type of neurexin gene, and neurexins are presynaptic cell-adhesion molecules that affect the function of synapses and mediate the conduction of nerve signals and that play an important role in normal brain development and become candidate genes for autism.28 Wang et al. found that, among those, ASMT might be a susceptibility gene for autism, in a large-scale study of Chinese Han individuals, using target sequencing.29 ZNF713 and PTPRN2 have been identified as new candidate susceptibility genes.30, 31 GPC5 and DPP6 suggest a possible role of GPC5 in growth regulation and in cognitive development.32, 33
However, we have not yet proven whether these mutations, singly or in combination, contribute to the development of ASD. Given that this is only a pilot study for WGS in ASD and is based on a relatively small number of twins, the replication of our findings is needed to better describe the genetic architecture of ASD in the future. For other mutations, their role in the condition needs to be closely followed in the medical literature. The reported genetic variants also highlighted potential molecular targets for pharmacologic manipulation and open the way for personalized therapeutics in ASD. Promising results, thus far, indicate that WGS could eventually help direct personalized approaches to clinical management in individuals and families affected by ASD. In the future, results of WGS might allow earlier diagnosis of ASD, especially in siblings, for whom recurrence rates are approximately 18%.42
Materials and Methods
Sample Selection
Three pairs of ASD-discordant MZ twins were recruited from the Child Development and Behavior Research Center (CDBRC), Harbin Medical University, Heilongjiang Province, China. More than two experienced psychiatrists independently issued ASD diagnoses according to the criteria of the Diagnostic and Statistical Manual of Mental Disorders, 5th edition (DSM-5). All probands were also administered by the Autism Diagnostic Observation Schedule—Generic (ADOS)25 and were all found to meet the criteria for autism. For each patient, we applied the ADOS module according to the level of expressive language. Cases with Rett syndrome, tuberous sclerosis, fragile-X syndrome, and any other neurological conditions suspected to be associated with autism were excluded by clinical examination and molecular genetic tests of the FMR1 gene.25 Approval from the Ethics Committee of Harbin Medical University (HMUIRB2012008) was obtained before initiating the study.
WGS
WGS was performed with HiSeq 2000 technology (Illumina) at the Beijing Genomics Institute (BGI) in Shenzhen.
Genomic DNA from 3 MZ twin pairs discordant for ASD was isolated from the peripheral blood and used to construct a genomic DNA library for WGS. DNA was extracted and sheared into fragments, which were then purified by gel electrophoresis. DNA fragments were ligated with adaptor oligonucleotides to form paired-end DNA libraries with an insert size of 500 bp. To enrich libraries for sequences with 5′ and 3′ adaptors, we used ligation-mediated PCR amplification with primers complimentary to the adaptor sequences. The DNA libraries were sequenced to generate 90-bp paired-end reads with at least 30× average genome coverage per sample.
Discovery and Characterization of Various Variants
Raw data were preprocessed by SOAPnuke43 to get clean data, which removed sequences of adaptor, low-quality reads, and unknown reads. Both raw data and clean data were quality controlled by FastQC.44 Filtered reads were then mapped to the human reference genome (hg38) with the BWA, v0.7.17.21 The BAM file was then sorted by genomic position and indexed using Sequence Alignment/Map tools (SAMtools).45 BAM files from the same sample were then merged into a single BAM file using MergeSamFiles in Picard tools. To prevent PCR artifacts from influencing the downstream analysis, we used MarkDuplicates in Picard tools to mark the duplicate reads, which were then ignored. Qualimap46 was used to calculate the basic statistics of BAM files.
Local realignment and quality recalibration were performed with the Genome Analysis Toolkit (GATK),v3.8.22 The HaplotypeCaller (HC) was used to detect SNVs and small indels (< 65 nt). We then used the GATK Variant Quality Score Recalibration (VQSR) to filter spurious SNVs and indels due to sequencing errors and mapping artifacts. We performed CNV analysis with DELLY.23 CNV calls with a p value < 0.05 and calls with >50% of q0 (zero mapping quality) reads within the CNV regions were removed (q0 filter). All variations are displayed in a Circos plot.25
Variant Filtration and Risk Gene Prioritization
When assessing variants of SNVs and indels, we removed those variants common in the population (in dbSNP142) that were likely to be false positives. For each twin, we also removed variants of the normal sibling from the proband, and then we retained variants found in at least two pairs of twins. GATK SelectVariants was used to get intersect or union set from Variant Call Format (VCF) files. As for CNVs, sizes between 100 bp and 1,000,000 bp were preserved, since it is an interval region of genome; we considered CNVs with at least 50% reciprocal size overlap with each other to be the same and only retained the overlapped region for the following analysis, considered as the CNVR. We used BEDtools47 to achieve this goal.
The effects (e.g., missense, nonsense, or frameshift mutations) and classifications (e.g., in exonic, intronic, or intergenic regions) of variants across the genome were annotated by ANNOVAR.24 Finally, we got a gene set annotated from each type of variation and computed the union or intersect set of those genes.
Functional Analysis
For gene union set from three types of variation, we conducted a functional enrichment analysis of biological process by clusterProfiler.48 For gene intersect set from three types of variation, we explored the functional network of these genes by GeneMANIA.26
Data and Software Availability
The sequencing data reported in this paper have been deposited at the Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra/; SRA: PRJNA471799).
Author Contributions
Y.H., Y.Z., Y.R., Y.Y., and J.Y. performed the analysis of sequencing data. Y.H., X.L., Z.G., and X.Z. collected the samples and performed the clinical examination and molecular genetic tests. Y.H., Y.Z., and Y.Y. wrote the initial manuscript. S.L. and D.W. improved the thesis of the section of the paper. L.W. designed the experiment. All authors reviewed the manuscript and provided comments that were incorporated in the final version of the manuscript.
Conflicts of Interest
The authors declare no competing interests.
Acknowledgments
We thank all the patients and their parents for their support and participation. This study was funded by grants from the National Natural Science Foundation of China (81202221 and 81273094), the China Postdoctoral Science Foundation (2014M561375), the Scientific Research Fund of Heilongjiang Provincial Education Department (12521232), and the Scientific Research Fund of Heilongjiang Postdoctoral Program (LBH-Z14153).
Footnotes
Supplemental Information includes nine tables and can be found with this article online at https://doi.org/10.1016/j.omtn.2018.11.015.
Contributor Information
Shuang Liang, Email: liangyouyou2004@163.com.
Lijie Wu, Email: wulijiehyd@126.com.
Supplemental Information
References
- 1.Autism and Developmental Disabilities Monitoring Network Surveillance Year 2008 Principal Investigators. Centers for Disease Control and Prevention Prevalence of autism spectrum disorders—Autism and Developmental Disabilities Monitoring Network, 14 sites, United States, 2008. MMWR Surveill. Summ. 2012;61:1–19. [PubMed] [Google Scholar]
- 2.Bailey A., Le Couteur A., Gottesman I., Bolton P., Simonoff E., Yuzda E., Rutter M. Autism as a strongly genetic disorder: evidence from a British twin study. Psychol. Med. 1995;25:63–77. doi: 10.1017/s0033291700028099. [DOI] [PubMed] [Google Scholar]
- 3.de la Torre-Ubieta L., Won H., Stein J.L., Geschwind D.H. Advancing the understanding of autism disease mechanisms through genetics. Nat. Med. 2016;22:345–361. doi: 10.1038/nm.4071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Xu G., Strathearn L., Liu B., Bao W. Prevalence of autism spectrum disorder among US children and adolescents, 2014-2016. JAMA. 2018;319:81–82. doi: 10.1001/jama.2017.17812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hallmayer J., Cleveland S., Torres A., Phillips J., Cohen B., Torigoe T., Miller J., Fedele A., Collins J., Smith K. Genetic heritability and shared environmental factors among twin pairs with autism. Arch. Gen. Psychiatry. 2011;68:1095–1102. doi: 10.1001/archgenpsychiatry.2011.76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Devlin B., Scherer S.W. Genetic architecture in autism spectrum disorder. Curr. Opin. Genet. Dev. 2012;22:229–237. doi: 10.1016/j.gde.2012.03.002. [DOI] [PubMed] [Google Scholar]
- 7.Miller D.T., Adam M.P., Aradhya S., Biesecker L.G., Brothman A.R., Carter N.P., Church D.M., Crolla J.A., Eichler E.E., Epstein C.J. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 2010;86:749–764. doi: 10.1016/j.ajhg.2010.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Betancur C. Etiological heterogeneity in autism spectrum disorders: more than 100 genetic and genomic disorders and still counting. Brain Res. 2011;1380:42–77. doi: 10.1016/j.brainres.2010.11.078. [DOI] [PubMed] [Google Scholar]
- 9.Neale B.M., Kou Y., Liu L., Ma’ayan A., Samocha K.E., Sabo A., Lin C.F., Stevens C., Wang L.S., Makarov V. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature. 2012;485:242–245. doi: 10.1038/nature11011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pinto D., Pagnamenta A.T., Klei L., Anney R., Merico D., Regan R., Conroy J., Magalhaes T.R., Correia C., Abrahams B.S. Functional impact of global rare copy number variation in autism spectrum disorders. Nature. 2010;466:368–372. doi: 10.1038/nature09146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sanders S.J., Ercan-Sencicek A.G., Hus V., Luo R., Murtha M.T., Moreno-De-Luca D., Chu S.H., Moreau M.P., Gupta A.R., Thomson S.A. Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron. 2011;70:863–885. doi: 10.1016/j.neuron.2011.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Sanders S.J., Murtha M.T., Gupta A.R., Murdoch J.D., Raubeson M.J., Willsey A.J., Ercan-Sencicek A.G., DiLullo N.M., Parikshak N.N., Stein J.L. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature. 2012;485:237–241. doi: 10.1038/nature10945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.O’Roak B.J., Vives L., Girirajan S., Karakoc E., Krumm N., Coe B.P., Levy R., Ko A., Lee C., Smith J.D. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature. 2012;485:246–250. doi: 10.1038/nature10989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Iossifov I., Ronemus M., Levy D., Wang Z., Hakker I., Rosenbaum J., Yamrom B., Lee Y.H., Narzisi G., Leotta A. De novo gene disruptions in children on the autistic spectrum. Neuron. 2012;74:285–299. doi: 10.1016/j.neuron.2012.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Yuen R.K.C., Merico D., Bookman M., Howe J.L., Thiruvahindrapuram B., Patel R.V., Whitney J., Deflaux N., Bingham J., Wang Z. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. Nat. Neurosci. 2017;20:602–611. doi: 10.1038/nn.4524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Jiang Y.H., Yuen R.K.C., Jin X., Wang M., Chen N., Wu X., Ju J., Mei J., Shi Y., He M. Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. Am. J. Hum. Genet. 2013;93:249–263. doi: 10.1016/j.ajhg.2013.06.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yuen R.K.C., Thiruvahindrapuram B., Merico D., Walker S., Tammimies K., Hoang N., Chrysler C., Nalpathamkalam T., Pellecchia G., Liu Y. Whole-genome sequencing of quartet families with autism spectrum disorder. Nat. Med. 2015;21:185–191. doi: 10.1038/nm.3792. [DOI] [PubMed] [Google Scholar]
- 18.Basu S.N., Kollu R., Banerjee-Basu S. AutDB: a gene reference resource for autism research. Nucleic Acids Res. 2009;37:D832–D836. doi: 10.1093/nar/gkn835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Christian S.L., Brune C.W., Sudi J., Kumar R.A., Liu S., Karamohamed S., Badner J.A., Matsui S., Conroy J., McQuaid D. Novel submicroscopic chromosomal abnormalities detected in autism spectrum disorder. Biol. Psychiatry. 2008;63:1111–1117. doi: 10.1016/j.biopsych.2008.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ma D., Salyakina D., Jaworski J.M., Konidari I., Whitehead P.L., Andersen A.N., Hoffman J.D., Slifer S.H., Hedges D.J., Cukier H.N. A genome-wide association study of autism reveals a common novel risk locus at 5p14.1. Ann. Hum. Genet. 2009;73:263–273. doi: 10.1111/j.1469-1809.2009.00523.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., DePristo M.A. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Rausch T., Zichner T., Schlattl A., Stütz A.M., Benes V., Korbel J.O. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28:i333–i339. doi: 10.1093/bioinformatics/bts378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wang K., Li M., Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lord C., Risi S., Lambrecht L., Cook E.H., Jr., Leventhal B.L., DiLavore P.C., Pickles A., Rutter M. The autism diagnostic observation schedule-generic: a standard measure of social and communication deficits associated with the spectrum of autism. J. Autism Dev. Disord. 2000;30:205–223. [PubMed] [Google Scholar]
- 26.Zuberi K., Franz M., Rodriguez H., Montojo J., Lopes C.T., Bader G.D., Morris Q. GeneMANIA prediction server 2013 update. Nucleic Acids Res. 2013;41:W115–W122. doi: 10.1093/nar/gkt533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Miranda R., Nagapin F., Bozon B., Laroche S., Aubin T., Vaillend C. Altered social behavior and ultrasonic communication in the dystrophin-deficient mdx mouse model of Duchenne muscular dystrophy. Mol. Autism. 2015;6:60. doi: 10.1186/s13229-015-0053-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wang J., Gong J., Li L., Chen Y., Liu L., Gu H., Luo X., Hou F., Zhang J., Song R. Neurexin gene family variants as risk factors for autism spectrum disorder. Autism Res. 2018;11:37–43. doi: 10.1002/aur.1881. [DOI] [PubMed] [Google Scholar]
- 29.Wang L., Li J., Ruan Y., Lu T., Liu C., Jia M., Yue W., Liu J., Bourgeron T., Zhang D. Sequencing ASMT identifies rare mutations in Chinese Han patients with autism. PLoS One. 2013;8:e53727. doi: 10.1371/journal.pone.0053727. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lionel A.C., Crosbie J., Barbosa N., Goodale T., Thiruvahindrapuram B., Rickaby J., Gazzellone M., Carson A.R., Howe J.L., Wang Z. Rare copy number variation discovery and cross-disorder comparisons identify risk genes for ADHD. Sci. Transl. Med. 2011;3:95ra75. doi: 10.1126/scitranslmed.3002464. [DOI] [PubMed] [Google Scholar]
- 31.Metsu S., Rainger J.K., Debacker K., Bernhard B., Rooms L., Grafodatskaya D., Weksberg R., Fombonne E., Taylor M.S., Scherer S.W. A CGG-repeat expansion mutation in ZNF713 causes FRA7A: association with autistic spectrum disorder in two families. Hum. Mutat. 2014;35:1295–1300. doi: 10.1002/humu.22683. [DOI] [PubMed] [Google Scholar]
- 32.Kannu P., Campos-Xavier A.B., Hull D., Martinet D., Ballhausen D., Bonafé L. Post-axial polydactyly type A2, overgrowth and autistic traits associated with a chromosome 13q31.3 microduplication encompassing miR-17-92 and GPC5. Eur. J. Med. Genet. 2013;56:452–457. doi: 10.1016/j.ejmg.2013.06.001. [DOI] [PubMed] [Google Scholar]
- 33.Maussion G., Cruceanu C., Rosenfeld J.A., Bell S.C., Jollant F., Szatkiewicz J., Collins R.L., Hanscom C., Kolobova I., de Champfleur N.M. Implication of LRRC4C and DPP6 in neurodevelopmental disorders. Am. J. Med. Genet. A. 2017;173:395–406. doi: 10.1002/ajmg.a.38021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ricotti V., Mandy W.P.L., Scoto M., Pane M., Deconinck N., Messina S., Mercuri E., Skuse D.H., Muntoni F. Neurodevelopmental, emotional, and behavioural problems in Duchenne muscular dystrophy in relation to underlying dystrophin gene mutations. Dev. Med. Child Neurol. 2016;58:77–84. doi: 10.1111/dmcn.12922. [DOI] [PubMed] [Google Scholar]
- 35.Pereanu W., Larsen E.C., Das I., Estévez M.A., Sarkar A.A., Spring-Pearson S., Kollu R., Basu S.N., Banerjee-Basu S. AutDB: a platform to decode the genetic architecture of autism. Nucleic Acids Res. 2018;46(D1):D1049–D1054. doi: 10.1093/nar/gkx1093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Xu L.-M., Li J.-R., Huang Y., Zhao M., Tang X., Wei L. AutismKB: an evidence-based knowledgebase of autism genetics. Nucleic Acids Res. 2012;40:D1016–D1022. doi: 10.1093/nar/gkr1145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Li J., Cai T., Jiang Y., Chen H., He X., Chen C., Li X., Shao Q., Ran X., Li Z. Genes with de novo mutations are shared by four neuropsychiatric disorders discovered from NPdenovo database. Mol. Psychiatry. 2016;21:290–297. doi: 10.1038/mp.2015.40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Cui T., Zhang L., Huang Y., Yi Y., Tan P., Zhao Y., Hu Y., Xu L., Li E., Wang D. MNDR v2.0: an updated resource of ncRNA-disease associations in mammals. Nucleic Acids Res. 2018;46(D1):D371–D374. doi: 10.1093/nar/gkx1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Yi Y., Zhao Y., Li C., Zhang L., Huang H., Li Y., Liu L., Hou P., Cui T., Tan P. RAID v2.0: an updated resource of RNA-associated interactions across organisms. Nucleic Acids Res. 2017;45(D1):D115–D118. doi: 10.1093/nar/gkw1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zhang T., Tan P., Wang L., Jin N., Li Y., Zhang L., Yang H., Hu Z., Zhang L., Hu C. RNALocate: a resource for RNA subcellular localizations. Nucleic Acids Res. 2017;45(D1):D135–D138. doi: 10.1093/nar/gkw728. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lee H., Deignan J.L., Dorrani N., Strom S.P., Kantarci S., Quintero-Rivera F., Das K., Toy T., Harry B., Yourshaw M. Clinical exome sequencing for genetic identification of rare Mendelian disorders. JAMA. 2014;312:1880–1887. doi: 10.1001/jama.2014.14604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Ozonoff S., Young G.S., Carter A., Messinger D., Yirmiya N., Zwaigenbaum L., Bryson S., Carver L.J., Constantino J.N., Dobkins K. Recurrence risk for autism spectrum disorders: a Baby Siblings Research Consortium study. Pediatrics. 2011;128:e488–e495. doi: 10.1542/peds.2010-2825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Chen Y., Chen Y., Shi C., Huang Z., Zhang Y., Li S., Li Y., Ye J., Yu C., Li Z. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience. 2018;7:1–6. doi: 10.1093/gigascience/gix120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc.
- 45.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., 1000 Genome Project Data Processing Subgroup The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Okonechnikov K., Conesa A., García-Alcalde F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics. 2016;32:292–294. doi: 10.1093/bioinformatics/btv566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Quinlan A.R. BEDTools: the Swiss-Army tool for genome feature analysis. Curr. Protoc. Bioinformatics. 2014;47:11.12.1–11.12.34. doi: 10.1002/0471250953.bi1112s47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Yu G., Wang L.-G., Han Y., He Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16:284–287. doi: 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The sequencing data reported in this paper have been deposited at the Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra/; SRA: PRJNA471799).