Skip to main content
Genome Medicine logoLink to Genome Medicine
. 2025 Oct 24;17:129. doi: 10.1186/s13073-025-01563-0

Whole-genome sequencing reveals individual and cohort level insights into chromosome 9p syndromes

Yingxi Wang 1,#, Eleanor I Sams 1,#, Rachel Slaugh 2,#, Sandra Crocker 3,#, Emily Cordova Hurtado 1, Sophia Tracy 2, Ying-Chen Claire Hou 3, Christopher Markovic 4, Kostandin Valle 1, Victoria Tate 2, Khadija Belhassan 3, Elizabeth Appelbaum 4, Titilope Akinwe 1,5,6, Rodrigo T Starosta 2, Yang Cao 3, Amber Neilson 4, Yu Liu 1, Nathaniel Jensen 2, Reza Ghasemi 3, Tina Lindsay 4, Juana Manuel 1, Sophia Couteranis 2, Milinn Kremitzki 4, Jack Ustanik 1, Thomas Antonacci 4, Jeffrey K Ng 1, Andrew Emory 4, Laura Metz 1, Tracie DeLuca 4, Katherine N Lyons 1, Toni Sinnwell 4, Brianne Thomeczek 4, Kymme Wang 7, Nick Sisneros 8, Megha Muraleedharan 8, Anantha Kethireddy 8, Marco Corbo 8, Harsha Gowda 8, Katherine A King 2, Christina A Gurnett 9, Susan K Dutcher 1, Catherine Gooch 2, Yang E Li 1,10, Matthew W Mitchell 11, Kevin A Peterson 12, Amjad Horani 2, Jill A Rosenfeld 13, Weimin Bi 13,14, Pawel Stankiewicz 13, Hsiao-Tuan Chao 13,15,16,17, Jennifer E Posey 13, Christopher M Grochowski 13,18, Zain Dardas 13, Erik G Puffenberger 19, Christopher E Pearson 20, Frank Kooy 21, Dale Annear 21, A Micheil Innes 22, Michael Heinz 4, Richard Head 4, Robert Fulton 4, Stephan Toutain 23; 9P-ARCH, Lucinda Antonacci-Fulton 4, Xiaoxia Cui 4, Robi D Mitra 1,4, F Sessions Cole 2, Julie Neidich 2,3,#, Patricia I Dickson 2,#, Jeffrey Milbrandt 1,4,24,#, Tychele N Turner 1,✉,#
PMCID: PMC12551315  PMID: 41137173

Abstract

Background

Previous genomic efforts on chromosome 9p deletion and duplication syndromes have utilized low-resolution strategies (i.e., karyotypes, chromosome microarrays). These studies have provided important initial insights into these syndromes. This current study is the first large-scale whole-genome sequencing (WGS) study of 100 individuals from families with chromosome 9p syndromes.

Methods

Through the newly formed 9P-ARCH (Advanced Research in Chromosomal Health: Genomic, Phenotypic, and Functional Aspects of 9p-Related syndromes) research network, we assembled a cohort of individuals from families with chromosome 9p syndromes. WGS was applied to 100 individuals, and other genomic technologies were applied to a subset of individuals. To prioritize genes on 9p, we utilized two independent approaches: statistical analyses of genomic data and spatial transcriptomic profiling of embryonic mouse tissue. To assess the enrichment of DNVs within genomic regions, we developed a computational tool, DiamondsDenovo (https://github.com/TNTurnerLab/DiamondsDenovo).

Results

Unlike previous low-resolution studies, we analyzed the genomic architecture of chromosome 9p syndromes, highlighting fundamental features and their commonalities and differences across individuals. A machine-learning model was developed to predict 9p deletion syndrome based on gene copy number estimates using WGS data. We identified two late-replicating regions containing most structural variant breakpoints in 9p deletion syndrome, pointing to replication-based issues as a potential cause of structural variant formation in most individuals and structural rearrangements in some individuals. Genes on 9p were prioritized based on statistical assessment of human genomic variation and through spatial transcriptomics, with 24 genes (AK3, BRD10, CD274, CDC37L1, DMRT1, DMRT2, DMRT3, DOCK8, GLIS3, JAK2, KANK1, KDM4C, PLPP6, PTPRD, PUM3, RANBP6, RCL1, RFX3, RIC1, SLC1A1, SMARCA2, UHRF2, VLDLR, and ZNG1A) identified as important for the majority (83%) of individuals with 9p deletion syndrome. Testing of the mitochondrial genome revealed excess copy number in individuals with 9p deletion syndrome.

Conclusions

This study introduces the 9P-ARCH research network that is actively pursuing genomic, phenotypic, and functional aspects of 9p-related syndromes. We advanced the study of 9p-related syndromes both at the individual level and across the cohort through the largest, most comprehensive genomic analysis of 9p-related syndromes to date.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13073-025-01563-0.

Keywords: Deletion, Duplication, 9p, Chromosome, Syndrome

Background

Chromosome 9p syndromes involve the deletion (loss) or duplication (gain) of DNA sequence on the p-arm of chromosome 9. Neurodevelopmental disorders (NDDs) are seen in nearly all individuals. However, other phenotypes in these syndromes are variable, for example, abnormal auricles, abnormal genitals, anteverted nostrils, cardiac murmurs or defects, flat feet, frequent colds and infections, high or narrow palate, hypotonia, increased internipple distance, long philtrum, low-set ears, midface hypoplasia, single palmar crease, short or broad neck, short palpebral fissures, thin upper lip, and trigonocephaly [14]. Recently, we examined, in detail, phenotype data from 48 individuals with 9p deletion syndrome highlighting several shared physical characteristics (e.g., NDDs, dysmorphic features) [5] and advancing previous phenotype-related analyses in this area [3, 613]. Most genetic and genomic studies of chromosome 9p syndromes have relied primarily on karyotypes or chromosome microarray data, which offer limited resolution for detecting structural variation [2, 9, 10, 1220]. These studies have revealed which chromosome bands are most often deleted in these syndromes. In our previous work, we analyzed karyotype data from more than 700 individuals from the international Chromosome 9P Minus Network and found that the most common deletions in 9p deletion syndrome extend either from the telomere through 9p24 or from the telomere to 9p22 [1]. This result was consistent with other earlier reports [2, 9, 10, 13, 14, 17, 19, 21]. Conversely, few individuals have been identified with deletions near the centromere of 9p. While chromosome band-level information is informative, modern genomic technologies can detect variation at the single-nucleotide resolution (“precision genomics” [1]). For chromosome 9p syndromes, a few studies have taken that approach revealing exact genes deleted and determining even more complex structural variation than can be detected with karyotypes or chromosome microarrays [1, 5, 22]. Genes located on chromosome 9p are associated with a wide range of phenotypes relevant to chromosome 9p syndrome. These include immune (e.g., DOCK8) and blood-related traits (e.g., JAK2), urogenital anomalies (e.g., DMRT1, DMRT2), musculoskeletal features (e.g., KLHL9), craniofacial characteristics (e.g., FREM1), and neurodevelopmental (e.g., RFX3, SMARCA2) and neurodegenerative disorders (C9ORF72) [1]. Only a limited number of direct genotype–phenotype associations have been identified in chromosome 9p syndrome, making comprehensive genomic sequencing of affected individuals an essential next step. Overall, previous studies have established a foundation and set a precedent for the work presented in this current study. Several unique factors make the present study particularly timely for generating critical insights into chromosome 9p syndromes. Through collaboration with the family-based Chromosome 9P Minus Network, we initiated rigorous research efforts focused on these syndromes. In parallel, we engaged the broader rare disease community to identify additional affected individuals, leading to the establishment of the 9p research network that we call 9P-ARCHAdvanced Research in Chromosomal Health: Genomic, Phenotypic, and Functional Aspects of 9p-Related syndromes. This collaborative outreach allowed us to gain access to the largest 9p cohort, along with their family members, analyzed to date.

This study focuses on two main areas of genomic investigation in chromosome 9p syndromes: individual-level and cross-cohort analyses. At the individual level, the primary objective is to resolve structural variation down to the precise nucleotide, enabling the identification of the specific genes disrupted within these regions. Furthermore, since reports have suggested that 50% of individuals harbor a 9p deletion and a related translocation, it is critical to identify the translocations since they may also affect gene expression. Cross-cohort analyses, in turn, are essential for finding the genes that are most important across all individuals with this syndrome to lead to insights for diagnostics and therapeutics. A unique aspect of our work is the implementation of advanced genomic sequencing technologies with capabilities to perform short-read whole-genome sequencing on 100 individuals with chromosome 9p syndromes and their family members. This approach allows for the simultaneous analysis of both nuclear and mitochondrial genomes. To complement these data, additional genomic technologies are applied in select cases to provide deeper insights. Through short-read whole-genome sequencing combined with sophisticated computational analyses, we pursue the first detailed gene-level analyses on the p-arm of chromosome 9 in these syndromes and also benchmark a machine-learning model to identify chromosome 9p syndromes on gene dosage profiles. We also extend our analyses by integrating large-scale genomic datasets of individuals with neurodevelopmental disorders and those without obvious 9p-related phenotypes. Specifically, we leverage denovo-db [23], which now contains over one million de novo variants (DNVs) across ~70,000 parent–child sequenced trios [24], focusing on 62,181 individuals with relevant phenotypes to assess the contribution of 9p genes through statistical enrichment models. In addition, we analyze whole-genome sequencing data from 5824 individuals with autism to evaluate the enrichment of DNVs in noncoding regulatory regions using both established and new statistical approaches. Complementary analyses of large control datasets of individuals without neurodevelopmental disorders allow us to assess gene dosage sensitivity and constraint [25, 26]. These datasets provide critical information about genes on 9p, which we then directly apply to the analysis of genomes of individuals with chromosome 9p syndromes. Finally, by applying modern spatial expression techniques, we examine the embryonic expression of the top 9p-related genes in the developing mouse face and brain, offering new insights into their functional relevance. Together, these efforts constitute the most comprehensive genomic analysis of chromosome 9p syndromes to date.

Methods

WashU 9p deletion and duplication cohort

To establish a collection of individuals with 9p deletions and/or duplications, we assembled a team of clinical, genomic, and functional researchers (Additional file 1: Fig. S1). In collaboration with the Chromosome 9P Minus Network, which is a family network, we recruited 87 individuals from families with 9p deletion and/or duplication syndrome. Included were 70 unrelated probands, 6 fathers, 7 mothers, 1 grandfather, 1 grandmother, and 2 affected siblings (Additional file 2: Tables S1, S2). Each individual was consented into the study using an age-appropriate form of consent/assent and in accordance with the Human Research Protection Office at Washington University in St. Louis (IRB no. 201706062). Blood samples were obtained for karyotype, DNA isolation for genomic analyses, and PBMC purification for downstream cell analyses.

All participants in the WashU 9p Deletion and Duplication Cohort underwent at least one of the following genomic assessments: clinical genomic record review, karyotype, Illumina short-read whole-genome sequencing, Bionano optical mapping, PacBio HiFi long-read whole-genome sequencing, SNP microarray, or Illumina CLR whole-genome sequencing. For the clinical genomic record review, the records were reviewed by a minimum of two individuals for derivation of variant details. Karyotype generation and assessment were performed at the Washington University in St. Louis Clinical Genomics/Cytogenetics and Molecular Pathology Core Lab. Cells were hypotonic treated, fixed, and then stained with GTG banding and examined with a microscope. Cells were analyzed using the GTW banding method, and 20 metaphases were examined for each individual. The other genomic technologies were carried out at the Washington University in St. Louis McDonnell Genome Institute. Illumina short-read whole-genome sequencing was achieved to an average coverage of 30× using an Illumina NovaSeq 6000 sequencer on a total of 84 individuals and is the predominant data source underlying the analyses in this study. Bionano optical mapping was fulfilled using a Bionano Saphyr system. PacBio HiFi long-read whole-genome sequencing was executed to an average coverage of 20× on a PacBio Sequel II sequencer. SNP microarray was accomplished using a Thermo Fisher CytoScan HD array. Illumina CLR whole-genome sequencing was completed to an average coverage of 30× using an Illumina NovaSeq 6000 sequencer.

Cohort of individuals with 9p deletion and/or duplication syndromes from NIGMS repository

The NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research (https://www.coriell.org/1/NIGMS/Collections/Chromosomal-Abnormalities) was queried for individuals with genomic variation on the p-arm of chromosome 9. Candidate samples were identified and checked for available cells. In total, there were 16 samples (Additional file 2: Table S2) with available cells for performing DNA extraction. Both standard DNA extraction [27] and high molecular weight DNA extraction [28] were performed on these samples in the Coriell Institute’s Molecular Biology Laboratory. All 16 samples underwent Illumina short-read whole-genome sequencing to average coverage of 30× at the Washington University in St. Louis McDonnell Genome Institute. Additionally, eight of the samples were sequenced on a PacBio Revio sequencer to an average coverage of 30× at MedGenome. All sequence data was deposited in dbGaP [29].

Aggregation of additional individuals with 9p deletion and/or duplication syndromes

We contacted several genomics researchers across the USA to identify additional individuals with 9p deletion and/or duplication syndromes and identified 26 individuals (Additional file 2: Table S3). These individuals had previously been assessed using SNP microarray technologies.

Bioinformatic assessment of genomic data

Illumina short-read whole-genome sequencing

Post-sequencing, the fastq files were aligned to GRCh38_full_analysis_set_plus_decoy_hla.fa (build 38 of the human genome) using bwa mem [30] version 0.7.17-r1188, followed by sorting with SAMtools [31] version 1.9 sort and indexing with SAMtools version 1.9 index. Coverage metrics were calculated using Picard [32] 2.25.4 CollectWgsMetrics. Structural variants were called using manta [33] version 1.6.0. Mitochondrial genome assessment was performed using EGP [34] version 1.3. Windowed copy number estimates across the genome were generated using QuicK-mer2 [35]. SNV/indel calls were generated using DeepVariant [36] version 1.0.0. STRs were detected using ExpansionHunter [37] and GangSTR [38]. Copy number genotyping was performed using CNPI [39]. For parent–child sequenced trios, DNVs were detected with HAT [40, 41] and quality checked with acorn [42].

PacBio long-read whole-genome sequencing

Post-sequencing, either the bam or fastq files were either submitted to the primrose program or directly aligned to GRCh38_full_analysis_set_plus_decoy_hla.fa (build 38 of the human genome) using pbmm2 version 1.10.0, followed by sorting with SAMtools [31] version 1.9 sort and indexing with SAMtools version 1.9 index. Copy number assessment was performed using hificnv. SNV/indel calls were generated using DeepVariant [36] version 1.0.0.

Bionano optical mapping was analyzed using the Bionano Genomics software platform, SNP microarray was analyzed using Thermo Fisher CytoScan HD Suite, and Illumina CLR whole-genome sequencing was analyzed in Illumina DRAGEN.

Dosage of genes on 9p

Per-gene copy number estimates were generated from whole-genome sequencing data from the 100 individuals with 9p-related syndromes and their family members in this study and 2504 unrelated individuals in the 1000 Genomes Project [43]. The individuals with 9p deletion syndrome and the individuals from the 1000 Genomes Project were compared gene by gene for differences in copy number using a Mann–Whitney U-test. This data was also used in two distinct machine-learning model strategies including random forest and a Naïve Bayes. R packages used in the machine-learning analyses included caret [44], e1071 (https://CRAN.R-project.org/package=e1071), and randomforest (https://cran.r-project.org/web/packages/randomForest). For machine learning, the 9p deletion syndrome individuals consisted of the 68 individuals that had deletions on 9p in our analyses, and the control individuals consisted of the 2504 unrelated individuals from the 1000 Genomes Project. For each of these datasets, 70% of the data was utilized in the training, and 30% of the data was used for testing. Several metrics were calculated to benchmark the models, including accuracy, sensitivity, specificity, positive prediction value, and negative prediction value.

Updated denovo-db

For this study, we updated denovo-db to version 1.8, which is a database of DNVs aggregated from the published literature. In this version [24], there are 1,131,762 DNVs from 72,794 parent–child sequenced trios. These DNVs were compiled from 81 studies and include data on the following phenotypes: 9p minus syndrome, acromelic frontonasal dysostosis, amyotrophic lateral sclerosis, anophthalmia, microphthalmia, attention-deficit/hyperactivity disorder, autism, bipolar disorder, Cantu syndrome, cerebral palsy, congenital diaphragmatic hernia, congenital heart disease, controls, developmental disorder, early onset Alzheimer’s disease, early onset Parkinson’s disease, epilepsy, intellectual disability, isolated biliary atresia, mixed, neural tube defects, neurodevelopmental disorder, nonsyndromic cleft lip/palate, obsessive-compulsive disorder, orofacial cleft, preterm birth, periventricular nodular heterotopia, schizophrenia, sporadic congenital hydrocephalus, sporadic infantile spasm syndrome, syndromic craniosynostosis, Tourette syndrome, and vein of Galen malformations. DNVs were extracted from each publication and, if necessary, lifted to build 38 of the human genome. All variants were annotated using variant effect predictor [45]. The VCF files, the annotated files, and a README file are on Zenodo (DOI 10.5281/zenodo.13901295.).

To assess regions on the p-arm of chromosome 9 that have enrichment of DNVs, we retained only variants on the p-arm (b38 chr9:1–42668912). Since there are many variable phenotypes in 9p deletion and duplication syndromes [1, 5], we retained DNVs from individuals with the following phenotypes for analyses in this study: acromelic frontonasal dysostosis, amyotrophic lateral sclerosis, anophthalmia, microphthalmia, attention-deficit/hyperactivity disorder, autism, bipolar disorder, cerebral palsy, congenital diaphragmatic hernia, congenital heart disease, developmental disorder, early onset Alzheimer’s disease, early onset Parkinson’s disease, epilepsy, intellectual disability, isolated biliary atresia, neural tube defects, neurodevelopmental disorder, nonsyndromic cleft lip/palate, obsessive-compulsive disorder, orofacial cleft, preterm birth, periventricular nodular heterotopia, schizophrenia, sporadic congenital hydrocephalus, sporadic infantile spasm syndrome, syndromic craniosynostosis, Tourette syndrome, and vein of Galen malformations. There was a total number of 62,181 individuals within these phenotypes.

DNVs from WGS from NDDs

We analyzed genomes from 5824 individuals with autism and their parents from the Simons Simplex Collection [46] and SPARK [47]. DNVs were called using HAT [41] and quality checked using acorn [42]. Beyond standard DNV filtering in HAT, DNVs were additionally filtered and removed if they were in the following genomic regions: simple repeats and sdust that contain low complexity genomic regions [48].

Genomic regions for testing DNV enrichment

Different genomic annotations were used for testing enrichment of DNVs. These included 3′ UTRs, 5′ UTRs, promoters, and coding exons of RefSeq protein-coding genes. These annotations were downloaded from the UCSC Genome Browser [49] using the Table Browser tool. To assess putative, noncoding regulatory regions, we examined candidate cis-regulatory elements (CREs) from the human fetal brain [50] using data from the CATlas database [51], CREs from all of ENCODE [52] (GRCh38-cCREs.bed) available at https://screen.wenglab.org/downloads, and the VISTA Enhancer Browser [53, 54] CREs available at https://enhancer.lbl.gov/vista/downloads.

Development of DiamondsDenovo

In this study, we developed a tool called DiamondsDenovo [55] to assess the enrichment of DNVs within genomic regions. DiamondsDenovo is comprised of two subtools. The first subtool is a strategy to generate mutation rates for specified regions by comparing a reference genome from a species of interest to the human reference genome. Input data files to this program include the regions of interest in a bed file, the human reference genome, the other species reference genome, the divergence time between the two species, the chain file between the two reference genomes [49], and a bigWig [49] file of CADD scores [56]. For each region, the tool identifies the region (if present) in the other species genome that corresponds to its location in the human genome. The sequence from each species is then extracted from both reference genomes. The sequences are then aligned using MUSCLE [57]. From the sequence alignment, the aligned length, number of mutations, and mutation rate are calculated from the data. A weighted mutation rate is also calculated using the following formula: (mutation rate × (1/average CADD score in the region)). A table containing this data is output by the program and is an input for the second part of DiamondsDenovo. The second subtool in DiamondsDenovo calculates a p-value for DNV enrichment in regions of interest. The p-value is calculated using the following formula in R: binom.test(x = DNV count, n = number of individuals × 70 DNVs per genome, p = mutation rate). It takes as input a bed file of DNVs, a bed file of regions of interest, a text file of mutation rates (calculated in subtool 1), CADD score bigWig, number of samples, and an option to use the CADD weights or not. The output gives the p-value for all the regions, a Manhattan plot, and a QQ plot. The Manhattan plot uses a genome-wide significant line of 8.33 × 10−12 (calculated as (0.05/9,000,000,000 nucleotides × 2)) to correct for all nucleotides in a diploid genome and a suggestive line of 5 × 10−8 (calculated as (0.05/1,000,000)). For conservative calculations in this study, we only use the unweighted mutation rates.

We chose two species for testing in this study and those include Pan troglodytes (chimpanzee) and Takifugu rubripes (fugu). Chimpanzee is a closely related species to humans and has a divergence time of 6 million years. Fugu is a distant species with a divergence time of 450 million years, but it also has a condensed genome with the most critical genomic regions included in the noncoding genome [58]. The names of each of the genomes used from the UCSC Genome Browser include hg38, panTro6, and fr3. For chimp-based and fugu-based testing of promoters, 3′ UTRs, 5′ UTRs, and coding exons, we performed DiamondsDenovo using the 62,181 individuals from our updated denovo-db. For the CREs, we used the DNVs called from 5824 individuals from the WGS datasets analyzed in this study.

denovolyzeR

To test for enrichment of genes with an excess of DNVs in the 62,181 individuals from our updated denovo-db, we utilized the denovolyzeR tool [59] and tested the following categories: loss of function (lof), missense (mis), synonymous (syn), and lof + mis (prot). Genome-wide significance for a gene was set at 2.6 × 10−6.

fitDNM

Another test to look for enrichment of DNVs within genomic regions was fitDNM [60, 61]. We ran fitDNM on the 62,181 individuals from our updated denovo-db for the promoters, 3′ UTRs, 5′ UTRs, and coding exons genomic regions. For the CREs, we used the DNVs called from 5824 individuals from the WGS datasets analyzed in this study.

Spatial expression in E13.5 mouse head

A C57BL/6 J wild-type embryonic day 13.5 (E13.5) mouse head was collected from a wild-type mouse at the Jackson Laboratory and processed through their in vivo services. Procedures were approved by the Institutional Animal Care and Use Committee of the Jackson Laboratory. Steps followed by the Jackson Laboratory services team included the following: (1) The head was isolated and fixed in 4% PFA/PBS on ice for 50 ± 15 min, (2) washed 3 × 5 min in PBS on ice, (3) cryoprotected overnight in 30% sucrose/PBS at 2–8 °C, (4) heads bisected median sagittally and cryoembedded cut side down in OCT, and (5) cryoblocks stored at −80 °C until shipment in dry ice. The tissue was then sent to MedGenome for sectioning and processing through the 10× Visium HD Spatial Gene Expression platform. Processing of the data was performed using a series of steps. First, the Space Ranger (version 2.0.1) mkfastq tool from 10× genomics was used to generate fastq files. Second, the Space Ranger count tool was used to analyze the microscope slide image and the fastq files. Specifically, the steps include alignment, tissue detection, fiducial detection, and counting of barcodes/UMIs. Third, the barcodes are used to generate feature-barcode matrices, identify clusters, and analyze gene expression. Several output files are generated including an HDF5 file containing the feature barcode matrix, QC images, downsampled images, and barcode locations. Fourth, Seurat (version 4.1.0) was utilized to perform QC and to further analyze gene expression. Cell QC was conducted using several metrics to ensure data integrity. The number of genes expressed (nFeature_Spatial) was assessed by counting genes with a UMI count greater than zero in each cell or spot, while the number of UMI reads (nCount_Spatial) represented the total UMI count from genes with at least one detected transcript per cell or spot. Mitochondrial contamination was evaluated by calculating the percentage of UMI counts originating from mitochondrial genes for each cell or spot within a sample. Similarly, ribosomal contamination was measured as the fraction of UMI counts attributed to ribosomal protein genes, which reflect translational activity rather than rRNA contamination. Since ribosomal gene abundance varies across cell types, this metric provides insight into data composition. Additionally, hemoglobin gene contamination was assessed to detect potential blood contamination. Fifth, once the data passed QC, the subsequent analyses were to normalize the data using SCTransform. A regularized negative binomial regression is used by this function to normalize the UMI count data. Sixth, the SCTransform function looks for features contributing to large cell-to-cell variation in the dataset. Seventh, the data is scaled using a linear transformation. Eighth, PCA is performed on the scaled data. Ninth, a K-nearest neighbor graph is the clustering approach used in Seurat to identify “communities” using the first 14 principal components from the PCA. The FindClusters function optimizes to find the best clusters. Tenth, a UMAP is generated to explore the data. Eleventh, the FindAllMarkers function is used to find gene markers specific to each cluster and it uses the Wilcoxon rank-sum test.

Assessment of published mouse phenotypes

We utilized MouseMine [62] to query for known phenotypes in mouse models associated with 24 genes that we identified in this study as relevant to 9p deletion syndrome. Twenty-one genes had known phenotypes.

Results

9p-related syndrome genomic cohort

Although karyotype and chromosome microarray are often utilized in the identification of chromosome 9p syndromes, these approaches provide only a limited viewpoint on the full range of variation occurring on 9p. Whole-genome sequencing enables comprehensive detection of genomic variation in both the nuclear and mitochondrial genomes. For most variants, it provides nucleotide-level resolution of most structural variants, including precise characterization of their breakpoints. This level of resolution surpasses that of karyotypes and chromosome microarrays, where even a small deviation in breakpoint localization on the order of a few nucleotides can determine whether a gene is involved in the structural variant. For this reason, we performed whole-genome sequencing on 100 individuals with 9p-related syndromes and family members (Additional file 2: Tables S1, S2, S4, S5, S6). Most individuals sequenced were independent probands (n = 85), two were affected siblings, five were fathers, six were mothers, one was a grandfather, and one was a grandmother (Additional file 2: Table S2). To our knowledge, this is the largest cohort of individuals ever assembled for 9p-related syndromes regarding whole-genome sequencing data and provides a unique resource to delineate core features of these syndromes.

Genomic architecture of 9p-related syndromes and implications for individuals

The genomic architecture of 9p genes in 9p-related syndromes was analyzed by examining their copy number, using an approach with paralog specificity, in all sequenced individuals in this study (Additional file 2: Tables S7, S8). This strategy improves resolution in regions with high sequence similarity, enabling more accurate characterization of genes on 9p. Subsequently, data from the 1000 Genomes Project (controls) were processed in the exact same manner (Additional file 2: Table S9). The two datasets were combined, and we used principal component analysis to perform unsupervised learning on the data. The results are shown in Fig. 1 (scree plot in Additional file 1: Fig. S2).

Fig. 1.

Fig. 1

Genomic architecture of 9p-related syndromes. Each point, in this principal component analysis of genic copy number on 9p, represents an individual either sequenced in this study as part of a family with 9p-related syndromes or from the 1000 Genomes Project (controls). The individuals from the 1000 Genomes Project cluster in the center of this plot. Small schematics of the p-arm are shown in rectangles with tel indicating telomere and cen indicating centromere. Orange represents deletion, and blue represents duplication. This PCA reveals the complexity of what is called a 9p deletion syndrome (all the individuals with orange) and what is called a 9p duplication syndrome (all the individuals with blue). This kind of map is critical for researchers and families to connect with each other based on similar genomic events and for future refined care, beyond research, and into the clinic

In the center of the PCA are individuals from the 1000 Genomes Project who have copy numbers of two for most genes on 9p. This PCA reveals the complexity of what is called a 9p deletion syndrome and what is called a 9p duplication syndrome. On the left side of the plot and moving up the y-axis are individuals ordered by increasing deletion size starting from the telomere and getting larger, as shown in orange. On the bottom left is the one individual in our study who has a deletion near the centromere, shown in orange. On the top right are three unrelated individuals who have a large deletion, shown in orange, followed by a large duplication, shown in blue, on the p-arm. Viewing the middle/bottom of the plot and moving to the right are individuals with increasing duplication size, shown in blue, starting with duplications that are approximately half the p-arm in size and moving to those with full duplication of 9p.

By using estimates of copy number with paralog-level specificity that is only derivable using whole-genome sequencing data, the first complete view of the genomic architecture of 9p genes in 9p-related syndromes has been accomplished in this study. Future work and implementation of these results will enable more refined matching of individuals with 9p-related syndromes to other families. For example, the three unrelated individuals at the top right of the plot containing similar deletion/duplication events likely share more phenotypes with each other than they do with the individual at the bottom left who has a deletion near the centromere. The general clinical information for these four specific individuals states they have 9p deletion syndrome. However, whole-genome sequencing clearly shows these are not the same genomic events at all. Further genotype–phenotype analyses are required to fully determine the links of the events on this genomic map to specific phenotypes.

Machine-learning model to predict 9p deletion syndrome

Our next analysis focused on building a machine-learning model for 9p deletion syndrome. This model provides valuable insights into the contributions of specific genes to 9p deletion syndrome and serves as a useful tool for identifying individuals in other large sequencing-based research cohorts who may not yet have a formal diagnosis. Importantly, this work is conducted solely in a research context, and the model is not intended for clinical use. For this approach, we examined the genic copy number of 9p genes. Specifically, the copy number of the 488 genes on 9p was calculated (as described above) in individuals with 9p deletion syndrome and in individuals from the 1000 Genomes Project (who do not have 9p deletion syndrome). A random forest machine-learning approach was used to train a model using 70% of each dataset in the training set and 30% of each dataset in the testing set. Using fivefold cross-validation on the training set, we found that our model had high accuracy of 0.997, sensitivity of 0.9, specificity of 1, positive predictive value of 1, and a negative predictive value of 0.997. The top 10 features (genes) contributing to the signal were RFX3_DT (mean decrease in Gini [MDG] = 5.67), RCL1 (MDG = 4.81), LOC105375956 (MDG = 4.57), MIR101_2 (MDG = 3.30), UHRF2 (MDG = 3.16), SMARCA2 (MDG = 3.14), RFX3 (MDG = 3.13), RLN2 (MDG = 3.08), LOC124902112 (MDG = 3.07), and GLIS3 (MDG = 2.99). The full set of features ordered by importance is shown in Additional file 1: Fig. S3. This gives us insights into relevant genes for this syndrome. We also used a Naïve-Bayes strategy to examine the 488 genes on 9p with the training and testing set up in the same manner as in the random forest. The result of the Naïve-Bayes strategy was an accuracy of 0.996, sensitivity of 0.95, specificity of 0.997, positive predictive value of 0.904, and negative predictive value of 0.999. The accuracy of this model indicates that we have good ability to identify individuals with 9p deletion syndrome using whole-genome sequencing data, and this model could be applied to other large sequencing-based research cohorts to identify additional individuals with 9p deletion syndrome.

Precision genomics in 9p deletion syndrome

Most individuals sequenced in this study have 9p deletion syndrome. These individuals were assessed by a minimum of one sequencing technology and karyotype. Characterization of the structural variants in each individual revealed a total of 184 breakpoints (Additional file 2: Table S10). Precise nucleotide resolution was achieved in 159 (86.4%) of these events. Furthermore, we predicted 10 translocation events from the sequencing data, and all were confirmed to be real by FISH (Additional file 2: Table S11). Eleven individuals contained structural variation involving chromosome 9 and at least one other chromosome. In one individual, the variation was predicted to be quite complex based on the karyotype and sequencing results (Fig. 2, Additional file 2: Table S12) and is likely the result of chromothripsis [6365].

Fig. 2.

Fig. 2

Complex structural variation detected in 9p.123.p1. A Shown is the karyotype of individual 9p.123.p1 where abnormalities were detected involving chromosome 8, chromosome 9, chromosome 11, and chromosome 13. B Shown is the detailed characterization of the complex variation involving chromosome 8, chromosome 9, and chromosome 3 as detected using short-read whole-genome sequencing. A, 8q del 1 breakpoint 1 (chr8:105028001; linked to Bright). Bright, right (centromere) side of region B (chr9:8384167; linked to A). Bleft, left (telomere) side of region B; inverted reads at chr9:8384164; linked to Ileft. C, 8q del 2 breakpoint 2 (chr8:107839946; linked to Dleft). Dright, chr3:166346923 (linked to Kleft). Dleft, chr3:166346922 (linked to C). E, 8q del 2 breakpoint 1 (chr8:114357399; linked to Fright). Fright, right (telomere) side of region F (chr3:167475935; linked to E). Fleft, left (centromere) side of region F (chr3:167475865; linked to Hright). G, 8q del 2 breakpoint 2 (chr8:114697049; linked to Hleft). Hright, right (telomere) side of region H (chr8:111368737; linked to Fleft). Hleft, left (centromere) side of region H (chr8:111368715; linked to G). Iright, inverted reads at chr9:8430297 (linked to Jright). Ileft, inverted reads at chr9:8430297 (linked to Bleft). Jright, inverted reads at chr9:8719693 (linked to Iright). Jleft, chr9:8719701 (linked to Kright). Kright, chr3:157869331 (linked to Jleft). Kleft, chr3:157869330 (linked to Dright). Not to scale. C Shown are the FISH results for chromosome 8 and chromosome 9. The green probe is to 14q32, the blue probe is to the centromere of chromosome 8, and the red probe is to 8q24. D Shown is the updated schematic of chromosome 9 based on the karyotype, WGS, and FISH experiments. E Shown is the updated schematic of chromosome 8 based on the karyotype, WGS, and FISH experiments. F Shown are the FISH results for chromosome 11 and chromosome 13. The red probe is to 13q14, the green probe is to 13q34, and the yellow probe is to 11p15. G Shown is the updated schematic of chromosome 11 based on the karyotype, WGS, and FISH experiments. H Shown is the updated schematic of chromosome 13 based on the karyotype, WGS, and FISH experiments. Structural variation in this individual is likely due to chromothripsis

Using breakpoint information from sequencing data, FISH probes were designed, and all predictions were confirmed by the FISH experiments (Fig. 2, Additional file 1: Fig. S4, Additional file 1: Fig. S5, Additional file 1: Fig. S6, Additional file 1: Fig. S7, Additional file 1: Fig. S8, Additional file 1: Fig. S9, Additional file 1: Fig. S10, Additional file 1: Fig. S11, Additional file 1: Fig. S12, Additional file 1: Fig. S13). This emphasizes the importance of sequencing data and its accuracy in the identification of fully resolved structural variation. Shown in Additional file 1: Fig. S14 are the chromosome 9 schematics for all the other individuals assessed in this study. Reports containing these schematics and other related information were conveyed back to the research participants.

Overall, 53 individuals had deletion alone (75.7%), 4 had deletion and duplication on chromosome 9 (5.7%), 11 individuals had more complex variation also involving a translocation of sequence from another chromosome (15.7%), and 2 individuals had no large genic deletions (2.9%).

We previously published DNVs detected in individual 9p.100.p1 using sequencing data generated in the father, mother, and child. In this study, we recruited three new trios and detected DNVs in these trios. In each case, the 9p structural variant event was de novo. Examining single-nucleotide variants and small insertions/deletions, we did not find any missense or loss-of-function DNVs on 9p (Additional file 2: Table S13).

Preferential copy number breakpoints in two late-replicating regions of 9p

To understand the mechanism underlying the structural variants formed in 9p deletion syndromes, we examined published Repli-seq data [66] on the p-arm of chromosome 9 from ENCODE. Through this search, we identified four regions on 9p that are late-replicating regions (i.e., highest signal during S4 and G2) (Fig. 3). We named these regions late-replicating region 1 (LRR1, b38: chr9:7302516–14625487), late-replicating region 2 (LRR2, b38: chr9:16012189–18667816), late-replicating region 3 (LRR3, b38: chr9:22193299–26926755), and late-replicating region 4 (LRR4, b38: chr9:27450126–32408022). These regions are shared across several cell types (Additional file 1: Fig. S15). Further review of these regions and comparison to the HumCFS fragile site database [67] identified LRR2 as the known human fragile site FRA9G that is on 9p22 and includes the gene CNTLN [67, 68], LRR3 as the known human fragile site FRA9C and includes the gene ELAVL2 [67, 69], and LRR4 as the recently molecularly mapped fragile site FRA9A with breakpoints spanning all 8.2 Mb of 9p21.1–9p21.3, caused by the (GGGGCC)N repeat expansion in C9orf72 [67, 70]. LRR1, including PTPRD, resides in 9p24, a region associated with an unnamed chromosomal fragile site [71] and a known recurrent DNA double-strand break cluster [72]. Visual inspection revealed several breakpoints in LRR1 and LRR2 (Fig. 3). There were no breakpoints in LRR3 or LRR4 identified in any individual with 9p deletion syndrome. We hypothesized that a replication-based mechanism may be critical for structural variant formation in 9p deletion syndrome. Of the 70 probands in our study, 3 had structural variants with a breakpoint in LRR1 and LRR2 (4.3%), 29 had structural variants with a breakpoint in LRR1 (41.4%), 11 had structural variants with a breakpoint in LRR2 (15.7%), 25 did not have breakpoints in LRR1 or LRR2 (35.7%), and, as mentioned above, 0 individual had breakpoints in either LRR3 or LRR4, while 2 individuals had no 9p deletions (2.9%) (Fig. 3). We further tested whether there was an enrichment of individuals with breakpoints at LRR1 or LRR2. For this, we calculated the proportion of 9p the regions encompass. LRR1 is 17.2% of the 9p region, and LRR2 is 6.2%. In total, 32 individuals had a breakpoint in LRR1, and this was enriched given the size of the region (binomial test: n = 70, x = 32, p = 0.172, p-value = 3.0 × 10−8), and 14 individuals had a breakpoint in LRR2 that was also enriched given the size of the region (binomial test: n = 70, x = 11, p = 0.062, p-value = 3.8 × 10−3). These two regions are of critical importance in 9p deletion syndrome. One potential defined mechanism underlying the structural variants we identify in individuals with 9p deletion syndrome is chromothripsis [7376]. This mechanism has been identified in some other congenital diseases [7780].

Fig. 3.

Fig. 3

9P-ARCH copy number variant breakpoints preferentially localize to two late-replicating regions on 9p. A Chromosome bands on the p-arm of chromosome 9. B Shown in orange are deletions and in blue duplications identified by WGS in 9P-ARCH, respectively. C The previously published Repli-seq tracks 50 are shown from ENCODE for GM12878 cells. These are consistent with replication timings in several other cell lines shown in Fig. S15. We identified four late-replicating regions with the highest signal intensity in the S4 phase or G2 phase of the cell cycle. They are as follows: late-replicating region 1 (LRR1, b38: chr9:7302516–14625487), late-replicating region 2 (LRR2, b38: chr9:16012189–18667816), late-replicating region 3 (LRR3, b38: chr9:22193299–26926755), and late-replicating region 4 (LRR4, b38: chr9:27450126–32408022). D and E The majority of breakpoints identified in individuals with 9p deletion syndrome resided in LRR1 on 9p. This region contains the long gene (PTPRD) and is a region known to be an unnamed fragile site and recurrent double-stranded break cluster region 56. Several of the other events have breakpoints in LRR2, which is the known human fragile site FRA9G. There are no breakpoints in LRR3 or LRR4 that are known to be human fragile sites (FRA9C, FRA9A). The data in this figure support replication-based issues in LRR1 and LRR2 underlying the variation in 9p deletion syndromes. F Shown is the percent of individuals with 9p deletion syndrome that have a breakpoint in one of the late-replicating regions or in another part of 9p (orange) and the percent of genomic space that region takes up on 9p (gray). There is a significant excess of breakpoints in LRR1 (p = 3.0 × 10−8) and LRR2 (p = 3.8 × 10−3)

Identification of relevant 9p genes in 9p deletion syndrome

Focused analyses of the copy number of genes on 9p were pursued in the individuals diagnosed with 9p deletion syndrome (n = 70) in comparison to control individuals from the 1000 Genomes Project (n = 2504) who do not have a diagnosis of 9p deletion syndrome. For each gene on 9p, a Mann–Whitney U-test was performed to identify the genes with a significant difference in copy number between the two groups. Of the 488 genes, 287 (58.8%) showed a significant difference in copy number (Additional file 2: Table S14). Based on the identification of 287 genes, 2 possibilities emerge regarding the relevance of genes on 9p: (1) the critical factor may be the sheer number of genes deleted, or (2) it may be the importance of specific genes. While recognizing that both possibilities are valid, our subsequent analyses prioritized the latter, examining the contribution of specific genes. By adding information on constraint from gnomAD [26] and probability of haploinsufficiency from a previous publication [25] (Fig. 4), we found 60 of the 287 genes met some criteria for constraint or haploinsufficiency. It should be noted that 161 significant genes (most are LOC genes) on 9p do not have any constraint or dosage metrics available in the files we downloaded for constraint and probability of haploinsufficiency (Additional file 2: Table S15). For this reason, we examined additional evidence of potential relevance of significant 9p genes. Next, we looked for genes in this region with enrichment of DNVs on 9p, in two large cohorts, as another layer of information to prioritize genes.

Fig. 4.

Fig. 4

Prioritization of genes in 9p deletion syndrome. Shown is the gene prioritization approach in this study to reduce the number of total 9p genes from 488 down to 24 genes of interest. This approach required comparison of our sequencing data with control sequencing data. Integration of other genomic dataset including our expanded denovo-db and detection of DNVs in WGS data from families with autism was necessary. Next, we developed a new statistical method called DiamondsDenovo and used this in parallel to other tools (fitDNM, denovolyzeR) to find genes and genomic regions with excess DNVs in individuals with relevant phenotypes. Finally, the list was narrowed down further to 24 genes with mean copy number less than 1.5 across individuals with 9p deletion syndrome sequenced in this study

Two cohorts were assessed for enrichment of DNVs on 9p (Fig. 4). The first cohort was assembled through extensive searching of the literature, and we identified 40,198 unique DNV sites on the p-arm of chromosome 9 in 72,794 individuals. These DNVs (Additional file 2: Table S16) are included in our newest release (version 1.8) of denovo-db. Testing for enrichment did not include individuals who did not have a phenotype of interest, and the final dataset for testing consisted of 62,181 individuals. Since individuals underwent either whole-exome sequencing or whole-genome sequencing, we focused our enrichment of de novo variants in this dataset only on regions accessible by whole-exome sequencing. These included 5′ UTR, 3′ UTR, promoters, coding exons, or the entire gene (by category: synonymous, missense, LOF, missense + LOF). Within coding exons, testing for enrichment of loss of function and missense DNVs was considered separately from testing for enrichment of synonymous DNVs. No genes had enrichment of DNVs in their 5′ UTR (Additional file 2: Tables S17, S18, S19), 1 gene had enrichment in the 3′ UTR (AK3) (Additional file 2: Tables S20, S21, S22), no genes had enrichment in their promoter (Additional file 2: Tables S23, S24, S25), 14 genes contained at least one coding exon with an excess of missense and LOF variants (Additional file 2: Tables S26, S27, S28) (BRD10, CCIN, GLIS3, KANK1, KIF24, MOB3B, MYORG, RANBP6, RIC1, RUSC2, SMARCA2, TAF1L, TOPORS, UBAP1), and no coding exons had an excess of synonymous variants (Additional file 2: Tables S28, S29, S30). For the entire gene analyses (Additional file 2: Table S31), 8 genes had an excess of synonymous DNVs (Additional file 2: Table S31) (ATOSB, BRD10, CIMIP2B, PUM3, RIC1, SAXO1, SPATA31F1, ZNG1A), 16 genes had an excess of missense DNVs (Additional file 2: Table S31) (ATOSB, BAG1, BRD10, MYORG, PHF24, PLPP6, PUM3, RIC1, RIGI, SMARCA2, SPATA31F1, SPATA31F3, SPATA31G1, SPMIP6, WASHC1, ZNG1A), 7 genes had an excess of LOF DNVs (Additional file 2: Table S31) (BRD10, HACD4, MYORG, PHF24, PLPP6, RIC1, SAXO1), and 19 genes had an excess of LOF + missense DNVs (Additional file 2: Table S31) (ATOSB, BAG1, BRD10, HACD4, MYORG, PHF24, PLPP6, PUM3, RFX3, RIC1, RIGI, SAXO1, SMARCA2, SPATA31F1, SPATA31F3, SPATA31G1, SPMIP6, WASHC1, ZNG1A). The second cohort of individuals assessed included 5824 individuals with autism with available whole-genome sequencing data from SPARK or SSC. We detected 4066 DNVs on the p-arm of chromosome 9 in this data (Additional file 2: Table S30). We focused our analyses of this data on regions of the noncoding genome, including candidate cis-regulatory elements in fetal brain, generally in all ENCODE, and in VISTA enhancers. We did not identify regions meeting genome-wide significance in this data (Additional file 2: Tables S33, S34, S35, S36, S37, S38, S39, S40). It should be noted that 5824 individuals are substantially fewer than the 62,181 individuals available for the coding region analyses. Future increase in sample size would help with detection of relevant noncoding regions. For example, the candidate cis-regulatory element with the lowest p-value in the data (b38: chr9:12904581–12904931) contained three different DNVs and had a p-value of 8.80 × 10−10 but did not reach our conservative cutoff of p < 8.33 × 10−12.

By adding information regarding significant genes from the de novo variant analyses, we expanded our list of 9p genes, with potential relevance, to 71 genes. Since many deletion cases are within the range of the telomere to 9p22 cytobands, we focused specifically on the subset of these genes within these chromosome bands. Through these analyses, the set of genes for consideration was 28 (RCL1, BRD10, GLIS3, SLC1A1, KDM4C, RIC1, UHRF2, JAK2, AK3, CD274, CDC37L1, RFX3, DOCK8, RANBP6, DENND4C, DMRT1, PSIP1, DMRT3, KANK1, ZNG1A, PLPP6, SMARCA2, PUM3, VLDLR, DMRT2, SLC24A2, PTPRD, BNC2). Twenty-four of these genes exhibited an average copy number less than 1.5 for the 9p deletion cohort (RCL1, BRD10, GLIS3, SLC1A1, KDM4C, RIC1, UHRF2, JAK2, AK3, CD274, CDC37L1, RFX3, DOCK8, RANBP6, DMRT1, DMRT3, KANK1, ZNG1A, PLPP6, SMARCA2, PUM3, VLDLR, DMRT2, PTPRD) (Additional file 2: Table S41). Based on genomic analyses alone, this list of genes is the best candidate gene list attainable at this time for 9p deletion syndrome as they are deleted in up to 83% of individuals with this syndrome.

Identification of expression patterns of 9p genes

Since neurodevelopmental disorders are a shared feature across 9p deletion syndrome, a spatial transcriptomic experiment of an embryonic mouse head at E13.5 was performed as this is a critical time in embryonic brain development, including the expansion of the forebrain and refinement of the midbrain and hindbrain. By examining the whole head, we also were able to gain information regarding craniofacial development as this is a time when the upper jaw, nose, and lower jaw are being formed. In Fig. 5, we show the expression patterns in the mouse head with several genes showing expression in both the face and the brain. Both regions are relevant in relation to 9p-related syndromes. While several genes exhibit expression in the face, two were in very specific regions of the face, and those included Glis3 and Bnc2. Biallelic loss of function of GLIS3 is known to underlie a syndrome in humans with several facial dysmorphic features [81]. In the brain, some genes have restricted expression to one area (e.g., Dmrt3 in radial glial cells), whereas there are several genes with expression in numerous regions of the brain (Uhrf2, Cdc37l1, Rfx3, Psip1, Ptprd, Ak3, Smarca2, Pum3).

Fig. 5.

Fig. 5

Spatial expression in E13.5 mouse head of genes relevant to 9p deletion syndrome. A Histology showing sagittal section of E13.5 mouse head. B Cell type assignment of cells in 10× Visium HD experiment generated in this study. C through B show the expression of the gene listed below each image. To the left of each image is the scale bar, and below the gene name is the mean copy number (CN) of the gene in individuals with 9p deletion syndrome. Note, we originally looked to assess the full set of 28 genes, but 2 were not represented in the Visium HD probe set (Brd10, Zng1a). The most interesting genes are the 24 genes where the mean 9p CN is < 1.5 across the 9p deletion syndrome cohort

Excess mitochondrial genome copy number in 9p

Two of the genes we identified as relevant in 9p deletion syndrome have consequences on mitochondrial function (AK3, ZNG1A). Therefore, assessment of the mitochondrial genome was pursued. For each individual, the mitochondrial genome was derived from whole-genome sequencing data. The mitochondrial genomes displayed expected relationships between mothers and their children (Fig. 6, Additional file 2: Table S6). Testing of mitochondrial genome copy number was also pursued in both the individuals with 9p deletion syndrome in this study (n = 69 individuals) and individuals who are not affected by 9p deletion syndrome from a previous study [82] (n = 1801 individuals). Only individuals with whole-genome sequencing data generated on DNA derived from blood were considered in this analysis. Individuals with 9p deletion syndrome had mitochondrial genome copy numbers of 271.4 ± 58.7, and individuals without 9p deletion syndrome had mitochondrial genome copy numbers of 219.5 ± 32.4. This was a significant difference in copy number (Mann-Whitney U-test p = 3.9 × 10−15) (Fig. 6).

Fig. 6.

Fig. 6

Mitochondrial genome assessment using EGP on Illumina short-read whole-genome sequencing data. A Maximum likelihood tree of mitochondrial genomes for all individuals sequenced in this study. The evolutionary history was inferred by using the maximum likelihood method and Tamura-Nei model. The tree with the highest log likelihood (−29,132.73) is shown. Initial tree(s) for the heuristic search were obtained automatically by applying neighbor-joining and BioNJ algorithms to a matrix of pairwise distances estimated using the Tamura-Nei model and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. This analysis involved 98 nucleotide sequences. Codon positions included were 1 st + 2nd + 3rd + noncoding. There was a total of 16,569 positions in the final dataset. Analyses were conducted in MEGA X. All familial relationships are confirmed in this analysis, and individuals NA05067 and NA03226 are identified as the same individual. B Mitochondrial genome copy number estimates specifically for individuals with WGS on blood-derived DNA. The data for the 1801 controls is from 1801 unaffected children that had previously been analyzed with EGP 66. There is a significantly higher mitochondrial genome copy number in individuals with 9p deletion syndrome (271.4 ± 58.7) than in controls (220.7 ± 32.4) (Mann-Whitney U p = 3.9 × 10−15)

Discussion

The field of genomics is shifting towards whole-genome sequencing as a first-line approach to assess human phenotypes. However, syndromes due to large structural variants are often still assessed using lower resolution strategies, including karyotypes and chromosome microarrays. Whole-genome sequencing provides fine-scale resolution of the genome and is a needed next step as a standard for characterizing these syndromes. This study provides a first-of-its-kind look at 9p-related syndromes by performing whole-genome sequencing on 100 individuals in families with 9p-related syndromes, from both the WashU cohort collection and Coriell Repository, including 85 total unrelated probands with 9p-related syndromes to generate the clearest evaluation of variation in these syndromes (Additional file 2: Table S2). In Fig. 1, a map of these syndromes shows the diversity at the level of the variant types in different individuals, providing a glimpse into the potential reason for high phenotypic variability in these syndromes. This map will be used to link families to the nearest other families in 9p gene space. Currently, strategies like GeneMatcher [83] enable researchers to link based on shared small variants in single genes. 9p-related syndromes now can have their own type of “CNVmatcher” within the syndrome’s genomic space. Furthermore, we generated a machine learning model to classify 9p deletion syndrome with high accuracy.

Whole-genome sequencing has enabled the resolution of complex structural variation in 9p-related syndromes, including the complex case shown in Fig. 2, which is likely due to chromothripsis. Gene-level resolution is not attainable without whole-genome sequencing. We advocate for whole-genome sequencing as a first test for the assessment of these syndromes, with the addition of other orthogonal technologies (e.g., FISH, karyotype) as needed for confirmation analyses.

For future work at the level of the individual, 26 additional individuals with 9p-related syndromes were found with array data from various clinics (Additional file 2: Table S3). Future studies will include whole-genome sequencing of these individuals. Another future step is to sequence full trios, especially for examining balanced translocation carrier parents with requisite consequences for genetic counseling in those families. In total, we have sequenced four complete trios, including a child with a 9p deletion. One of these was previously published [41], and three were new to this study. Calling of DNVs was performed in this study (Additional file 2: Table S13), and while no single-nucleotide or small insertion/deletion DNVs of obvious functional consequence on chromosome 9 were identified in these individuals, this set is a first-of-its-kind trio-based analyses of DNVs in whole-genome sequencing data in 9p-related syndromes. This is only the first step of what can be achieved using family-based WGS in these syndromes.

Cross-cohort analyses of 9p-related syndromes in relation to gene discovery were focused specifically on individuals with 9p deletion syndrome since most individuals sequenced in our study had this specific syndrome. Several strategies were used to prioritize the relevant genes for this syndrome (Fig. 4) including major expansion of our database of DNVs and development of a new statistical method (i.e., DiamondsDenovo) to test for DNV enrichment on 9p in related phenotypes. In total, we identified 28 genes of interest and further characterized these by spatial expression experiments examining their expression in the developing mouse face and brain (Fig. 5). Twenty-four of the genes (Fig. 7, Additional file 2: Table S41) have an average copy number less than 1.5 across all individuals with 9p deletion syndrome, and we next describe each of them further. RCL1 is RNA terminal phosphate cyclase like 1. Copy number variants in this gene have been detected in multiple individuals with neurodevelopmental or psychiatric disorders [84, 85]. Some aspects of its expression pattern in the brain are unique to higher-order primates [85]. It forms a complex with BMS1 to process 18S rRNA [86]. BRD10 is bromodomain containing 10 and is a gene that was formerly known as KIAA2026. Little is known about this specific protein, but it is likely, based on its predicted function, that it works as a regulator of transcription. GLIS3 is GLIS family zinc finger 3. This gene has been identified as causal in autosomal recessive neonatal syndrome of diabetes mellitus and congenital hypothyroidism. It localizes to the nucleus and the primary cilia of cells [87]. Expression of this gene was identified in the face and the brain in the present study. Individuals with the human syndrome have shared common facial dysmorphic features, including bilateral low-set ears, a depressed nasal bridge with an overhanging columella, elongated and upslanted palpebral fissures, a long philtrum, and a thin vermilion border of the upper lip [81]. Many of these features are also observed in individuals with 9p deletion syndrome [1, 5]. However, the syndrome associated with this gene is autosomal recessive, so more detailed genotype–phenotype studies are required to explore this further. SLC1A1 is solute carrier family 1. In a five-generation family, the genetic variant responsible for schizophrenia and bipolar schizoaffective disorder was identified as a deletion specifically in this gene [88]. Copy number variants in this gene have also been associated with schizophrenia [89]. This gene also underlies autosomal recessive dicarboxylic aminoaciduria [90]. It has also been associated with obsessive compulsive disorder [91]. Specific missense variants in the gene underlie hot water epilepsy [92]. It has also been suggested as a gene for autism based on linkage to 9p24 [93]. Mice completely lacking this gene have been found to have age-dependent changes in behavior and in the brain (Additional file 2: Table S42). The effects include neurodegeneration and neuronal oxidative stress and have been suggested to be involved in Parkinson’s disease [94, 95]. KDM4C is lysine demethylase 4 C (previously known as JMJD2C), which was associated with autism, obsessive compulsive disorder [96], and cancer [97]. RIC1 is RIC1 homolog, RAB6A GEF complex partner 1. It underlies the autosomal recessive condition called CATIFA syndrome [98]. This syndrome is a rare neurodevelopmental disorder that is typified by distinctive facial features, including long philtrum, elongated face, and small ears. There are also often vision issues and oral malformations, including cleft lip and/or palate [98, 99]. UHRF2 is ubiquitin-like with PHD and ring finger domains 2. The gene is known for its role in cancer [100]. In our spatial transcriptomic analyses, we found expression of this gene in many regions of the brain, including the radial glial cells, and in the face. A recent study of mice lacking this gene found several consequences on brain function and learning and memory [101] (Additional file 2: Table S42). Another study of knockout mice showed they developed frequent seizures in adulthood with accompanying electrical activity abnormalities. They also found alterations in 5-methylcytosine levels in the brains of these mice [102, 103]. JAK2 is Janus kinase 2. It is involved in several cancers and hematopoietic disorders [104]. AK3 is adenylate kinase 3. It has been found to localize to the mitochondrial matrix within cells [105]. AK3 phosphorylates adenosine monophosphate using guanosine-5′-triphosphate (GTP) and is critical for the citric acid cycle [106]. CD274 is CD274 molecule. It is a gene implicated in cancer [107] and in the immune system and inflammatory response [108]. CDC37L1 is cell division cycle 37-like 1, Hsp90 cochaperone. It has expression throughout the brain in our spatial transcriptomics experiment. It has been indicated as a gene involved in cancer [109] and in regulating tau as part of tauopathies [110], respectively. RFX3 is regulatory factor X3. In our spatial transcriptomics experiment, it is expressed in many places in the brain. It is well-established as a gene involved in autism [111] and is a regulator of cilia [112115]. DOCK8 is dedicator of cytokinesis 8. It has been found as the gene involved in the autosomal recessive hyper-IgE syndrome with recurrent infections [116], for which hematopoietic stem cell transplantation has been recommended as a potential therapy [117]. It has also been identified as a regulator of microglia and potentially involved in neurodegeneration [118]. RANBP6 is RAN-binding protein 6, and little is known about this gene. DMRT1, DMRT2, and DMRT3 are part of a gene family and are called doublesex and mab-3 related transcription factors 1, 2, and 3, respectively. They are genes involved in sexual development [119]. In our spatial transcriptomic analyses, Dmrt1 and Dmrt3 were expressed in the radial glial cells. The DMRT gene family has been indicated in brain development [120]. KANK1 is KN motif and ankyrin repeat domains 1. Variants in this gene have been identified in individuals with cerebral palsy [121], autism [122], nephrotic syndrome [123], or male congenital genitourinary anomalies [124]. ZNG1A is Zn-regulated GTPase metalloprotein activator 1A. It is a gene involved in the function of mitochondria and functions in the utilization of zinc. In mice lacking Zng1, there is an impairment of the homeostasis of zinc and effects of zinc in the diet. There is also an effect on mitochondrial function [125]. PLPP6 is phospholipid phosphatase 6. Little is known about this gene beyond that knockout mice exhibit effects related to allergies and lung function [126]. SMARCA2 is SWI/SNF-related BAF chromatin remodeling complex subunit ATPase 2, and mutations result in autosomal dominant conditions: blepharophimosis-impaired intellectual development syndrome [127] and Nicolaides-Baraitser syndrome [128]. Since this gene underlies a dominant disorder and some features overlap with 9p deletion syndrome, it is likely a contributor to this syndrome. PUM3 is pumilio RNA-binding family member 3, and little is known about its function. VLDLR is very low-density lipoprotein receptor. It is the gene involved in autosomal recessive cerebellar hypoplasia impaired intellectual development and dysequilibrium syndrome 1 [129]. It is a receptor of reelin and important for brain development [130]. Early work on VLDLR highlighted its role in lipid metabolism [131]. PTPRD is protein tyrosine phosphatase receptor type D, with specific expression in the brain [132]. Mice completely lacking Ptprd (homozygous null) have learning problems [132] (Additional file 2: Table S42). Furthermore, this gene is the orexigenic receptor for the asprosin hormone, and mice completely lacking the gene are lean, exhibiting appetite loss because they cannot respond to asprosin [133]. Many of these genes have relevant phenotypes regarding 9p-related syndromes. Since two of them have links to mitochondrial dysfunction, testing of the mitochondrial genome was performed, and individuals with 9p deletion syndrome have excess mitochondrial genome copy number in comparison with unaffected individuals.

Fig. 7.

Fig. 7

Copy number status of 24 top priority genes in 9p deletion syndrome. Each box represents the copy number across the entire gene (rows) for a given individual (columns). Orange means the individual has a deletion of that gene, and white means they do not. The genes are sorted from the most individuals containing a deletion in the gene to the least. Note that smaller deletions within these genes are not represented in these estimates

Conclusions

Whole-genome sequencing is an important tool to understand the genomics of human phenotypes. While it has not traditionally been a first-line test for genomic syndromes including 9p-related syndromes, we show through this research study its utility. First, we built a comprehensive map of the genomic architecture of these syndromes using highly accurate gene copy number estimates from sophisticated analyses of whole-genome sequencing data (Fig. 1). Our first use of this data is to relay to families (who are interested) other families with similar events based on these estimates. We hypothesize that individuals close in the map will share more phenotypes than those who are distant from each other. Future studies will integrate phenotypic data with information from this genomic map. As we show, 9p-related syndromes can be quite heterogeneous, with a minority of individuals having quite different genomic events than the majority. A CNVMatcher (akin to GeneMatcher) would be useful for researchers. Next, we built a machine learning model to classify 9p deletion syndromes using copy number estimates for genes on 9p. These models work well on our data, and future studies will expand on this work. We highlight a particularly challenging case from the genomic perspective in Fig. 2. While confirmations were performed using FISH, the predictions made by analyzing the sequencing data were all correct and resolved the variation in this individual. The second half of the paper focused on what critical genes underlie 9p deletion syndromes. We focused on individuals with a diagnosis of 9p deletion syndrome because the majority of our probands were from this group (n = 70). By employing several techniques, we centered on 24 genes of critical importance underlying 83% of individuals with 9p deletion syndrome. Some are well known and make sense regarding 9p-related phenotypes (e.g., RFX3, SMARCA2), while others are relatively enigmatic as to their function and the phenotypic consequences of their disruption (BRD10, GLIS3, RANBP6, PLPP6, PUM3). Future work should address these 24 genes to further bring clarity to these syndromes.

Supplementary Information

13073_2025_1563_MOESM1_ESM.pdf (5.4MB, pdf)

Additional file 1: Fig S1. Schematic of the 9p-ARCH Study. Fig S2. Scree Plot of PCA Shown in Main Fig. 1. Fig S3. Feature Importance of Random Forest Model. Fig S4. FISH Confirmation of Translocation in 9p.102.p1 Detected by Short-Read WGS. Fig S5. FISH Confirmation of Translocation in 9p.105.p1 Detected by Short-Read WGS. Fig S6. FISH Confirmation of Translocation in 9p.109.p1 Detected by Short-Read WGS. Fig S7. FISH Confirmation of Translocation in 9p.114.p1 Detected by Short-Read. WGS. Fig S8. FISH Confirmation of Translocation in 9p.116.p1 Detected by Short-Read. WGS. Fig S9. FISH Confirmation of Translocation in 9p.119.p1 Detected by Short-Read. WGS. Fig S10. FISH Confirmation of Translocation in 9p.125.p1 Detected by Short-Read. WGS. Fig S11. FISH Confirmation of Translocation in 9p.127.p1 Detected by Short-Read. WGS. Fig S12. FISH Confirmation of Translocation in 9p.128.p1 Detected by Short-Read. WGS. Fig S13. FISH Confirmation of Translocation in 9p.132.p1 Detected by Short-Read. WGS. Fig S14. Chromosome Schematics of the 70 Individuals with 9p Deletion. Syndrome Sequenced in this Study. Fig S15. Repli-Seq Data from ENCODE Across 9p.

13073_2025_1563_MOESM2_ESM.xlsx (13.5MB, xlsx)

Additional file 2: Table S1. Summary of Individuals with 9p-Related Syndrome and Family Members Analyzed by Whole Genome Sequencing. Table S2. Summary Information for the 9P-ARCH Cohort Collection at Washington University in St. Louis with 9p-Related Syndromes and Individuals from the NIGMS with 9p-Related Syndromes. Table S3. Summary Information for the 9p-ARCH External Data on Individuals with 9p-Related Syndromes. Table S4. Coverage of Illumina WGS Data for Individuals Sequenced in this Study. Table S5. CNPI Calculations on Illumina WGS Data for Individuals Sequenced in this Study. Table S6. Mitochondrial Genome Distance Calculated with EGP on Illumina WGS Data for Individuals Sequenced in this Study. Table S7. Calculated Gene Dosage on 9p in WashU 9p-Related Syndromes Samples Sequenced with Illumina WGS in this Study. Table S8. Calculated Gene Dosage on 9p in Coriell 9p-Related Syndromes Sequenced with Illumina WGS in this Study. Table S9. Calculated Gene Dosage on 9p in 1000 Genomes Project Samples Sequenced with Illumina WGS. Table S10. Final 9p-Related Events in WashU 9p Cohort Based on Illumina Short-Read Whole-Genome Sequencing. Table S11. Candidate Translocation Events Detected by Short-Read WGS and Tested by FISH. Table S12. Breakpoint Sequences and Repeat Information in 9p.123.p1. Table S13. De Novo Variants detected with HAT in Illumina WGS Data for Trios Sequenced in this Study. Table S14. Comparison of 9p Copy Numberfor Genes on 9p in WashU 9p and 1000 Genomes WGS Data. Table S15. Genes that Do Not Have Constraint or Probability of Haploinsufficiency in GnomAD or Published Studies. Table S16. VCF file of denovo-db 1.8 variants on 9p in 72,794 individualsor overlap phenotype with the present study. Table S17. Diamonds 5' UTR Results Using Chimp on 62,181 individuals in denovo-db 1.8. Table S18. Diamonds 5' UTR Results Using Fugu on 62,181 individuals in denovo-db 1.8. Table S19. fitDNM DNV Enrichment in 5' UTR on 62,181 individuals in denovo-db 1.8. Table S20. Diamonds 3' UTR Results Using Chimp with Blue Highlight Multi-Test Correction Significant Regions on 62,181 individuals in denovo-db 1.8. Table S21. Diamonds 3' UTR Results Using Fugu with Blue Highlight Multi-Test Correction Significant Regions on 62,181 individuals in denovo-db 1.8. Table S22. fitDNM DNV Enrichment in 3' UTR on 62,181 individuals in denovo-db 1.8. Table S23. Diamonds Promoter Results Using Chimp with Blue Highlight Multi-Test Correction Significant Regions on 62,181 individuals in denovo-db 1.8. Table S24. Diamonds Promoter Results Using Fugu with Blue Highlight Multi-Test Correction Significant Regions on 62,181 individuals in denovo-db 1.8. Table S25. fitDNM DNV Enrichment in Promoters on 62,181 individuals indenovo-db 1.8. Table S26. Diamonds Loss-of-Function and Missense DNVs in Coding Exons Results Using Chimp with Blue Highlight Multi-Test Correction Significant Regions on 62,181 individuals in denovo-db 1.8. Table S27. Diamonds Loss-of-Function and Missense DNVs in Coding Exons Results Using Fugu with Blue Highlight Multi-Test Correction Significant Regions on 62,181 individuals in denovo-db 1.8. Table S28. fitDNM DNV Enrichment in Coding Exons on 62,181 individuals in denovo-db 1.8. Table S29. Diamonds Synonymous DNVs in Coding Exons Results Using Chimp on 62,181 individuals in denovo-db 1.8. Table S30. Diamonds Synonymous DNVs in Coding Exons Results Using Fugu on 62,181 individuals in denovo-db 1.8. Table S31. DenovolyzeR Results for Genes on 9p with Blue Highlight Multi-Test Correction Significant Geneson 62,181 individuals in denovo-db 1.8. Table S32. DNVs Detected with HAT in 5,824 Individuals with Autism on 9p in SPARK and SSC WGS Datasets. Table S33. fitDNM DNV Enrichment in Fetal Brain Candidate Cis-Regulatory Elements on HAT WGS Data from 5,824 with Autism. Table S34. fitDNM DNV Enrichment in All ENCODE Candidate Cis-Regulatory Elements on HAT WGS Data from 5,824 with Autism. Table S35. fitDNM DNV Enrichment in VISTA Candidate Cis-Regulatory Elements on HAT WGS Data from 5,824 with Autism. Table S36. Diamonds DNVs in Fetal Brain Candidate Regulatory Elements Chimp on HAT WGS from 5,824 with Autism. Table S37. Diamonds DNVs in Fetal Brain Candidate Regulatory Elements Fugu on HAT WGS from 5,824 with Autism. Table S38. Diamonds DNVs in All ENCODE Candidate Regulatory Elements Chimp on HAT WGS from 5,824 with Autism. Table S39. Diamonds DNVs in All ENCODE Candidate Regulatory Elements Fugu on HAT WGS from 5,824 with Autism. Table S40. Diamonds DNVs in All ENCODE Candidate Regulatory Elements Chimp on HAT WGS from 5,824 with Autism. Table S41. Top Priority 9p Genes in 9p Deletion Syndrome Based on Genomic Datasets. Table S42. MouseMine Results for the Top 24 Genes in 9p Deletion Syndrome.

Acknowledgements

Thank you to the Chromosome 9P Minus Network and all the research participants who participated in this study. This work was supported by grants from the National Institutes of Health (R00MH117165 to T.N.T., R01MH126933 to T.N.T., R03HD116062 to T.N.T., P50HD103525 to T.N.T.), the Washington University in St. Louis Just-In-Time Core Usage Funding Program (JIT896 to T.N.T., JIT1174 to T.N.T.), the Simons Foundation (734069 to T.N.T.), Washington University in St. Louis laboratory startup funds to T.N.T., and a gift from Leon Eidelman and Sara Israel to the McDonnell Genome Institute. Thank you to Dr. Nara Sobreira, Dr. Joon-Yong An, Dr. Hee Jeong Yoo, Dr. Maria Chahrour, and Dr. Bryn Dionna Webb for searching your databases for additional individuals with 9p-related syndromes. The following cell lines/DNA samples were obtained from the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research: GM00705, GM00870, GM01388, GM01750, GM01751, GM02820, GM03226, GM03369, GM03563, GM05067, GM05347, GM05394, GM09132, GM10994, GM11520, GM12722. Thank you to Eunice Horton at the Coriell Institute for Medical Research for help with obtaining the DNA for these Coriell samples. Thank you to the New York Genome Center for generating the SSC and SPARK whole-genome sequencing data through the CCDG National Institutes of Health grant (UM1HG008901). We are grateful to all of the families at the participating Simons Simplex Collection (SSC) sites, as well as the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, R. Goin-Kochel, E. Hanson, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren, E. Wijsman). We appreciate obtaining access to SSC whole-genome sequencing data on SFARI Base. Approved researchers can obtain the SSC population dataset described in this study by applying at https://base.sfari.org. We are grateful to all the families in SPARK, the SPARK clinical sites and SPARK staff. We appreciate obtaining access to SPARK whole-genome sequencing data on SFARI Base. Approved researchers can obtain the SPARK population dataset described in this study by applying at https://base.sfari.org. Thank you to Dr. Tristan Hayeck and Dr. Andrew Allen for previously developing and publishing fitDNM. Thank you to The Jackson Laboratory Model Generation Services and Breeding Services for creating and breeding the mice, and The Jackson Laboratory Preclinical Services for harvesting and processing the mouse samples.

Abbreviations

9P-ARCH

Advanced Research in Chromosomal Health: Genomic, Phenotypic, and Functional Aspects of 9p-Related syndromes

BAF

Brahma-associated factor

CADD

Combined annotation-dependent depletion

CCDG

Centers for Common Disease Genomics

CLR

Complete Long Read

CNPI

Copy Number Private Investigator

CNV

Copy number variant

CRE

Cis-regulatory element

DNA

Deoxyribonucleic acid

DNV

De novo variant

E13.5

Embryonic day 13.5

EGP

El genoma pequeño

ENCODE

The Encyclopedia of DNA Elements

FISH

Fluorescent in situ hybridization

GTG banding

G-banding using trypsin and Giemsa

GTW banding

G-banding using trypsin and Wright

HAT

Hare And Tortoise

HDF5

Hierarchical Data Format version 5

Indel

Insertion/deletion

IRB

Institutional Review Board

Lof

Loss of function

LRR

Late-replicating regions

MEGA

Molecular Evolutionary Genetics Analysis

mis

Missense

MUSCLE

Multiple sequence comparison by log-expectation

NDD

Neurodevelopmental disorder

NIGMS

National Institute of General Medical Sciences

PBMC

Peripheral blood mononuclear cell

PBS

Phosphate-buffered saline

PCA

Principal component analysis

PFA

Paraformaldehyde

Prot

Protein-altering

QC

Quality check

QQ

Quantile–quantile

RNA

Ribonucleic acid

SNP

Single-nucleotide polymorphism

SNV

Single-nucleotide variant

SPARK

Simons Foundation Powering Autism Research

SSC

Simons Simplex Collection

STR

Short-tandem repeat

Syn

Synonymous

UCSC

University of California, Santa Cruz

UMI

Unique molecular identifiers

UTR

Untranslated region

VCF

Variant call file

WashU

Washington University in St. Louis

WGS

Whole-genome sequencing

Authors’ contributions

Y.W.: Formal analysis, Investigation, Writing – Review & Editing, Visualization. E.I.S.: Formal analysis, Investigation, Writing – Review & Editing, Visualization. R.S.: Formal analysis, Investigation, Writing – Review & Editing, Visualization. S.C.: Formal analysis, Investigation, Writing – Review & Editing, Visualization. E.C.H.: Formal analysis. S.T.: Investigation. Y.C.H.: Investigation. C.M.: Investigation. K.V.: Investigation. V.T.: Investigation. K.B.: Investigation. E.A.: Investigation. T.A.: Investigation. R.S.T.: Investigation. Y.C.: Investigation. A.N.: Investigation. Y.L.: Formal Analysis. N.J.: Investigation. R.G.: Investigation. T.L.: Investigation. J.M.: Investigation. S.C.: Investigation. M.K.: Investigation. J.U.: Formal Analysis. T.A.: Investigation. J.K.N.: Formal Analysis. A.E.: Investigation. L.M.: Investigation. T.D.: Investigation. K.N.L.: Formal Analysis. T.S.: Investigation. B.T.: Investigation. K.W.: Formal Analysis. N.S.: Investigation. M.M.: Investigation, Formal Analysis. A.K.: Investigation, Formal Analysis. M.C.: Investigation, Formal Analysis. H.G.: Investigation, Formal Analysis. K.K.: Resources. C.G.: Consultant, Writing – Review & Editing. S.K.D.: Consultant. C.G.: Resources. Y.E.L.: Resources, Data Curation. M.W.M.: Resources, Investigation, Writing – Review & Editing. K.A.P.: Resources, Investigation. A.H.: Investigation. J.A.R.: Resources, Investigation, Writing – Review & Editing. W.B.: Investigation, Resources. P.S.: Investigation, Resources. H.C.: Investigation, Resources. J.P.: Investigation, Resources. C.M.G.: Investigation, Resources. Z.D.: Investigation, Resources. E.P.: Investigation, Resources. C.E.P.: Consultant, Writing – Review & Editing. F.K.: Consultant. D.A.: Formal Analysis. A.I.: Investigation, Resources. M.H.: Investigation. R.H.: Investigation. R.F.: Investigation. S.T.: Consultant, Resources. L.A.: Investigation. X.C.: Investigation. R.D.M.: Investigation, Writing – Review & Editing. F.S.C.: Investigation. J.N.: Methodology, Validation, Formal Analysis, Investigation, Resources, Data Curation, Writing – Review & Editing, Visualization, Supervision, Project Administration P.I.D.: Methodology, Validation, Formal Analysis, Investigation, Resources, Data Curation, Writing – Review & Editing, Visualization, Supervision, Project Administration J.M.: Methodology, Validation, Formal Analysis, Investigation, Resources, Data Curation, Writing – Review & Editing, Visualization, Supervision, Project Administration, Funding Acquisition. T.N.T.: Conceptualization, Methodology, Software, Validation, Formal Analysis, Investigation, Resources, Data Curation, Writing – Original Draft, Visualization, Supervision, Project Administration, Funding Acquisition. All authors read and approved the final manuscript.

Funding

This work was supported by grants from the National Institutes of Health (R00MH117165 to T. N. T., R01MH126933 to T. N. T., R03HD116062 to T. N. T., P50HD103525 to T. N. T., UM1HG008901 for sequencing of the SSC and SPARK whole-genome sequencing data at the New York Genome Center), the Washington University in St. Louis Just-In-Time Core Usage Funding Program (JIT896 to T. N. T., JIT1174 to T. N. T.), the Simons Foundation (734069 to T. N. T.), Washington University in St. Louis laboratory startup funds to T. N. T., and a gift from Leon Eidelman and Sara Israel to the McDonnell Genome Institute.

Data availability

Code for DiamondsDenovo is available at https://github.com/TNTurnerLab/DiamondsDenovo. Denovo-db version 1.8 is available at DOI 10.5281/zenodo.13901295 and https://zenodo.org/records/13901296. Sequencing data for a subset of samples are available through dbGaP (accession: phs004000.v1.p1, https://dbgap.ncbi.nlm.nih.gov/study/phs004000.v1.p1/). Data supporting this study are kept in an institutional repository and are available from the corresponding author upon reasonable request.

Declarations

Ethics approval and consent to participate

Each individual was consented into the study using an age-appropriate form of consent/assent and in accordance with the Human Research Protection Office at Washington University in St. Louis (IRB no. 201706062). All research was conducted in accordance with the Declaration of Helsinki. The following cell lines/DNA samples were obtained from the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research: GM00705, GM00870, GM01388, GM01750, GM01751, GM02820, GM03226, GM03369, GM03563, GM05067, GM05347, GM05394, GM09132, GM10994, GM11520, and GM12722.

Consent for publication

Each individual was consented to the study using an age-appropriate form of consent/assent and in accordance with the Human Research Protection Office at Washington University in St. Louis (IRB no. 201706062). All participants gave written informed consent to have data published.

Competing interests

The Department of Molecular and Human Genetics at Baylor College of Medicine receives revenue from clinical genetic testing conducted at Baylor Genetics Laboratory. N.S., M.M., A.K., M.C., and H.G. are employees of Medgenome Laboratory and K.W. is an employee of Illumina. The affiliation of authors with these entities did not influence the design of the study, data collection, analysis, or interpretation of results. All findings and conclusions presented in this study are solely the responsibility of the authors and do not reflect the views or interests of these affiliations. The remaining authors declare that they do not have conflicting interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Yingxi Wang, Eleanor I. Sams, Rachel Slaugh, and Sandra Crocker are co-first authors.

Julie Neidich, Patricia I. Dickson, Jeffrey Milbrandt, and Tychele N. Turner are senior authors.

References

  • 1.Sams EI, Ng JK, Tate V, Claire Hou YC, Cao Y, Antonacci-Fulton L, et al. From karyotypes to precision genomics in 9p deletion and duplication syndromes. HGG Adv. 2022;3(1):100081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Swinkels ME, Simons A, Smeets DF, Vissers LE, Veltman JA, Pfundt R, et al. Clinical and cytogenetic characterization of 13 Dutch patients with deletion 9p syndrome: delineation of the critical region for a consensus phenotype. Am J Med Genet A. 2008;146a(11):1430–8. [DOI] [PubMed] [Google Scholar]
  • 3.Huret JL, Leonard C, Forestier B, Rethore MO, Lejeune J. Eleven new cases of del(9p) and features from 80 cases. J Med Genet. 1988;25(11):741–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Young RS, Reed T, Hodes ME, Palmer CG. The dermatoglyphic and clinical features of the 9p trisomy and partial 9p monosomy syndromes. Hum Genet. 1982;62(1):31–9. [DOI] [PubMed] [Google Scholar]
  • 5.Starosta RT, Jensen N, Couteranis S, Slaugh R, Easterlin D, Tate V, et al. Using a new analytic approach for genotyping and phenotyping chromosome 9p deletion syndrome. Eur J Hum Genet. 2024;32(9):1095–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Alfi O, Donnell GN, Crandall BF, Derencsenyi A, Menon R. Deletion of the short arm of chromosome no.9 (46,9p-): a new deletion syndrome. Ann Genet. 1973;16(1):17–22. [PubMed] [Google Scholar]
  • 7.Alfi OS, Donnell GN, Allderdice PW, Derencsenyi A. The 9p- syndrome. Ann Genet. 1976;19(1):11–6. [PubMed] [Google Scholar]
  • 8.Banerjee I, Senniappan S, Laver TW, Caswell R, Zenker M, Mohnike K, et al. Refinement of the critical genomic region for congenital hyperinsulinism in the chromosome 9p deletion syndrome. Wellcome Open Res. 2019;4:149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Christ LA, Crowe CA, Micale MA, Conroy JM, Schwartz S. Chromosome breakage hotspots and delineation of the critical region for the 9p-deletion syndrome. Am J Hum Genet. 1999;65(5):1387–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Faas BH, de Leeuw N, Mieloo H, Bruinenberg J, de Vries BB. Further refinement of the candidate region for monosomy 9p syndrome. Am J Med Genet A. 2007;143a(19):2353–6. [DOI] [PubMed] [Google Scholar]
  • 11.Mitsui N, Shimizu K, Nishimoto H, Mochizuki H, Iida M, Ohashi H. Patient with terminal 9 Mb deletion of chromosome 9p: refining the critical region for 9p monosomy syndrome with trigonocephaly. Congenit Anom (Kyoto). 2013;53(1):49–53. [DOI] [PubMed] [Google Scholar]
  • 12.Mohamed AM, Kamel AK, Eid MM, Eid OM, Mekkawy M, Hussein SH, et al. Chromosome 9p terminal deletion in nine Egyptian patients and narrowing of the critical region for trigonocephaly. Mol Genet Genomic Med. 2021;9:e1829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Schwartz SBS, Christ L, Eichenmiller M, Graf M, Vance H, Crowe C, editor. Delineation of chromosome 9p deletions: a model of phenotype and chromosomal mechanisms for terminal deletions. Rockville: American Society of Human Genetics; 2005.
  • 14.Kawara H, Yamamoto T, Harada N, Yoshiura K, Niikawa N, Nishimura A, et al. Narrowing candidate region for monosomy 9p syndrome to a 4.7-Mb segment at 9p22.2-p23. Am J Med Genet A. 2006;140(4):373–7. [DOI] [PubMed] [Google Scholar]
  • 15.Hauge X, Raca G, Cooper S, May K, Spiro R, Adam M, et al. Detailed characterization of, and clinical correlations in, 10 patients with distal deletions of chromosome 9p. Genet Med. 2008;10(8):599–611. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Barbaro M, Balsamo A, Anderlid BM, Myhre AG, Gennari M, Nicoletti A, et al. Characterization of deletions at 9p affecting the candidate regions for sex reversal and deletion 9p syndrome by MLPA. Eur J Hum Genet. 2009;17(11):1439–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Recalcati MP, Bellini M, Norsa L, Ballarati L, Caselli R, Russo S, et al. Complex rearrangement involving 9p deletion and duplication in a syndromic patient: genotype/phenotype correlation and review of the literature. Gene. 2012;502(1):40–5. [DOI] [PubMed] [Google Scholar]
  • 18.Di Bartolo DL, El Naggar M, Owen R, Sahoo T, Gilbert F, Pulijaal VR, et al. Characterization of a complex rearrangement involving duplication and deletion of 9p in an infant with craniofacial dysmorphism and cardiac anomalies. Mol Cytogenet. 2012;5(1):31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Onesimo R, Orteschi D, Scalzone M, Rossodivita A, Nanni L, Zannoni GF, et al. Chromosome 9p deletion syndrome and sex reversal: novel findings and redefinition of the critically deleted regions. Am J Med Genet A. 2012;158a(9):2266–71. [DOI] [PubMed] [Google Scholar]
  • 20.Helbig J, Kunz J, Mannhardt A, Gandaputra E, Kling C. Distal 1q duplication and distal 9p deletion: a follow-up case report and literature review on candidate genes for 9p deletion syndrome. Am J Med Genet A. 2025;197(8):e64066. [DOI] [PubMed] [Google Scholar]
  • 21.Kowalczyk M, Tomaszewska A, Podbioł-Palenta A, Constantinou M, Wawrzkiewicz-Witkowska A, Kowalski J, et al. Another rare case of a child with de novo terminal 9p deletion and co-existing interstitial 9p duplication: clinical findings and molecular cytogenetic study by array-CGH. Cytogenet Genome Res. 2013;139(1):9–16. [DOI] [PubMed] [Google Scholar]
  • 22.Ng J, Sams E, Baldridge D, Kremitzki M, Wegner DJ, Lindsay T, et al. Precise breakpoint detection in a patient with 9p- syndrome. Cold Spring Harbor Mol Case Stud. 2020;6(3):a005348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Turner TN, Yi Q, Krumm N, Huddleston J, Hoekzema K, F Stessman HA, Doebley AL, Bernier RA, Nickerson DA, Eichler EE. denovo-db: a compendium of human de novo variants. Nucleic Acids Res. 2017;45(D1):D804–11. 10.1093/nar/gkw865. Epub 5 Oct 2016. PMID: 27907889; PMCID: PMC5210614. [DOI] [PMC free article] [PubMed]
  • 24.Turner TN. denovo-db. Zenodo; 2024. 10.5281/zenodo.13901295. Available from: https://zenodo.org/records/13901296.
  • 25.Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Francioli LC, et al. A structural variation reference for medical and population genetics. Nature. 2020;581(7809):444–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581(7809):434–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Miller SA, Dykes DD, Polesky HF. A simple salting out procedure for extracting DNA from human nucleated cells. Nucleic Acids Res. 1988;16(3):1215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Zhang Y, Zhang Y, Burke JM, Gleitsman K, Friedrich SM, Liu KJ, et al. A simple thermoplastic substrate containing hierarchical silica lamellae for high-molecular-weight DNA extraction. Adv Mater. 2016;28(48):10630–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wang Y, Turner TN. Assessment of complex chromosomal changes in de-identified cell lines. dbGaP; 2025. Available from: https://dbgap.ncbi.nlm.nih.gov/study/phs004000.v1.p1/. Accession: phs004000.v1.p1.
  • 30.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv e-prints. 2013. Available from: https://ui.adsabs.harvard.edu/abs/2013arXiv1303.3997L.
  • 31.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Picard. https://broadinstitute.github.io/picard/.
  • 33.Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Källberg M, et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2015;32(8):1220–2. [DOI] [PubMed] [Google Scholar]
  • 34.Turner TN. EGP. 2024.
  • 35.Shen F, Kidd JM. Rapid, paralog-sensitive CNV analysis of 2457 human genomes using QuicK-mer2. Genes (Basel). 2020;11(2):141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Poplin R, Chang PC, Alexander D, Schwartz S, Colthurst T, Ku A, et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018;36(10):983–7. [DOI] [PubMed] [Google Scholar]
  • 37.Dolzhenko E, Deshpande V, Schlesinger F, Krusche P, Petrovski R, Chen S, et al. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics. 2019;35(22):4754–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Mousavi N, Shleizer-Burko S, Yanicky R, Gymrek M. Profiling the genome-wide landscape of tandem repeat expansions. Nucleic Acids Res. 2019;47(15):e90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Ustanik J, Turner TN. CNPI: Rapid Analyses of Human Copy Number Data. J Mol Biol. 2025;437(19):169313. 10.1016/j.jmb.2025.169313. Epub 28 Jun 2025. PMID: 40588120. [DOI] [PubMed]
  • 40.Ng JK, Vats P, Fritz-Waters E, Sarkar S, Sams EI, Padhi EM, Payne ZL, Leonard S, West MA, Prince C, Trani L, Jansen M, Vacek G, Samadi M, Harkins TT, Pohl C, Turner TN. de novo variant calling identifies cancer mutation signatures in the 1000 Genomes Project. Hum Mutat. 2022;43(12):1979–93. 10.1002/humu.24455. Epub 10 Sep 2022. PMID: 36054329; PMCID: PMC9771978. [DOI] [PMC free article] [PubMed]
  • 41.Ng JK, Turner TN. HAT: de novo variant calling for highly accurate short-read and long-read sequencing data. Bioinformatics. 2024;40(1):btad775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Turner TN. Acorn: an R package for de novo variant analysis. BMC Bioinformatics. 2023;24(1):330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, et al. High-coverage whole-genome sequencing of the expanded 1000 genomes project cohort including 602 trios. Cell. 2022;185(18):3426-40.e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008;28(5):1–26.27774042 [Google Scholar]
  • 45.McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The Ensembl variant effect predictor. Genome Biol. 2016;17(1):122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Fischbach GD, Lord C. The simons simplex collection: a resource for identification of autism genetic risk factors. Neuron. 2010;68(2):192–5. [DOI] [PubMed] [Google Scholar]
  • 47.SPARK. SPARK: a US cohort of 50,000 families to accelerate autism research. Neuron. 2018;97(3):488–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Morgulis A, Gertz EM, Schäffer AA, Agarwala R. A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J Comput Biol. 2006;13(5):1028–40. [DOI] [PubMed] [Google Scholar]
  • 49.Kuhn RM, Haussler D, Kent WJ. The UCSC genome browser and associated tools. Brief Bioinform. 2013;14(2):144–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Mannens CCA, Hu L, Lönnerberg P, Schipper M, Reagor CC, Li X, He X, Barker RA, Sundström E, Posthuma D, Linnarsson S. Chromatin accessibility during human first-trimester neurodevelopment. Nature. 2024. 10.1038/s41586-024-07234-1. Epub ahead of print. PMID: 38693260. [DOI] [PubMed]
  • 51.Preissl S, Fang R, Huang H, Zhao Y, Raviram R, Gorkin DU, et al. Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-type-specific transcriptional regulation. Nat Neurosci. 2018;21(3):432–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.ENCODE. The ENCODE (ENCyclopedia of DNA elements) project. Science. 2004;306(5696):636–40. [DOI] [PubMed] [Google Scholar]
  • 53.Kosicki M, Baltoumas FA, Kelman G, Boverhof J, Ong Y, Cook LE, et al. Vista enhancer browser: an updated database of tissue-specific developmental enhancers. Nucleic Acids Res. 2025;53(D1):D324–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Visel A, Minovitsky S, Dubchak I, Pennacchio LA. Vista enhancer browser–a database of tissue-specific human enhancers. Nucleic Acids Res. 2007;35(Database issue):D88-92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Turner TN. DiamondsDenovo. Zenodo; 2025. 10.5281/zenodo.17260099. Available from: https://zenodo.org/records/17260100.
  • 56.Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Edgar RC. Muscle: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5(1):113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, et al. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science. 2002;297(5585):1301–10. [DOI] [PubMed] [Google Scholar]
  • 59.Ware JS, Samocha KE, Homsy J, Daly MJ. Interpreting de novo variation in human disease using denovolyzeR. Curr Protoc Hum Genet. 2015;87(1):7.25.1-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Jiang Y, Han Y, Petrovski S, Owzar K, Goldstein DB, Allen AS. Incorporating functional information in tests of excess de novo mutational load. Am J Hum Genet. 2015;97(2):272–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Padhi EM, Hayeck TJ, Cheng Z, Chatterjee S, Mannion BJ, Byrska-Bishop M, et al. Coding and noncoding variants in EBF3 are involved in HADDS and simplex autism. Hum Genomics. 2021;15(1):44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Motenko H, Neuhauser SB, O’Keefe M, Richardson JE. MouseMine: a new data warehouse for MGI. Mamm Genome. 2015;26(7–8):325–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Kloosterman WP, Guryev V, van Roosmalen M, Duran KJ, de Bruijn E, Bakker SC, et al. Chromothripsis as a mechanism driving complex de novo structural rearrangements in the germline. Hum Mol Genet. 2011;20(10):1916–24. [DOI] [PubMed] [Google Scholar]
  • 64.Leibowitz ML, Zhang CZ, Pellman D. Chromothripsis: a new mechanism for rapid karyotype evolution. Annu Rev Genet. 2015;49:183–211. [DOI] [PubMed] [Google Scholar]
  • 65.Poot M. Genes, proteins, and biological pathways preventing chromothripsis. Methods Mol Biol. 2018;1769:231–51. [DOI] [PubMed] [Google Scholar]
  • 66.Hansen RS, Thomas S, Sandstrom R, Canfield TK, Thurman RE, Weaver M, et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc Natl Acad Sci U S A. 2010;107(1):139–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Kumar R, Nagpal G, Kumar V, Usmani SS, Agrawal P, Raghava GPS. Humcfs: a database of fragile sites in human chromosomes. BMC Genomics. 2019;19(Suppl 9):985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Sawińska M, Schmitt JG, Sagulenko E, Westermann F, Schwab M, Savelyeva L. Novel aphidicolin-inducible common fragile site FRA9G maps to 9p22.2, within the C9orf39 gene. Genes Chromosomes Cancer. 2007;46(11):991–9. [DOI] [PubMed] [Google Scholar]
  • 69.Sutherland GR, Parslow MI, Baker E. New classes of common fragile sites induced by 5-azacytidine and bromodeoxyuridine. Hum Genet. 1985;69(3):233–7. [DOI] [PubMed] [Google Scholar]
  • 70.Mirceta M, Schmidt MHM, Shum N, Prasolava TK, Meikle B, Lanni S, et al. C9orf72 repeat expansion creates the unstable folate-sensitive fragile site FRA9A. NAR Mol Med. 2024;1(4):ugae019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Rocchi A, Pelliccia F. Synergistic effect of DAPI and thymidylate stress conditions on the induction of common fragile sites. Cytogenet Cell Genet. 1988;48(1):51–4. [DOI] [PubMed] [Google Scholar]
  • 72.Wei PC, Lee CS, Du Z, Schwer B, Zhang Y, Kao J, et al. Three classes of recurrent DNA break clusters in brain progenitors identified by 3D proximity-based break joining assay. Proc Natl Acad Sci U S A. 2018;115(8):1919–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Korbel JO, Campbell PJ. Criteria for inference of chromothripsis in cancer genomes. Cell. 2013;152(6):1226–36. [DOI] [PubMed] [Google Scholar]
  • 74.Kinsella M, Patel A, Bafna V. The elusive evidence for chromothripsis. Nucleic Acids Res. 2014;42(13):8231–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Krupina K, Goginashvili A, Cleveland DW. Scrambling the genome in cancer: causes and consequences of complex chromosome rearrangements. Nat Rev Genet. 2024;25(3):196–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Leibowitz ML, Papathanasiou S, Doerfler PA, Blaine LJ, Sun L, Yao Y, et al. Chromothripsis as an on-target consequence of CRISPR-Cas9 genome editing. Nat Genet. 2021;53(6):895–905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Arya P, Hodge JC, Matlock PA, Vance GH, Breman AM. Two patients with complex rearrangements suggestive of germline chromoanagenesis. Cytogenet Genome Res. 2020;160(11–12):671–9. [DOI] [PubMed] [Google Scholar]
  • 78.Zepeda-Mendoza CJ, Morton CC. The iceberg under water: unexplored complexity of chromoanagenesis in congenital disorders. Am J Hum Genet. 2019;104(4):565–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Storchová Z, Kloosterman WP. The genomic characteristics and cellular origin of chromothripsis. Curr Opin Cell Biol. 2016;40:106–13. [DOI] [PubMed] [Google Scholar]
  • 80.Kloosterman WP, Cuppen E. Chromothripsis in congenital disorders and cancer: similarities and differences. Curr Opin Cell Biol. 2013;25(3):341–8. [DOI] [PubMed] [Google Scholar]
  • 81.Dimitri P, De Franco E, Habeb AM, Gurbuz F, Moussa K, Taha D, et al. An emerging, recognizable facial phenotype in association with mutations in GLI-similar 3 (GLIS3). Am J Med Genet A. 2016;170(7):1918–23. [DOI] [PubMed] [Google Scholar]
  • 82.Hall AN, Turner TN, Queitsch C. Thousands of high-quality sequencing samples fail to show meaningful correlation between 5S and 45S ribosomal DNA arrays in humans. Sci Rep. 2021;11(1):449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Sobreira N, Schiettecatte F, Valle D, Hamosh A. GeneMatcher: a matching tool for connecting investigators with an interest in the same gene. Hum Mutat. 2015;36(10):928–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Brownstein CA, Smith RS, Rodan LH, Gorman MP, Hojlo MA, Garvey EA, et al. RCL1 copy number variants are associated with a range of neuropsychiatric phenotypes. Mol Psychiatry. 2021;26(5):1706–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Amin N, de Vrij FMS, Baghdadi M, Brouwer RWW, van Rooij JGJ, Jovanova O, et al. A rare missense variant in RCL1 segregates with depression in extended families. Mol Psychiatry. 2018;23(5):1120–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Wang Y, Zhao Z, Yu H, Shi H, Tao B, He Y, et al. Stability and function of RCL1 are dependent on the interaction with BMS1. J Mol Cell Biol. 2024;15(7):mjad046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Kang HS, Beak JY, Kim YS, Herbert R, Jetten AM. Glis3 is associated with primary cilia and Wwtr1/TAZ and implicated in polycystic kidney disease. Mol Cell Biol. 2009;29(10):2556–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Myles-Worsley M, Tiobech J, Browning SR, Korn J, Goodman S, Gentile K, et al. Deletion at the SLC1A1 glutamate transporter gene co-segregates with schizophrenia and bipolar schizoaffective disorder in a 5-generation family. Am J Med Genet B Neuropsychiatr Genet. 2013;162b(2):87–95. [DOI] [PubMed] [Google Scholar]
  • 89.Rees E, Walters JT, Chambert KD, O’Dushlaine C, Szatkiewicz J, Richards AL, et al. CNV analysis in a large schizophrenia sample implicates deletions at 16p12.1 and SLC1A1 and duplications at 1p36.33 and CGNL1. Hum Mol Genet. 2014;23(6):1669–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Bailey CG, Ryan RM, Thoeng AD, Ng C, King K, Vanslambrouck JM, et al. Loss-of-function mutations in the glutamate transporter SLC1A1 cause human dicarboxylic aminoaciduria. J Clin Invest. 2011;121(1):446–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Stewart SE, Fagerness JA, Platko J, Smoller JW, Scharf JM, Illmann C, et al. Association of the SLC1A1 glutamate transporter gene and obsessive-compulsive disorder. Am J Med Genet B Neuropsychiatr Genet. 2007;144b(8):1027–33. [DOI] [PubMed] [Google Scholar]
  • 92.Karan KR, Satishchandra P, Sinha S, Anand A. Rare SLC1A1 variants in hot water epilepsy. Hum Genet. 2017;136(6):693–703. [DOI] [PubMed] [Google Scholar]
  • 93.Szatmari P, Paterson AD, Zwaigenbaum L, Roberts W, Brian J, Liu XQ, et al. Mapping autism risk loci using genetic linkage and chromosomal rearrangements. Nat Genet. 2007;39(3):319–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Aoyama K, Suh SW, Hamby AM, Liu J, Chan WY, Chen Y, et al. Neuronal glutathione deficiency and age-dependent neurodegeneration in the EAAC1 deficient mouse. Nat Neurosci. 2006;9(1):119–26. [DOI] [PubMed] [Google Scholar]
  • 95.Berman AE, Chan WY, Brennan AM, Reyes RC, Adler BL, Suh SW, et al. N-acetylcysteine prevents loss of dopaminergic neurons in the EAAC1-/- mouse. Ann Neurol. 2011;69(3):509–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Kantojärvi K, Onkamo P, Vanhala R, Alen R, Hedman M, Sajantila A, et al. Analysis of 9p24 and 11p12-13 regions in autism spectrum disorders: rs1340513 in the JMJD2C gene is associated with ASDs in Finnish sample. Psychiatr Genet. 2010;20(3):102–8. [DOI] [PubMed] [Google Scholar]
  • 97.Berry WL, Janknecht R. KDM4/JMJD2 histone demethylases: epigenetic regulators in cancer cells. Cancer Res. 2013;73(10):2936–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Unlu G, Qi X, Gamazon ER, Melville DB, Patel N, Rushing AR, et al. Phenome-based approach identifies RIC1-linked Mendelian syndrome through zebrafish models, biobank associations and clinical studies. Nat Med. 2020;26(1):98–109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Lansdon LA, Dickinson A, Arlis S, Liu H, Hlas A, Hahn A, et al. Genome-wide analysis of copy-number variation in humans with cleft lip and/or cleft palate identifies COBLL1, RIC1, and ARHGEF38 as clefting genes. Am J Hum Genet. 2023;110(1):71–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Zhang Y, Wu K, Liu Y, Sun S, Shao Y, Li Q, et al. UHRF2 promotes the malignancy of hepatocellular carcinoma by PARP1 mediated autophagy. Cell Signal. 2023;109:110782. [DOI] [PubMed] [Google Scholar]
  • 101.Zhuang Y, Li C, Zhao F, Yan Y, Pan H, Zhan J, et al. E3 ubiquitin ligase Uhrf2 knockout reveals a critical role in social behavior and synaptic plasticity in the hippocampus. Int J Mol Sci. 2024;25(3):1543. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Liu Y, Zhang B, Meng X, Korn MJ, Parent JM, Lu LY, et al. UHRF2 regulates local 5-methylcytosine and suppresses spontaneous seizures. Epigenetics. 2017;12(7):551–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Chen R, Zhang Q, Duan X, York P, Chen GD, Yin P, et al. The 5-hydroxymethylcytosine (5hmC) reader UHRF2 is required for normal levels of 5hmC in mouse adult brain and spatial learning and memory. J Biol Chem. 2017;292(11):4533–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Levine RL, Pardanani A, Tefferi A, Gilliland DG. Role of JAK2 in the pathogenesis and therapy of myeloproliferative disorders. Nat Rev Cancer. 2007;7(9):673–83. [DOI] [PubMed] [Google Scholar]
  • 105.Noma T, Fujisawa K, Yamashiro Y, Shinohara M, Nakazawa A, Gondo T, et al. Structure and expression of human mitochondrial adenylate kinase targeted to the mitochondrial matrix. Biochem J. 2001;358(Pt 1):225–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Rogne P, Dulko-Smith B, Goodman J, Rosselin M, Grundström C, Hedberg C, et al. Structural basis for GTP versus ATP selectivity in the NMP kinase AK3. Biochemistry. 2020;59(38):3570–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Inaguma S, Lasota J, Wang Z, Felisiak-Golabek A, Ikeda H, Miettinen M. Clinicopathologic profile, immunophenotype, and genotype of CD274 (PD-L1)-positive colorectal carcinomas. Mod Pathol. 2017;30(2):278–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Huang G, Wen Q, Zhao Y, Gao Q, Bai Y. NF-κB plays a key role in inducing CD274 expression in human monocytes after lipopolysaccharide treatment. PLoS One. 2013;8(4):e61602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Li L, Tao X, Li Y, Gao Y, Li Q. CDC37L1 acts as a suppressor of migration and proliferation in gastric cancer by down-regulating CDK6. J Cancer. 2021;12(11):3145–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Peak SL, Gracia L, Lora G, Jinwal UK. Hsp90-interacting co-chaperones and their family proteins in tau regulation: introducing a novel role for Cdc37L1. Neuroscience. 2021;453:312–23. [DOI] [PubMed] [Google Scholar]
  • 111.Harris HK, Nakayama T, Lai J, Zhao B, Argyrou N, Gubbels CS, et al. Disruption of RFX family transcription factors causes autism, attention-deficit/hyperactivity disorder, intellectual disability, and dysregulated behavior. Genet Med. 2021;23(6):1028–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Didon L, Zwick RK, Chao IW, Walters MS, Wang R, Hackett NR, et al. RFX3 modulation of FOXJ1 regulation of cilia genes in the human airway epithelium. Respir Res. 2013;14(1):70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.El Zein L, Ait-Lounis A, Morlé L, Thomas J, Chhin B, Spassky N, et al. RFX3 governs growth and beating efficiency of motile cilia in mouse and controls the expression of genes involved in human ciliopathies. J Cell Sci. 2009;122(Pt 17):3180–9. [DOI] [PubMed] [Google Scholar]
  • 114.Bonnafe E, Touka M, AitLounis A, Baas D, Barras E, Ucla C, et al. The transcription factor RFX3 directs nodal cilium development and left-right asymmetry specification. Mol Cell Biol. 2004;24(10):4417–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Benadiba C, Magnani D, Niquille M, Morlé L, Valloton D, Nawabi H, et al. The ciliogenic transcription factor RFX3 regulates early midline distribution of guidepost neurons required for corpus callosum development. PLoS Genet. 2012;8(3):e1002606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Zhang Q, Davis JC, Lamborn IT, Freeman AF, Jing H, Favreau AJ, et al. Combined immunodeficiency associated with DOCK8 mutations. N Engl J Med. 2009;361(21):2046–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Aydin SE, Kilic SS, Aytekin C, Kumar A, Porras O, Kainulainen L, et al. DOCK8 deficiency: clinical and immunological phenotype and treatment options - a review of 136 patients. J Clin Immunol. 2015;35(2):189–98. [DOI] [PubMed] [Google Scholar]
  • 118.Namekata K, Guo X, Kimura A, Arai N, Harada C, Harada T. DOCK8 is expressed in microglia, and it regulates microglial activity during neurodegeneration in murine disease models. J Biol Chem. 2019;294(36):13421–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Kopp A. Dmrt genes in the development and evolution of sexual dimorphism. Trends Genet. 2012;28(4):175–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Kikkawa T, Osumi N. Multiple functions of the Dmrt genes in the development of the central nervous system. Front Neurosci. 2021;15:789583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Lerer I, Sagi M, Meiner V, Cohen T, Zlotogora J, Abeliovich D. Deletion of the ANKRD15 gene at 9p24.3 causes parent-of-origin-dependent inheritance of familial cerebral palsy. Hum Mol Genet. 2005;14(24):3911–20. [DOI] [PubMed] [Google Scholar]
  • 122.Vanzo RJ, Twede H, Ho KS, Prasad A, Martin MM, South ST, et al. Clinical significance of copy number variants involving KANK1 in patients with neurodevelopmental disorders. Eur J Med Genet. 2019;62(1):15–20. [DOI] [PubMed] [Google Scholar]
  • 123.Gee HY, Zhang F, Ashraf S, Kohl S, Sadowski CE, Vega-Warner V, et al. KANK deficiency leads to podocyte dysfunction and nephrotic syndrome. J Clin Invest. 2015;125(6):2375–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Lamb DJ, O'Neill MA. OR04-02 KANK1 Deficiencies Underlie A Subset of Male Congenital Genitourinary Anomalies. J Endocr Soc. 2023;7(Suppl 1):bvad114.1559. 10.1210/jendso/bvad114.1559. PMCID: PMC10554062.
  • 125.Weiss A, Murdoch CC, Edmonds KA, Jordan MR, Monteith AJ, Perera YR, et al. Zn-regulated GTPase metalloprotein activator 1 modulates vertebrate zinc homeostasis. Cell. 2022;185(12):2148-63.e27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Brüggemann TR, Carlo T, Krishnamoorthy N, Duvall MG, Abdulnour RE, Nijmeh J, et al. Mouse phospholipid phosphatase 6 regulates dendritic cell cholesterol, macropinocytosis, and allergen sensitization. iScience. 2022;25(10):105185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Cappuccio G, Sayou C, Tanno PL, Tisserant E, Bruel AL, Kennani SE, et al. De novo SMARCA2 variants clustered outside the helicase domain cause a new recognizable syndrome with intellectual disability and blepharophimosis distinct from Nicolaides-Baraitser syndrome. Genet Med. 2020;22(11):1838–50. [DOI] [PubMed] [Google Scholar]
  • 128.Van Houdt JK, Nowakowska BA, Sousa SB, van Schaik BD, Seuntjens E, Avonce N, et al. Heterozygous missense mutations in SMARCA2 cause Nicolaides-Baraitser syndrome. Nat Genet. 2012;44(4):445–9, s1. [DOI] [PubMed] [Google Scholar]
  • 129.Boycott KM, Flavelle S, Bureau A, Glass HC, Fujiwara TM, Wirrell E, et al. Homozygous deletion of the very low density lipoprotein receptor gene causes autosomal recessive cerebellar hypoplasia with cerebral gyral simplification. Am J Hum Genet. 2005;77(3):477–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Hirota Y, Kubo K, Katayama K, Honda T, Fujino T, Yamamoto TT, et al. Reelin receptors ApoER2 and VLDLR are expressed in distinct spatiotemporal patterns in developing mouse cerebral cortex. J Comp Neurol. 2015;523(3):463–78. [DOI] [PubMed] [Google Scholar]
  • 131.Tiebel O, Oka K, Robinson K, Sullivan M, Martinez J, Nakamuta M, et al. Mouse very low-density lipoprotein receptor (VLDLR): gene structure, tissue-specific expression and dietary and developmental regulation. Atherosclerosis. 1999;145(2):239–51. [DOI] [PubMed] [Google Scholar]
  • 132.Uetani N, Kato K, Ogura H, Mizuno K, Kawano K, Mikoshiba K, et al. Impaired learning with enhanced hippocampal long-term potentiation in PTPdelta-deficient mice. EMBO J. 2000;19(12):2775–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Mishra I, Xie WR, Bournat JC, He Y, Wang C, Silva ES, et al. Protein tyrosine phosphatase receptor δ serves as the orexigenic asprosin receptor. Cell Metab. 2022;34(4):549-63.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

13073_2025_1563_MOESM1_ESM.pdf (5.4MB, pdf)

Additional file 1: Fig S1. Schematic of the 9p-ARCH Study. Fig S2. Scree Plot of PCA Shown in Main Fig. 1. Fig S3. Feature Importance of Random Forest Model. Fig S4. FISH Confirmation of Translocation in 9p.102.p1 Detected by Short-Read WGS. Fig S5. FISH Confirmation of Translocation in 9p.105.p1 Detected by Short-Read WGS. Fig S6. FISH Confirmation of Translocation in 9p.109.p1 Detected by Short-Read WGS. Fig S7. FISH Confirmation of Translocation in 9p.114.p1 Detected by Short-Read. WGS. Fig S8. FISH Confirmation of Translocation in 9p.116.p1 Detected by Short-Read. WGS. Fig S9. FISH Confirmation of Translocation in 9p.119.p1 Detected by Short-Read. WGS. Fig S10. FISH Confirmation of Translocation in 9p.125.p1 Detected by Short-Read. WGS. Fig S11. FISH Confirmation of Translocation in 9p.127.p1 Detected by Short-Read. WGS. Fig S12. FISH Confirmation of Translocation in 9p.128.p1 Detected by Short-Read. WGS. Fig S13. FISH Confirmation of Translocation in 9p.132.p1 Detected by Short-Read. WGS. Fig S14. Chromosome Schematics of the 70 Individuals with 9p Deletion. Syndrome Sequenced in this Study. Fig S15. Repli-Seq Data from ENCODE Across 9p.

13073_2025_1563_MOESM2_ESM.xlsx (13.5MB, xlsx)

Additional file 2: Table S1. Summary of Individuals with 9p-Related Syndrome and Family Members Analyzed by Whole Genome Sequencing. Table S2. Summary Information for the 9P-ARCH Cohort Collection at Washington University in St. Louis with 9p-Related Syndromes and Individuals from the NIGMS with 9p-Related Syndromes. Table S3. Summary Information for the 9p-ARCH External Data on Individuals with 9p-Related Syndromes. Table S4. Coverage of Illumina WGS Data for Individuals Sequenced in this Study. Table S5. CNPI Calculations on Illumina WGS Data for Individuals Sequenced in this Study. Table S6. Mitochondrial Genome Distance Calculated with EGP on Illumina WGS Data for Individuals Sequenced in this Study. Table S7. Calculated Gene Dosage on 9p in WashU 9p-Related Syndromes Samples Sequenced with Illumina WGS in this Study. Table S8. Calculated Gene Dosage on 9p in Coriell 9p-Related Syndromes Sequenced with Illumina WGS in this Study. Table S9. Calculated Gene Dosage on 9p in 1000 Genomes Project Samples Sequenced with Illumina WGS. Table S10. Final 9p-Related Events in WashU 9p Cohort Based on Illumina Short-Read Whole-Genome Sequencing. Table S11. Candidate Translocation Events Detected by Short-Read WGS and Tested by FISH. Table S12. Breakpoint Sequences and Repeat Information in 9p.123.p1. Table S13. De Novo Variants detected with HAT in Illumina WGS Data for Trios Sequenced in this Study. Table S14. Comparison of 9p Copy Numberfor Genes on 9p in WashU 9p and 1000 Genomes WGS Data. Table S15. Genes that Do Not Have Constraint or Probability of Haploinsufficiency in GnomAD or Published Studies. Table S16. VCF file of denovo-db 1.8 variants on 9p in 72,794 individualsor overlap phenotype with the present study. Table S17. Diamonds 5' UTR Results Using Chimp on 62,181 individuals in denovo-db 1.8. Table S18. Diamonds 5' UTR Results Using Fugu on 62,181 individuals in denovo-db 1.8. Table S19. fitDNM DNV Enrichment in 5' UTR on 62,181 individuals in denovo-db 1.8. Table S20. Diamonds 3' UTR Results Using Chimp with Blue Highlight Multi-Test Correction Significant Regions on 62,181 individuals in denovo-db 1.8. Table S21. Diamonds 3' UTR Results Using Fugu with Blue Highlight Multi-Test Correction Significant Regions on 62,181 individuals in denovo-db 1.8. Table S22. fitDNM DNV Enrichment in 3' UTR on 62,181 individuals in denovo-db 1.8. Table S23. Diamonds Promoter Results Using Chimp with Blue Highlight Multi-Test Correction Significant Regions on 62,181 individuals in denovo-db 1.8. Table S24. Diamonds Promoter Results Using Fugu with Blue Highlight Multi-Test Correction Significant Regions on 62,181 individuals in denovo-db 1.8. Table S25. fitDNM DNV Enrichment in Promoters on 62,181 individuals indenovo-db 1.8. Table S26. Diamonds Loss-of-Function and Missense DNVs in Coding Exons Results Using Chimp with Blue Highlight Multi-Test Correction Significant Regions on 62,181 individuals in denovo-db 1.8. Table S27. Diamonds Loss-of-Function and Missense DNVs in Coding Exons Results Using Fugu with Blue Highlight Multi-Test Correction Significant Regions on 62,181 individuals in denovo-db 1.8. Table S28. fitDNM DNV Enrichment in Coding Exons on 62,181 individuals in denovo-db 1.8. Table S29. Diamonds Synonymous DNVs in Coding Exons Results Using Chimp on 62,181 individuals in denovo-db 1.8. Table S30. Diamonds Synonymous DNVs in Coding Exons Results Using Fugu on 62,181 individuals in denovo-db 1.8. Table S31. DenovolyzeR Results for Genes on 9p with Blue Highlight Multi-Test Correction Significant Geneson 62,181 individuals in denovo-db 1.8. Table S32. DNVs Detected with HAT in 5,824 Individuals with Autism on 9p in SPARK and SSC WGS Datasets. Table S33. fitDNM DNV Enrichment in Fetal Brain Candidate Cis-Regulatory Elements on HAT WGS Data from 5,824 with Autism. Table S34. fitDNM DNV Enrichment in All ENCODE Candidate Cis-Regulatory Elements on HAT WGS Data from 5,824 with Autism. Table S35. fitDNM DNV Enrichment in VISTA Candidate Cis-Regulatory Elements on HAT WGS Data from 5,824 with Autism. Table S36. Diamonds DNVs in Fetal Brain Candidate Regulatory Elements Chimp on HAT WGS from 5,824 with Autism. Table S37. Diamonds DNVs in Fetal Brain Candidate Regulatory Elements Fugu on HAT WGS from 5,824 with Autism. Table S38. Diamonds DNVs in All ENCODE Candidate Regulatory Elements Chimp on HAT WGS from 5,824 with Autism. Table S39. Diamonds DNVs in All ENCODE Candidate Regulatory Elements Fugu on HAT WGS from 5,824 with Autism. Table S40. Diamonds DNVs in All ENCODE Candidate Regulatory Elements Chimp on HAT WGS from 5,824 with Autism. Table S41. Top Priority 9p Genes in 9p Deletion Syndrome Based on Genomic Datasets. Table S42. MouseMine Results for the Top 24 Genes in 9p Deletion Syndrome.

Data Availability Statement

Code for DiamondsDenovo is available at https://github.com/TNTurnerLab/DiamondsDenovo. Denovo-db version 1.8 is available at DOI 10.5281/zenodo.13901295 and https://zenodo.org/records/13901296. Sequencing data for a subset of samples are available through dbGaP (accession: phs004000.v1.p1, https://dbgap.ncbi.nlm.nih.gov/study/phs004000.v1.p1/). Data supporting this study are kept in an institutional repository and are available from the corresponding author upon reasonable request.


Articles from Genome Medicine are provided here courtesy of BMC

RESOURCES