Abstract
Childhood apraxia of speech (CAS) is a severe and rare form of speech sound disorder (SSD). CAS is typically sporadic, but may segregate in families with broader speech and language deficits. We hypothesize that genetic changes may be involved in the etiology of CAS. We conduct whole-genome sequencing in 27 families with CAS, 101 individuals in all. We identify 17 genomic regions including 19 unique copy number variants (CNVs). Three variants are shared across families, but the rest are unique; three events are de novo. In four families, siblings with milder phenotypes co-inherited the same CNVs, demonstrating variable expressivity. We independently validate eight CNVs using microarray technology and find many of these CNVs are present in children with milder forms of SSD. Bioinformatic investigation reveal four CNVs with substantial functional consequences (cytobands 2q24.3, 6p12.3-6p12.2, 11q23.2-11q23.3, and 16p11.2). These discoveries show that CNVs are a heterogeneous, but prevalent, cause of CAS.
Subject terms: Risk factors, Next-generation sequencing
Copy number variations identified in families with childhood apraxia of speech and other speech sound disorders provide a deeper understanding of the genetic basis for naturally acquired speech.
Introduction
Human speech is a complex phenomenon engaging >72 muscles with synchronized control, modulated by the nervous system. Speech, the most common form of human communication, is naturally acquired by babies and toddlers during development. Defined developmental language milestones link acquisition of individual speech sounds to connected, mature speech, and eventually, literacy (see Fig. 1). In some children this natural cascade of developmental events is disrupted leading to communication disorders, a heterogeneous grouping of conditions that are “an impairment in the ability to receive, send, process and comprehend concepts of verbal, nonverbal and graphic symbol systems”1.
Fig. 1. Model of speech and language development.
The model illustrates the relationship between speech and language development and reading acquisition. As depicted in the model, speech and oral language precede literacy acquisition. The green boxes represent stages in speech sound development3, where acquisition of some speech sounds are earlier than others. Preliteracy skills develop during preschool, including phonemic awareness and letter-sound correspondences that draw from early speech sound development. Linguistic skills (in the light blue boxes) are shared by spoken and written language. Literacy development is represented by the dark blue and gray boxes. Double arrows represent the reciprocal relationship for spoken and written language. Based on a simple view of reading113, there are two components to skilled reading, decoding (dark blue) and comprehension (gray).
Communication disorders are highly prevalent in the United States, with approximately one in twelve children ages 3-17 years in the population exhibiting any disorder2. The most common types of difficulty are either a speech (5%)2 and/or a language problem (3.3%)2, present in children both with or without intellectual disabilities. While early work showed that the neural basis of speech and language lies in two critical regions of the left hemisphere of the brain, Wernicke’s area and Broca’s area, recent advances have shown that many other neural pathways as well as other regions of the brain are also involved3–5. Therefore, babies and children with cognitive deficits and delays due to brain malformations and other frank developmental problems and are often nonverbal, as the basic neural systems for speech are disrupted. After omitting children with intellectual disabilities, cleft palate and hearing loss, a subtler class of deficits emerge in a broad grouping of children with speech sound disorder (SSD), who have normal IQ.
SSD include errors of articulation or phonetic structure6 (i.e., errors due to poor motor abilities associated with the production of speech-sounds) and phonological errors (i.e., errors in applying linguistic rules to combine sounds to form words). SSD are highly prevalent in preschool children, approximately 16% at three years of age7, with an estimated 3.8% of children continuing to present with speech delay at six years of age8. One of the most severe forms of SSD, excluding those with known developmental syndromes, is childhood apraxia of speech (CAS). The American Speech-Language-Hearing Association (ASHA) defined CAS as a “…speech disorder in which the precision and consistency of movements underlying speech are impaired in the absence of neuromuscular deficits (e.g., abnormal reflexes, abnormal tone)”1,9. CAS shows a range of severity10 and is rare, with a prevalence 1 to 2 per 100011,12. Recent literature13–16 has demonstrated the wide variability of CAS in terms of severity, persistence, and presence of neuropsychatric comorbidities. Along with the observation that CAS is often comorbid with language impairment, CAS in isolation is uncommon.
CAS typically presents in a single child in a family or in families with less severely affected siblings with SSD; large cohorts of this diagnostic entity are not available for genetic analysis. Despite attempts to identify sib pairs and families with CAS, very few such families have been reported to be able to perform heritability calculations, as this method requires that multiple family members be affected (or a quantitative trait be available). The KE family that led to identification of FOXP2 as an important locus for CAS exceptionally had multiple affected family members17–19. This is not the norm and CAS is generally present in the population as a sporadic event, not inherited by multiple siblings/family members in a recessive or dominant manner. In our full cohort of 115 CAS families spanning a 20+ year recruitment time period (data not shown), only two families with a proband and another sib affected with CAS were ever reported.
An alternate genetic model that has gained significant traction in the genetic literature is emergence of de novo mutations as a major cause for several syndromes and diseases20. The de novo mutation model has been applicable because of advancement in whole exome and whole genome sequencing methods, leading to de novo mutations being reported for Kabuki, Schinzel–Giedion, Bohring–Opitz, Baraitser–Winter and Coffin–Siris syndromes20. CNVs were first reported for CAS by Raca et al. 21 and Fedorenko et al. on 16p11.222. Further extending this concept, Morgan et al. 23 recently reviewed a large number of case reports and noted the sporadic nature of CAS. In addition, variable expressivity is known to occur with some genetic variants24–26, leading to more severe consequences in certain individuals.
A number of recent studies of CAS performed whole genome sequencing (WGS) or whole exome sequencing (WES) to identify associated structural, copy number, and rare coding variants16,21,27–33. Other studies examined individuals with a well-characterized deletion on chromosome 16p11.2, and found that the majority of subjects had a highly penetrant and severe form of CAS22,34, with one study showing 77% of individuals with this deletion to have CAS34; albeit other neurodevelopmental conditions are also associated with this deletion. Another recent study identified CNVs in children with CAS, but focused their search on variants ≤ 5 Mb in size35. Infrequently, CAS has been reported in cytogenetic case reports36–38, but most of the cases remain unresolved, and the majority of these children are not referred for genetic testing. A similar approach has been taken in dyslexia and language impairment39. Many of these reports illustrate varying severity and variability in clinical manifestations in children with CAS that carry these copy number variants. Given the considerable phenotypic variability associated with copy number variants like 16p11.240,41, additional clinical characterization of these variants is important for clinical care.
Approximately 1 in 7000 live births produces a cytogenetically visible deletion42 with larger deletions causing more severe consequences. In general, deletions lead to greater morbidity and mortality than duplications43, and larger CNVs are often associated with higher burden of developmental disorders44, but the effect of CNVs on SSD phenotypes has not been well described. As data from whole genome sequence or exome sequencing becomes more widely available, it will be feasible to quantify the penetrance and phenotypic correlations of CNVs that are not associated with particular syndromes, but do lead to subtle phenotypes. Recently this effort has been bolstered by bioinformatic methods that predict the effect of the deletion/duplication based on the gene content, regulatory profiles and size or position of the event on the chromosome using large cohorts45,46.
The Cleveland Family Speech and Reading Study (CFSRS) has accrued a large number of children with CAS, providing us a unique opportunity to identify variants associated with CAS and other communication disorders. We hypothesize that typical clinical caseloads of CAS/SSD seen by speech and language pathologists are genetically heterogeneous in pathology, although they likely share common pathways that perturb normal speech and language phenotypes. Prior studies have discovered some structural variants associated with CAS16,21,27–33, but we posit that many more remain to be discovered as the evidence thus far suggests that rare variants predominate in CAS. One possibility is that some CAS cases are actually milder forms of known neurodevelopmental conditions that were not ascertained via routine medical exams because their symptoms did not warrant clinical genetic or neurologist consults. Very likely, pediatricians referred these children to SLPs, OT and PT care following normal care procedures in the US. In a population, individuals who bear the same genetic variant may not express an identical phenotype due to alterations in a range of factors24. The best chance of seeing common phenotypes with the same genetic variant is in a family, particularly siblings, as they frequently share background genetics (i.e. the rest of the genome) as well as exposures to similar demographic and lifestyle factors. Many neuordevelopmental genes display considerable variable expressivity in phenotype, and we hypothesize that some of the variants identified in CAS probands will be shared in children with SSD even if they don’t show as severe a phenotype. Additional discovery efforts in this area are important for a number of reasons: 1) As we hypothesize above, there are likely as yet undiscovered rare genetic variants associated with CAS; 2) To translate these findings to clinical care, extensive clinical characterization of the implications of these variants is needed; and 3) As causal variants are discovered and clinically characterized, future efforts can focus on personalized medicine47,48.
We performed WGS in 27 families containing a child with CAS. Discovery of CNVs was performed using WGS, followed by validation using B-allele frequencies (BAF), and Log R Ratio (LRR) from high dimensional microarray data. We focused on families where multiple children were affected with SSD (but not necessarily CAS), to increase the likelihood of finding CNVs that segregated with these traits. We discovered 19 large copy number variants across ~50% of the CAS probands, most of which were not previously reported. These discoveries could give us greater insight into the mechanisms of this rare disorder and SSD in general.
Material and methods
Subject ascertainment
This study was approved by the Institutional Review Board of Case Medical Center and University Hospitals. All parents provided written informed consent for their children to participate and children older than five years provided assent. All ethical regulations relevant to human research participants were followed. Families were ascertained through a proband identified from caseloads of speech-language pathologists (SLPs) in the Greater Cleveland area and referred to this study. Initial binary trait classifications were based on parental report of a diagnosis from a referring SLP. These diagnoses were confirmed within our study by direct testing by an SLP, using an extensive battery of standardized communication measures (Fig. 1, Supplementary Fig. 1) as well as noting specific diagnostic criteria of CAS from recordings of conversational speech, as described below (encapsulated in Supplementary Table 1).
All probands in the overall study met inclusion criteria based on direct assessment and information provided by a parent in an interview, or via questionnaire, including: a score ≤ 16th percentile on the Goldman-Fristoe Test of Articulation49 upon entry into therapy, at least three phonological process errors, normal hearing and fewer than six middle ear infections prior to age three, intact oral structures, monolingual English speaker, no neurological diagnosis such as autism or developmental delays other than speech and language, performance IQ ≥ 80 on the Wechsler Preschool and Primary Scale of Intelligence (WPPSI)50, and a diagnosis of a SSD, or suspected CAS, by a local SLP.
Detailed CAS Phenotyping
Diagnostic criteria for CAS were based on the speech characteristics reported in the literature at the time of diagnosis or the diagnostic criteria described in the CAS Technical Report of the American Speech Language and Hearing Association51 (Supplementary Table 1). While children could be directly tested because of the age-appropriate nature of these communication measures, the majority of SSD resolves by adulthood and self-report lacks reliability. Thus, we did not include parent historical self-report of SSD in our data because our SLPs could not confirm the diagnosis. We examined measures that were appropriate to the age of the child (Supplementary Fig. 1, Supplementary Table 2) covering the domains of oral motor control, speech, language, and literacy. In sum, from the CFSRS52–55, we examined 101 individuals from 27 CAS families who had both DNA for WGS and endophenotype data available (Table 1). We also examined whether the child demonstrated persistent speech errors after age 8.5 years13.
Table 1.
Sample characteristics
Total sample size | 101 | |
Total number of CAS families | 27 | |
Parents not directly tested | 43 | |
Total sample | Children only | |
Sample size | 101 | 58 |
Sex N (proportion Female) | 37 (0.37) | 15 (0.26) |
Binary Traits N (proportion) | ||
CAS | 28 (0.28) | 28 (0.48) |
SSD + LI | 8 (0.08) | 6 (0.10) |
SSD only | 25 (0.25) | 11 (0.19) |
LI only | 5 (0.05) | 5 (0.09) |
Unaffected | 35 (0.35) | 8 (0.14) |
Other phenotype data
Language impairment (LI), recently described as Developmental Language Disorder (DLD)56, was also reported based on diagnosis by an SLP, and confirmed by a score < 1 SD below the mean on the Clinical Evaluation of Language Fundamentals (CELF-3) (Supplementary Table 2). In addition to classifying children in the study according to the presence or absence of CAS, SSD, and LI as described above, we also conducted a wide range of age-appropriate assessments of articulation, phonological awareness, speeded naming, vocabulary, receptive and expressive language, and literacy (Supplementary Table 2, Supplementary Fig. 1), as in our previous work10,52,53,57–59. These quantitative endophenotypes were used to characterize the deficits in children identified with CNVs, as described below. Additional clinical manifestations (issues with feeding, delayed language onset, gross/fine motor incoordination, and ADHD) were recorded in the database based on parent interview. In addition, we characterized severity of CAS based on membership in clusters identified through our earlier analysis10; in that analysis, we utilized measures of articulation, vocabulary, and reading to group children with CAS into subgroups of varying severity. Cluster membership is indicated in Supplementary Table 3.
WGS methods
DNA was extracted from buffy coats or saliva samples as previously described60. Purified DNA was sent to the Broad Institute or the University of Michigan for sequencing on the Illumina HiSeq platform, where DNA was processed into Illumina sequencing libraries using the TruSeq DNA PCR-Free prep kit. DNA was sheared using a Covaris ultrasonicator to approximately 300-500 base pairs. Sheared DNA fragments were blunt end repaired and the 3’ ends were adenylated. Illumina indexed paired-end adapters were ligated to the fragments, purified, and pooled for sequencing. Library preparation quality and fragment size was assessed using an Agilent Bioanalyzer.
Raw sequence data were transferred to the High Performance Computing (HPC) cluster at Case Western Reserve University (CWRU) for analysis. The reads were trimmed for quality (phred score > 20) as well as any residual Illumina adapter sequences at end of reads using TrimGalore!61. Reads that passed quality control were then aligned to the human reference genome (GRCh19) using bwa-mem (v0.7.17) [https://bio-bwa.sourceforge.net]. The raw sequence data for part of the dataset were not available to realign to the new reference (GRCh38). All reads were aligned to GRCh19 due to the limitations of the legacy data. Aligned reads were reported by bwa and formatted to BAM and processed as individual sample files using the GATK (v4) pipeline using “Best Practice” guidelines62. Assessment of the average coverage across the genome was done using Mosdepth on the aligned BAM files63. The average coverage was 33X across all samples (17.2 - 49.1X). Included in the GATK process; deduplication using Picard tools, local realignment using GATK:RealignerTargetCreator and IndelRealigner and recalibration for quality scores using GATK:BaseRecalibrator. Optimized BAM files were used as input into GenomeSTRIP64 (v2.0) and processed using SVPreprocess followed by CNVDiscovery and SVDiscovery. Variants were reported by GenomeSTRIP in vcf format and subsequently filtered and annotated using VCFTools (v0.1.12b) [https://vcftools.sourceforge.net] and ANNOVAR (v2017Jun01) [https://annovar.openbioinformatics.org].
Filtering and cross-validation of copy number variants
We used GenomeSTRIP to discover 10,185 copy number variants across the entire cohort. Two main filters were used to winnow the CNVs for further examination: size of the deletion/duplication and its presence in multiple affected individuals. 5,361 CNVs were identified in at least one proband. We focused on CNVs >50 kb since these could be replicated in our independent microarray dataset, as described below. Filtering for CNVs that spanned >50 kb resulted in 48 CNVs, with 3 CNVs falling into annotated genomic regions with low mappability or poor reference genome assembly. These annotations included regions compiled by ENCODE and 10X Genomics65,66. One additional CNV was found in 77/101 sequenced individuals and was removed from consideration. After merging the remaining 44 CNVs for overlap, we were left with 21 candidate CNVs. These 21 CNV regions were pulled from the BAM files and manually examined using the Integrated Genome Viewer (IGV)67–69. Regions were examined for stretches of coverage showing increased or decreased coverage, and reads spanning deletion or duplication junctions.
We cross-referenced the WGS data with already available microarray data58. CNV characterization was performed using Genome Studio data analysis software, which uses the signals generated from fluorescence intensity of both polymorphic and nonpolymorphic markers, and the cnvPartition Plug-in. The goal of the cnvPartition algorithm is to identify regions of the genome that are aberrant in copy number using two outputs: the LRR, a measure of normalized total signal intensity, and the BAF, a measure of allelic intensity ratios. The expected LRR is 0 for every SNP, while the expected BAF is to equal 0, 0.5, and 170. There is evidence that CNVs are present when samples deviate from the expected LRR and BAF. The copy numbers with confidence scores are generated to produce a map of the CNV regions. Data were loaded into GenomeStudio [www.illumina.com/genomestudio] from the raw. idat files, a sample sheet, the Illumina provided manifest file for the Infinium Omni2.5 BeadChip, and the cluster file. Plots for LRR and BAF were generated for every autosomal SNP calculated for each sample. For CNV calling, a minimum of 3 consecutive SNPs was required with a minimum region size of 1 Mb and confidence >35. All CNVs that were confirmed using both methods (IGV and GenomeStudio) were then examined for segregation with phenotype.
Inheritance of CNVs in families
First, we examined whether CNVs were inherited by determining if the CNV was present in either parent as well as the proband’s siblings. Table 2 shows the presence/absence of the CNV in all individuals in the family who had WGS data. In some cases, parents did not have sufficient DNA available for WGS, but were part of our GWAS study58.
Table 2.
List of families with copy number variations in 17 unique genomic regions
![]() | |
![]() |
If parent not listed, parent was not genotyped (DNA unavailable).
aPredicted duplication in this family, was confirmed in another family.
bPhenotypes determined by direct testing of children. Parents were not directly tested. CAS childhood apraxia of speech, Lang language impairment.
Colors indicate overlapping variant across families (one per each color).
Inheritance pattern: I = inherited from one parent and/or the other, U = unknown (one parent missing DNA), and DN = de novo. Inheritance determined by examination of microarray data.
Second, we considered a CNV to be inherited with the phenotype if it was present in the proband and in siblings with CAS, speech, and/or LI, and if the variant was absent in siblings who were not affected with any of these 3 disorders. There were three variants that were excluded from further consideration based on these criteria (Supplementary Table 4). If the variant was present in children with speech or LI, we closely examined the children’s respective values on speech, language, and literacy measures given as part of the study, in order to further characterize the potential implications of these variants. We considered a child to have a deficit in these quantitative measures (Supplementary Table 2) if they demonstrated a standard score of ˂ 85 on measures of articulation, motor speech ability, receptive and expressive language, vocabulary knowledge, single word decoding, phonological processing, reading comprehension and spelling. Articulation was additionally evaluated by previously recorded audio samples (Supplementary Tables 1, 3, and 5).
Functional annotation
We used multiple sources of information to functionally annotate the deletions and duplications, synthesizing evidence when possible. We examined regulatory disruption, population frequencies, evolutionary constraint, as well as literature reviews and validation in our microarray analysis.
The regulatory disruption score is the sum of the predicted expression scores for each gene affected by the CNV, weighted by the gene’s intolerance metric (Supplementary Methods). A CNV regulatory disruption score annotation was generated using the CNV_FunctionalAnnotation program and methodology45 that relies on a simple linear model to predict expression effects of rare CNVs. The model was developed using brain samples from the Common Mind Consortium (CMC) study and incorporates information about genes contained in, or affected by, the CNV, including exon overlap, promoter overlap, enhancer overlap, topologically-associating domain (TAD)-gene overlap71, CNV length and CNV type. The overall regulatory disruption of a CNV is determined by weighting the predicted expression consequences by the relative deleteriousness of the gene affected and summing across all genes affected. The mean regulatory disruption scores, where normative data are reported in Han et al. 45 were used as a point of reference for predicting the pathogenicity of our CNVs.
As an estimate of population frequency for our CNVs we report the maximum frequency for CNVs observed in the Database of Genomic Variants (DGV)71, focusing on the DGV gold standard variants. The DGV gold standard variants are a curated set of variants from a select number of studies in DGV. DGV CNVs considered showed a 30% or greater overlap with our 266 CNV and were of the same type regarding copy number gain or loss.
As an estimate of evolutionary constraint for the genes affected by our CNVs, we supply gene specific probability of loss-of-function intolerance (pLI) scores72, extracted from University of California at Santa Cruz genome browser tables and the gnomAD database. We report gene and corresponding pLI if the metric was greater than 0.9. The pLI score reflects the tolerance of a given gene to the loss of function on the basis of the number of protein truncating variants, that is, the frameshift, splice donor, splice acceptor, and stop-gained variants referenced for this gene in gnomAD database weighted by the size of the gene and the sequencing coverage72. Lastly, we incorporated data on dosage sensitivity73 from recent data by Collins et al.30 and CNV-ClinViewer46.
Additional details on all annotation methods are provided in the Supplementary Methods, and the annotation described above is provided in Supplementary Data 1 and Table 3.
Table 3.
Detailed summary of CNVs
Variant location (hg19) | Size (bp) | Cytoband | Families with variant | Children carrying variant with persistent CAS | Number of children (probands and siblings) with variant | Replication in our independent SSD sample | LoF intolerant | Likely pathogenic or causal based on gene regulation disruption score | pHaplo(max) | pTriplo(max) | CNV ClinViewer (based on region) | Case reports or support from prior literature for CAS, SSD or other neuro diseases | Frequency |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr11:113059924-115271926 | 2212002 | 11q23.2-11q23.3 (E) | 8139 | 1394 | 1 | X | X | 0.97 | 0.91 | Uncertain | X | NR | |
chr6:49976443-52091444 | 2115001 | 6p12.3-6p12.2 (E) | 33 | 135 | 1 | X | X | 0.99 | 0.99 | Pathogenic | X | NR | |
chr2:163855968-165518209 | 1662241 | 2q24.3 (E) | 8047 | 1124 | 1 | X | X | 0.96 | 0.96 | Uncertain | X | NR | |
chr16:29581236-30197928 | 616692 | 16p11.2 (E) | 12, 25 | 50, 107 | 2 | E | X | X | 0.6 | 0.99 | Pathogenic | X | 0.05% |
chr22:16340236-17295468 | 955232 | 22q11.1 (E) | 8107 | 1302a, 1304a | 3 | 0.16 | 0.49 | Uncertain | 0.03% | ||||
chr2:97735774-97804498 | 68724 | 2q11.2 (E) | 8055 | 1159 | 2 | E | 0.12 | 0.33 | Uncertain | X | 0.53% | ||
chr16:20545972-20596528 | 50556 | 16p12.3 (E) | 8049 | 1128 | 1 | 0.08 | 0.18 | Uncertain | 0.10% | ||||
chr4:3892602-4168684 | 276082 | 4p16.3 (E) | 8060 | 1 | - | - | Uncertain | X | 0.44% | ||||
chr4:189241227-189512452 | 271225 | 4q35.2 (E) | 74 | 410 | 2 | E | - | - | Uncertain | X | 0.02% | ||
chr4:168808644-168993082 | 184438 | 4q32.3 (E) | 8008 | 1 | U | - | - | Uncertain | X | 0.40% | |||
chr8:137680264-137735693 | 55429 | 8q24.23 (E) | 8055 | 1159 | 2 | E | - | - | Benign | 5.42% | |||
chr2:164641142-164646533 | 5391 | 2q24.3 (E) | 8107 | 1302a, 1304a | 3 | - | - | Benign | X | 7.01% | |||
chr12:85939000-86285088 | 346088 | 12q21.31 (U) | 8055 | 1159 | 2 | 0.16 | 0.11 | Uncertain | X | NR | |||
chr10:45208524-45336137 | 127613 | 10q11.21 (U) | 8066 | 1188 | 1 | U | 0.28 | 0.17 | Uncertain | X | 1.11% | ||
chr2:133176490-133188129 | 11639 | 2q21.2 (U) | 8061 | 1174 | 1 | 0.19 | 0.07 | Uncertain | X | 0.05% | |||
chr5:59710570-59769304 | 58734 | 5q12.1 (U) | 33 | 135 | 3 | U | - | - | Uncertain | 0.44% | |||
chr7:17165709-17222842 | 57133 | 7p21.1 (U) | 8130 | 1 | - | - | Uncertain | X | 0.05% | ||||
chr2:132718104-132979733 | 261629 | 2q21.2 (U) | 8055 | 1159 | 1 | E,U | - | - | Benign | X | 0.98% |
E Deletion, U Duplication, * Unknown persistence, NR Not reported in dbGV; - under pHaplo and pTriplo indicates unavailable from Collins paper.
Results
Group characteristics
Our study included 27 families with at least one affected child with CAS, totaling 101 individuals. The average family size was 3.7 individuals, with a range of 1 (singleton) to 6. Affected children presented with CAS (28, including one family with two cases of CAS), as well as other communication disorders (six had SSD + LI, 11 had SSD alone, five had LI alone), eight siblings were unaffected for CAS, SSD, and LI (Table 1). One family was African-American, the rest were Caucasian. Expectedly, the majority of the children were male (74%), as CAS and SSD predominantly affect males.
Children with an early diagnosis of CAS were followed and assessed from preschool to adolescence (Fig. 1). Direct testing of participants at the preschool assessment revealed that scores ˂1 SD below the test mean were observed in the majority of CAS subjects on speech sound measures of single word articulation (75%), multisyllabic real word repetition (93%), multisyllabic nonword repetition (82%), and oral motor function (82%) (Supplementary Table 3). All participants scored 1 SD below the mean on at least one speech sound or motor-sequencing measure. Per parent report, the following developmental difficulties were noted on a parent questionnaire: 14 participants had fine motor difficulties, 10 participants had gross motor difficulties, 10 participants had oral motor/feeding difficulties, and three participants were born prematurely (Supplementary Table 6). Comorbid ADHD was reported for 16 participants, with five of these participants reportedly on medication for ADHD.
Children with CAS were also directly assessed for comorbid LI at preschool and reading disorders (RD) at school-age. Six participants did not demonstrate LI based on test scores. The remaining participants scored < 1 SD below the mean on standardized language measures of receptive language (43%), expressive language (68%), total language composite (46%), receptive vocabulary (25%), and expressive vocabulary (29%). On literacy measures, nine participants did not demonstrate literacy difficulties based on test scores. The remaining participants scored < 1 SD below the mean on standardized measures of single real word decoding (46%), single nonword decoding (39%), reading comprehension (43%), spelling (54%), and measures of phonological awareness (39%) and rapid automatized naming (36%) (Supplementary Table 3).
Identification of CNVs
Using GenomeSTRIP, we identified 5,361 CNVs within the 28 children with CAS from the 27 families. We focused on CNVs that were >50 kb in size, and additionally excluded genomic regions with low mappability or poor reference genome assembly, leaving 48 CNVs. Among these, 19 were validated using both microarray data and IGV manual inspection (Table 2). The deletions/duplications spanned 17 unique genomic regions. Two of the CNVs occurred in different families with different breakpoints, and one of these overlapping CNVs was much shorter (5.kb). The largest CNV identified was ~2.2 Mb. In our replication analysis, we discovered large CNVs (13 deletions and six duplications) present in 14 unrelated families. Some regions did not contain genes, but the 16p11.2 locus borne by two probands encompassed 47 genes (Supplementary Data 1). In total of the 27 probands examined, 14 (51.9%) had at least one CNVs that was associated with CAS and/or other speech sound disorders.
There were three CNVs that were shared across three families. The first, on chromosome 16p11.2 at 29-30 Mb, indicated with green in Table 2, is the aforementioned locus that has been previously observed in CAS. In both cases, this is a de novo deletion, which is consistent with the literature21,34. The child in family 12 had a wealth of deficits, including all speech and language measures, some literacy measures, oral motor issues, fine motor incoordination, and ADHD, that demonstrated persistence beyond age 8.5, and was present in the most severe cluster from our earlier cluster analysis10. The child in family 25 had difficulties with multisyllabic word repetition, expressive and receptive language, reading, fine motor incoordination, ADHD, had persisting problems beyond age 8.5, and was identified as being moderately severe from an earlier cluster analysis10. Second, the deletion on chromosome 2q24.3, indicated with blue in Table 2, is inherited in two families. The child in family 8047 had some difficulties in articulation and literacy, as well as oral motor problems and ADHD, was also persistent, and a member of the high severity cluster10. The child in family 8107 had issues with all speech measures but was too young for the literacy assessment and did not have data on persistence and severity, but did have oral motor problems and delayed language onset. Finally, the duplication on chromosome 2q21.2 is inherited in both families. The child with CAS in family 8055 had difficulties with all speech measures as well as expressive language, expressive vocabulary, and most literacy measures, as well as oral motor problems, delayed language onset, gross motor incoordination, and ADHD, and was also in the most severe cluster10 and had persistent errors after age 8.5. The child with CAS in family 8061 also had difficulties with all speech measures, most language measures, and had delayed language onset and ADHD, as well as persistence after age 8.5, but was in the mildest severity cluster10; this CNV in family 8061 was smaller (~5 kb) than all of the others, but overlapped with the large CNV in family 8055. In children affected with CAS with deletions or duplications, difficulties with speech measures were present in all children, difficulty with language and literacy measures were present in most, and other clinical manifestations varied (Supplementary Table 3 and 6).
Functional implications of CNV regions
We have provided detailed annotation of each of these CNV regions in Supplementary Data 1, including the regulatory disruption score described in the Methods, whether there was potential intolerance due to loss of function in the region according to gnomAD, and links to these regions in Decipher and the Database of Genomic Variants. In addition, Table 3 contains data on dosage sensitivity30,46. There were four regions that showed potentially significant effects on function based on their regulatory disruption score: chromosome 16p11.2, chromosome 11q23.2-11q23.3, chromosome 6p12.3-6p12.2, and chromosome 2q24.3, and all four of these were likely pathogenic based on either their pHaplo scores and/or CNV-ClinViewer (Table 3). The deletion on chromosome 16 has been associated with CAS in several studies21,22,29,34,74 and there is potential loss of function of CORO1A, TAOK2 and MAZ genes within this region (Supplementary Data 1). The deletion on chromosome 11 has been associated with autism, schizophrenia, neocortical development, and other social cognitive behaviors75–78. Moreover, two genes within the chromosome 11 deletion boundaries, ZBTB16 and NCAM1, are loss of function intolerant (Supplementary Data 1). The region on chromosome 6 has been associated with brain-face shape, gross and fine motor control, lack of expressive speech, as well as sleep quality79,80. Finally, the deletion on chromosome 2 has been previously associated with speech development, lack of social interactions, and intellectual disability in a case report81.
Replication via prior microarray data
In order to reproduce our results in an independent cohort, we additionally examined 131 cases of SSD for whom we had high dimensional microarray data but not WGS data, using the same bioinformatic approach we previously applied to validate WGS results. This analysis was limited to regions where there was sufficient marker density, thus excluding the chromosome 2 locus ~164 Mb, and two additional regions. In the extended SSD cohort, we identified 19 additional subjects carrying 8 CNVs in the same physical locations as those described in our discovery cohort (Supplementary Table 7), a rate of 47.0% (8 CNVs / 17 unique CNV regions from Table 2). Two children had CAS, but the majority had SSD and LI or SSD alone. While most loci had duplications or deletions, one locus showed deletions and duplications. One child with SSD and LI had two duplications. The deletion on chromosome 8 is fairly common in the population at ~5%, and this replication cohort showed a prevalence of 4.5% (6/131) (Table 3, Supplementary Data 1) in this dataset. These results demonstrate that variants discovered in children with CAS are also present in children with less severe SSDs, suggesting that these CNVs have variable expressivity and that these findings are reproducible.
Determination of likely causal variants
We synthesized the results from functional annotation, replication analyses, severity of CAS based on persistence, and the literature in Table 3, in order to posit which CNVs among those discovered were likely causal. First, the top of Table 3 lists the four aforementioned CNVs with likely deleterious effects. The fourth of these, on chromosome 2q24.3, was observed in one family as a large deletion, and a deletion of a smaller, overlapping region in another family. The family with the smaller deletion, observed in the population at ~7%, also carries a large CNV on chromosome 22q11.1, which we posit to be the causal variant in this family. Two additional variants, on chromosomes 22q11.1 and 2q11.2, are likely causal based on their nonzero pHaplo scores and literature suggesting a link to speech phenotypes (Table 3, Supplementary Data 1).
Two other variants, on chromosomes 2q11.2 and 4q35.2, are also potentially causal based on their existence in children with persistent CAS, replication in our independent dataset, and previous case studies in the literature in children with communication and/or other neurological traits15,16,82. CNVs on chromosomes 4q32.3, 4p16.3, 10q11.21, 2q21.2 ( ~ 133 Mb), 7p21.1, 5q12.1, and 12q21.31 are not as well characterized, but are potentially of interest based on two or more of the following criteria; their existence in children with persistent CAS, replication in our independent sample, or previous case reports in the literature36,82–91. The CNV on 8q24.24.3 was replicated in our independent cohort, however, the common frequency of this variant within the population weakens support for causality. The CNV on chromosome 2q21.3 ( ~ 132 Mb) is of unknown significance, as both deletions and duplications observed in our data have also been reported in the literature92,93. One family in the WGS discovery cohort has four inherited variants, three of which are present in a sibling with LI. The 2q21.2 duplication restricted to the child with CAS of this family is of unknown significance, because duplications as well as deletions were observed in our replication dataset. However, other studies of learning disabilities have demonstrated genetic interactions between multiple CNV can have causal effects29,94,95, so it is possible that the 2q21.2 duplication may be additive in the case of CAS, while the other CNVs are only associated with severe LI in this family.
Phenotypic heterogeneity
By comparing families where children without CAS but with SSD or LI carried CNVs, we were able to characterize the other phenotypic implications of the variants using our extensive database of communication measures (Supplementary Table 3, 5, 6). In family 33, the variant on chromosome 5 was present in children with poor scores in articulation, fluency, and oral-motor skills. The CAS affected child in this family had a variety of other clinical manifestations as well (Supplementary Table 6). In family 74, children with the variant demonstrated poor scores in reading, elision, and rapid naming; this CAS patient also had problems with feeding and a limited repertoire of sounds. In family 8055, there were three variants, all present in children with severe language difficulties, and the CAS patient had broad clinical manifestations. In family 8107, children with the variant on chromosome 22 demonstrated poor scores on articulation measures, and the CAS affected child demonstrated delayed language onset.
Discussion
CAS is a rare SSD, reportedly found in only 3.4%–4.3% of children who are clinically referred for SSD96. Given the infrequent occurrence of the disorder, we nevertheless successfully ascertained a relatively large group of individuals who presented with CAS in early childhood, and who we followed longitudinally into adolescence and early adulthood. The CFSRS included participants whose primary impairment was CAS, eliminating participants with syndromes and other known neurodevelopmental disorders that can affect speech, such as autism spectrum disorder, sensorineural hearing impairment, and intellectual disability. Nonetheless, nearly 72% of the current participants demonstrated oral language difficulties related to language processing or verbal expression, congruent with previous reports of deficits in these areas55,97,98. Furthermore, 55% of the participants demonstrated difficulties with single real words, or nonword decoding and spelling, signaling that a diagnosis of CAS confers a higher risk for literacy difficulties than other forms of idiopathic SSD99. A diagnosis of CAS is also associated with persistent speech sound errors. This was apparent in our study, as 62% of the participants continued to demonstrate speech errors beyond the age of typical speech normalization of 8.5 to nine years of age11,100–102. Put together with our genetic data, these observations suggest that the loci discovered have an effect on multiple phenotypes across an individual’s lifespan.
We identified 17 CNV regions containing 19 different genomic regions with CNVs associated with CAS, speech, and language. This high rate of families where CAS is inherited with CNVs (51.9%) is consistent with other studies29. Most of these loci were unique across families, demonstrating significant genetic heterogeneity in CAS and speech disorder. We were able to replicate many (47.0%) of these loci in an independent sample with microarray data, and through that analysis, demonstrate that these variants are not exclusively associated with CAS but are also present in children with less severe speech disorders. Using bioinformatic resources and the extant literature, we have strong evidence for an association between CAS and deletions on chromosomes 2q24.3, 6p12.3-p12.2, 11q23.2-q23.3, and 16p11.2, and probable evidence with deletions 2q11.2 and 22q11.1.
CNVs in several regions have been previously associated with CAS, as well as speech and language. The deletion on chromosome 2q11.2 is associated with speech delay, ADHD, and developmental disability82. We observed a deletion on chromosome 4q35.2, and deletions and duplications at this locus in our replication study. This genomic region has been previously associated with an unbalanced dislocation and duplication in children with CAS15,38. A deletion on chromosome 7p21.1 has been previously associated with severe speech and language disorder, autism, and developmental delay87, here, we observed a duplication of this region. Finally, the duplication that we observed in two families on chromosome 2q21.2 that we replicated in our independent microarray data, has been previously associated with reading and language39. While we did not replicate all of the previously identified CNVs that been associated with CAS in other WGS and/or microarray studies21,27–29,35, this further suggests that CAS is genetically heterogeneous and that there may be additional, as yet undiscovered, causal variants. This in turn implies that multiple genes controlling naturally acquired speech and language are distributed across many locations in the human genome, and that evolution of speech and language did not occur in a single evolutionary step.
Additional CNVs in the literature have been associated with other neurological traits, such as autism, ADHD, and intellectual and developmental disability, as well as syndromes. Given that the CAS-associated CNVs also are present in children with developmental phenotypes, these findings are relevant for our work. The deleted region on chromosome 22q11.1 has been associated with autism, intellectual disability, ADHD, and DiGeorge syndrome103,104. While we observed a duplication on chromosome 10q11.21, other studies reported microduplications in this region associated with developmental delay and seizure85 and facial dysmorphism84. A deletion on chromosome 4p16.3, has been associated with Wolf-Hirschorn syndrome83. We observed a duplication on chromosome 12q21.31, while another study found a deletion associated with intellectual disability88. These developmental disabilities are often associated with delays/disorders of speech and language105–107.
Our work builds on previous literature21,22,29,34,74 demonstrating that the 16p11.2 deletion is a major cause of CAS. Previous literature also asserts that this deletion demonstrates variable expressivity21,40,41,104 with variable deficits seen in language21, speech production21, gross motor issues or delay21,22,74, severe articulation issues22,34, and speech delay74. The severity of these traits varies dramatically, even within study when evaluated by the same research team. Our work adds to these phenotypes by reporting all but one of the children in our cohort with this deletion have persistent speech errors beyond the age 8.5. Notably, the one nonpersistent child demonstrated relatively mild speech difficulties at baseline. Consistent with previous work34, these findings imply that children with a 16p11.2 deletion should be evaluated for CAS and other communication disorders, especially since some of the comorbidities of CAS could be mistaken as autism spectrum disorder34.
Three CNVs were seen in multiple families, but the rest were unique. Replication of these CNVs in our independent microarray data, consisting of children manifesting speech sound disorders other than CAS, suggests there are multiple CNVs that are common causes of CAS and other communication disorders. For example, the deletion on chromosome 16p11.2 was previously associated with both CAS and autism; our study extended this clinical characterization to milder forms of SSD. SSD are heterogenous with regard to etiology and clinical presentation, and the diversity of CNVs identified support that conclusion.
As in previous studies of CAS29, we also observed a subset of families with multiple CNVs inherited with CAS. Other authors have hypothesized a “two-hit hypothesis” among pathogenic CNVs with variable penetrance and expressivity29,94,95. Our data supports this hypothesis; among the three apraxics with multiple CNVs, variable severity in terms of persistence, severity, deficits in language and literacy, and existence of other clinical manifestations is observed, as one of the three children is “mild” based on our metrics.
This work is not without limitations. Only a portion of the CFSRS sample had WGS data available, so we were unable to identify whether these CNVs were present in a larger cohort of children affected with CAS or SSD. Since some children were lost to follow-up, we do not have complete data on literacy measures for some children. Clinical manifestation data were based on self-report and were not verified in all cases with medical records, so they may be incomplete or inaccurate. Despite the strict inclusion criteria, our participants presented with a range of co-occurring developmental delays, LI, and RD, in addition to persistent speech difficulties. These findings are consistent with the view of CAS as a multi-domain motor-speech disorder, affecting processes related to auditory-perceptual encoding, memory, and motor planning108. Comparison of our findings with the literature is challenging because of variable expressivity seen with CNVs109, so some of our observed CNVs may not have been previously reported in their association with speech phenotypes. DNA changes like CNVs may be caused by a variety of environmental factors110–112, but unfortunately, we did not collect data on such factors, so we are unable to explore this hypothesis. Lastly, we focused our search on CNVs ≥ 50 kb in size, so we were unable to replicate recent findings35. Smaller CNVs are more common in the population, necessitating a different study design and analytical approach, for which this sample was underpowered. Thus, this will be the focus of future research, in an independent cohort.
In sum, our analysis of WGS data in 28 children with CAS revealed 17 CNV regions associated with CAS or other SSDs. The children with these CNVs demonstrated a wide variety of severity and speech, language, and literacy deficits, as well as other clinical manifestations. These data support CNVs as a major cause of CAS and other communication disorders in our sample, and that these CNVs have variable expressivity. Additional work is needed in new cohorts to discover novel potentially causal CNVs, as well as identify whether some of these CNVs are shared in other CAS subjects. Such future work will be invaluable for assisting practitioners in diagnosing disorders, screening potentially affected siblings, and allowing patients to seek targeted therapies47,48.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Description of Additional Supplementary Materials
Acknowledgements
We would like to thank the families who have so generously participated in this study for many years. We would also like to acknowledge Dr. H. Gerry Taylor for his valuable contributions as consulting neuropsychologist during the enrollment and diagnosis phase of this study. This research was supported by the Genomics Core Facility of the CWRU School of Medicine’s Genetics and Genome Sciences Department. This work made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University. This work was supported by NIH grant R01DC012380 awarded to Dr. Iyengar and R01DC000528 awarded to Dr. Lewis.
Author contributions
ERC, CMS, and SKI conceptualized and designed the study, drafted the initial manuscript, and reviewed and revised the manuscript. ERC, PB, KB, and BT conducted the statistical analyses. AS and BAL helped conceptualize the study and critically reviewed the manuscript for important intellectual content. GM, LF, JT, and BAL collected the data and revised and reviewed the manuscript. All authors approved the final manuscript as submitted.
Peer review
Peer review information
Communications Biology thanks Beate Peter and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Julio Hechavarría and Benjamin Bessieres.
Data availability
The datasets generated and/or analyzed during the current study are not publicly available due to Institutional Review Board (IRB) restrictions on the data, but are available from the corresponding author (Sudha Iyengar, ski@case.edu) on reasonable request and will require an IRB application.
Code availability
All software versions are identified within the Methods. If there is no version number, then that software package only has one (current) version. There were no custom scripts created for the analyses conducted in this paper.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Catherine M. Stein, Email: cmj7@case.edu
Sudha K. Iyengar, Email: ski@case.edu
Supplementary information
The online version contains supplementary material available at 10.1038/s42003-024-06968-y.
References
- 1.American-Speech-Language-Hearing-Association. Definitions of communication disorders and variations. 1993.
- 2.Black, L. I., Vahratian, A. & Hoffman, H. J. Communication Disorders and Use of Intervention Services Among Children Aged 3-17 Years: United States, 2012. NCHS data brief, 1-8 (2015). [PubMed]
- 3.Tourville, J. A. & Guenther, F. H. The DIVA model: A neural theory of speech acquisition and production. Lang. Cogn. Process.26, 952–981 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Chang, E. F., Raygor, K. P. & Berger, M. S. Contemporary model of language organization: an overview for neurosurgeons. J. Neurosurg.122, 250–261 (2015). [DOI] [PubMed] [Google Scholar]
- 5.Abbasi, O., Steingräber, N., Chalas, N., Kluger, D. S. & Gross, J. Spatiotemporal dynamics characterise spectral connectivity profiles of continuous speaking and listening. PLoS Biol.21, e3002178 (2023). [DOI] [PubMed] [Google Scholar]
- 6.Bauman-Wängler, J. A. (Pearson Educatoin, United Kingdom, 2020).
- 7.Catts, H. W., Adlof, S. M., Hogan, T. P. & Weismer, S. E. Are specific language impairment and dyslexia distinct disorders. J. Speech, Lang., Hearing Res. : JSLHR48, 1378–1396 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Shriberg, L., Tomblin, J. & McSweeny, J. Prevalence of speech delay in 6-year-old children and comorbidity with language impairment. J. Speech, Lang., Hearing Res.42, 1461–1481 (1999). [DOI] [PubMed] [Google Scholar]
- 9.American-Speech-Language-Hearing-Association. Chilldhood Apraxia of Speech (Practice Portal), 2007).
- 10.Stein, C. M. et al. Feature-driven classification reveals potential comorbid subtypes within childhood apraxia of speech. BMC Pediatrics20, 519 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Shriberg, L. D., Aram, D. M. & Kwiatkowski, J. Developmental apraxia of speech: I. Descriptive and theoretical perspectives. J. Speech Lang. Hear Res. 40, 273–285 (1997). [DOI] [PubMed] [Google Scholar]
- 12.Shriberg, L. D., Kwiatkowski, J. & Mabie, H. L. Estimates of the prevalence of motor speech disorders in children with idiopathic speech delay. Clin. Linguist. phonetics33, 679–706 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lewis, B. A., Miller, G.; Iyengar, S. K.; Stein, C. M.; Benchek, P. Long-term Outcomes for Individuals with Childhood Apraxia of Speech. Journal of Speech, Language, and Hearing Research (in press). [DOI] [PubMed]
- 14.Iuzzini-Seigel, J., Delaney, A. L. & Kent, R. D. Retrospective case-control study of communication and motor abilities in 143 children with suspected childhood apraxia of speech: effect of concomitant diagnosis. Perspect. ASHA Spec. Interest Groups7, 45–55 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Chilosi, A. M. et al. Behavioral and neurobiological correlates of childhood apraxia of speech in Italian children. Brain Lang.150, 177–185 (2015). [DOI] [PubMed] [Google Scholar]
- 16.Chilosi, A. M. et al. Differences and commonalities in children with childhood apraxia of speech and comorbid neurodevelopmental disorders: a multidimensional perspective. J. Personalized Med.12, 313 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Fisher, S., Vargha-Khadem, F., Watkins, K., Monaco, A. & Pembrey, M. Localisation of a gene implicated in a severe speech and language disorder. Nat. Genet.18, 168–170 (1998). [DOI] [PubMed] [Google Scholar]
- 18.Lai, C. S. et al. The SPCH1 region on human 7q31: genomic characterization of the critical interval and localization of translocations associated with speech and language disorder. Am. J. Hum. Genet.67, 357–368 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lai, C. S., Fisher, S. E., Hurst, J. A., Vargha-Khadem, F. & Monaco, A. P. A forkhead-domain gene is mutated in a severe speech and language disorder. Nature413, 519–523 (2001). [DOI] [PubMed] [Google Scholar]
- 20.Veltman, J. A. & Brunner, H. G. De novo mutations in human genetic disease. Nat. Rev. Genet.13, 565–575 (2012). [DOI] [PubMed] [Google Scholar]
- 21.Raca, G. et al. Childhood Apraxia of Speech (CAS) in two patients with 16p11.2 microdeletion syndrome. Eur. J. Hum. Genet.21, 455–459 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Fedorenko, E. et al. A highly penetrant form of childhood apraxia of speech due to deletion of 16p11.2. Eur. J. Hum. Genet.24, 302–306 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Morgan, A. T., Amor, D. J., St John, M. D., Scheffer, I. E. & Hildebrand, M. S. Genetic architecture of childhood speech disorder: a review. Mol. Psychiatry29, 1281–1292 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kingdom, R. & Wright, C. F. Incomplete penetrance and variable expressivity: from clinical studies to population cohorts. Front. Genet.13, 920390 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Weischenfeldt, J., Symmons, O., Spitz, F. & Korbel, J. O. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat. Rev. Genet.14, 125–138 (2013). [DOI] [PubMed] [Google Scholar]
- 26.Grimes, K. et al. Cell-type-specific consequences of mosaic structural variants in hematopoietic stem and progenitor cells. Nat. Genet.56, 1134–1146 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Worthey, E. A. et al. Whole-exome sequencing supports genetic heterogeneity in childhood apraxia of speech. J. Neurodev. Disord.5, 29 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Eising, E. et al. A set of regulatory genes co-expressed in embryonic human brain is implicated in disrupted speech development. Mol. Psychiatry24, 1065–1078 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Laffin, J. J. et al. Novel candidate genes and regions for childhood apraxia of speech identified by array comparative genomic hybridization. Genet. Med. 14, 928–936 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Collins, R. L. et al. A cross-disorder dosage sensitivity map of the human genome. Cell185, 3041–3055.e3025 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Peter, B. A case with cardiac, skeletal, speech, and motor traits narrows the subtelomeric 19p13.3 microdeletion region to 46 kb. Am. J. Med Genet. A191, 120–129 (2023). [DOI] [PubMed] [Google Scholar]
- 32.Peter, B. et al. Two unrelated children with overlapping 6q25.3 deletions, motor speech disorders, and language delays. Am. J. Med Genet. A173, 2659–2669 (2017). [DOI] [PubMed] [Google Scholar]
- 33.Peter, B., Matsushita, M., Oda, K. & Raskind, W. De novo microdeletion of BCL11A is associated with severe speech sound disorder. Am. J. Med. Genet. A164a, 2091–2096 (2014). [DOI] [PubMed] [Google Scholar]
- 34.Mei, C. et al. Deep phenotyping of speech and language skills in individuals with 16p11.2 deletion. Eur. J. Hum. Genet.26, 676–686 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kaspi, A., et al. Genetic aetiologies for childhood speech disorder: novel pathways co-expressed during brain development. Mol Psychiatry, 10.1038/s41380-022-01764-8 (2022). [DOI] [PMC free article] [PubMed]
- 36.Fanizza, I. et al. Genotype-phenotype relationship in a child with 2.3 Mb de novo interstitial 12p13.33-p13.32 deletion. Eur. J. Med. Genet.57, 334–338 (2014). [DOI] [PubMed] [Google Scholar]
- 37.Boyar, F. et al. A family with a grand-maternally derived interstitial duplication of proximal 15q. Clin. Genet.60, 421–430 (2001). [DOI] [PubMed] [Google Scholar]
- 38.Shriberg, L. D., Jakielski, K. J. & El-Shanti, H. Breakpoint localization using array-CGH in three siblings with an unbalanced 4q;16q translocation and childhood apraxia of speech (CAS). Am. J. Med Genet. A146a, 2227–2233 (2008). [DOI] [PubMed] [Google Scholar]
- 39.Gialluisi, A. et al. Investigating the effects of copy number variants on reading and language performance. J. Neurodev. Disord.8, 17 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Duyzend, M. H. et al. Maternal modifiers and parent-of-origin bias of the autism-associated 16p11.2 CNV. Am. J. Hum. Genet.98, 45–57 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Tai, D. J. C. et al. Tissue- and cell-type-specific molecular and functional signatures of 16p11.2 reciprocal genomic disorder across mouse brain and human neuronal models. Am. J. Hum. Genet.109, 1789–1813 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Jacobs, P. A., Browne, C., Gregson, N., Joyce, C. & White, H. Estimates of the frequency of chromosome abnormalities detectable in unselected newborns using moderate levels of banding. J. Med. Genet.29, 103–108 (1992). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Brewer, C., Holloway, S., Zawalnyski, P., Schinzel, A. & FitzPatrick, D. A chromosomal deletion map of human malformations. Am. J. Hum. Genet.63, 1153–1159 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Girirajan, S. et al. Relative burden of large CNVs on a range of neurodevelopmental phenotypes. PLoS Genet.7, e1002334 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Han, L. et al. Functional annotation of rare structural variation in the human brain. Nat. Commun.11, 2990 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Macnee, M., et al. CNV-ClinViewer: enhancing the clinical interpretation of large copy-number variants online. Bioinformatics39, 10.1093/bioinformatics/btad290 (2023). [DOI] [PMC free article] [PubMed]
- 47.Peter, B. et al. Translating principles of precision medicine into speech-language pathology: Clinical trial of a proactive speech and language intervention for infants with classic galactosemia. HGG Adv.3, 100119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Bruce, L. & Peter, B. Three children with different de novo BCL11A variants and diverse developmental phenotypes, but shared global motor discoordination and apraxic speech: Evidence for a functional gene network influencing the developing cerebellum and motor and auditory cortices. Am. J. Med Genet. A188, 3401–3415 (2022). [DOI] [PubMed] [Google Scholar]
- 49.Goldman, R. & Fristoe, M. (American Guidance Service, Circle Pinesm MN, 1986).
- 50.Wechsler, D. (The Psychological Coporation, San Antonio, TX, 1991).
- 51.American-Speech-Language-Hearing-Association. Childhood Apraxia of Speech [Technical Report], 2007).
- 52.Lewis, B. A. et al. Literacy outcomes of children with early childhood speech sound disorders: impact of endophenotypes. J. Speech Lang. Hear. Res54, 1628–1643 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Lewis, B. A. et al. Subtyping children with speech sound disorders by endophenotypes. Top. Lang. Disord.31, 112–127 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Lewis, B. A. et al. Adolescent outcomes of children with early speech sound disorders with and without language impairment. Am. J. Speech Lang. Pathol.24, 150–163 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Lewis, B. A., Freebairn, L. A., Hansen, A. J., Iyengar, S. K. & Taylor, H. G. School-age follow-up of children with childhood apraxia of speech. Lang., speech, hearing Serv. Sch.35, 122–140 (2004). [DOI] [PubMed] [Google Scholar]
- 56.American-Speech-Language-Hearing-Association. Latest Forum From Perspectives Tackles SLI/DLD Terminology Discussion, 2020.
- 57.Anthoni, H. et al. The aromatase gene CYP19A1: several genetic and functional lines of evidence supporting a role in reading, speech and language. Behav. Genet.42, 509–527 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Benchek, P. et al. Association between genes regulating neural pathways for quantitative traits of speech and language disorders. NPJ Genom. Med.6, 64 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Lewis, B., Freebairn, L. & Taylor, H. Follow-up of children with early expressive phonology disorders. J. Learn. Disabilities33, 433–444 (2000). [DOI] [PubMed] [Google Scholar]
- 60.Stein, C. M. et al. Pleiotropic effects of a chromosome 3 locus on speech-sound disorder and reading. Am. J. Hum. Genet.74, 283–297 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Trim Galore!. https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/.
- 62.GATK. https://gatk.broadinstitute.org/.
- 63.Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics34, 867–868 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Handsaker, R. E. et al. Large multiallelic copy number variations in humans. Nat. Genet.47, 296–303 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep.9, 9354 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Kundaje, A. A comprehensive collection of signal artifact blacklist regions in the human genome. 2021.
- 67.Robinson, J. T., Thorvaldsdóttir, H., Wenger, A. M., Zehir, A. & Mesirov, J. P. Variant review with the integrative genomics viewer. Cancer Res.77, e31–e34 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol.29, 24–26 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Robinson, J. T., Thorvaldsdóttir, H., Turner, D. & Mesirov, J. P. igv.js: an embeddable JavaScript implementation of the Integrative Genomics Viewer (IGV). bioRxiv, 2020.2005.2003.075499 10.1101/2020.05.03.075499 (2020). [DOI] [PMC free article] [PubMed]
- 70.Illumina. www.illumina.com/documents/products/technotes/technote_cnv_algorithms.pdf.
- 71.MacDonald, J. R., Ziman, R., Yuen, R. K., Feuk, L. & Scherer, S. W. The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res.42, D986–992 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature536, 285–291 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Rice, A. M. & McLysaght, A. Dosage-sensitive genes in evolution and disease. BMC Biol.15, 78 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Fetit, R., Price, D. J., Lawrie, S. M. & Johnstone, M. Understanding the clinical manifestations of 16p11.2 deletion syndrome: a series of developmental case reports in children. Psychiatr. Genet.30, 136–140 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Usui, N. et al. Zbtb16 regulates social cognitive behaviors and neocortical development. Transl. Psychiatry11, 242 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Boeckx, C., & Benítez-Burraco, A. Globularity and language-readiness: generating new predictions by expanding the set of genes of interest. Front. Psychol.5, 1324 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Yang, X. et al. The association between NCAM1 levels and behavioral phenotypes in children with autism spectrum disorder. Behav. Brain Res.359, 234–238 (2019). [DOI] [PubMed] [Google Scholar]
- 78.Vukojevic, V. et al. Evolutionary conserved role of neural cell adhesion molecule-1 in memory. Transl. psychiatry10, 217 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Hu, Y. et al. Functional divergence of mammalian TFAP2a and TFAP2b transcription factors for bidirectional sleep control. Genetics216, 735–752 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Poot, M. et al. Three de novo losses and one insertion within a pericentric inversion of chromosome 6 in a patient with complete absence of expressive speech and reduced pain perception. Eur. J. Med. Genet.52, 27–30 (2009). [DOI] [PubMed] [Google Scholar]
- 81.Belengeanu, V. et al. A de novo 2.3 Mb deletion in 2q24.2q24.3 in a 20-month-old developmentally delayed girl. Gene539, 168–172 (2014). [DOI] [PubMed] [Google Scholar]
- 82.Riley, K. N. et al. Recurrent deletions and duplications of chromosome 2q11.2 and 2q13 are associated with variable outcomes. Am. J. Med. Genet. A167a, 2664–2673 (2015). [DOI] [PubMed] [Google Scholar]
- 83.Rauch, A. et al. First known microdeletion within the Wolf-Hirschhorn syndrome critical region refines genotype-phenotype correlation. Am. J. Med. Genet.99, 338–342 (2001). [DOI] [PubMed] [Google Scholar]
- 84.Chen, C. P. et al. Prenatal diagnosis of a 4.9-Mb deletion of 10q11.21 -> q11.23 by array comparative genomic hybridization. Taiwan. J. Obstet. Gynecol.49, 117–119 (2010). [DOI] [PubMed] [Google Scholar]
- 85.Stankiewicz, P. et al. Recurrent deletions and reciprocal duplications of 10q11.21q11.23 including CHAT and SLC18A3 are likely mediated by complex low-copy repeats. Hum. Mutat.33, 165–179 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Moralli, D. et al. Language impairment in a case of a complex chromosomal rearrangement with a breakpoint downstream of FOXP2. Mol. Cytogenet.8, 36 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Udayakumar, A. M., Al-Mamari, W., Al-Sayegh, A. & Al-Kindy, A. De Novo Duplication of 7p21.1p22.2 in a child with autism spectrum disorder and craniofacial dysmorphism. Sultan Qaboos Univ. Med. J.15, e415–e419 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Akilapa, R. S., Smith, K. & Balasubramanian, M. Clinical report: inherited deletion of chromosome 12q21.31q21.32 associated with a distinct phenotype and intellectual disability. Clin. Dysmorphol.24, 151–155 (2015). [DOI] [PubMed] [Google Scholar]
- 89.Chernus, J. et al. GWAS reveals loci associated with velopharyngeal dysfunction. Sci. Rep.8, 8470 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Dallapiccola, B., Bernardini, L., Novelli, A. & Mingarelli, R. Phenocopy of Wolf-Hirschhorn syndrome in a patient with duplication 12q13.3q14.1. Am. J. Med Genet. A149a, 546–548 (2009). [DOI] [PubMed] [Google Scholar]
- 91.Bertoli, M. et al. Another patient with 12q13 microduplication. Am. J. Med Genet. A161a, 2004–2008 (2013). [DOI] [PubMed] [Google Scholar]
- 92.Dharmadhikari, A. V. et al. Small rare recurrent deletions and reciprocal duplications in 2q21.1, including brain-specific ARHGEF4 and GPR148. Hum. Mol. Genet.21, 3345–3355 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Gimelli, S. et al. Recurrent microdeletion 2q21.1: report on a new patient with neurological disorders. Am. J. Med Genet. A164a, 801–805 (2014). [DOI] [PubMed] [Google Scholar]
- 94.Girirajan, S. et al. A recurrent 16p12.1 microdeletion supports a two-hit model for severe developmental delay. Nat. Genet.42, 203–209 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Desachy, G. et al. Increased female autosomal burden of rare copy number variants in human populations and in autism families. Mol. Psychiatry20, 170–175 (2015). [DOI] [PubMed] [Google Scholar]
- 96.Delaney, A. L. K., R.D. Developmental profiles of children diagnosed with apraxia of speech. Paper presented at the American Speech-Language-Hearing Association Convention, Philadelpha, PA (2004).
- 97.Ekelman, B. & Aram, D. Syntactic findings in developmental verbal apraxia. J. Commun. Disord.16, 237–250 (1983). [DOI] [PubMed] [Google Scholar]
- 98.Thoonen, G., Maassen, B., Gabreëls, F., Schreuder, R. & de Swart, B. Towards a standardised assessment procedure for developmental apraxia of speech. Eur. J. Disord. Commun. : J. Coll. Speech Lang. Therapists, Lond.32, 37–60 (1997). [DOI] [PubMed] [Google Scholar]
- 99.Miller, G. J. et al. Reading outcomes for individuals with histories of suspected childhood apraxia of speech. Am. J. Speech Lang. Pathol.28, 1432–1447 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Preston, J. L., Hull, M. & Edwards, M. L. Preschool speech error patterns predict articulation and phonological awareness outcomes in children with histories of speech sound disorders. Am. J. Speech Lang. Pathol.22, 173–184 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Shriberg, L. & Kwiatkowski, J. Developmental phonological disorders I: A clinical profile. J. Speech Hearing Res.37, 1100–1126 (1994). [DOI] [PubMed] [Google Scholar]
- 102.Wren, Y., Miller, L. L., Peters, T. J., Emond, A. & Roulstone, S. Prevalence and predictors of persistent speech sound disorder at eight years old: findings from a population cohort study. J. Speech Lang. Hear. Res.59, 647–673 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Moreno-De-Luca, D. & Martin, C. L. All for one and one for all: heterogeneity of genetic etiologies in neurodevelopmental psychiatric disorders. Curr. Opin. Genet. Dev.68, 71–78 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Aguirre, M., Rivas, M. A. & Priest, J. Phenome-wide burden of copy-number variation in the UK Biobank. Am. J. Hum. Genet.105, 373–383 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Sanger, D. D., Stick, S. L., Sanger, W. G. & Dawson, K. Specific syndromes and associated communication disorders: a review. J. Commun. Disord.17, 385–405 (1984). [DOI] [PubMed] [Google Scholar]
- 106.Shprintzen, R. Genetics, syndromes and communication disorders. (Singular Publishing Group, 1997).
- 107.Batshaw, M. N., NJ; Pelligino, L. Children with Disabilities. Eighth edition edn. (Brookes Publishing, 2019).
- 108.Shriberg, L. D., Lohmeier, H. L., Strand, E. A. & Jakielski, K. J. Encoding, memory, and transcoding deficits in Childhood Apraxia of Speech. Clin. Linguist. Phonetics26, 445–482 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Bertini, V. et al. Deletion extents are not the cause of clinical variability in 22q11.2 deletion syndrome: does the interaction between DGCR8 and miRNA-CNVs play a major role? Front Genet.8, 47 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Goldmann, J. M., Veltman, J. A. & Gilissen, C. De novo mutations reflect development and aging of the human germline. Trends Genet.35, 828–839 (2019). [DOI] [PubMed] [Google Scholar]
- 111.Goodman, C. V. et al. Sex difference of pre- and post-natal exposure to six developmental neurotoxicants on intellectual abilities: a systematic review and meta-analysis of human studies. Environ. Health22, 80 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Dzwilewski, K. L. & Schantz, S. L. Prenatal chemical exposures and child language development. J. Commun. Disord.57, 41–65 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Gough, P. B. & Tunmer, W. E. Decoding, reading, and reading disability. Remedial Spec. Educ.7, 6–10 (1986). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Description of Additional Supplementary Materials
Data Availability Statement
The datasets generated and/or analyzed during the current study are not publicly available due to Institutional Review Board (IRB) restrictions on the data, but are available from the corresponding author (Sudha Iyengar, ski@case.edu) on reasonable request and will require an IRB application.
All software versions are identified within the Methods. If there is no version number, then that software package only has one (current) version. There were no custom scripts created for the analyses conducted in this paper.