Abstract
Background/Objectives: Autism spectrum disorder (ASD) is a heritable neurodevelopmental condition that displays heterogeneity in both presentation and etiology, and often presents with concomitant communication difficulties. The hypothesis behind the New Jersey Language and Autism Genetic Study is that genetic heterogeneity for component phenotypes of ASD may be reduced relative to the disorder as a whole. We previously published an initial phase of this study with family recruitment that used very restricted inclusion/exclusion criteria for both autism and language deficits. Here, we present an expanded sample that includes a wider range of phenotypic presentations in the autism and language domains. Methods: Bioinformatics tools focusing on variant prioritization were used to identify candidate risk genes. Results: Our previous findings on 15q and 16q, connecting ASD and oral/written communication, are only relevant to the narrow ASD and language impairment phenotypes, though addition of families did reduce both critical regions. After variant and gene prioritization, we determined a set of ten and six top candidate risk genes with a strong association with language impairment and reading impairment, respectively. The top candidate genes include both genes previously implicated in neurodevelopmental disorders (e.g., ZNF774 and DNAH3) and genes not previously reported but with strong evidence of being involved in neurodevelopmental phenotypes. Conclusions: Our analysis elucidates the genetic architecture and interaction of ASD and language-related phenotypes. In addition, we reported a number of high-confidence candidate genes within the top linkage regions. These genes will provide insights into the genetic etiology of neurodevelopmental disorders.
Keywords: autism, genetics, family-based, linkage analysis, reading impairment, language impairment, DSM-5, posterior probability of linkage
1. Introduction
Autism spectrum disorder (ASD) is a neurodevelopmental condition frequently seen in clinical practice and defined in the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) by impairment in social interactions along with evidence of either restricted interests or repetitive behaviors [1]. Progress on the genetics of ASD has advanced (reviewed in [2,3]) in recent years. Large genome-wide association studies and exome sequencing studies have identified both common and rare variants contributing to ASD risk with over 100 high-confidence risk genes implicated [4,5]. Overlapping with other neurodevelopmental and psychiatric traits, these genes provide the general architecture of ASD, though more work is needed to link the genetics to measurable phenotypic heterogeneity [6,7].
ASD endophenotypes are sometimes conceptualized as subcomponents of the broader ASD diagnosis, whereas in other contexts, endophenotype analyses indicate a comorbid model. To address the challenge of how candidate phenotypes genetically intersect with ASD, two general strategies are commonly used. One strategy employs a simple additive comorbidity model, which implies ubiquitously common underlying genetics that may drive observed comorbidities. The other strategy considers traits that occur predominantly in only subsets of the ASD population, using these traits to define heterogeneity rather than assuming a simple additive model. The latter heterogeneity model has been emphasized in ASD genetics research. Aspects of language ability [8,9,10], traits related to the broader autism phenotype [2,11,12,13], and other cognitive traits [14] have been applied to the genetic analysis of ASD.
At the same time, dimensional modeling and subtyping approaches have been widely applied to ASD cohorts. Factor analytic and other models of brain connectivity and behavior have derived latent factors of ASD heterogeneity that cut across diagnostic boundaries and may offer a more unified view of how categorical and dimensional views of the condition coincide [15,16]. Mixture models on broader phenotype trait batteries in large ASD cohorts have shown that trait-first data-driven subgroups map onto partially distinct genetic regimes, underscoring the importance of modeling ASD and related phenotypic domains (i.e., social communication, restricted/repetitive behavior, and language) as continuous dimensions [17].
The New Jersey Language-Autism Genetics Study (NJLAGS) took a different approach to study the genetics of communication in ASD [18]. The goal of the project was to characterize the genetic relationship between communication skills and ASD. The DSM-5 defines ASD by social-communication deficits plus restricted/repetitive behavior [1]. Specific Language Impairment (SLI)-like language impairment, or even the absence of verbal language as an extreme limit, is neither necessary nor sufficient for diagnosis. The broader autism phenotype construct in relatives loads most strongly on social-communication, in particular pragmatic language (the social use of language) alongside aloofness and rigidity, consistent with expectations from the DSM-5. In contrast, structural language abilities (phonology and morphosyntax) are highly heterogeneous in ASD. A subgroup shows SLI/Developmental Language Disorder (DLD)-like deficits with distinct neurobiological signatures, while many individuals with ASD have age-appropriate structural language and still others are non-verbal. Psycholinguistic and clinical research studies continue to define the relevance of structural and pragmatic language domains and how these domains represent partially dissociable dimensions in ASD [19]. Considering language constructs as having distinct cognitive and neurobiological correlates, language profiles can help refine ASD subtyping by providing additional dimensions to ASD risk. These distinct patterns may imply partially separable etiologies from the social–behavioral core. The NJLAGS family-based dual ASD/language-impairment ascertainment strategy therefore compliments recent genome/exome sequencing analyses of multiplex ASD families that have demonstrated that deeply phenotyped pedigrees are particularly informative for dissecting the contributions of rare inherited variants and de novo mutations to ASD and related quantitative traits [20,21].
To study the possibility of language-related subgroups, families were ascertained only if they had both ASD and psychometrically verified language impairment (LI) within a candidate family. The co-incident diagnoses within the family presumed a shared genetics model between the two disorders. The genetic analysis model for NJLAGS also assumed a partial common etiology [18], allowing the focus to remain on genetic loci common to both disorders while giving lesser weighting to those unique to each. In our analysis, loci were found on chromosomes 15q23-26 (spoken language and ASD) and 16p12 (written language and ASD). Additional analyses indicated that the best model fit on how spoken language and ASD are related at chromosome 15 was the etiological equivalence model. On chromosome 16, the model fit for written language and ASD was equivocal; although the evidence for linkage at that locus was weaker, the etiological equivalence model could not be excluded. Consistent with these findings, a recent whole genome sequencing study of multiplex ASD families concluded that language delay and dysfunction are a related feature of ASD, consistent with our emphasis on communication and the hypothesis that language impairment is not merely a comorbid condition, though the genetic architecture mediating this proposed relationship is still unclear [20].
An additional key feature of NJLAGS goes beyond the co-incident diagnoses of language impairment and ASD within families. In the first wave of NJLAGS recruitment (WAVE1), proband criteria were conservative for both ASD and LI. ASD probands were required to meet the strict congruence of both DSM-IV Autistic Disorder (AD) and Autism Diagnostics Interview-Revised (ADI-R)/Autism Diagnostic Observation Scales (ADOS) criteria. LI probands met criteria for a definition of SLI from the language disorders literature requiring typical intelligence quotient along with traditional criteria for language impairment [22,23]. Coinciding with the publication of the DSM-5 [1] and out of concern that the narrow definitions of AD during WAVE1 of NJLAGS recruitment may not generalize to the wider population of ASD families, ASD ascertainment criteria were relaxed in the second wave of recruitment (WAVE2) to conform with the new, more inclusive definition of ASD. Our rationale was that less strict criteria would increase the ecological validity of genetic findings. Specifically, during WAVE2 recruitment, families were ascertained using DSM-5 criteria, meaning that probands with significant social impairments and evidence of restricted interests and repetitive behaviors who did not meet the AD cutoff were still classified as ASD. During WAVE2 recruitment, the LI criteria remained substantially the same, although the IQ requirement was not taken into consideration. The term Developmental Language Disorder was adapted to explain greater inclusivity (see the Section 2 for specific diagnostic criteria).
In summary, the NJLAGS is distinct from other ASD cohorts in several respects. First, it consists of multiplex extended pedigrees including both affected and unaffected relatives, useful for dimensional analysis across the distribution of a given trait, rather than primarily trios or sib-pairs. Second, the joint language–ASD ascertainment scheme increases the number of informative meioses for joint analysis of ASD, language impairment, and broader-autism-phenotype traits across generations. Third, all family members regardless of affected status underwent an extensive psychometric battery including oral language, reading, and quantitative autistic traits, allowing us to construct both categorical and dimensional phenotypes (LI*, RI*, SRS total, and SRS-RIRB). Fourth, genome-wide SNP genotyping was combined with whole-genome sequencing to interrogate single-nucleotide variants, indels, structural variants, and copy-number variants within linkage-defined regions leveraging our extensive phenotypic data. Together, these design features enable NJLAGS to dissect how differences in genetic variation contribute to language-related and broader autism traits within families. This is an extension from much of the previous literature that interrogated whether stricter versus broader criteria change apparent effect sizes, rather, we use our design to ask whether language-related and social/behavioral dimensions of ASD respond differently between our two ascertainment frames.
In this study, we identified candidate genes within the two linkage regions of interest, 15q23-26 and 16p12, originally defined in WAVE1 and further examined with WAVE2. The top candidate genes in the linkage regions indicate a shared etiology of ASD and language impairment at these two loci. We also present the results of a unified analysis of genetic heterogeneity across both waves of recruitment (WAVE1 and WAVE2) to better characterize our previous genetic findings and to assess if candidate genes generalize across narrow versus broad phenotypic criteria-based ascertainment schemes. As part of NJLAGS and keeping consistent with ASD criteria, we also collected component phenotypes for the social and restricted interests/stereotyped behavioral domains. These phenotypes, while secondary to the goal of NJLAGS, nevertheless offer insight for understanding heterogeneity in ASD and the relations among related neuropsychiatric disorders.
2. Materials and Methods
2.1. Families
Design of study. The purpose of the NJLAGS study is to find genetic variation relevant to both ASD and spoken language impairments [18]. The goal of recruiting families was to find close relatives where at least one person had ASD and at least one other person had a language impairment. The entire family was then enrolled in the study including extended family members, when available, raising the prospects of finding additional persons with ASD and/or LI. Any genetic factors that “reduce” language ability in the person with LI should also do so in persons with ASD, and possibly to greater effect through gene-by-gene or other interactions as evidenced and discussed elsewhere [24]. All subjects gave informed consent or assent conforming to the guidelines for treatment of human subjects supervised by the Institutional Review Board at Rutgers, The State University of New Jersey (IRB number: 13-112Mc).
While the NJLAGS study manifestly focused on communication in autism, the genotype–phenotype relationship among all family members was considered highly relevant. Therefore, all family members—including higher-functioning persons with ASD—received the age-appropriate version of language, reading, and other cognitive and behavioral measures used to define ascertainment criteria and phenotypes for genetic analysis. Questionnaire data related to other behaviors associated with ASD and communication were collected for all subjects. A complete list of phenotypic measures are presented in Table S1, and the summary statistics by affected status are presented in Table S2.
Ascertainment in two waves of recruitment. In our previous study sample (WAVE1) [18], we defined tiers of analysis that ranked from strict phenotypically narrow (Tier I) to phenotypically broader (Tier II). Tier I families included at least one autism proband who met criteria for DSM-IV AD and at least one proband who met strict criteria for SLI. Tier II families were multiplex for AD (including one proband with LI) but with no family member with SLI. (We note also that in the 2014 publication there was a Tier III that consisted of trios and a non-European family which we do not carry through to the analysis presented here.)
Our goal during WAVE2 was to assess the etiological validity of the relationship between ASD and communication impairments by loosening phenotypic criteria, which would better represent the larger population of ASD families. This was done by (1) using the more inclusive DSM-5 diagnosis of ASD and (2) using the less restrictive language impairment criteria for SLI, which excludes the normal IQ requirement and is more aligned with the current definition of DLD or LI.
Hypothesis. The effect of WAVE2 on social quantitative traits would be anticipated to introduce minimal heterogeneity since the Social Responsiveness Scale-2 (SRS-2), for example, is a valid metric across the full phenotypic range of scores seen in the population, including scores associated with autism.
We anticipated greater heterogeneity in language phenotypes, as our WAVE2 families included multiplex ASD families both with and without concomitant LI, some with the earlier, more stringent AD/SLI diagnoses, and others with the newer ASD/DLD (LI) diagnoses.
Characteristics of the families. The first phase of ascertainment (WAVE1) included 440 individuals from the 79 families. In WAVE1, we used an autism proband criteria whereby two of three diagnostic criteria were met: (1) Autism Diagnostic Interview—Revised [25] (ADI-R) algorithm scores meet cutoff for “autism,” (2) Autism Diagnostic Observation Schedule [26,27] (ADOS) algorithm scores meet cutoff for “autism,” and (3) DSM-IV [28], autistic disorder. We also sought to ascertain language-impaired probands in the same family using the following criteria: (1) A core standard score of ≤85 on the age-appropriate version of the Clinical Evaluation of Language Fundamentals [29,30] (CELF-4 or CELF-Preschool) or at least one standard deviation below peers on 60% of the administered oral language subtests as well as a significant history of language-learning intervention. The last criterion is based on our past research. Historical information on past language interventions has been useful in identifying adults who had language impairments in childhood, but did not meet our defined threshold for language impairment in adulthood, although many continue to present with concomitant language-based reading disorders [31,32]. (2) Performance IQ (PIQ) ≥ 80 on the Wechsler Abbreviated Scale for Intelligence (WASI) [33]. (3) Hearing within normal limits during a traditional audiological screening. (4) No motor impairments or oral structural deviations affecting speech or non-speech movement of the articulators. (5) No history of ASD or frank neurological disorders such as intellectual disability or brain injury, as determined by parental interview. (6) Native English speaker with English as the primary language spoken at home.
Phenotypic Modifications in WAVE2. The WAVE2 samples included 192 individuals from 36 families. We used broader proband criteria based on the DSM-5 ASD, which included but was not limited to AD. We also sought to ascertain at least one language-impaired proband in the same family using the same criteria as in WAVE1 without the PIQ ≥ 80 requirement. Additionally, we accepted multiplex ASD families into the study as long as at least one of the ASD probands had a concomitant oral language impairment (and if there were no LI probands elsewhere in the family, we accepted multiplex ASD families as long as both were language impaired).
Once a family was ascertained and enrolled in the WAVE2 part of the study, we applied phenotypic definitions described below for linkage analysis. The ascertainment criteria for probands outlined above was only for ascertainment purposes. For genetic analysis, a series of traits were derived to illuminate key aspects of how ASD relates to language and broader autism traits within families.
DSM-IV and DSM-5 Criteria for ASD. The previous NJLAGS studies [18,22] included only DSM-IV [28] criteria for both ascertainment and analysis. Since the DSM-5 [1] was published, it has been widely adopted in the current scientific literature; we applied the DSM-5 criteria to all new families in our study (retroactively for families ascertained prior to DSM-5) in order to have a uniform definition of affected with ASD for downstream analysis. This has the effect of changing classifications for some participants and thereby potentially including or excluding families compared with our previous research. Individuals from WAVE1 who were not considered Autism probands because they did not meet the strict cutoffs were now reclassified as ASD and included in the current analyses with their new diagnoses (n = 3). While we expected that such phenotypic change would induce differences from our previous results, results from this paper will more easily align with current and future autism genetics research in the literature.
Language Phenotypes. Two categorical phenotypes covered the range of phenotypic variables of interest from our previous studies [22,23,34]: (1) language impairment and (2) reading impairment.
Language impairment or “LI” referred to the LI requirements described previously. We defined our language impairment trait LI* that included all persons with ASD as affected, along with any person without ASD that met the LI criteria. This is an etiological equivalence phenotype model described elsewhere [18], for the purpose of finding loci relevant to both LI and ASD.
A determination of reading impairment (RI) required scores at least 1 standard deviation (SD) below the population mean on 60% of all reading tests and subtests. As with LI*, we defined a parallel RI* that included all persons with ASD as affected, along with any person without ASD that met the RI criteria. While RI is overtly defined by performance on written language tests, we include it since in our previous studies of multiplex language impairment families (without ASD) we observed many instances of semi-compensated adults with a historical childhood diagnosis of language problems and currently demonstrating weak oral language skills who did not qualify for language impairment but did meet reading impairment criteria. Use of RI has improved gene mapping in language-impaired families that have such semi-compensated adults [22,34]. Language and ASD phenotypes by adult/child and sex are presented in Table 1. Averages by phenotypic group are presented in Table S2.
Table 1.
Number of adult and child family members by affected status.
| Unaffected | ASD | LI Only | RI Only | DLD | |
|---|---|---|---|---|---|
| All Adults | 265 | 18 | 8 | 13 | 13 |
| Male | 121 | 15 | 4 | 6 | 9 |
| Female | 144 | 3 | 4 | 7 | 4 |
| All Children | 101 | 133 | 22 | 4 | 4 |
| Male | 58 | 110 | 13 | 3 | 2 |
| Female | 43 | 23 | 9 | 1 | 2 |
Non-language Phenotypes. The test battery also included representative metrics from the broader autism phenotype. For the social domain, we administered the SRS-2, a commonly used instrument to assess social functioning in the ASD spectrum, that can detect variation in the general population [35,36,37]. The SRS was administered and analyzed on all participants. The SRS Total T-score was of main interest due to its wide deployment in the field of ASD research as a general metric of social functioning. The most current version (SRS-2) contains five socially motivated subscales, one of them being Restricted Interests and Repetitive Behaviors (RIRB). During WAVE1, the SRS raw scores were analyzed as quantitative traits and as categorical traits where we chose the mild-moderate cut-off for affected status [18]. We did not analyze the scores from the RIRB subscale separately since they were not available for adults during our analysis period. Since we now were able to use both quantitative and categorical scores from the SRS-2 for an overall Social functioning score (Total T-score), as well as information from the RIRB subscale, we elected to define our RIRB phenotype using the SRS-2 in place of the Yale-Brown Obsessive Compulsive Scale (Y-BOCS, CY-BOS, and C-YBOCS ASD) [38,39,40], which was used in our earlier work.
2.2. Genotyping
Genome-wide data were obtained through single-nucleotide polymorphism (SNP) array genotyping conducted in rolling waves as families were ascertained. The WAVE1 sample SNP data were generated using the Affymetrix Axiom 1.0 array (Affymetrix, Santa Clara, CA, USA), which included 567,893 SNPs and lower frequency variants. WAVE2 sample SNP data were generated with the Illumina Infinium PsychArray-24 v1 array (Illumina, San Diego, CA, USA), which includes 593,260 SNPs and lower frequency variants. In this study, we focused on SNP data (population minor allele frequency > 1%). Quality control on these SNP genotypes was conducted by array batch and by array type, as described previously [34], including with regard to individual/SNP genotype completion, relationship checking, Mendelian errors, and ancestry. Only samples that clustered with the CEU samples from HapMap reference data, as determined by EIGENSTRAT using the recommended parameters in the documentation [41], were included in the linkage analysis. For linkage analysis, a subset of 10,899 SNPs in common across both array types was chosen that minimized marker-to-marker linkage disequilibrium while retaining a high frequency of minor alleles (>30%) so as to provide suitable genomic coverage of recombination events in the pedigrees. This map was augmented with an additional 5325 SNPs that were not in common across the datasets, mostly near the ends of the chromosomes, to increase information content as assessed by MERLIN 1.1.2. Within each family, the same array was used for genotyping so that adding SNPs to increase information content did not induce a pattern of missing data within families.
2.3. Statistical Analysis
Linkage analyses were conducted with KELVIN 2.3.3 (kelvin.mathmed.org (accessed on 1 January 2026)). KELVIN implements the posterior probability of linkage (PPL) metric to estimate the probability that a genetic location is linked with a tested trait [42]. Genetic map locations came from version 3 of the Rutgers Combined Physical-Linkage Map project [43,44]. All centiMorgan (cM) values were reported as sex-averaged Kosambi map units.
Primary linkage analysis of the two language categorical traits characterizes the genetic relationship between ASD and language (oral and written). Secondary analysis of the broader autism phenotype was also conducted on both quantitative and categorical traits derived from the same scales (the latter based on a threshold). These secondary analyses include information across the range of values seen in the general population as well as in ASD on a single scale.
Stratification to assess heterogeneity. Our chosen Bayesian analysis method allowed us to address the question of postulated genetic heterogeneity by analyzing the data stratified on phenotypic criteria. The two waves and two tiers stratified the families for analysis, while the phenotypic definitions outlined above were used to categorically define individuals for analysis. Each phenotype was analyzed twice. Our chosen statistical metric is derived from a Bayesian analysis in which the posterior probability of linkage is sequentially updated across the phenotypic subgroups of the data to output a single PPL at each locus. We present this single output as the primary linkage statistic of interest to evaluate the strength of the evidence at a given location. In practice, this procedure is performed across the four subgroups formed by the two recruitment waves of the study and the two tiers within each phase (2 Waves × 2 Tiers = 4 phenotypic subsets). The main advantage of the Bayesian sequential updating over a single pooled analysis of all the data is that a valid linkage signal in a subset of the data is less attenuated [45], and hence more likely to be detected. Importantly, retaining valid linkage signals within a subset using Bayesian updating is accomplished without causing an inflation in the test statistic since we are not maximizing over many separate analyses [46]. Once linkage is established, a subsequent post hoc inspection of each constituent dataset provides information on the structure of genetic heterogeneity at a linked locus. A secondary linkage analysis was conducted using all families jointly in a single ‘pooled’ analysis of each trait. By comparing the sequentially updated result to the pooled result, we may qualitatively infer the role of heterogeneity in the dataset. Because stratifying on an irrelevant sample characteristic will, on average, yield only slightly lower PPL results compared to a pooled analysis [47], observing appreciably higher sequentially updated PPL relative to pooled PPL suggests that stratifying variables captures true heterogeneity in the data.
The PPL is an estimated probability, in the Bayesian sense, in contrast to the more commonly used and familiar p-value. In order to assess the effect of performing genome scans using multiple correlated phenotypic traits, we performed simulation studies on 2500 unlinked genomes in the NJLAGS family collection (genotypes generated without regard to phenotypes) and analyzed each to create an empirical null distribution for estimating p-values [18]. A PPL of 0.32 or greater is consistent with a genome-wide error rate of p < 0.001, a PPL of 0.26 corresponds to p < 0.01, and a PPL of 0.11 corresponds to p < 0.05.
2.4. Association Analysis
Under the linkage peaks we performed a posterior probability of linkage disequilibrium (PPLD) analysis [48], the association analog of the PPL for family data implemented in KELVIN 2.3.3 [42]. We analyzed genotypes imputed from the array-based data on the Haplotype Reference Consortium (http://www.haplotype-reference-consortium.org/site (accessed on 1 January 2026)) [49] using an r2 threshold of 0.3. Post-imputation, we kept N = 116,675 SNPs in the linkage regions with minor allele frequency (MAF) > 5%, imputation quality score > 0.4, and Hardy–Weinberg equilibrium p-values > 1 × 10−5. Critical regions for linkage analyses were defined, as previously described [18], from the maximum PPL in both directions along the chromosome from the peak until a PPL of <5% is reached. All SNPs within these regions were analyzed for association with the phenotype appropriate for that critical region.
2.5. Candidate Gene Analysis
2.5.1. Single Nucleotide Variant (SNV) and Insertion/Deletion (Indel)
SNV/indel Call Set. The SNV/indel call sets were obtained from a previous study [50]. Briefly, whole-genome sequencing (WGS) was conducted using an Illumina sequencing platform and the variant calling was performed using the GATK v3.5.0 variant calling pipeline following the best practice recommendation [51].
SNV/indel Analysis and Candidate Gene Identification. pVAAST (pedigree Variant Annotation, Analysis, and Search Tool) [52] was used for candidate gene prioritization within the LI* and RI* linkage regions, respectively. Variants were filtered to include those that have an MAF < 5% in the ExAC dataset. For the control dataset, 635 GTEx WGS samples were obtained from the GTEx project and condensed into a single group [50]. pVAAST [52] was run under both autosomal dominant and autosomal recessive models. Under the autosomal dominant model, all family members (including the proband) descended from one pair of ancestors were analyzed. Under the autosomal recessive model, only the parents and siblings of the proband are incorporated in the analysis. As such, multiple sub-pedigrees were created for some families. In these cases, the sub-pedigree with the most affected members sequenced was selected for the final analysis. In the case of multiple sub-pedigrees having the same number of affected members sequenced, the sub-pedigree with the most family members sequenced was selected. pVAAST calculates a pVAAST score using variant linkage pattern, association evidence, MAF, and functional prediction. The significance of each gene was determined by 106 permutations per gene. The pVAAST command used is as follows:
VAAST -m pvaast -o {output_file_path} -pv_control {ctl_file} -region LI.bed {gff_file} {cdr_file} -gw 1e6 -min_suc 50 -p 22 -max_time 300
To select genes for further study, a p-value threshold was used for each of the two phenotypes. The p-value threshold corrected for multiple comparisons is calculated by dividing 0.05 by the total number of genes in the region.
| LI*: 0.05/308 = 1.62 × 10−4 |
| RI*: 0.05/103 = 4.85 × 10−4 |
2.5.2. Structural Variant (SV) and Copy Number Variant (CNV)
Candidate SVs and CNVs. The candidate SVs and CNVs were obtained from a previous study [53]. For the SV call set, SVs were generated based on the WGS data from 272 individuals across 73 families using MetaSV [54]. The mobile element insertions (MEIs) were identified using MELT [55]. The SV set and MEI set was then merged and annotated using AnnotSV [56] for functional prediction. GEMINI [57] is then used to determine the inheritance patterns [53]. Variants that were identified as benign by AnnotSV were filtered out. The final candidate SV set consisted of 1816 variants and 739 genes.
For the CNV call set, samples were genotyped using either Affymetrix Axiom 1.0 Genome-Wide CEU 1 (Affymetrix, Santa Clara, CA, USA) or Illumina Infinium PsychArray (Illumina, CA, USA) (See Section 2.2 above for details). Two CNV calling algorithms were applied to determine the call set: PennCNV [58] and QuantiSNP [59]. CNVs were then filtered to remove CNVs that were too small (<10 kb) or too large (>7.5 Mb). Additionally, CNVs that were called by less than 5 probes were filtered out. In the final CNV set, there were 2528 variants across 524 individuals.
Using this final set of candidate SVs and CNVs, variants within the linkage regions of interest were selected. The genes affected by the SVs and CNVs were used in the subsequent analyses.
2.5.3. Candidate Gene Filtration and Prioritization
Initial Filtration. The autosomal dominant and autosomal recessive gene sets from the SNV/indel analysis were combined with the prioritized gene sets from the CNV and SV analyses to obtain a final set of candidate genes. Genes were then annotated for brain expression using three databases, GTEx [60], BrainSpan [61], and Human Developmental Brain Resource [62]. Each database contains median transcripts per kilobase million (TPM) values for each gene in brain tissues or developmental stages. For each gene, the max of normalized TPM values in each database were calculated, and genes were removed if the max TPM value was less than 5 in all 3 databases. This cutoff was selected in accordance with methodology described in a previous study [63]. Genes were also annotated for predicted constraint metrics using gnomAD [64]. Metrics considered in the analysis include pLI score (probability of being loss of function), O/E LoF score (observed/expected loss of function), LOEUF (loss-of-function observed/expected upper bound fraction), and missense Z-score.
Segregation Analysis. Pedigree data and genotype information from pVAAST and SV/CNV results was then analyzed to determine the variant segregation pattern. Specifically, variants were filtered based on LOD score per family, with an LOD score threshold of >0.3. Variants segregating in at least one family after LOD score filtration were considered for the final candidate gene set. In addition to the set of families containing each variant type for each gene, a set of unique families for each gene was determined. This set of unique families represents all families that contain a variant within a given gene, regardless of variant type. Lastly, genes were annotated based on previous association with a neurodevelopmental disorder (NDD) [50].
Top Candidate Gene Selection. Selection of top candidate genes from the combined annotated set was then performed using the following criteria: (1) a gene contains variants that segregate in multiple families and (2) a gene has a brain expression max TPM value > 5. Genes were ranked higher if they possessed a max TPM value > 5 in multiple databases. Additionally, gene association with other NDDs was considered.
2.5.4. Gene Network Analysis
Gene network analysis was conducted using the functional module detection tool in HumanBase (https://hb.flatironinstitute.org/ (accessed on 1 January 2026)) using recommended parameters.
2.6. Data Availability
The raw sequencing reads, variants, and genotypes for all samples are available in the National Institute of Mental Health (NIMH) Data Archive (NDA) under collections C1932 and C2933 and NRGR under study 39. Access can be requested through the portal at this link (https://nda.nih.gov (accessed on 1 January 2026)).
3. Results
3.1. Refining the Critical Regions on Chromosomes 15 & 16 for Language-Related Traits
Genome-wide multipoint linkage analysis results are visualized in Figure 1 and specific aspects of peaks are detailed in Table 2. Figure 1 shows the primary analysis results that we use to declare linkage to a locus, which is based on Bayesian sequential updating across phenotypic subsets. The LI* phenotype on chromosome 15 (PPL = 57%) has a similar magnitude as the previous study (i.e., no change from Bartlett et al. 2014 [18]) and remains genome-wide significant with a slightly smaller critical region (0.4 Mb reduction). RI* on chromosome 16 (PPL = 33%), likewise, showed the same trend with a similar magnitude (PPL reduced 3%) as the previous study, though with a smaller critical region (1.6 Mb reduction).
Figure 1.
Genome-wide linkage analysis of language-related traits.
Table 2.
Linkage Analysis Results of Autism Spectrum Disorder and Specific Language Impairment.
| Maximizing Model | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Estimated Genotypic Effect for Locus c | |||||||||||
| Phenotype a | Chr | cM | PPL | Band Range | Width (Mb) | Fully Maximized Lod Score b | -/- | -/+ | +/+ | Disease Gene Frequency | Heterogeneity Parameter in Admixture Likelihood (α) d |
| LI* | 15 | 84 (76–113) | 0.57 | 15q23-26 | 23.8 | 4.01 | 0.0 | 0.7 | 0.9 | 0.001 | 0.9 |
| RI* | 16 | 45 (39–51) | 0.33 | 16p12.1-3 | 7.3 | 3.93 | 0.0 | 0.0 | 0.8 | 0.8 | 1.0 |
| SRS-TS | 2 | 70 (69–82) | 0.41 | 2p16-21 | 14.6 | 4.21 (W1) | 1.0 | 3.0 | 3.0 | 0.01 | 1.0 |
| SRS-TS | 3 | 23 | 0.34 | 3p25.3 | 3.9 | 4.45 (W1) | −3.0 | −3.0 | −1.0 | 0.01 | 0.8 |
| SRS-TS | 15 | 129 (114–133) | 0.93 | 15q23-26 | 23.9 | 4.38 (W1) | −1.0 | 2.0 | 3.0 | 0.01 | 0.7 |
| SRS-TS | 19 | 17 (1–25) | 0.27 | 19p13.2-3 | 8.1 | 3.48 (W2) | −1.0 | −1.0 | 1.0 | 0.01 | 1.0 |
| SRS-TS | 20 | 100 (97–103) | 0.36 | 19q32-33 | 1.3 | 2.46 (W1) | 2.0 | 2.0 | 3.0 | 0.99 | 0.8 |
| SRS-TS-DT | 14 | 115 (106–122) | 0.55 | 14q32.2-3 | 6.7 | 3.27 (W1) | 0.0 | 0.4 | 0.99 | 0.01 | 0.95 |
| SRS-TS-DT | 15 | 129 (124–133) | 0.48 | 15q26 | 2.7 | 2.33 (W1) | 0.0 | 0.4 | 0.99 | 0.1 | 1.0 |
| SRS-RIRB | 3 | 15 (10–28) | 0.71 | 3p25-26 | 7.3 | 4.23 (W12) | −3.0 | −3.0 | −2.0 | 0.01 | 1.0 |
| SRS-RIRB | 19 | 24 (1–25) | 0.42 | 19p13.2-3 | 7.8 | 3.58 (W2) | −3.0 | 3.0 | 3.0 | 0.99 | 1.0 |
a. LI*: oral language impairment and/or ASD; RI*: reading impairment and/or ASD; SRS-TS: SRS T-score; SRS-TS-DT: SRS T-score dichotomous trait; and SRS-RIBR: SRS Restricted Interests and Repetitive Behaviors. b. Sometimes referred to as a Mod score. c. For categorical analysis, these quantities are penetrance, and for quantitative traits, they are genotypic means on a z-score scale. d. Estimate of the proportion of families linked to a given locus.
No new significant peaks were noted for LI* or RI* and no new suggestive peaks were observed. The nominal (p < 0.05) RI* peaks on chromosomes 1 and 5 are notable only since both were also observed in the WAVE1 study, and in both cases those peaks have a larger magnitude when analyzed together with WAVE2. Chromosome 1 went from 9% to 14%, and chromosome 5 went from 16% to 19%.
Analysis of Heterogeneity. Compared with the pooled results in Table 3 (in the column labeled “Pooled Tiers and Waves”), which shows PPL values for LI* and RI* in a pooled analysis of both WAVE1 and WAVE2 in a single analysis, both significant peaks on chromosomes 15 and 16, respectively, were attenuated. LI* remains significant with a PPL of 44%, though RI* is reduced to nominally linked status with a PPL of 23% (i.e., less than suggestive linkage). Under heterogeneity, sequentially updated linkage signals would be expected to be larger than a pooled analysis. Our observed decrease when adding the WAVE2 data are consistent with heterogeneity across data subsets. To assess differences between subsets we compared the PPL values for WAVE1 versus WAVE2. As suspected, linkage results only come from WAVE1 families. WAVE2 contributes nothing to LI* with a PPL of 2%, which is the prior probability of linkage, indicating neither evidence for nor against linkage. WAVE2 provides evidence against linkage of RI* to chromosome 16 with a PPL of 1.9%, which is less than the prior probability of linkage (2%) and therefore indicates evidence against linkage at that location.
Table 3.
Breakdown of Linkage Results by Wave and Tier.
| Phenotype | Chr | Sequentially Updated Tiers and Waves a | Pooled Tiers and Waves | WAVE1 b | WAVE2 b | Pooled Tiers-Sequentially Updated Waves |
|---|---|---|---|---|---|---|
| LI* | 15 | 0.57 | 0.44 | 0.56 | 0.02 | 0.61 |
| RI* | 16 | 0.33 | 0.23 | 0.34 | 0.019 c | 0.32 |
| SRS-TS | 2 | 0.41 | 0.13 | 0.21 | 0.06 | 0.46 |
| SRS-TS | 3 | 0.34 | 0.26 | 0.51 | 0.012 c | 0.50 |
| SRS-TS | 15 | 0.93 | 0.80 | 0.83 | 0.05 | 0.80 |
| SRS-TS | 19 | 0.27 | 0.06 | 0.0098 c | 0.51 | 0.17 |
| SRS-TS | 20 | 0.36 | 0.11 | 0.012 c | 0.52 | 0.11 |
| SRS-TS-DT | 14 | 0.55 | 0.21 | 0.54 | 0.022 c | 0.38 |
| SRS-TS-DT | 15 | 0.48 | 0.10 | 0.33 | 0.04 | 0.10 |
| SRS-RIBR | 3 | 0.71 | 0.25 | 0.78 | 0.015 c | 0.27 |
| SRS-RIBR | 19 | 0.42 | 0.11 | 0.04 | 0.41 | 0.32 |
a. This is the primary analysis model presented in this publication. b. Results from updating over tier within a single wave (to two significant digits). c. PPLs below 2% are evidence against linkage (and by extension, neutral evidence = 2% while positive evidence > 2%).
We also performed an analysis with tiers pooled within waves, with sequential updating across those pooled waves (in Table 3, the column labeled “Pooled Tiers-Sequentially Updated Waves”). Under this secondary analysis, decreases in the PPL from our primary analysis are indicative of heterogeneity between tiers. Table 3 shows only minor changes in the linkage signal of the language traits. Whereby LI* PPL goes from 57% to 61%, as noted above, the linkage signal comes entirely from WAVE1 and the original WAVE1 study indicated essentially homogeneity across the two tiers in terms of LI*. This pattern is mirrored in RI*, where the PPL was reduced from 33% to 32%, essentially the same value. Again, the linkage evidence comes entirely from WAVE1 and the previous study of WAVE1 indicated homogeneity across the two tiers in terms of RI* [18].
3.2. Analysis of Non-Language Traits
Social domain. The SRS total score provides a quantitative overview of social reciprocity across the full continuum of behaviors that we used as traits. Genome-wide analysis results are summarized in Figure 2. Analogous to the results from the language-related traits, our primary analysis consisted of the sequentially updated PPL, updated over the data subsets for a single linkage metric. The largest peak was observed on 15q (PPL = 93%). This peak does not overlap with the LI* locus and is unlikely to be related (Table 2). Chromosomes 2 (PPL = 41%), 3 (PPL = 34%), and 20 (PPL = 36%) also had significant peaks. All four of these peaks were present in the original WAVE1 analysis, but only the chromosome 15 peak was significant. We also note that the SRS-2 T scores used in the present analysis changed WAVE1 results relative to the raw scores used in the original publication. A suggestive peak on chromosome 19 (PPL = 27%) did not have even a nominal peak in the WAVE1 data.
Figure 2.
Genome-wide linkage analysis of Social Responsiveness Scale-related traits.
Dichotomizing a quantitative trait transforms approximately continuous scores into a different phenotypic model that may provide different linkage information. In keeping with this assumption, analyzing the SRS as a dichotomous trait (SRS-DT) provided a new peak on chromosome 14 (PPL = 55%), as seen in Figure 2. A chromosome 15 peak was present at the same location as the quantitative trait but much diminished (PPL = 48%).
Heterogeneity in the Social Domain. When compared with the pooled PPL runs (in Table 3, the column labeled “Pooled Tiers and Waves”), all SRS total score quantitative peaks were greatly diminished, though 15q stayed high at 80% (and significant though somewhat lessened) and chromosome 3 dropped from 34% to 26% (suggestive). These results suggest heterogeneity for all peaks, perhaps markedly so for the non-15q peaks. Inspection of the PPL results by subset bear this pattern out on chromosome 3 where WAVE2 provides evidence against linkage, and on chromosomes 19 and 20, where WAVE1 provides evidence against linkage. For the peaks on chromosomes 2 and 15 both WAVE1 and WAVE2 provide evidence for linkage (Table 3).
For the dichotomous trait, the pooled results exhibited greatly diminished PPL results at the two peaks, and no new peaks were found. This pattern is consistent with heterogeneity. In this case, both WAVE1 and WAVE2 data are consistent with linkage at both loci, however, these linkage signals are clearly driven only by WAVE1 data (Table 3).
Restricted interests/repetitive behaviors. The SRS also includes a quantitative subscale for restricted interests and repetitive behaviors (SRS-RIRB), the second diagnostic pillar that defines autism. The sequentially updated PPL summarized in Figure 2 shows two significant peaks, on chromosomes 3 (PPL = 71%) and 19 (PPL = 42%). Both peaks overlap with the SRS total score analyses, but are higher, suggesting that restricted interests and repetitive behaviors may be driving the SRS total score linkage results at those loci. When looking at the pooled results (Table 3, the column labeled “Pooled Tiers and Waves”), those same peaks are greatly attenuated from apparent heterogeneity. Indeed, when looking at the PPL’s for WAVE1 and 2 separately, WAVE2 shows evidence against linkage to chromosome 3, and is largely equivocal at chromosome 19.
3.3. Candidate Gene Analysis
Initially, association analyses were conducted in each critical linkage region. None of the PPLDs warranted additional investigation, as the highest observed PPLD was 5%, well below thresholds for expending additional resources. Next, we identified candidate genes associated with LI* and RI* within the significant linkage regions. SNV, indel, and SV call sets based on the WGS data, as well as CNV call sets based on the genotyping array data are available for the NJLAGS cohort [50,53]. Using the genetic variants, we performed candidate gene prioritization using several criteria, such as variant impact, variant segregation pattern, gene function, and gene expression pattern in brain regions (see Methods for detail). The variant and gene counts from each step are summarized in Figure 3.
Figure 3.
Workflow for candidate gene analysis in LI* and RI* linkage regions.
SNV/indel analysis. The SNV/indel analysis was conducted for two modes of inheritance: autosomal dominant and autosomal recessive. As described in the Section 2, the autosomal dominant model analyzed all family members (including the proband) descended from one pair of ancestors. The autosomal recessive model analyzed only the parents and siblings of the proband. For LI*, 55 genes were scored under the dominant model, and 8 genes were scored under the recessive model (Table S3a,b). For RI*, 24 genes were scored under the dominant model, and 5 genes were scored under the recessive model (Table S3c,d). To determine the final candidate gene set for each phenotype, a union was performed between the dominant and recessive gene sets, and a p-value cutoff was applied (see Methods for more detail). After application of the p-value filtration, there were 29 genes in the LI* set, and 13 genes in the RI* set.
CNV analysis. The initial sets of CNVs associated with language impairment and reading impairment were obtained from a previous study [53]. Candidate CNVs were then filtered based on their location in the genome. After this filtration, there was one candidate CNV (15_90794757_90950358_<CN3>_1) located within the LI* linkage region and affecting six genes. For RI*, two overlapping CNVs (16_21538186_21731120_<CN3>_1 and 16_21608472_21747738_<CN3>_1) were identified within the RI* linkage region, and both affected the same four genes. Given that the two CNVs were identified in the same individual by different methods, it is likely that the two CNVs are one variant for which different tools reported different break points. All three CNVs were de novo duplications (Table S4).
SV analysis. Similar to the CNV analysis, the initial sets of SVs associated with LI* and RI* were obtained from a previous study [53]. Candidate SVs within the LI* and RI* linkage regions were first identified, and benign variants based on AnnotSV annotations were filtered out. After this filtration, there were 21 candidate SVs identified for LI*, affecting 18 genes. There were 12 candidate SVs identified for RI*, affecting 11 genes (Table S5). The majority of the SVs are intronic, except for one deletion (15_79220669_79233894_DEL_1) and two insertions (15_77092501_77092758_INS_1, 16_20843131_20844439_INS_1) that affect the coding regions of CTSH, SCAPER, and REXO5, respectively (Table S5).
Top Candidate Gene Selection. Analysis of SNVs/indels, CNVs, and SVs within the linkage regions yielded a total of 75 genes associated with LI*, and 36 genes associated with RI*. Genes were then selected as top candidates based on the following criteria: (1) a brain expression median TPM value > 5 in at least 1 database and (2) the gene contains variants that passed segregation filtering in multiple families. After this filtration, there were 34 candidate genes associated with LI* (Table S6a), and 13 candidate genes associated with RI* (Table S6b). To select the top candidate genes, genes were further analyzed based on segregation patterns, previous literature, and metrics such as pLI and LOUEF. Genes with an established association with NDDs were prioritized. Ten genes were identified as the top candidates for LI* (Table 4, Figure 4A), and six genes were identified as the top candidates for RI* (Table 4, Figure 4B). Specific genes of interest associated with each phenotype are further discussed below.
Table 4.
Top candidate genes for LI* and RI*.
| Gene | Gene Name | Variant Type | Variant | Function | Segregating Families | Known NDD |
|---|---|---|---|---|---|---|
| LI* | ||||||
| SCAPER | S-phase cyclin A-associated protein in the ER | SNV | 15-76726530-C-T | non-synonymous | 1 | 0 |
| SV | 15_77092501_77092758_INS_1 | CDS | 2 | |||
| CPEB1 | cytoplasmic polyadenylation element binding protein 1 | SNV | 15-83240107-G-T | non-synonymous | 1 | 0 |
| SNV | 15-83296111-G-T | non-synonymous | 1 | |||
| ADAMTSL3 | ADAMTS-like 3 | SNV | 15-84506964-A-G | non-synonymous | 2 | 0 |
| SNV | 15-84651321-C-T | non-synonymous | 1 | |||
| SV | 15_84643753_84644082_DEL_1 | intronic | 1 | |||
| AKAP13 | A kinase (PRKA) anchor protein 13 | SNV | 15-86198852-C-T | non-synonymous | 1 | 0 |
| SV | 15_86224963_86225242_INS_1 | intronic | 2 | |||
| ZNF774 | zinc finger protein 774 | SNV | 15-90904468-C-T | non-synonymous | 1 | 1 |
| CNV | 15_90794757_90950358_<CN3>_1 | CN3 | 1 | |||
| WHAMM | WAS protein homolog associated with actin golgi membranes and microtubules | SNV | 15-83478990-C-T | non-synonymous | 1 | 0 |
| SNV | 15-83499472-C-T | non-synonymous | 1 | |||
| KIAA1024 | KIAA1024 | SNV | 15-79750056-C-T | non-synonymous | 1 | 0 |
| SNV | 15-79750609-A-C | non-synonymous | 1 | |||
| ACAN | aggrecan | SNV | 15-89389050-C-T | non-synonymous | 1 | 0 |
| SNV | 15-89400204-C-T | non-synonymous | 1 | |||
| SNV | 15-89401134-C-T | non-synonymous | 1 | |||
| SNV | 15-89401362-G-T | non-synonymous | 1 | |||
| CSPG4 | chondroitin sulfate proteoglycan 4 | SNV | 15-75975211-C-T | non-synonymous | 1 | 0 |
| SNV | 15-75980780-A-G | non-synonymous | 1 | |||
| SNV | 15-75982312-C-T | non-synonymous | 2 | |||
| THSD4 | thrombospondin type 1 domain containing 4 | SV | 15_71885481_71885814_DEL_1 | intronic | 7 | 0 |
| SV | 15_71991555_71991778_DEL_1 | intronic | 5 | |||
| RI* | ||||||
| XYLT1 | xylosyltransferase I | SNV | 16-17211545-C-T | non-synonymous | 1 | 0 |
| SNV | 16-17232234-A-G | non-synonymous | 2 | |||
| SNV | 16-17353170-C-G | non-synonymous | 1 | |||
| SV | 16_17414680_17414960_INS_1 | intronic | 1 | |||
| SMG1 | smg-1 homolog phosphatidylinositol 3-kinase-related kinase (C. elegans) | SNV | 16-18907410-A-G | non-synonymous | 1 | 0 |
| SNV | 16-18908268-C-T | non-synonymous | 1 | |||
| SV | 16_18832544_18838377_DEL_1 | intronic | 7 | |||
| ARHGAP17 | Rho GTPase activating protein 17 | SNV | 16-24950845-C-T | non-synonymous | 1 | 0 |
| SNV | 16-24950880-C-T | non-synonymous | 1 | |||
| USP31 | ubiquitin specific peptidase 31 | SNV | 16-23080170-A-G | non-synonymous | 1 | 0 |
| SNV | 16-23160057-C-T | non-synonymous | 1 | |||
| DNAH3 | dynein axonemal heavy chain 3 | SNV | 16-20994160-A-G | non-synonymous | 1 | 1 |
| SNV | 16-21008629-C-T | non-synonymous | 1 | |||
| SNV | 16-21031053-C-T | non-synonymous | 1 | |||
| TNRC6A | trinucleotide repeat containing 6A | SNV | 16-24800831-A-G | non-synonymous | 1 | 0 |
| SNV | 16-24828224-A-G | non-synonymous | 1 |
Figure 4.
Linkage regions and the top candidate genes. (A) Language impairment linkage region (15q23-26). (B) Reading impairment linkage region (16p12). Linkage regions are highlighted with red boxes on the karyotype, and the top candidate genes are shown below.
LI* Genes of Interest. In the final LI* candidate gene set, the highest-ranking gene is Zinc finger protein 774 (ZNF774). On the variant level, one non-synonymous SNV and one CNV are found within ZNF774. The non-synonymous variant (15-90904468-C-T) segregates in one family with two affected members and has a gene LOD score of 0.46. The CNV, 15_90794757_90950358_<CN3>_1, segregates within another family. The CNV is a 156 kb duplication that affects 6 genes, including ZNF774. ZNF774 has a brain expression max TPM value of >5 in two databases, GTEx and HDBR. Specifically, ZNF774 has strong expression in the basal ganglion, diencephalon, and medulla oblongata during multiple developmental stages. In particular, the basal ganglion has been previously implicated in autism, and plays an essential role in motor skill acquisition and development [65]. Alterations to the basal ganglion disrupt the normal flow of feedback to the cortex, thus impacting basic functions such as higher order cognition, gross and fine motor skills, and speech [66]. The medulla oblongata has also been shown to be associated with autism—one study found that the medulla oblongata is significantly smaller in autistic children than in control children [67]. ZNF774 is categorized as a C2H2 zinc finger protein (C2H2-ZNF), which acts as an important target for pathological processes associated with NDDs, and is highly expressed in the developing brain. C2H2-ZNFs play a significant role in the regulation of brain morphogenesis and influence the proliferation and migration of neural stem cells. In particular, ZNF774 has been found to be one of the C2H2-ZNFs implicated in the pathogenesis and pathophysiology of ASDs and autistic features [68].
Another gene of interest within the LI* linkage region is the S-Phase cyclin A-associated protein in the ER (SCAPER). SCAPER has a non-synonymous variant (15-76726530-C-T) segregating in one family with two affected members and has a LOD score of 0.46. SCAPER also has one SV, 15_77092501_77092758_INS_1, which segregates in two families. This SV overlaps the coding region between intron 7 and intron 8 and it is predicted to be a frameshifting mutation. SCAPER has a brain expression max TPM value of > 5 in all three databases, strongly indicating expression in the brain. SCAPER is shown to have high expression in the pre-frontal cortex across various developmental stages. The pre-frontal cortex has been frequently implicated in autism, and is primarily responsible for deficits in higher-order functions such as cognition, language, and emotion [69]. Additionally, SCAPER has been shown to be associated with intellectual developmental disorder and speech disorder [70].
RI* Genes of Interest. In the final RI* candidate gene set, the highest-ranking gene is Xylosyltransferase 1 (XYLT1). XYLT1 has 3 non-synonymous SNVs segregating within 4 families with an overall pVAAST LOD score of 1.59, which implies a strong segregation of variants within families. XYLT1 also has one SV, 16_17414680_17414960_INS_1, which is an intronic variant that segregates in one family. XYLT1 has a brain expression max TPM value of >5 in all three databases. Specifically, XYLT1 is highly expressed in the amygdaloid complex, which has been previously implicated in autism and other NDDs. Studies have found that the amygdala is an important component of the neural network responsible for social cognition and brain function. Impairment in the amygdala has been shown to cause abnormal social behavior, as well as various NDDs [71]. Additionally, autistic children have been shown to have significantly slower right amygdala growth [72]. The high expression of XYLT1 in the amygdaloid complex suggests that variants within the gene may contribute to these NDDs. Variants in XYLT1 have also been shown to contribute to disorders associated with developmental delay, such as Baratela–Scott syndrome (BSS) [73].
Another gene of interest in the RI* linkage region is the nonsense-mediated mRNA decay associated PI3K-related kinase (SMG1). SMG1 has an overall pVAAST LOD score of 0.61. One missense variant (16-18907410-A-G) segregates within one family with one affected member. The other missense variant (16-18908268-C-T) segregates in another family with 2 affected members. SMG1 has one candidate SV, 16_18832544_18838377_DEL_1, an intronic variant that segregates within 7 families, implying strong segregation in affected members. SMG1 has a brain expression median TPM value > 5 in all three databases. In particular, SMG1 has relatively high expression in the amygdaloid complex and pre-frontal cortex. As aforementioned, alterations in these brain regions have a strong association with autism and other NDDs, therefore implying a potential role of SMG1 in the development of these phenotypes. Additionally, SMG1 has been previously identified as an ASD candidate gene, with an essential role in the Nonsense-Mediated mRNA Decay (NMD) pathway [74].
Network analysis of candidate genes. To further explore the functional implications of the identified language and reading impairment candidate genes, we conducted network analyses on a combined gene set of 46 LI and RI candidate genes to identify potential functional groups. Figure 5 represents enriched modules within the network. The M1 module is significantly enriched for the glycosaminoglycan (GAG, q = 0.0002) and carbohydrate metabolism (q = 0.001) pathways, which are crucial to extracellular matrix (ECM) remodeling in neural development. GAGs, such as heparan sulfate and chondroitin sulfate, are polysaccharides that are attached to core proteins that regular signaling gradients and cell adhesion in the developing brain [75]. Abnormal GAG metabolism has been linked to altered neuronal migration and neurodegenerative diseases such as Alzheimer’s disease [76]. Two of our top candidate genes are within the M1 module: ACAN and DNAJA4.
Figure 5.
Network analysis of 46 candidate genes between language impairment and reading impairment, across all functional groups. M1, M5, and M7 modules contain enriched pathways. Top candidate genes within enriched modules are highlighted with red outlines.
The M7 module contains genes within the 15q25 copy number variation region. Deletions occurring in this region have been associated with an increased risk of developing cognitive deficits [77]. Additionally, 15q25 microdeletions have been linked to several neurodevelopmental clinical manifestations, such as neurodevelopmental delay and autism [78,79,80,81]. Three of our top candidate genes are within the M7 module: DNAH3, ADAMTSL3, and CPEB1.
The M5 module represents genes involved in lysosomal function. Lysosomes are essential in degrading proteins, lipids, carbohydrates, and complex macromolecules. Due to its role in the maintenance of cell homeostasis and viability, dysregulation in lysosomal function can impact several fundamental processes such as membrane repair, energy metabolism, and inflammatory pathways [82,83]. Such dysregulation has been linked to both neurodevelopmental and neurodegenerative disorders such as Alzheimer’s disease and Parkinson’s disease [84,85,86,87,88].
4. Discussion
This research aimed to delineate candidate genes in the previously identified linkage peaks for language and reading on chromosomes 15 and 16. It refined the implicated regions by adding more families to the analysis, reducing the critical regions for analyzing variants in those regions. We further sought to assess the generalization of these findings across two family ascertainment criteria. In total, our linkage results illustrate the value of treating ASD-related phenotypes as quantitative or dimensional traits rather than relying solely on diagnostic categories. Our findings are consistent with data-driven work showing that social communication and restricted/repetitive behavior constructs captured by instruments such as the SRS-2 are multidimensional with partially distinct genetic correlates [16,22,89].
To identify candidate risk genes within the linkage peaks, we examined different types of genomic variants, including SNVs, indels, SVs, and CNVs. Through analysis of the variant sets, we determined a set of 34 genes associated with LI*, and 13 genes associated with RI*. Within each phenotype, there were genes with variants that segregate within multiple families, as well as previously established associations with other NDDs. Additionally, several genes are highly expressed in brain tissues associated with neurodevelopmental processes. We applied a set of criteria to further prioritize genes based on factors such as brain expression and segregation patterns, as well as an extensive literature search. Cross-validating our final set of top candidate genes with brain tissue expression data further strengthens the implication that these genes are involved in NDDs. Our final results include 10 top candidate genes for LI*, and 6 top candidate genes for RI* within the linkage regions (Table 4). In addition to genes previously implicated in NDDs (e.g., ZNF774 and DNAH3) [66,67,68,90], we also identified genes that were not previously associated with NDDs but showed strong evidence of being involved in ASD and language impairment in our cohort. These discoveries provide new insights into the genetic etiology of language-related traits.
The second wave of NJLAGS recruitment (WAVE2) was designed to investigate how genetic heterogeneity in ASD and LI is reflected in phenotypic heterogeneity, driven here by two sets of ascertainment criteria: the stricter WAVE1 criteria and the more relaxed WAVE2 criteria. Interpretation of the results is based on the dichotomy that either the magnitude of the genetic findings is increased or decreased by the inclusion of WAVE2. In the results, both trends occurred, indicating that this design can be informative regarding genetic heterogeneity. We note that our design concerns phenotypic components of ASD rather than ASD itself. As our ascertainment scheme did not require multiplex ASD families, the present findings may not capture ASD susceptibility as a unitary construct, in contrast to affected sib-pair studies for linkage or trio- and case–control designs for association analyses. Instead, based on our requirement of both ASD and LI probands, our findings relate to (1) the relationship between ASD and LI and (2) dimensional aspects of ASD manifested by quantitative instruments such as the SRS. To this end, the NJLAGS study offers unique information regarding ASD.
Based on the main research question, we found that the WAVE2 sample provides no evidence for linkage to LI* on chromosome 15 or RI* on chromosome 16. Both of these loci relate to ASD in addition to oral and written language, respectively. However, the relationship only holds for strict ASD and strict LI ascertainment (i.e., WAVE1). WAVE2 included cases of ASD and LI that are of clinical interest but are further from the classic definitions of their respective diagnostic categories. Specifically, WAVE1 ASD probands had a primary diagnosis of AD, with the majority being non-verbal or minimally verbal, highlighting both the potential overlap between language impairment and ASD and the greater severity of social and repetitive behaviors in WAVE1. The WAVE2 LI diagnosis (removal of the IQ criteria) might not be expected to affect the linkage results appreciably. Indeed, while clinicians may see more patients typical in WAVE2 ascertainment, our results indicate that strict clinical definitions may indicate a more genetically homogeneous group to consider—at least in regard to how language relates to ASD in those patients.
For the SRS, we expected homogeneity since quantitative traits are not typically modeled with heterogeneity parameters. However, in our samples, the SRS shows heterogeneity across WAVE1 and WAVE2. WAVE2 strengthens the two previous WAVE1 SRS peaks on 14q (for the categorical trait) and 15q (for the quantitative trait) [18]. The peak on 14q went from a PPL of 37% to a PPL of 55% and decreased the critical region by 1 Mb. The peak at 15q increased from a PPL of 52% to a PPL of 93% in the present study. In joint analysis of WAVE1 and WAVE2, 15q also showed a PPL of 48% for the categorical SRS trait, which had contributions from both waves. Taken together, these results show a consistency in the genetics of strictly and broadly defined ASD. However, when looking at chromosomes 3 (WAVE 1 only), as well as 19 and 20 (both WAVE2 only), the linkage findings were clearly discordant across WAVE1 and WAVE2. This level of heterogeneity was not expected for a quantitative trait. It may be caused by differences in ascertainment that effectively create different underlying populations for quantitative trait analysis.
The difference in how ASD relates to LI versus how ASD relates to general social constructs may have implications for meta-analysis methods. Based on our data, we might posit that meta-analysis of the SRS could be highly productive at some but not all loci. However, combining ASD samples for a joint analysis of language traits would require a model that incorporates differences in ASD ascertainment and LI ascertainment. As our study has strict ASD confounded with strict LI, we cannot know if the strictness of ASD and/or the strictness of language impairment criteria are responsible for this result. However, one or both would need to be included in meta-analysis models to retain true results, rather than wash out true effects due to the genetic heterogeneity across the studies included.
We chose to examine and report the SRS Restricted Interests and Repetitive Behavior Treatment subscale in this follow-up study—a departure from our use of the Yale-Brown Obsessive Compulsive Scale (YBOCS) in the original study—for two reasons. (1) As mentioned previously, when the study first began in 2003, and in the initial analysis period in 2010, adult T-scores were not available for the SRS even though specific behaviors were available as a sub-category (“autistic traits”). At that time, the YBOCS offered a scale of behaviors that best represented a range of OCD-like behaviors comparable to restricted interests and repetitive behaviors (RIRB) characterizing ASD. (2) Once the SRS-2 was published, it offered a standardized (RRB) sub-score and contained questions specifically addressing behaviors related to ASD. While the YBOCS focuses on OCD behaviors specific to that disorder, administration requires a degree of interpretation both by the clinician and the respondent. The decision to use the SRS-RIRB was based on issues with YBOCS coding and making it a quantitative trait for genetic analysis. It was not designed for that purpose, while the SRS was. However, one aspect in favor of the YBOCS is that it fundamentally measures repetitive behaviors and obsessive compulsive disorder, while the SRS is more social in its design space. Yet, the two may be much more closely aligned, which is sometimes not appreciated. A recent paper by Gulisano and colleagues [91], examined the overlap of OCD-like behaviors in children initially diagnosed with Tourette Syndrome using the YBOCS while also administering the SRS-2. They suggested that some of the OCD-like behaviors associated with Tourette Syndrome might be confused with stereotypies associated with ASD.
Every linkage peak in our study contained at least one SFARI [92,93] autism candidate gene (Table 5), even though the linkage peaks only implicate 3.8% of the genome. If our observed linkage signals were random noise, we would expect very low concordance between the two approaches. The presence of numerous SFARI genes in our findings suggests a convergence of different investigative approaches. Although our ascertainment scheme differs from typical ASD studies, the genetic findings show a striking degree of overlap. This finding both validates our approach and adds further evidence for the genes listed in Table 4 as having a role in ASD.
Table 5.
Overlap of current findings with SFARI ASD gene list.
| Trait | Chromosome | SFARI Genes |
|---|---|---|
| SRS-Total Score | 2 | FBXO11, NRXN1 |
| SRS-Repetitive | 3 | CX3CR1, CTNNB1, SETD2, DHX30, PLXNB1, QRICH1, GPX1, AMT, ACY1, CACNA1D, CACNA2D3, ASB14, FHIT, FEZF2, CADPS, PRICKLE2, FOXP1, CNTN3, ROBO2, CADM2, TBC1D23 |
| SRS-Total Score | 3 | FOXP1, CNTN3, ROBO2, CADM2 |
| SRS-Total Score Categorical | 14 | CCNK, YY1, DYNC1H1, CDC42BPB |
| LI* | 15 | CELF6, BBS4, NEO1, CD276, SIN3A, CIB2, ARNT2, NTRK3, ZNF774, ST8SIA2, CHD2 |
| SRS-Total Score Categorical | 15 | ALDH1A3 |
| SRS-Total Score | 15 | ALDH1A3 |
| RI* | 16 | SYT17, DNAH3, PRKCB |
| SRS-Total Score | 19 | KDM4B |
| SRS-Total Score | 20 | GNAS |
Our candidate gene study has some limitations. One is that the candidate risk genes’ functional impact is determined primarily through annotations. While gene annotations provide a detailed description of the functional impact, the true impact on the phenotype of interest usually needs additional confirmation. Therefore, one potential expansion of this study is to focus on functionally validating the candidate genes through experiments, such as testing the genes/variants in human iPSCs, mice, or biobanked brain tissue [94]. Another limitation is the relatively small sample size of the NJLAGS cohort due to the detailed phenotypic battery, which necessitates follow-up studies in larger studies based on the implicated phenotypes in this study. Large SVs and CNVs are primarily impacted by this small sample size, as some variants may falsely be given higher prioritization. Because of the highly heterogeneous nature of ASD, a future study with a larger sample size can further increase the statistical power for candidate gene prioritization. Furthermore, the phenotypic data include a large amount of psychometric (quantitative) data that could be analyzed as quantitative traits using a similar Bayesian paradigm [95], though the small sample would limit the analysis. Additionally, a whole genome analysis beyond the linkage regions can be conducted to examine variants and genes in other parts of the genome, and gather a more comprehensive understanding of the genetic architecture of ASD and language impairments.
In conclusion, our analysis of the NJLAGS cohort provides further elucidation of the genetic architecture and interaction of ASD and language-related phenotypes. In addition, we reported a number of high-confidence candidate genes within the linkage regions on chromosomes 15 and 16. In the future, these genes will require functional validation to define their role in neurodevelopment.
Acknowledgments
We sincerely thank the families who generously gave their time and effort to participate in this study. We acknowledge the computing support from High Performance Computing Center in the Abigail Research Institute at Nationwide Children’s Hospital.
Abbreviations
The following abbreviations are used in this manuscript:
| ASD | Autism Spectrum Disorder |
| DSM-5 | Diagnostic and Statistical Manual of Mental Disorders |
| NDD | Neurodevelopmental Disorder |
| NJLAGS | New Jersey Language Autism Genetics Study |
| LI | Language Impairment |
| RI | Reading Impairment |
| AD | DSM-IV Autistic Disorder |
| ADI-R | Autism Diagnostics Interview-Revised |
| ADOS | Autism Diagnostic Observation Scales |
| SLI | Specific Language Impairment |
| DLD | Developmental Language Disorder |
| SRS-2 | Social Responsiveness Scale-2 |
| CELF-4 | Clinical Evaluation of Language Fundamentals |
| PIQ | Performance IQ |
| WASI | Wechsler Abbreviated Scale for Intelligence |
| RIRB | Restricted Interest and Repetitive Behaviors |
| PPLD | Posterior Probability of Linkage Disequilibrium |
| MAF | Minor Allele Frequency |
| SNV | Single Nucleotide Variant |
| Indel | Insertion/Deletion |
| WGS | Whole-genome Sequencing |
| pVAAST | pedigree Variant Annotation, Analysis, and Search Tool |
| SV | Structural Variant |
| CNV | Copy Number Variant |
| MEI | Mobile Element Insertion |
| NIMH | National Institute of Mental Health |
| NDA | National Institute of Mental Health Data Archive |
| SRS-DT | SRS as a dichotomous trait |
| SRS-RIRB | SRS for restricted interests and repetitive behaviors |
| YBOCS | Yale-Brown Obsessive Compulsive Scale |
Supplementary Materials
The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/genes17020125/s1, Table S1: NJLAGS Testing Battery Including Subtest Descriptions and Age Ranges of Tests; Table S2: Mean Score and Sample Size for Each Standardized Assessment Administered; Table S3: Candidate Genes from pVAAST Analysis; Table S4: CNV Candidate Variants; Table S5: SV Candidate Variants; and Table S6: Language Impairment and Reading Impairment Candidate Genes.
Author Contributions
Conceptualization, L.M.B., J.X. and C.W.B.; formal analysis, M.K.L., A.R., J.X. and C.W.B.; data curation, M.K.L., J.F.F., C.G., S.W., A.R., S.B., L.M.B., J.X. and C.W.B.; visualization, M.K.L., J.X. and C.W.B.; writing—original draft preparation, M.K.L., J.X. and C.W.B.; writing—review and editing, J.F.F., L.M.B., J.X. and C.W.B.; supervision, L.M.B., J.X. and C.W.B.; project administration, C.G., S.W., J.F.F., L.M.B., C.W.B. and J.X.; funding acquisition, L.M.B. and J.X. All authors have read and agreed to the published version of the manuscript.
Institutional Review Board Statement
The study is approved by the Institutional Review Board at Rutgers, The State University of New Jersey (IRB number: 13-112Mc, approval date: 30 January 2021).
Informed Consent Statement
Written informed consent or assent were obtained from all subjects involved in the study.
Data Availability Statement
The raw sequencing reads, variants, and genotypes for all samples are available in the National Institute of Mental Health (NIMH) Data Archive (NDA) under collections C1932 and C2933 and NRGR under study 39. Access can be requested through the portal at this link (https://nda.nih.gov (accessed on 1 January 2026)). The code and workflow for the project are publicly available on GitHub: https://github.com/JXing-Lab/linkage-pipeline (accessed on 1 January 2026).
Conflicts of Interest
The authors declare no conflicts of interest.
Funding Statement
This study was supported by NIMH grants R01 MH-070366 (LMB) and RC1 MH-088288 (LMB); NIMH Center for Collaborative Genomic Studies on Mental Disorders (U24 MH-068457, LMB); and New Jersey Governor’s Council for Medical Research and Treatment of Autism grants CAUT12APS006 (LMB), CAU15APL026 (LMB), and CAUT19APL028 (JX and LMB).
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
References
- 1.American Psychiatric Association . Diagnostic and Statistical Manual of Mental Disorders. 5th ed. American Psychiatric Association; Washington, DC, USA: 2013. DSM-5. [Google Scholar]
- 2.Woodbury-Smith M., Scherer S.W. Progress in the genetics of autism spectrum disorder. Dev. Med. Child. Neurol. 2018;60:445–451. doi: 10.1111/dmcn.13717. [DOI] [PubMed] [Google Scholar]
- 3.Arnett A.B., Trinh S., Bernier R.A. The state of research on the genetics of autism spectrum disorder: Methodological, clinical and conceptual progress. Curr. Opin. Psychol. 2019;27:1–5. doi: 10.1016/j.copsyc.2018.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Grove J., Ripke S., Als T.D., Mattheisen M., Walters R.K., Won H., Pallesen J., Agerbo E., Andreassen O.A., Anney R. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet. 2019;51:431–444. doi: 10.1038/s41588-019-0344-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Satterstrom F.K., Kosmicki J.A., Wang J., Breen M.S., De Rubeis S., An J.-Y., Peng M., Collins R., Grove J., Klei L. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell. 2020;180:568–584.e23. doi: 10.1016/j.cell.2019.12.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Mattheisen M., Grove J., Als T.D., Martin J., Voloudakis G., Meier S., Demontis D., Bendl J., Walters R., Carey C.E. Identification of shared and differentiating genetic architecture for autism spectrum disorder, attention-deficit hyperactivity disorder and case subgroups. Nat. Genet. 2022;54:1470–1478. doi: 10.1038/s41588-022-01171-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Shimizu H., Morimoto Y., Yamamoto N., Kumazaki H., Ozawa H., Imamura A. Schizophrenia-Recent Advances and Patient-Centered Treatment Perspectives. IntechOpen; London, UK: 2022. Clinical and Biological Overlap between Schizophrenia, Autism Spectrum Disorder, and Trauma and Stress-Related Disorders: The Three-Tree Model of SCZ-ASD-TSRD. [Google Scholar]
- 8.Bradford Y., Haines J., Hutcheson H., Gardiner M., Braun T., Sheffield V., Cassavant T., Huang W., Wang K., Vieland V., et al. Incorporating language phenotypes strengthens evidence of linkage to autism. Am. J. Med. Genet. 2001;105:539–547. doi: 10.1002/ajmg.1497. [DOI] [PubMed] [Google Scholar]
- 9.Eicher J.D., Gruen J.R. Language impairment and dyslexia genes influence language skills in children with autism spectrum disorders. Autism Res. 2015;8:229–234. doi: 10.1002/aur.1436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Spence S.J., Cantor R.M., Chung L., Kim S., Geschwind D.H., Alarcon M. Stratification based on language-related endophenotypes in autism: Attempt to replicate reported linkage. Am. J. Med. Genetics. Part B Neuropsychiatr. Genet. Publ. Int. Soc. Psychiatr. Genet. 2006;141B:591–598. doi: 10.1002/ajmg.b.30329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gerdts J.A., Bernier R., Dawson G., Estes A. The broader autism phenotype in simplex and multiplex families. J. Autism Dev. Disord. 2013;43:1597–1605. doi: 10.1007/s10803-012-1706-6. [DOI] [PubMed] [Google Scholar]
- 12.Woodbury-Smith M., Paterson A.D., O’Connor I., Zarrei M., Yuen R.K.C., Howe J.L., Thompson A., Parlier M., Fernandez B., Piven J., et al. A genome-wide linkage study of autism spectrum disorder and the broad autism phenotype in extended pedigrees. J. Neurodev. Disord. 2018;10:20. doi: 10.1186/s11689-018-9238-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Piven J., Vieland V.J., Parlier M., Thompson A., O’Conner I., Woodbury-Smith M., Huang Y., Walters K.A., Fernandez B., Szatmari P. A molecular genetic study of autism and related phenotypes in extended pedigrees. J. Neurodev. Disord. 2013;5:30. doi: 10.1186/1866-1955-5-30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Clarke T.K., Lupton M.K., Fernandez-Pujals A.M., Starr J., Davies G., Cox S., Smith B.H. Common polygenic risk for autism spectrum disorder (ASD) is associated with cognitive ability in the general population. Mol. Psychiatr. 2016;21:419–425. doi: 10.1038/mp.2015.12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tang S., Sun N., Floris D.L., Zhang X., Di Martino A., Yeo B.T. Reconciling dimensional and categorical models of autism heterogeneity: A brain connectomics and behavioral study. Biol. Psychiatry. 2020;87:1071–1082. doi: 10.1016/j.biopsych.2019.11.009. [DOI] [PubMed] [Google Scholar]
- 16.Uljarević M., Jo B., Frazier T.W., Scahill L., Youngstrom E.A., Hardan A.Y. Using the big data approach to clarify the structure of restricted and repetitive behaviors across the most commonly used autism spectrum disorder measures. Mol. Autism. 2021;12:39. doi: 10.1186/s13229-021-00419-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Litman A., Sauerwald N., Green Snyder L., Foss-Feig J., Park C.Y., Hao Y., Dinstein I., Theesfeld C.L., Troyanskaya O.G. Decomposition of phenotypic heterogeneity in autism reveals underlying genetic programs. Nat. Genet. 2025;57:1611–1619. doi: 10.1038/s41588-025-02224-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bartlett C.W., Hou L., Flax J.F., Hare A., Cheong S.Y., Fermano Z., Zimmerman-Bier B., Cartwright C., Azaro M.A., Buyske S., et al. A genome scan for loci shared by autism spectrum disorder and language impairment. Am. J. Psychiatry. 2014;171:72–81. doi: 10.1176/appi.ajp.2013.12081103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Reindal L., Nærland T., Weidle B., Lydersen S., Andreassen O.A., Sund A.M. Structural and pragmatic language impairments in children evaluated for autism spectrum disorder (ASD) J. Autism Dev. Disord. 2023;53:701–719. doi: 10.1007/s10803-020-04853-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cirnigliaro M., Chang T.S., Arteaga S.A., Pérez-Cano L., Ruzzo E.K., Gordon A., Bicks L.K., Jung J.-Y., Lowe J.K., Wall D.P. The contributions of rare inherited and polygenic risk to ASD in multiplex families. Proc. Natl. Acad. Sci. USA. 2023;120:e2215632120. doi: 10.1073/pnas.2215632120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.More R.P., Warrier V., Brunel H., Buckingham C., Smith P., Allison C., Holt R., Bradshaw C.R., Baron-Cohen S. Identifying rare genetic variants in 21 highly multiplex autism families: The role of diagnosis and autistic traits. Mol. Psychiatr. 2023;28:2148–2157. doi: 10.1038/s41380-022-01938-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Bartlett C.W., Flax J.F., Logue M.W., Smith B.J., Vieland V.J., Tallal P., Brzustowicz L.M. Examination of potential overlap in autism and language loci on chromosomes 2, 7, and 13 in two independent samples ascertained for specific language impairment. Hum. Hered. 2004;57:10–20. doi: 10.1159/000077385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bartlett C.W., Flax J.F., Logue M.W., Vieland V.J., Bassett A.S., Tallal P., Brzustowicz L.M. A major susceptibility locus for specific language impairment is located on 13q21. Am. J. Hum. Genet. 2002;71:45–55. doi: 10.1086/341095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bartlett C.W., Flax J.F., Fermano Z., Hare A., Hou L., Petrill S.A., Buyske S., Brzustowicz L.M. Gene x gene interaction in shared etiology of autism and specific language impairment. Biol. Psychiatry. 2012;72:692–699. doi: 10.1016/j.biopsych.2012.05.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lord C., Rutter M., Le Couteur A. Autism Diagnostic Interview-Revised: A revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J. Autism Dev. Disord. 1994;24:659–685. doi: 10.1007/BF02172145. [DOI] [PubMed] [Google Scholar]
- 26.Lord C., Risi S., Lambrecht L., Cook E.H., Jr., Leventhal B.L., DiLavore P.C., Pickles A., Rutter M. The autism diagnostic observation schedule-generic: A standard measure of social and communication deficits associated with the spectrum of autism. J. Autism Dev. Disord. 2000;30:205–223. doi: 10.1023/A:1005592401947. [DOI] [PubMed] [Google Scholar]
- 27.Lord C., Rutter M., Goode S., Heemsbergen J., Jordan H., Mawhood L., Schopler E. Autism diagnostic observation schedule: A standardized observation of communicative and social behavior. J. Autism Dev. Disord. 1989;19:185–212. doi: 10.1007/BF02211841. [DOI] [PubMed] [Google Scholar]
- 28.American Psychiatric Association . Diagnostic and Statistical Manual of Mental Disorders. 4th ed. American Psychiatric Association; Washington, DC, USA: 2000. [Google Scholar]
- 29.Semel E., Wiig E.H., Secord W.A. Clinical Evaluation of Language Fundamentals. 4th ed. The Psychological Corporation/A Harcourt Assessment Company; Toronto, ON, Canada: 2003. CELF-4. [Google Scholar]
- 30.Wiig E.H., Secord W.A., Semel E. Clinical Evaluation of Language Fundamentals—Preschool. 2nd ed. The Psychological Corporation/A Harcourt Assessment Company; Toronto, ON, Canada: 2004. CELF Preschool-2. [Google Scholar]
- 31.Flax J.F., Realpe-Bonilla T., Hirsch L.S., Brzustowicz L.M., Bartlett C.W., Tallal P. Specific language impairment in families: Evidence for co-occurrence with reading impairments. J. Speech Lang. Hear. Res. 2003;46:530–543. doi: 10.1044/1092-4388(2003/043). [DOI] [PubMed] [Google Scholar]
- 32.Tallal P., Hirsch L.S., Realpe-Bonilla T., Miller S., Brzustowicz L.M., Bartlett C., Flax J.F. Familial aggregation in specific language impairment. J. Speech Lang. Hear. Res. 2001;44:1172–1182. doi: 10.1044/1092-4388(2001/091). [DOI] [PubMed] [Google Scholar]
- 33.Wechsler D. Wechsler Abbreviated Scale of Intelligence (WASI) Harcourt Assessment; San Antonio, TX, USA: 1999. [Google Scholar]
- 34.Simmons T.R., Flax J.F., Azaro M.A., Hayter J.E., Justice L.M., Petrill S.A., Bassett A.S., Tallal P., Brzustowicz L.M., Bartlett C.W. Increasing genotype-phenotype model determinism: Application to bivariate reading/language traits and epistatic interactions in language-impaired families. Hum. Hered. 2010;70:232–244. doi: 10.1159/000320367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Constantino J.N., Gruber C.P., Davis S., Hayes S., Passanante N., Przybeck T. The factor structure of autistic traits. J. Child. Psychol. Psychiatry. 2004;45:719–726. doi: 10.1111/j.1469-7610.2004.00266.x. [DOI] [PubMed] [Google Scholar]
- 36.Constantino J.N., Davis S.A., Todd R.D., Schindler M.K., Gross M.M., Brophy S.L., Metzger L.M., Shoushtari C.S., Splinter R., Reich W. Validation of a brief quantitative measure of autistic traits: Comparison of the social responsiveness scale with the autism diagnostic interview-revised. J. Autism Dev. Disord. 2003;33:427–433. doi: 10.1023/A:1025014929212. [DOI] [PubMed] [Google Scholar]
- 37.Constantino J.N., Hudziak J.J., Todd R.D. Deficits in reciprocal social behavior in male twins: Evidence for a genetically independent domain of psychopathology. J. Am. Acad. Child. Adolesc. Psychiatry. 2003;42:458–467. doi: 10.1097/01.CHI.0000046811.95464.21. [DOI] [PubMed] [Google Scholar]
- 38.Scahill L., McDougle C.J., Williams S.K., Dimitropoulos A., Aman M.G., McCracken J.T., Tierney E., Arnold L.E., Cronin P., Grados M., et al. Children’s Yale-Brown Obsessive Compulsive Scale modified for pervasive developmental disorders. J. Am. Acad. Child. Adolesc. Psychiatry. 2006;45:1114–1123. doi: 10.1097/01.chi.0000220854.79144.e7. [DOI] [PubMed] [Google Scholar]
- 39.Goodman W.K., Price L.H., Rasmussen S.A., Mazure C., Delgado P., Heninger G.R., Charney D.S. The Yale-Brown Obsessive Compulsive Scale. II. Validity. Arch. Gen. Psychiat. 1989;46:1012–1016. doi: 10.1001/archpsyc.1989.01810110054008. [DOI] [PubMed] [Google Scholar]
- 40.Goodman W.K., Price L.H., Rasmussen S.A., Mazure C., Fleischmann R.L., Hill C.L., Heninger G.R., Charney D.S. The Yale-Brown Obsessive Compulsive Scale. I. Development, use, and reliability. Arch. Gen. Psychiat. 1989;46:1006–1011. doi: 10.1001/archpsyc.1989.01810110048007. [DOI] [PubMed] [Google Scholar]
- 41.Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
- 42.Vieland V.J., Huang Y., Seok S.C., Burian J., Catalyurek U., O’Connell J., Segre A., Valentine-Cooper W. KELVIN: A software package for rigorous measurement of statistical evidence in human genetics. Hum. Hered. 2011;72:276–288. doi: 10.1159/000330634. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Matise T.C., Chen F., Chen W., De La Vega F.M., Hansen M., He C., Hyland F.C., Kennedy G.C., Kong X., Murray S.S., et al. A second-generation combined linkage physical map of the human genome. Genome Res. 2007;17:1783–1786. doi: 10.1101/gr.7156307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kong X., Murphy K., Raj T., He C., White P.S., Matise T.C. A combined linkage-physical map of the human genome. Am. J. Hum. Genet. 2004;75:1143–1148. doi: 10.1086/426405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Huang J., Vieland V.J. Comparison of ‘model-free’ and ‘model-based’ linkage statistics in the presence of locus heterogeneity: Single data set and multiple data set applications. Hum. Hered. 2001;51:217–225. doi: 10.1159/000053345. [DOI] [PubMed] [Google Scholar]
- 46.Vieland V.J. Bayesian linkage analysis, or: How I learned to stop worrying and love the posterior probability of linkage. Am. J. Hum. Genet. 1998;63:947–954. doi: 10.1086/302076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Govil M., Vieland V.J. Practical considerations for dividing data into subsets prior to PPL analysis. Hum. Hered. 2008;66:223–237. doi: 10.1159/000143405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Huang Y., Vieland V.J. Association statistics under the PPL framework. Genet. Epidemiol. 2010;34:835–845. doi: 10.1002/gepi.20537. [DOI] [PubMed] [Google Scholar]
- 49.McCarthy S., Das S., Kretzschmar W., Delaneau O., Wood A.R., Teumer A., Kang H.M., Fuchsberger C., Danecek P., Sharp K., et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 2016;48:1279–1283. doi: 10.1038/ng.3643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Zhou A., Cao X., Mahaganapathy V., Azaro M., Gwin C., Wilson S., Buyske S., Bartlett C.W., Flax J.F., Brzustowicz L.M., et al. Common genetic risk factors in ASD and ADHD co-occurring families. Hum. Genet. 2023;142:217–230. doi: 10.1007/s00439-022-02496-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.DePristo M.A., Banks E., Poplin R., Garimella K.V., Maguire J.R., Hartl C., Philippakis A.A., del Angel G., Rivas M.A., Hanna M., et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 2011;43:491–498. doi: 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Hu H., Roach J.C., Coon H., Guthery S.L., Voelkerding K.V., Margraf R.L., Durtschi J.D., Tavtigian S.V., Shankaracharya, Wu W., et al. A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data. Nat. Biotechnol. 2014;32:663–669. doi: 10.1038/nbt.2895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Alibutud R., Hansali S., Cao X., Zhou A., Mahaganapathy V., Azaro M., Gwin C., Wilson S., Buyske S., Bartlett C.W., et al. Structural Variations Contribute to the Genetic Etiology of Autism Spectrum Disorder and Language Impairments. Int. J. Mol. Sci. 2023;24:13248. doi: 10.3390/ijms241713248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Mohiyuddin M., Mu J.C., Li J., Bani Asadi N., Gerstein M.B., Abyzov A., Wong W.H., Lam H.Y. MetaSV: An accurate and integrative structural-variant caller for next generation sequencing. Bioinformatics. 2015;31:2741–2744. doi: 10.1093/bioinformatics/btv204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Gardner E.J., Lam V.K., Harris D.N., Chuang N.T., Scott E.C., Pittard W.S., Mills R.E., Devine S.E., Consortium G.P. The Mobile Element Locator Tool (MELT): Population-scale mobile element discovery and biology. Genome Res. 2017;27:1916–1929. doi: 10.1101/gr.218032.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Geoffroy V., Herenger Y., Kress A., Stoetzel C., Piton A., Dollfus H., Muller J. AnnotSV: An integrated tool for structural variations annotation. Bioinformatics. 2018;34:3572–3574. doi: 10.1093/bioinformatics/bty304. [DOI] [PubMed] [Google Scholar]
- 57.Paila U., Chapman B.A., Kirchner R., Quinlan A.R. GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations. PLoS Comput. Biol. 2013;9:1003153. doi: 10.1371/journal.pcbi.1003153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Wang K., Li M., Hadley D., Liu R., Glessner J., Grant S.F., Hakonarson H., Bucan M. PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007;17:1665–1674. doi: 10.1101/gr.6861907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Colella S., Yau C., Taylor J.M., Mirza G., Butler H., Clouston P., Bassett A.S., Seller A., Holmes C.C., Ragoussis J. QuantiSNP: An Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 2007;35:2013–2025. doi: 10.1093/nar/gkm076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Aguet F., Barbeira A.N., Bonazzola R., Brown A., Castel S.E., Jo B., Kasela S., Kim-Hellmuth S., Liang Y., Oliva M., et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. bioRxiv. 2019:787903. doi: 10.1101/787903. [DOI] [Google Scholar]
- 61.Miller J.A., Ding S.L., Sunkin S.M., Smith K.A., Ng L., Szafer A., Ebbert A., Riley Z.L., Royall J.J., Aiona K., et al. Transcriptional landscape of the prenatal human brain. Nature. 2014;508:199–206. doi: 10.1038/nature13185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Lindsay S.J., Xu Y., Lisgo S.N., Harkin L.F., Copp A.J., Gerrelli D., Clowry G.J., Talbot A., Keogh M.J., Coxhead J., et al. HDBR Expression: A Unique Resource for Global and Individual Gene Expression Studies during Early Human Brain Development. Front. Neuroanat. 2016;10:86. doi: 10.3389/fnana.2016.00086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Cao X., Zhang Y., Abdulkadir M., Deng L., Fernandez T.V., Garcia-Delgar B., Hagstrom J., Hoekstra P.J., King R.A., Koesterich J., et al. Whole-exome sequencing identifies genes associated with Tourette’s disorder in multiplex families. Mol. Psychiatry. 2021;26:6937–6951. doi: 10.1038/s41380-021-01094-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Collins R.L., Brand H., Karczewski K.J., Zhao X., Alfoldi J., Francioli L.C., Khera A.V., Lowther C., Gauthier L.D., Wang H., et al. A structural variation reference for medical and population genetics. Nature. 2020;581:444–451. doi: 10.1038/s41586-020-2287-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Prat C.S., Stocco A., Neuhaus E., Kleinhans N.M. Basal ganglia impairments in autism spectrum disorder are related to abnormal signal gating to prefrontal cortex. Neuropsychologia. 2016;91:268–281. doi: 10.1016/j.neuropsychologia.2016.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Subramanian K., Brandenburg C., Orsati F., Soghomonian J.J., Hussman J.P., Blatt G.J. Basal ganglia and autism—A translational perspective. Autism Res. 2017;10:1751–1775. doi: 10.1002/aur.1837. [DOI] [PubMed] [Google Scholar]
- 67.Hashimoto T., Tayama M., Miyazaki M., Murakawa K., Shimakawa S., Yoneda Y., Kuroda Y. Brainstem involvement in high functioning autistic children. Acta Neurol. Scand. 1993;88:123–128. doi: 10.1111/j.1600-0404.1993.tb04203.x. [DOI] [PubMed] [Google Scholar]
- 68.Mackeh R., Marr A.K., Fadda A., Kino T. C2H2-Type Zinc Finger Proteins: Evolutionarily Old and New Partners of the Nuclear Hormone Receptors. Nucl. Recept. Signal. 2018;15:1550762918801071. doi: 10.1177/1550762918801071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Rinaldi T., Perrodin C., Markram H. Hyper-connectivity and hyper-plasticity in the medial prefrontal cortex in the valproic Acid animal model of autism. Front. Neural Circuits. 2008;2:4. doi: 10.3389/neuro.04.004.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Tatour Y., Sanchez-Navarro I., Chervinsky E., Hakonarson H., Gawi H., Tahsin-Swafiri S., Leibu R., Lopez-Molina M.I., Fernandez-Sanz G., Ayuso C., et al. Mutations in SCAPER cause autosomal recessive retinitis pigmentosa with intellectual disability. J. Med. Genet. 2017;54:698–704. doi: 10.1136/jmedgenet-2017-104632. [DOI] [PubMed] [Google Scholar]
- 71.Amaral D.G., Corbett B.A. The amygdala, autism and anxiety. Novartis Found. Symp. 2003;251:177–187; discussion 187–197, 281–297. [PubMed] [Google Scholar]
- 72.Andrews D.S., Aksman L., Kerns C.M., Lee J.K., Winder-Patel B.M., Harvey D.J., Waizbard-Bartov E., Heath B., Solomon M., Rogers S.J., et al. Association of Amygdala Development with Different Forms of Anxiety in Autism Spectrum Disorder. Biol. Psychiatry. 2022;91:977–987. doi: 10.1016/j.biopsych.2022.01.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.LaCroix A.J., Stabley D., Sahraoui R., Adam M.P., Mehaffey M., Kernan K., Myers C.T., Fagerstrom C., Anadiotis G., Akkari Y.M., et al. GGC Repeat Expansion and Exon 1 Methylation of XYLT1 Is a Common Pathogenic Variant in Baratela-Scott Syndrome. Am. J. Hum. Genet. 2019;104:35–44. doi: 10.1016/j.ajhg.2018.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Marques A.R., Santos J.X., Martiniano H., Vilela J., Rasga C., Romao L., Vicente A.M. Gene Variants Involved in Nonsense-Mediated mRNA Decay Suggest a Role in Autism Spectrum Disorder. Biomedicines. 2022;10:665. doi: 10.3390/biomedicines10030665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Smith P.D., Coulson-Thomas V.J., Foscarin S., Kwok J.C., Fawcett J.W. “GAG-ing with the neuron”: The role of glycosaminoglycan patterning in the central nervous system. Exp. Neurol. 2015;274:100–114. doi: 10.1016/j.expneurol.2015.08.004. [DOI] [PubMed] [Google Scholar]
- 76.Hirano K., Egawa H., Nishihara S. The sulfation pattern of glycosaminoglycans in human brain development and neurological disorders such as Alzheimer’s disease. Am. J. Physiology. Cell Physiol. 2025;329:C801–C811. doi: 10.1152/ajpcell.00842.2024. [DOI] [PubMed] [Google Scholar]
- 77.Wat M.J., Enciso V.B., Wiszniewski W., Resnick T., Bader P., Roeder E.R., Freedenberg D., Brown C., Stankiewicz P., Cheung S.W., et al. Recurrent microdeletions of 15q25.2 are associated with increased risk of congenital diaphragmatic hernia, cognitive deficits and possibly Diamond--Blackfan anaemia. J. Med. Genet. 2010;47:777–781. doi: 10.1136/jmg.2009.075903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Palumbo O., Palumbo P., Palladino T., Stallone R., Miroballo M., Piemontese M.R., Zelante L., Carella M. An emerging phenotype of interstitial 15q25.2 microdeletions: Clinical report and review. Am. J. Med. Genet. Part A. 2012;158A:3182–3189. doi: 10.1002/ajmg.a.35631. [DOI] [PubMed] [Google Scholar]
- 79.Chen Z., Chen H., Yuan K., Wang C. A 15q25.2 microdeletion phenotype for premature ovarian failure in a Chinese girl: A case report and review of literature. BMC Med. Genom. 2020;13:126. doi: 10.1186/s12920-020-00787-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Doelken S.C., Seeger K., Hundsdoerfer P., Weber-Ferro W., Klopocki E., Graul-Neumann L. Proximal and distal 15q25.2 microdeletions-genotype-phenotype delineation of two neurodevelopmental susceptibility loci. Am. J. Med. Genet. Part A. 2013;161A:218–224. doi: 10.1002/ajmg.a.35695. [DOI] [PubMed] [Google Scholar]
- 81.Wagenstaller J., Spranger S., Lorenz-Depiereux B., Kazmierczak B., Nathrath M., Wahl D., Heye B., Glaser D., Liebscher V., Meitinger T., et al. Copy-number variations measured by single-nucleotide-polymorphism oligonucleotide arrays in patients with mental retardation. Am. J. Hum. Genet. 2007;81:768–779. doi: 10.1086/521274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Settembre C., Fraldi A., Medina D.L., Ballabio A. Signals from the lysosome: A control centre for cellular clearance and energy metabolism. Nat. Rev. Mol. Cell Biol. 2013;14:283–296. doi: 10.1038/nrm3565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Lawrence R.E., Zoncu R. The lysosome as a cellular centre for signalling, metabolism and quality control. Nat. Cell Biol. 2019;21:133–142. doi: 10.1038/s41556-018-0244-7. [DOI] [PubMed] [Google Scholar]
- 84.Udayar V., Chen Y., Sidransky E., Jagasia R. Lysosomal dysfunction in neurodegeneration: Emerging concepts and methods. Trends Neurosci. 2022;45:184–199. doi: 10.1016/j.tins.2021.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Fassio A., Falace A., Esposito A., Aprile D., Guerrini R., Benfenati F. Emerging Role of the Autophagy/Lysosomal Degradative Pathway in Neurodevelopmental Disorders with Epilepsy. Front. Cell Neurosci. 2020;14:39. doi: 10.3389/fncel.2020.00039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Patak J., Zhang-James Y., Faraone S.V. Endosomal system genetics and autism spectrum disorders: A literature review. Neurosci. Biobehav. Rev. 2016;65:95–112. doi: 10.1016/j.neubiorev.2016.03.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Zhou Y., Fujiwara Y., Shirazaki M., Tian X., Igarashi G., Yamauchi H., Imaizumi K., Hayakawa H., Miyoshi K., Katayama T. Impaired lysosomal proteolysis in developing neurons induces protein aggregation and disrupts morphogenesis and neuromaturation. Neurochem. Int. 2025;190:106048. doi: 10.1016/j.neuint.2025.106048. [DOI] [PubMed] [Google Scholar]
- 88.Tintos-Hernandez J.A., Santana A., Keller K.N., Ortiz-Gonzalez X.R. Lysosomal dysfunction impairs mitochondrial quality control and is associated with neurodegeneration in TBCK encephaloneuronopathy. Brain Commun. 2021;3:fcab215. doi: 10.1093/braincomms/fcab215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Tiede G.M., Walton K.M. Social endophenotypes in autism spectrum disorder: A scoping review. Dev. Psychopathol. 2021;33:1381–1409. doi: 10.1017/S0954579420000577. [DOI] [PubMed] [Google Scholar]
- 90.Jarvela I., Maatta T., Acharya A., Leppala J., Jhangiani S.N., Arvio M., Siren A., Kankuri-Tammilehto M., Kokkonen H., Palomaki M., et al. Exome sequencing reveals predominantly de novo variants in disorders with intellectual disability (ID) in the founder population of Finland. Hum. Genet. 2021;140:1011–1029. doi: 10.1007/s00439-021-02268-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Gulisano M., Barone R., Alaimo S., Ferro A., Pulvirenti A., Cirnigliaro L., Di Silvestre S., Martellino S., Maugeri N., Milana M.C., et al. Disentangling Restrictive and Repetitive Behaviors and Social Impairments in Children and Adolescents with Gilles de la Tourette Syndrome and Autism Spectrum Disorder. Brain Sci. 2020;10:308. doi: 10.3390/brainsci10050308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Abrahams B.S., Arking D.E., Campbell D.B., Mefford H.C., Morrow E.M., Weiss L.A., Menashe I., Wadkins T., Banerjee-Basu S., Packer A. SFARI Gene 2.0: A community-driven knowledgebase for the autism spectrum disorders (ASDs) Mol. Autism. 2013;4:36. doi: 10.1186/2040-2392-4-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Banerjee-Basu S., Packer A. SFARI Gene: An evolving database for the autism research community. Dis. Model. Mech. 2010;3:133–135. doi: 10.1242/dmm.005439. [DOI] [PubMed] [Google Scholar]
- 94.Wolock S.L., Yates A., Petrill S.A., Bohland J.W., Blair C., Li N., Machiraju R., Huang K., Bartlett C.W. Gene x smoking interactions on human brain gene expression: Finding common mechanisms in adolescents and adults. J. Child. Psychol. Psychiatry. 2013;54:1109–1119. doi: 10.1111/jcpp.12119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Bartlett C.W., Vieland V.J. Accumulating quantitative trait linkage evidence across multiple datasets using the posterior probability of linkage. Genet. Epidemiol. 2007;31:91–102. doi: 10.1002/gepi.20193. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The raw sequencing reads, variants, and genotypes for all samples are available in the National Institute of Mental Health (NIMH) Data Archive (NDA) under collections C1932 and C2933 and NRGR under study 39. Access can be requested through the portal at this link (https://nda.nih.gov (accessed on 1 January 2026)). The code and workflow for the project are publicly available on GitHub: https://github.com/JXing-Lab/linkage-pipeline (accessed on 1 January 2026).






