Skip to main content
Translational Psychiatry logoLink to Translational Psychiatry
. 2020 Aug 17;10:290. doi: 10.1038/s41398-020-00951-x

Clustering by phenotype and genome-wide association study in autism

Akira Narita 1,2,#, Masato Nagai 1,2,#, Satoshi Mizuno 1,2,#, Soichi Ogishima 1,2, Gen Tamiya 1,2,3, Masao Ueki 1,2,3, Rieko Sakurai 1,2,3, Satoshi Makino 1,2,3, Taku Obara 1,2,4, Mami Ishikuro 1,2, Chizuru Yamanaka 1,2, Hiroko Matsubara 1,2, Yasutaka Kuniyoshi 2, Keiko Murakami 1,2, Fumihiko Ueno 1,2, Aoi Noda 1,2,4, Tomoko Kobayashi 1,2,4, Mika Kobayashi 1, Takuma Usuzaki 1, Hisashi Ohseto 1, Atsushi Hozawa 1,2, Masahiro Kikuya 1,2,5, Hirohito Metoki 1,2,6, Shigeo Kure 1,2,4, Shinichi Kuriyama 1,2,7,
PMCID: PMC7431539  PMID: 32807774

Abstract

Autism spectrum disorder (ASD) has phenotypically and genetically heterogeneous characteristics. A simulation study demonstrated that attempts to categorize patients with a complex disease into more homogeneous subgroups could have more power to elucidate hidden heritability. We conducted cluster analyses using the k-means algorithm with a cluster number of 15 based on phenotypic variables from the Simons Simplex Collection (SSC). As a preliminary study, we conducted a conventional genome-wide association study (GWAS) with a data set of 597 ASD cases and 370 controls. In the second step, we divided cases based on the clustering results and conducted GWAS in each of the subgroups vs controls (cluster-based GWAS). We also conducted cluster-based GWAS on another SSC data set of 712 probands and 354 controls in the replication stage. In the preliminary study, which was conducted in conventional GWAS design, we observed no significant associations. In the second step of cluster-based GWASs, we identified 65 chromosomal loci, which included 30 intragenic loci located in 21 genes and 35 intergenic loci that satisfied the threshold of P < 5.0 × 10−8. Some of these loci were located within or near previously reported candidate genes for ASD: CDH5, CNTN5, CNTNAP5, DNAH17, DPP10, DSCAM, FOXK1, GABBR2, GRIN2A5, ITPR1, NTM, SDK1, SNCA, and SRRM4. Of these 65 significant chromosomal loci, rs11064685 located within the SRRM4 gene had a significantly different distribution in the cases vs controls in the replication cohort. These findings suggest that clustering may successfully identify subgroups with relatively homogeneous disease etiologies. Further cluster validation and replication studies are warranted in larger cohorts.

Subject terms: Molecular neuroscience, Autism spectrum disorders

Introduction

Autism spectrum disorder (ASD) has heterogeneous characteristics in terms of both phenotypic features and genetics. ASD is mainly characterized by difficulties in communication and repetitive behaviors, but ASD also shows many other symptoms1. Regarding genetics, previous studies have not consistently identified genetic variants that are associated with an increased risk of ASD2, although several lines of evidence suggest that genetic factors strongly contribute to the increased risk of ASD. Monozygotic twins have higher concordance rates of ASD (92%) than dizygotic twins (10%)3. The recurrence risk ratio is 22 for ASD among siblings4. The Human Gene module of the Simons Foundation Autism Research Initiative (SFARI) gene provides a comprehensive reference for suggested human ASD-related genes in an up-to-date manner5 and currently demonstrates ~1000 genes that may have links to ASD, potentially indicating the heterogeneity of ASD. In addition to phenotype and genotype heterogeneities, ASD shows heterogeneous responses to interventions. Several kinds of pharmacological treatments are suggested, but the effects of these treatments are controversial6.

If the heterogeneous phenotypes and responses to treatment in some way correspond to differences in genotype, grouping persons with ASD according to phenotype and responses to treatment variables may increase the chances of identifying genetic susceptibility factors. Traylor and colleagues7 demonstrated that attempts to categorize patients with a complex disease into more homogeneous subgroups could have more power to elucidate the hidden heritability in a simulation study. Several studies on Alzheimer’s disease, neuroticism, or asthma indicated that items or symptoms were to some degree more useful for identifying high-impact genetic factors than broadly defined diagnoses810, although a study of ASD demonstrated modest effects of two-way stratification by individual symptoms11. In addition, medical researchers have begun to use machine learning methods, which is an artificial intelligence technique that can reveal masked patterns of data sets. In view of the abovementioned circumstances, clustering algorithms of machine learning and subsequent genome-wide association studies (GWASs) could be hypothesized to reveal novel and more genetically homogeneous clusters, but a combinatorial approach of cluster analysis and GWASs, to the best of our knowledge, has not been applied to any diseases including ASD.

We therefore explored whether grouping persons with ASD using a clustering algorithm with phenotype and responses to treatment variables can be used to discriminate more genetically homogeneous persons with ASD. In the present study, we conducted cluster-based GWASs (named cluster-based GWASs) using real data based on the concept of a previous simulation study7 adopting a machine learning k-means12 algorithm for cluster analysis.

Subjects and methods

We conducted the present study in accordance with the guidelines of the Declaration of Helsinki13 and all other applicable guidelines. The protocol was reviewed and approved by the institutional review board of Tohoku University Graduate School of Medicine, and written informed consent was obtained from all participants over the age of 18 by the SFARI14. For participants under the age of 18, informed consent was obtained from a parent and/or legal guardian. In addition, for participants 10–17 years of age, informed assent was obtained from the individuals.

data sets

We used phenotypic variables, history of treatment, and genotypic data from the Simons Simplex Collection (SSC)14. The SSC establishes a repository of phenotypic data and genetic data/samples from mainly simplex families.

The SSC data were publicly released in October 2007 and are directly available from the SFARI. From the SSC data set, we used data from 614 affected white male probands who had no missing information regarding Autism Diagnostic Interview-Revised (ADI-R) scores15 and vitamin treatment16,17 and 391 unaffected brothers for whom genotype data, generated by the Illumina Human Omni2.5 (Omni2.5) array, were available for subsequent clustering and genetic analyses. We excluded participants whose ancestries were estimated to be different from the other participants using principal component analyses (PCAs) performed by EIGENSOFT version 7.2.118 for the genotype data. Based on the PCAs, we excluded data beyond four standard deviations of principal components 1 or 2 (Supplementary Fig. 1). Therefore, we used data from 597 probands and 370 unaffected brothers.

In the replication study, we used another SSC data set genotyped using the Illumina 1Mv3 (1Mv3) array. In the data set, data from 735 affected male probands with no missing information regarding ADI-R scores or vitamin treatment and 387 unaffected brothers were available. After conducting PCA, we excluded data beyond four standard deviations of principal components 1 or 2 as outliers. In this way, we used data from 712 probands and 354 unaffected brothers in the replication study.

Clustering

We conducted cluster analyses using phenotypic variables of ADI-R scores and history of vitamin treatment16,17. We chose these variables because the ADI-R is one of the most reliable estimates of ASD and has the ability to evaluate substructure domains of ASD15. Among the ADI-R scores, “the total score for the Verbal Communication Domain of the ADI-R minus the total score for the Nonverbal Communication Domain of the ADI-R”, “the total score for the Nonverbal Communication Domain of the ADI-R”, “the total score for the Restricted, Repetitive, and Stereotyped Patterns of Behavior Domain of the ADI-R”, and “the total score for the Reciprocal Social Interaction Domain of the ADI-R” were included in the preprocessed data set.

Among the treatments, we selected the variable of history of vitamin treatment because we recently found that a cluster of persons with ASD is associated with potential responsiveness to vitamin B6 treatment16,17. The history of treatment is not always compatible with responsiveness, but we considered that continuous treatment indicates responsiveness to some degree. The SSC data set includes history of treatment but not variables of responsiveness.

We applied the machine learning k-means12 algorithm to conduct a cluster analysis to divide the data set obtained from ASD persons into subgroups using phenotypic variables and history of treatment. The k-means algorithm requires a cluster number (k) determined by researchers. We set a priori k of 5, 10, 15, and 20 under the hypothesis that ASD consists of hundreds of subgroups5,14 and considering statistical power by sample size calculations19. We performed the analyses using the scikit-learn toolkit in Python 2.7 (Supplementary Information 1).

Clustering is an exploratory data analysis technique, and the validity of the clustering results may be judged by external knowledge, such as the purpose of the segmentation20. Several methods have proposed to prespecify a cluster number of k, such as visual examination of the data, and likelihood and error-based approaches; however, these methods do not necessarily provide results that are consistent with each other21. Although there are measures for evaluating the quality of the clusters22, the number of clusters should still be determined according to the research purposes. We regarded the inflation factor (λ) of quantile-quantile (Q–Q) plots of the logarithm of the P value to base 10 (−log10P) as one of the indicators of successful clustering in the present study. We calculated λ for each cluster number.

When conducting clustering, we combined the two data sets of male probands, one genotyped using the Omni2.5 array and the other genotyped using the 1Mv3 array. After clustering, we redivided the new data set according to the SNP arrays used. In the discovery stage, we used the Omni2.5 data set and the 1Mv3 data set in the replication stage.

Genotype data and quality control

We used the SSC data set, in which probands and unaffected brothers had already been genotyped in other previous studies14,23. In the discovery stage, we used the data set genotyped by the Omni2.5 array, which has 2,383,385 probes. We excluded SNPs with a minor allele frequency < 0.01, call rate < 0.95, and Hardy–Weinberg equilibrium test P < 0.000001.

In the replication study, where we used the data set genotyped using the 1Mv3 array, we applied the same cutoff values for quality control as those used in the discovery stage. The 1Mv3 array includes 1,147,689 SNPs. The Omni2.5 array and the 1Mv3 array shared 675,923 SNPs.

Statistical analysis

As a preliminary study, we conducted a conventional GWAS in the whole Omni2.5 data set, with a total of 597 male probands and 370 unaffected brothers. Here, we used the brothers of the cases as controls, in contrast to many previous studies in which genetically unrelated controls were used. We thus adopted the sib transmission disequilibrium test (sib-TDT)24, a family-based association test, to take into account familial relationships among the participants. In the second step, in the discovery stage, we conducted cluster-based GWAS in each subgroup of the cases, which had been divided using the k-means12 algorithm, and the controls. As mentioned above, the controls were the brothers of the cases, and we then excluded the unaffected brothers of the cases belonging to the subgroup being analyzed. Details of the study design are shown in Supplementary Fig. 2. We applied the Cochran–Armitage trend test25, which examines the risk of disease in those who do not have the allele of interest, those who have a single copy, and those who are homozygous.

We further tested the significantly associated loci found in the discovery studies in the replication stage. The level of significance for association was set as P < 0.05 in the replication studies.

Association analyses were performed with the PLINK software package26. The detected SNPs were subsequently annotated using ANNOVAR27. Manhattan plots and Q–Q plots were generated using the ‘qqman’ package in R version 3.0.2.

Results

Cluster-based GWAS

As a preliminary study, we conducted a conventional GWAS with the Omni2.5 data set using the sib-TDT. We observed no significant associations (Fig. 1). Although we adopted the sib-TDT here because we used the brothers of the cases as controls, we also used the Cochran–Armitage trend test and found that the −log10P values were distributed downward compared with the expected values, as shown in Supplementary Fig. 3.

Fig. 1. Manhattan plot and corresponding quantile-quantile plot in GWAS for all male probands vs their unaffected brothers.

Fig. 1

Manhattan plot a and corresponding quantile-quantile plot b in GWAS for all male probands vs their unaffected brothers. We conducted a GWAS in the Simons Simplex Collection data set of 597 male probands and 370 unaffected brothers genotyped by the Illumina Human Omni2.5 array using the sib transmission/disequilibrium test (sib-TDT). We observed no significant associations in this GWAS with the genome-wide threshold of P = 5.0 × 10−8. The blue horizontal line indicates the genome-wide suggestive threshold of p = 1.0 × 10−5.

We also applied the sib-TDT to cluster 1, which was obtained by dividing all the cases using k-means with k of 15, and all the controls and found that the observed −logP values were lower than expected, as shown in Supplementary Fig. 3. As the sib-TDT may efficiently work in a population consisting of a substantial number of sibs, a limited number of brothers of the probands among all the controls probably contributed to a substantial loss of power. Thus, we excluded the brothers of the probands in each subset from the controls so that each subset of probands has no genetic relations with the rest of the controls and conducted the Cochran–Armitage trend test, as in many other studies. In the present study, therefore, we applied the sib-TDT to the GWAS of the whole data set, whereas in the cluster-based GWAS, we excluded in turn the unaffected brothers of the cases belonging to the subgroup being analyzed and used the Cochran–Armitage trend test to account for the relationships between participants.

The average inflation factor λ for the cluster-based GWAS with k of 5, 10, 15, and 20 were 1.021, 1.024, 1.038, and 1.053, respectively. Several lines of evidence suggest that regarding an appropriate threshold of λ, empirically, a value <1.050 is deemed safe for avoiding false positives28. Under the hypothesis that ASD consists of hundreds of subgroups14, we compared λ values giving larger numbers of clusters as priority. We therefore considered the cluster-based GWAS using k-means cluster analysis with k of 15 to be the most appropriate approach to the present data set. The characteristics of each cluster are presented in Table 1.

Table 1.

Characteristics of each of 15 k-means clusters in the Omni2.5 data set.

Cluster no. n Verbal score from ADI-R Nonverbal score from ADI-R Restricted and repetitive patterns of behavior score from ADI-R Social score from ADI-R Vitamin B6 treatment (%)
Mean (SD) Median (p25–p75) Min Max Mean (SD) Median (p25–p75) Min Max Mean (SD) Median (p25–p75) Min Max Mean (SD) Median (p25–p75) Min Max
All 597 7.7 (2.1) 8.0 (6.0–9.0) 0 12 8.9 (3.3) 9.0 (6.0–12.0) 0 14 6.8 (2.5) 7.0 (5.0–8.0) 1 12 19.8 (5.3) 20.0 (16.0–24.0) 8 30 59.6
1 33 7.4 (2.2) 7.0 (6.0–10.0) 3 11 4.4 (1.6) 4.0 (3.0–6.0) 1 7 8.5 (1.6) 8.0 (7.0–10.0) 6 12 14.0 (1.5) 14.0 (13.0–15.0) 11 17 60.6
2 49 8.9 (1.3) 9.0 (8.0–10.0) 6 12 12.3 (1.5) 12.0 (11.0–14.0) 9 14 6.2 (1.3) 6.0 (6.0–7.0) 3 8 27.1 (1.3) 27.0 (26.0–28.0) 24 30 79.6
3 45 6.0 (1.9) 6.0 (5.0–7.0) 2 10 8.8 (1.5) 9.0 (8.0–10.0) 6 12 5.0 (1.5) 5.0 (4.0–6.0) 2 7 16.8 (1.1) 17.0 (16.0–18.0) 15 19 64.4
4 59 9.0 (1.5) 9.0 (8.0–10.0) 6 12 8.1 (1.5) 8.0 (7.0–9.0) 4 10 8.8 (1.9) 8.0 (8.0–10.0) 5 12 23.8 (1.4) 24.0 (23.0–25.0) 21 27 57.6
5 28 7.3 (1.1) 7.0 (6.5–8.0) 5 9 9.1 (1.7) 9.0 (8.0–10.0) 7 13 6.1 (2.3) 6.0 (5.0–7.0) 1 12 12.7 (1.7) 13.0 (12.0–14.0) 9 15 60.7
6 29 7.7 (1.9) 8.0 (7.0–9.0) 2 12 4.6 (1.8) 5.0 (4.0–6.0) 0 8 4.0 (1.1) 4.0 (3.0–5.0) 2 6 15.8 (1.4) 16.0 (15.0–17.0) 14 19 44.8
7 37 6.5 (1.8) 6.0 (5.0–8.0) 3 11 12.5 (1.3) 12.0 (12.0–14.0) 10 14 5.6 (1.4) 6.0 (5.0–7.0) 3 8 19.4 (1.8) 20.0 (18.0–21.0) 15 22 56.8
8 23 8.3 (1.6) 8.0 (7.0–10.0) 5 11 4.2 (2.1) 4.0 (3.0–6.0) 0 8 5.9 (1.9) 6.0 (4.0–8.0) 3 10 9.7 (1.1) 10.0 (9.0–11.0) 8 12 60.9
9 46 9.0 (1.3) 9.0 (8.0–10.0) 5 12 12.4 (1.3) 13.0 (11.0–13.0) 10 14 9.2 (1.8) 9.0 (8.0–10.0) 6 12 22.7 (1.4) 22.5 (22.0–24.0) 20 25 69.6
10 43 6.6 (1.4) 7.0 (6.0–7.0) 4 9 11.7 (1.5) 12.0 (10.0–13.0) 9 14 5.0 (1.5) 5.0 (4.0–6.0) 2 8 24.1 (1.3) 24.0 (23.0–25.0) 22 26 55.8
11 34 4.4 (1.6) 5.0 (3.0–6.0) 0 7 4.9 (1.8) 5.0 (4.0–6.0) 1 9 4.1 (1.7) 4.0 (3.0–5.0) 1 9 10.9 (1.9) 10.5 (9.0–13.0) 8 14 55.9
12 38 8.8 (1.6) 9.0 (8.0–10.0) 5 12 9.7 (1.5) 9.0 (8.0–11.0) 8 13 9.2 (1.3) 9.0 (8.0–10.0) 7 12 18.1 (1.3) 18.0 (17.0–19.0) 15 20 65.8
13 52 7.1 (2.0) 7.0 (5.5–8.5) 3 12 7.4 (1.5) 7.0 (6.0–9.0) 4 10 4.6 (1.5) 4.5 (3.5–6.0) 1 7 22.0 (1.7) 22.0 (21.0–23.0) 19 27 44.2
14 46 7.9 (1.5) 8.0 (7.0–9.0) 4 11 6.2 (1.6) 6.0 (5.0–7.0) 1 9 8.4 (1.7) 8.0 (7.0–10.0) 5 12 19.4 (1.4) 19.0 (18.0–20.0) 17 22 58.7
15 35 9.5 (1.4) 9.0 (8.0–11.0) 7 12 12.7 (1.4) 13.0 (12.0–14.0) 9 14 9.6 (1.1) 10.0 (9.0–10.0) 8 12 27.5 (1.5) 27.0 (26.0–29.0) 25 30 54.3

ADI-R autism diagnostic interview-revised, SD standard deviation.

Gene interpretation

We observed 65 chromosomal loci that satisfied the threshold of P < 5.0 × 10−8 (Fig. 2); 30 out of the 65 loci were located within 21 genes, and the remaining 35 loci were intergenic (Table 2). Among them, eight loci were located within or near the genes associated with the Human Gene module of the SFARI Gene scoring system5; GABBR2 (score 4, Rare Single Gene Mutation, Syndromic, Functional) in Cluster 1; CNTNAP5 (score 4, Rare Single Gene Mutation, Genetic Association) in Cluster 3; ITPR1 (score 4, Rare Single Gene Mutation) in Cluster 5; DNAH17 (score 4, Rare Single Gene Mutation) in Cluster 7; SDK1 (score none, Rare Single Gene Mutation, Genetic Association) in Cluster 13; SRRM4 (score 5, Rare Single Gene Mutation, Functional) in Cluster 13; CNTN5 (score 3, Rare Single Gene Mutation, Genetic Association) in Cluster 14; and DPP10 (score 3, Rare Single Gene Mutation) in Cluster 15.

Fig. 2. Manhattan plots and corresponding quantile-quantile plots in cluster-based GWASs.

Fig. 2

Manhattan plots a and corresponding quantile-quantile plots b in cluster-based GWASs with a cluster number of 15. We performed cluster analysis using k-means with a cluster number of 15 and conducted cluster-based GWAS. Among 15 clusters, significant associations were observed in 14 clusters. In total, we observed 65 chromosomal loci, labeled in the figure, that satisfied the threshold of P = 5.0 × 10−8. The red horizontal lines indicate the threshold for genome-wide significance (P = 5.0 × 10−8) and the blue horizontal lines indicate the genome-wide suggestive threshold (P = 1.0 × 10−5). The names of the suggested genes where the excerpted and circled SNPs are located are typed in Manhattan plots.

Table 2.

Association table of the cluster-based GWAS with 15 k-means clusters in the Omni2.5 data set.

Cluster no. ID Chr hg19 Minor/major MAF (%) OR 95% CI P GENESYMBOL Function Power
1 rs111629286 11 130,152,136 A/G 1.80 13.42 4.38–41.17 1.36 × 10−8 ZBTB44 Intronic 0.997
1 rs115140946 6 37,891,923 C/A 1.03 21.07 4.79–92.77 2.87 × 10−8 ZFAND3 Intronic 0.878
1 rs9462391 6 38,123,030 A/G 1.03 21.07 4.79–92.77 2.87 × 10−8 ZFAND3 Downstream 0.878
1 rs10217283 9 101,423,675 A/G 1.42 15.51 4.45–54.12 2.95 × 10−8 GABBR2 Intronic 0.976
1 rs114109395 6 38,005,546 A/G 1.03 21.01 4.77–92.51 3.02 × 10−8 ZFAND3 Intronic 0.877
2 rs115621412 9 74,366,033 C/A 7.89 4.42 2.48–7.87 8.13 × 10−9 CEMIP2 Intronic 1.000
3 rs77507687 2 26,939,229 G/A 2.00 12.43 4.37–35.36 6.10 × 10−9 KCNK3 Intronic 1.000
3 rs76880969 1 227,711,506 G/A 1.00 27.15 5.30–139.20 8.20 × 10−9 CDC42BPA, ZNF678 Intergenic 0.865
3 rs115483919 2 125,010,267 A/G 1.00 27.15 5.30–139.20 8.20 × 10−9 CNTNAP5 Intronic 0.865
5 rs16965293 16 9,551,490 A/G 2.31 14.04 5.00–39.45 3.83 × 10−10 LINC01195, GRIN2A Intergenic 1.000
5 rs77489014 9 106,962,281 A/G 1.41 19.47 5.51–68.82 6.69 × 10−10 SMC2, LOC105376194 Intergenic 0.991
5 rs117473168 9 106,848,270 A/G 1.55 16.90 5.02–56.93 2.64 × 10−9 SMC2 ncRNA exonic 0.991
5 rs7199670 16 22,875,238 A/G 11.28 5.28 2.76–10.10 4.98 × 10−9 HS3ST2 Intronic 1.000
5 rs73142209 12 77,859,299 G/A 1.54 16.18 4.82–54.31 5.33 × 10−9 E2F7, NAV3 Intergenic 0.989
5 rs118167078 15 65,723,796 A/G 1.54 16.18 4.82–54.31 5.33 × 10−9 IGDCC4, DPP8 Intergenic 0.989
5 rs11919513 3 4,841,384 G/A 3.22 10.18 3.99–26.03 8.92 × 10−9 ITPR1 Intronic 1.000
5 rs13332627 16 22,874,928 G/A 9.23 6.10 2.96–12.57 1.22 × 10−8 HS3ST2 Intronic 1.000
5 rs111920363 7 143,656,906 A/G 1.15 19.46 4.89–77.39 1.29 × 10−8 OR2F1 Upstream 0.933
5 rs9939816 16 22,876,408 A/C 9.25 6.08 2.95–12.53 1.30 × 10−8 HS3ST2 Intronic 1.000
5 rs76096239 14 97,193,704 A/G 1.67 13.79 4.27–44.54 3.25 × 10−8 PAPOLA, LINC02299 Intergenic 0.986
5 rs1054028 16 22,927,214 G/A 14.36 5.02 2.62–9.61 3.32 × 10−8 HS3ST2 UTR3 1.000
5 rs78486970 7 106,127,612 G/A 6.87 5.46 2.67–11.18 3.68 × 10−8 NAMPT, CCDC71L Intergenic 1.000
6 rs148617803 1 76,136,228 G/A 1.32 22.57 5.95–85.64 2.77 × 10−10 SLC44A5, ACADM Intergenic 0.988
6 rs55985845 10 25,163,664 T/A 2.51 11.71 4.26–32.20 7.18 × 10−9 PRTFDC1 Intronic 1.000
6 rs73094424 12 39,840,397 A/G 2.11 12.09 4.12–35.52 2.70 × 10−8 KIF21A, ABCD2 Intergenic 0.998
6 rs58845693 3 122,804,247 G/A 1.18 18.07 4.55–71.72 4.24 × 10−8 PDIA5 Intronic 0.915
6 rs11709496 3 122,809,400 G/A 1.18 18.07 4.55–71.72 4.24 × 10−8 PDIA5 Intronic 0.915
6 rs199531954 12 95,064,359 C/A 1.19 17.92 4.52–71.10 4.92 × 10−8 TMCC3, MIR492 Intergenic 0.913
7 rs79033134 17 76,473,288 A/G 1.53 16.29 4.87–54.44 4.24 × 10−9 DNAH17 Intronic 0.995
7 rs57127555 17 76,475,811 C/A 1.54 16.24 4.86–54.28 4.49 × 10−9 DNAH17 Intronic 0.995
7 rs75382702 11 81,149,755 A/G 1.28 16.94 4.54–63.23 3.18 × 10−8 LINC02720, MIR4300HG Intergenic 0.961
8 rs73149247 3 100,864,047 G/A 2.21 11.41 3.88–33.54 5.80 × 10−11 ABI3BP, IMPG2 Intergenic 1.000
8 rs12418400 11 131,263,123 G/A 1.56 20.88 6.09–71.57 6.68 × 10−11 NTM Intronic 0.996
8 rs78323783 10 45,084,432 A/G 1.17 24.79 6.13–100.30 2.28 × 10−10 CXCL12, TMEM72 Intergenic 0.997
8 rs72991663 6 130,143,713 A/G 2.85 13.30 4.84–36.53 5.51 × 10−10 ARHGAP18, TMEM244 Intergenic 1.000
8 rs74922057 21 41,595,011 A/G 1.31 19.67 5.22–74.14 3.13 × 10−9 DSCAM Intronic 0.962
8 rs115035406 21 41,580,474 G/A 1.42 16.53 4.61–59.31 1.97 × 10−8 DSCAM Intronic 0.957
8 rs114994877 4 136,731,494 A/G 1.42 16.53 4.61–59.31 1.97 × 10−8 LINC02485, LINC00613 Intergenic 0.957
8 rs117008682 9 103,245,053 G/A 1.43 16.48 4.59–59.15 2.08 × 10−8 MSANTD3 Intronic 0.957
8 rs117772706 9 81,338,445 G/A 1.43 16.44 4.58–58.98 2.19 × 10−8 PSAT1, LOC101927450 Intergenic 0.957
9 rs4885429 13 77,400,673 G/A 2.14 13.69 4.91–38.16 4.67 × 10−10 LMO7DN, KCTD12 Intergenic 1.000
9 rs45618836 7 73,480,258 G/A 2.26 11.94 4.43–32.18 2.30 × 10−9 ELN Intronic 1.000
9 rs7299395 12 41,714,602 A/G 3.27 8.52 3.65–19.89 1.15 × 10−8 PDZRN4 Intronic 1.000
9 rs55772967 7 73,448,499 G/A 2.89 8.91 3.66–21.66 2.09 × 10−8 ELN Intronic 1.000
10 rs72799348 2 22,637,443 A/G 2.31 12.84 4.74–34.77 6.57 × 10−10 LINC01822, LINC01884 Intergenic 1.000
10 rs76159464 5 169,446,509 A/G 1.02 28.05 5.47–144.00 5.03 × 10−9 DOCK2 Intronic 0.877
10 rs12483301 21 28,070,591 G/A 1.92 11.89 3.94–35.92 6.74 × 10−9 CYYR1, ADAMTS1 Intergenic 1.000
10 rs72883714 18 23,987,552 A/G 2.17 11.25 4.08–31.06 1.59 × 10−8 TAF4B, LINC01543 Intergenic 1.000
10 rs1876769 2 22,678,191 A/G 2.17 11.25 4.08–31.06 1.59 × 10−8 LINC01822, LINC01884 Intergenic 1.000
10 rs17043765 2 22,656,804 A/G 2.17 11.25 4.08–31.06 1.59 × 10−8 LINC01822, LINC01884 Intergenic 1.000
11 rs74645195 4 48,330,367 G/A 2.71 10.26 3.95–26.66 1.34 × 10−8 TEC, SLAIN2 Intergenic 1.000
11 rs78513244 1 2,360,342 A/G 3.25 9.33 3.79–22.98 1.35 × 10−8 PEX10, PLCH2 Intergenic 1.000
11 rs10027938 4 90,242,059 A/G 16.93 4.48 2.49–8.05 2.29 × 10−8 GPRIN3, SNCA Intergenic 1.000
12 rs117647850 8 79,156,756 A/G 3.08 10.88 4.44–26.68 5.10 × 10−11 LOC102724874, PKIA Intergenic 1.000
12 rs4131532 1 3,540,256 A/G 1.54 15.63 4.68–52.14 8.61 × 10−9 MEGF6, TPRG1L Intergenic 0.994
12 rs77964987 4 183,685,432 G/A 4.77 7.06 3.21–15.53 4.97 × 10−8 TENM3 Intronic 1.000
13 rs117954350 7 4,440,757 A/G 1.02 52.73 6.34–438.60 4.00 × 10−10 SDK1, FOXK1 Intergenic 0.635
13 rs11064685 12 119,590,881 G/A 6.14 5.15 2.66–9.97 4.46 × 10−8 SRRM4 Intronic 1.000
14 rs77983358 12 82,393,237 G/A 1.52 21.71 5.50–85.76 1.29 × 10−10 LINC02426, CCDC59 Intergenic 0.999
14 rs7118821 11 96,876,267 C/A 1.01 26.18 5.11–134.00 1.50 × 10−8 LINC02737 Intergenic 0.847
14 rs7122015 11 96,950,548 G/A 1.01 26.18 5.11–134.00 1.50 × 10−8 LINC02737, CNTN5 Intergenic 0.847
14 rs7106102 11 96,885,969 A/G 1.01 26.10 5.10–133.70 1.58 × 10−8 LINC02737 Intergenic 0.845
14 rs7189512 16 66,324,048 A/G 3.28 7.26 3.13–16.88 4.62 × 10−8 LINC00922, CDH5 Intergenic 1.000
15 rs77311527 2 5,516,750 G/A 2.45 11.87 4.44–31.79 2.19 × 10−9 LINC01249, LINC01248 Intergenic 1.000
15 rs276833 2 114,769,078 A/G 1.29 18.00 4.81–67.43 1.25 × 10−8 LINC01191, DPP10 Intergenic 0.970

Chr chromosome, OR odds ratio, CI confidence interval.

Powers were calculated using the method based on the results in Nam’s study19.

The SFARI Gene scoring system ranges from “Category 1”, which indicates “high confidence”, through “Category 6”, which denotes “evidence does not support a role”. Genes of a syndromic disorder (e.g., fragile X syndrome) related to ASD are categorized in a different category. Rare single-gene variants, disruptions/mutations, and submicroscopic deletions/duplications related to ASD are categorized as “Rare Single Gene Mutation”.

In addition to genes in the Human Gene module of the SFARI Gene, several important genes associated with ASD or other related disorders29 from previous reports were included in our findings as follows: CDH5 in Cluster 14, DSCAM in Cluster 8, FOXK1 in Cluster 13, GRIN2A in Cluster 5, NTM in Cluster 8, and SNCA in Cluster 11 previously reported with ASD3035; PLCH2 in Cluster 11 previously reported with mental retardation36; ARHGAP18 in Cluster 18, CDC42BPA in Cluster 3, CXCL12 in Cluster 8, and HS3ST2 in Cluster 5 previously reported with schizophrenia3740; KCTD12 in Cluster 9 and PSAT1 in Cluster 8 previously reported with depressive disorder41,42; and ADAMTS1 in Cluster 10, DOCK2 in Cluster 10, HS3ST2 in Cluster 5, NAMPT in Cluster 5, and NAV in Cluster 5 previously reported with Alzheimer’s disease4347.

Replication study

We conducted replication studies with another independent data set that included a total of 712 male probands and 354 unaffected brothers and had been genotyped using the 1Mv3 array. As mentioned before, we had previously carried out cluster analyses in the combined data set genotyped with either Omni2.5 or 1Mv3 and then redivided it according to the SNP arrays used. The characteristics of each of the 15 clusters in the 1Mv3 data set are presented in Supplementary Table 1.

Among the 65 genome-wide significant chromosomal loci found in the discovery study, seven chromosomal loci were included in the 1Mv3 array. Of these loci, rs11064685, within SRRM4 in Cluster 13, had a significantly different distribution (p = 0.03) in cases vs controls in the replication cohort (Table 3).

Table 3.

Results of replication studies in the 1Mv3 data set for statistically significant chromosomal loci in the discovery studies.

Cluster no. ID Chr hg19 Minor/major MAF (%) OR 95% CI P GENESYMBOL Function
5 rs13332627 16 22,874,928 G/A 10.0 0.50 0.18–1.45 0.195
5 rs7199670 16 22,875,238 A/G 12.2 0.51 0.20–1.33 0.1629 HS3ST2 Intronic
5 rs1054028 16 22,927,214 G/A 15.0 0.51 0.22–1.21 0.121 HS3ST2 UTR3
10 rs1876769 2 22,678,191 A/G 1.4 NA 0.1822 LINC01822, LINC01884 Intergenic
13 rs11064685 12 119,590,881 G/A 8.2 1.89 1.06–3.37 0.02858 SRRM4 Intronic
14 rs7189512 16 66,324,048 A/G 3.5 2.16 0.83–5.67 0.1085 LINC00922, CDH5 Intergenic
15 rs276833 2 114,769,078 A/G 1.3 0.71 0.09–5.75 0.75 LINC01191, DPP10 Intergenic

Chr chromosome, OR odds ratio, CI confidence interval.

Discussion

One of the most important findings of our study was that reasonably decreasing the sample size could increase the statistical power. A plausible explanation is that our clustering may have successfully identified subgroups that are etiologically more homogeneous. At least two reasons could reduce the possibility of false positives of the present results of statistically significant SNPs in cluster-based GWAS. First, the present study validated the usefulness and feasibility of the concept of a previous simulation study7, which indicated that homogeneous case subgroups increase power in genetic association studies by Traylor and colleagues, using measurement data in the real world. Second, a substantial number of statistically significant SNPs in cluster-based GWAS observed in the present study were located within or near previously reported candidate genes for ASD5,3035.

We observed many statistically significant SNPs in cluster-based GWAS: CDH5, CNTN5, CNTNAP5, DNAH17, DPP10, DSCAM, FOXK1, GABBR2, GRIN2A5, ITPR1, NTM, SDK1, SNCA, and SRRM4. In particular, loci within the SRRM4 gene had significantly different distributions in the cases vs controls in the replication cohort. Previous studies indicate that SRRM4 is strongly associated with ASD, indicating that our results may be valid to some degree. The gene regulates neural microexons. In the brains of individuals with ASD, these microexons are frequently dysregulated48. In addition, nSR100/SRRM4 haploinsufficiency in mice induced autistic features such as sensory hypersensitivity and altered social behavior and impaired synaptic transmission and excitability49.

In addition to SRRM4, we observed several genes located within or near previously reported candidate genes for ASD. The relatively high correspondence between our results in part and the SFARI Gene scoring system5 indicates that the statistically significant loci we found may be associated with ASD subgroups (Fig. 2). We also observed several important genes associated with ASD and other related disorders29 from previous reports. These findings suggest that the statistically significant SNPs might explain autistic symptoms because these diseases are suggested to have shared etiology, even in part, with ASD29. Associations at the remaining significant loci that were not in the SFARI module or described above have not been previously reported, and to the best of our knowledge, some of them might be novel findings. These results might suggest that novel genetic loci of ASD could be found by identifying better defined subgroups, although further confirmation is needed in future cohorts with larger sample sizes.

Previous studies regarding Alzheimer’s disease, neuroticism, or asthma found that items or symptoms showed, to some degree, increased ORs between the case loci and control loci compared with those from previous studies using broadly defined disease diagnoses810. These findings may indicate that GWAS based on a symptom or an item could identify genetically more homogeneous subgroups and let us hypothesize that a relatively reasonable combination of symptoms or items could identify more genetically homogeneous subgroups. In contrast, Chaste and colleagues showed that stratifying children with ASD based on the phenotype only modestly increased power in GWAS11. The discrepancy between their findings and ours might be explained by usage of phenotype variables. Chaste and colleagues used one item or symptom alone with limited number of subgroups, whereas we used combinations of them with a machine learning method with a potentially sufficient number of clusters. DeMichele-Sweet and colleagues reported that subgrouping only by having psychosis could lead to the identification of limited loci that had small effects50, but Mukherjee and colleagues found a substantial number of suggestive loci that had extreme ORs after categorizing persons with Alzheimer’s disease based on relative performance across cognitive domains by modern psychometric approaches8.

Validation of clusters is essential. In the present study, we selected the k-means algorithm, focused on ADI-R items and treatment as variables, and determined cluster numbers based on the λ of the Q–Q plots. Although we believe this approach is one of the relevant ways, selection of variables, selection of algorithms and selection of cluster numbers still remain to be considered in future mathematical and biological cluster validation studies because controversies surrounding evaluation of the quality of the clusters are important issues and are still ongoing and because validated clusters may lead to elucidate the genetic architectures of ASD7.

The present study has a limitation to be noted. Substantial differences in the two genotyping platforms may have affected the results of the replication study. The Omni2.5 array includes 2,383,385 autosomal SNPs, whereas the 1Mv3 array includes 1,147,689 SNPs, with 675,923 shared SNPs between the two. Of the 65 statistically significant chromosomal loci in the discovery data, only seven chromosomal loci were shared between the two arrays.

Our study demonstrated that if the data set consists of multiple heterogeneous subgroups, even a subgroup that includes a much smaller number of homogeneous individuals could detect high-impact genetic factors. Hypothetical examples of the concept of cluster-based GWAS are shown in Supplementary Fig. 4. As shown in Table 2, only 30 etiologically homogeneous probands and 300 controls can have a statistical power of ~1.00, calculated using the method based on the results in Nam’s study19. Although the integral model, which assumes many genetic variants have a small effect, may contribute to the formation of some subgroups of ASD, our results indicate that clustering by specific phenotypic variables may provide a candidate example for identifying etiologically similar cases of ASD.

Our data indicate the relevance of cluster-based GWAS as a means to identify more homogeneous subgroups of ASD than broadly defined subgroups. Future investigation of cluster validation and replication with a larger sample size is therefore warranted. Such studies will provide clues to elucidate the genetic structures and etiologies of ASD and facilitate the development of precision medicine for ASD.

Supplementary information

Supplementary Table 1 (15KB, xlsx)
Supplementary Fig. 1 (266.5KB, pptx)
Supplementary Fig. 2 (200.3KB, tif)
Supplementary Fig. 3 (96.8KB, tif)
Supplementary Fig. 4 (283.1KB, tif)

Acknowledgements

We are grateful to all of the families at the participating SSC sites, as well as the staff at the Simons Foundation Autism Research Initiative (SFARI). The present study was supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) KAKENHI grant numbers 19390171, 16H05242 and 19H03894. MEXT had no role in the design or execution of the study.

Data availability

All data used in the study are available only to those granted access by the Simons Foundation.

Conflict of interest

The authors declare that they have no conflict of interest.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Akira Narita, Masato Nagai, Satoshi Mizuno

Supplementary information

Supplementary Information accompanies this paper at (10.1038/s41398-020-00951-x).

References

  • 1.American Psychological Association (2013): Diagnostic and Statistical Manual of Mental Disorders (DSM–5). Washington: American Psychological Association.
  • 2.Geschwind DH, State MW. Gene hunting in autism spectrum disorder: on the path to precision medicine. Lancet Neurol. 2015;14:1109–1120. doi: 10.1016/S1474-4422(15)00044-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bailey A, et al. Autism as a strongly genetic disorder: evidence from a British twin study. Psychol. Med. 1995;25:63–77. doi: 10.1017/s0033291700028099. [DOI] [PubMed] [Google Scholar]
  • 4.Lauritsen MB, Pedersen CB, Mortensen PB. Effects of familial risk factors and place of birth on the risk of autism: a nationwide register-based study. J. Child Psychol. Psychiatry. 2005;46:963–971. doi: 10.1111/j.1469-7610.2004.00391.x. [DOI] [PubMed] [Google Scholar]
  • 5.Gene, S. Gene scoring. 2008. https://gene.sfari.org/database/gene-scoring/.
  • 6.Eissa N, et al. Current enlightenment about etiology and pharmacological treatment of autism spectrum disorder. Front. Neurosci. 2018;12:304. doi: 10.3389/fnins.2018.00304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Traylor M, Markus H, Lewis CM. Homogeneous case subgroups increase power in genetic association studies. Eur. J. Hum. Genet. 2015;23:863–869. doi: 10.1038/ejhg.2014.194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mukherjee, S. et al. Genetic data and cognitively defined late-onset Alzheimer’s disease subgroups. Mol. Psychiatry. 2018. 10.1038/s41380-018-0298-8. [DOI] [PMC free article] [PubMed]
  • 9.Nagel M, Watanabe K, Stringer S, Posthuma D, van der Sluis S. Item-level analyses reveal genetic heterogeneity in neuroticism. Nat. Commun. 2018;9:905. doi: 10.1038/s41467-018-03242-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lavoie-Charland E, Berube JC, Boulet LP, Bosse Y. Asthma susceptibility variants are more strongly associated with clinically similar subgroups. J. Asthma. 2016;53:907–913. doi: 10.3109/02770903.2016.1165699. [DOI] [PubMed] [Google Scholar]
  • 11.Chaste P, et al. A genome-wide association study of autism using the Simons simplex collection: does reducing phenotypic heterogeneity in autism increase genetic homogeneity? Biol. Psychiatry. 2015;77:775–784. doi: 10.1016/j.biopsych.2014.09.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.MacQueen, J. Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967, pp 281–297.
  • 13.World Medical Association. World medical association Declaration of Helsinki: ethical principles for medical research involving human subjects. JAMA. 2013;310:2191–2194. doi: 10.1001/jama.2013.281053. [DOI] [PubMed] [Google Scholar]
  • 14.Fischbach GD, Lord C. The simons simplex collection: a resource for identification of autism genetic risk factors. Neuron. 2010;68:192–195. doi: 10.1016/j.neuron.2010.10.006. [DOI] [PubMed] [Google Scholar]
  • 15.Beggiato A, et al. Gender differences in autism spectrum disorders: divergence among specific core symptoms. Autism Res. 2017;10:680–689. doi: 10.1002/aur.1715. [DOI] [PubMed] [Google Scholar]
  • 16.Kuriyama S, et al. Pyridoxine treatment in a subgroup of children with pervasive developmental disorders. Dev. Med. Child Neurol. 2002;44:284–286. [PubMed] [Google Scholar]
  • 17.Obara T, et al. Potential identification of vitamin B6 responsiveness in autism spectrum disorder utilizing phenotype variables and machine learning methods. Sci. Rep. 2018;8:14840. doi: 10.1038/s41598-018-33110-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2:e190. doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Nam JM. Simple approximation for calculating sample sizes for detecting linear trend in proportions. Biometrics. 1987;43:701–705. [PubMed] [Google Scholar]
  • 20.Cutting, D. R., Karger, D. R., Pedersen J. O. & Tukey J. W. Scatter/gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 318–329 (New York: Association for Computing Machinery (ACM), 1992).
  • 21.Raykov YP, Boukouvalas A, Baig F, Little MA. What to do when K-means clustering fails: a simple yet principled alternative algorithm. PLoS ONE. 2011;11:e0162259. doi: 10.1371/journal.pone.0162259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Guo G, Chen L, Ye Y, Jiang Q. Cluster validation method for determining the number of clusters in categorical sequences. IEEE Trans. Neural Netw. Learn Syst. 2017;28:2936–2948. doi: 10.1109/TNNLS.2016.2608354. [DOI] [PubMed] [Google Scholar]
  • 23.Sanders SJ, et al. Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron. 2011;70:863–885. doi: 10.1016/j.neuron.2011.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Spielman RS, Ewens WJ. A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. Am. J. Hum. Genet. 1998;62:450–458. doi: 10.1086/301714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Freidlin B, Zheng G, Li Z, Gastwirth JL. Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum. Hered. 2002;53:146–152. doi: 10.1159/000064976. [DOI] [PubMed] [Google Scholar]
  • 26.Purcell S, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2002;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wang Y, et al. Genome-wide association study of piglet uniformity and farrowing interval. Front. Genet. 2017;8:194. doi: 10.3389/fgene.2017.00194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Anttila V, et al. Analysis of shared heritability in common disorders of the brain. Science. 2018;360:eaap8757. doi: 10.1126/science.aap8757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Redies C, Hertel N, Hubner CA. Cadherins and neuropsychiatric disorders. Brain Res. 2012;1470:130–144. doi: 10.1016/j.brainres.2012.06.020. [DOI] [PubMed] [Google Scholar]
  • 31.Varghese M, et al. Autism spectrum disorder: neuropathology and animal models. Acta Neuropathol. 2017;134:537–566. doi: 10.1007/s00401-017-1736-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Atsem S, et al. Paternal age effects on sperm FOXK1 and KCNA7 methylation and transmission into the next generation. Hum. Mol. Genet. 2016;25:4996–5005. doi: 10.1093/hmg/ddw328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Barnby G, et al. Candidate-gene screening and association analysis at the autism-susceptibility locus on chromosome 16p: evidence of association at GRIN2A and ABAT. Am. J. Hum. Genet. 2005;76:950–966. doi: 10.1086/430454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Minhas HM, et al. An unbalanced translocation involving loss of 10q26.2 and gain of 11q25 in a pedigree with autism spectrum disorder and cerebellar juvenile pilocytic astrocytoma. Am. J. Med. Genet. A. 2013;161a:787–791. doi: 10.1002/ajmg.a.35841. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Abou-Donia MB, Suliman HB, Siniscalco D, Antonucci N, ElKafrawy P. De novo blood biomarkers in autism: autoantibodies against neuronal and glial proteins. Behav. Sci. (Basel) 2019;9:E47. doi: 10.3390/bs9050047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Lo Vasco VR. Role of phosphoinositide-specific phospholipase C η2 in isolated and syndromic mental retardation. Eur. Neurol. 2011;65:264–269. doi: 10.1159/000327307. [DOI] [PubMed] [Google Scholar]
  • 37.Potkin SG, et al. Gene discovery through imaging genetics: identification of two novel genes associated with schizophrenia. Mol. Psychiatry. 2009;14:416–428. doi: 10.1038/mp.2008.127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Konopaske GT, et al. Dysbindin-1 contributes to prefrontal cortical dendritic arbor pathology in schizophrenia. Schizophr. Res. 2018;201:270–277. doi: 10.1016/j.schres.2018.04.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Openshaw RL, et al. JNK signalling mediates aspects of maternal immune activation: importance of maternal genotype in relation to schizophrenia risk. J. Neuroinflamm. 2019;16:18. doi: 10.1186/s12974-019-1408-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Ikeda M, et al. Identification of novel candidate genes for treatment response to risperidone and susceptibility for schizophrenia: integrated analysis among pharmacogenomics, mouse expression, and genetic case-control association approaches. Biol. Psychiatry. 2010;67:263–269. doi: 10.1016/j.biopsych.2009.08.030. [DOI] [PubMed] [Google Scholar]
  • 41.Teng X, et al. KCTD: a new gene family involved in neurodevelopmental and neuropsychiatric disorders. CNS Neurosci. Ther. 2019;25:887–902. doi: 10.1111/cns.13156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Lin CH, Huang MW, Lin CH, Huang CH, Lane HY. Altered mRNA expressions for N-methyl-D-aspartate receptor-related genes in WBC of patients with major depressive disorder. J. Affect. Disord. 2019;245:1119–1125. doi: 10.1016/j.jad.2018.12.016. [DOI] [PubMed] [Google Scholar]
  • 43.Kunkle BW, et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Abeta, tau, immunity and lipid processing. Nat. Genet. 2019;51:414–430. doi: 10.1038/s41588-019-0358-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Cimino PJ, Sokal I, Leverenz J, Fukui Y, Montine TJ. DOCK2 is a microglial specific regulator of central nervous system innate immunity found in normal and Alzheimer’s disease brain. Am. J. Pathol. 2009;175:1622–1630. doi: 10.2353/ajpath.2009.090443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Sepulveda-Diaz JE, et al. HS3ST2 expression is critical for the abnormal phosphorylation of tau in Alzheimer’s disease-related tau pathology. Brain. 2015;138:1339–1354. doi: 10.1093/brain/awv056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Ghosh D, Levault KR, Brewer GJ. Relative importance of redox buffers GSH and NAD(P)H in age-related neurodegeneration and Alzheimer disease-like mouse neurons. Aging Cell. 2014;13:631–640. doi: 10.1111/acel.12216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Zong Y, et al. miR-29c regulates NAV3 protein expression in a transgenic mouse model of Alzheimer’s disease. Brain Res. 2015;1624:95–102. doi: 10.1016/j.brainres.2015.07.022. [DOI] [PubMed] [Google Scholar]
  • 48.Irimia M, et al. A highly conserved program of neuronal microexons is misregulated in autistic brains. Cell. 2014;159:1511–1523. doi: 10.1016/j.cell.2014.11.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Quesnel-Vallieres M, et al. Misregulation of an activity-dependent splicing network as a common mechanism underlying autism spectrum disorders. Mol. Cell. 2016;64:1023–1034. doi: 10.1016/j.molcel.2016.11.033. [DOI] [PubMed] [Google Scholar]
  • 50.DeMichele-Sweet MAA, et al. Genetic risk for schizophrenia and psychosis in Alzheimer disease. Mol. Psychiatry. 2018;23:963–972. doi: 10.1038/mp.2017.81. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table 1 (15KB, xlsx)
Supplementary Fig. 1 (266.5KB, pptx)
Supplementary Fig. 2 (200.3KB, tif)
Supplementary Fig. 3 (96.8KB, tif)
Supplementary Fig. 4 (283.1KB, tif)

Data Availability Statement

All data used in the study are available only to those granted access by the Simons Foundation.


Articles from Translational Psychiatry are provided here courtesy of Nature Publishing Group

RESOURCES