Abstract
Background:
Genetic studies of schizophrenia have implicated numerous risk loci including several copy number variants (CNVs) of large effect and hundreds of loci of small effect. In only a few cases has a specific gene been clearly identified. Rare CNVs affecting a single gene offer a potential avenue to discovering schizophrenia risk genes.
Methods:
CNVs were generated from exome-sequencing of 4,913 schizophrenia cases and 6,188 controls from Sweden. We integrated multiple CNV calling methods (XHMM and ExomeDepth) to expand our set of single-gene CNVs and leveraged two different approaches for validating these variants (qPCR and Nanostring).
Results:
We found a significant excess of all rare CNVs (deletions p=0.0004, duplications p=0.0006) and single-gene CNVs (deletions p=0.04, duplications p=0.03) in schizophrenia cases compared to controls. An expanded set of CNVs generated from integrating multiple approaches showed a significant burden of deletions in 11/21 gene-sets previously implicated in schizophrenia and across all genes in those sets (p=0.008), although no tests survived correction. We performed an extensive validation of all deletions in the significant set of voltage-gated calcium channels among CNVs called from both exome-sequencing and genotyping arrays. In total, 4 exonic, single-gene deletions validated in cases and none in controls (p=0.039), of which all were identified by exome-sequencing.
Conclusions:
These results point to the potential contribution of single-gene CNVs to schizophrenia, that the utility of exome-sequencing for CNV calling has yet to be maximized and single-gene CNVs should be included in gene focused studies using other classes of variation.
Keywords: schizophrenia, copy number variation, exome-sequencing, single-gene, calcium channel, genetics
Introduction
Schizophrenia (SCZ) is a heritable psychiatric disorder that causes substantial morbidity, mortality, and personal and societal costs(1-4). Identifying genetic variation influencing risk will improve our biological understanding of SCZ. Copy number variants (CNVs) are appealing as they directly alter gene dosage providing an interpretable effect on gene function. SCZ cases carry a burden of large and rare CNVs (>100 kb and <1%)(5,6) and multiple rare recurrent CNVs with substantial effects on risk (genotypic relative risks 4-20) have been identified (e.g., 16p11.2 and 22q11.21)(6-11). Most of these known CNVs are megabase-sized and affect the dosages of many genes, but if specific genes contributing to risk could be identified it would aid our understanding of the neurobiology of the disorder. Thus far, only a few individual genes from genetic studies of CNVs and SNVs have been implicated: NRXN1(12), TOP3B(13), RBM12(14) and SETD1A(15), all of which provided novel insights into SCZ pathophysiology. Therefore, gene-focused CNV evaluation in large samples with high resolution capture is needed.
The majority of CNVs have a small genomic footprint(16-19) and, due to technological limitations or cost, their contribution to SCZ remains unknown(20). Commercial microarrays are limited in resolution by probe density and are largely incapable of detecting CNVs below 10 kb while also having low specificity for CNVs between 10 and 100 kb(21). CNV detection from whole genome sequencing offers a substantial improvement, but remains expensive and is currently infeasible for large samples. Whole exome sequencing (WES) can be used to identify CNVs impacting exons(22). These data, while noisy from dependence on read depth and lacking exact breakpoints from the discrete nature of exons, can be used to identify smaller CNVs affecting single genes that may be more interpretable in their contribution to SCZ risk.
Here, we performed a comprehensive analysis of CNVs from WES data in the Swedish Schizophrenia Study of 4,978 schizophrenia cases and 6,256 controls(23). Our goals were to evaluate the impact of single-gene CNVs on SCZ risk, and to discover copy number changes in specific genes that could lead to improved mechanistic understanding of SCZ risk. All samples also have GWAS genotyping arrays and Illumina exome arrays(24,25) providing additional data to follow up and validate CNVs.
Methods
Sample description
We extracted DNA from venous blood samples from 11,234 Swedish participants (4,978 SCZ cases, 6,256 controls, mean age at sample collection: 55 years). An additional 1,172 samples were included in generating and cleaning exome-sequencing CNVs to improve estimates of copy number and frequency but were removed before analyses (total N: 12,384). All procedures were approved by ethical committees in Sweden and the US, and all subjects provided informed written consent. Genomic investigation of each subject was done using independent technologies including GWAS genotyping(24), exome array genotyping(20), and exome sequencing(23,26). Genotyping and sequencing were conducted at the Broad Institute. Rare CNVs from GWAS arrays and exome genotyping arrays had been previously generated(20,25), and is briefly described in the supplementary material. Exome-sequencing based CNVs were generated for this analysis and have not previously been reported. Individuals already known to be carrying large CNV were included in all analyses. All genomic locations are given in NCBI build 37/UCSC hg19 coordinates.
CNV calling and QC using XHMM
We ran XHMM (eXome-Hidden Markov Model) as previously described(22,27), including calculating mean per-base coverage across 189,894 targets (sequences designed for capture, predominantly exons) using GATK DepthOfCoverage. A total of 14,555 targets were excluded before CNV calling due to: mean sequencing depth <10x, low complexity sequence (as defined by RepeatMasker) in >25% of its span, GC content <10% or >90%, and spanning <10 bp or >10 kb. The resulting sample-by-target read depth matrix was scaled by mean-centering the targets, after which principal component analysis (PCA) of the matrix was performed. To normalize the data, the top 109 principal components (those with variance >70% of the mean variance across all components) were removed from the data to account for systematic biases at the target- or sample-level, such as GC content or sequencing batch effects. Additional targets (n=37) were removed if variance in read depth remained high after normalization (standard deviation >50). CNVs were called using the Viterbi hidden Markov model (HMM) with default XHMM parameters, and XHMM CNV quality scores (SQ) were calculated using the forward-backward HMM. For any CNV detected in at least one individual, we statistically genotyped all samples using the same XHMM quality scores and outputted as a single VCF. Twenty-two samples failed CNV calling due to low overall read depth, A total of 175,303 targets were used to call CNVs across 12,384 samples after all filtering. CNVs from sex chromosomes would be inaccurately called since males and females were run together and so were removed from analyses.
There were 494,403 autosomal CNVs called by XHMM before any filtering. We removed 115 individuals (56 cases, 59 controls) with > 3 standard deviations from the mean in total number of CNVs (71.5) or total genomic content affected by CNVs (6,529 kb). After sample outlier removal, 484,940 CNV (SQ > 0) were used to develop a frequency filter, and we retained only CNVs present in less than 1% of individuals (<0.5% minor allele frequency). To account for the discrete nature of exons, each target was numbered sequentially based on genomic coordinates and frequency filtering was done using the sequential target information before mapping targets to genomic positions. After frequency filtering, there were 51,812 CNVs with a per individual mean of 4.3 ranging from 1-107. After quality filtering (SQ ≥ 60), 14,243 CNVs remained (we refer to this dataset going forward as the “exome QC” dataset). The median CNV length was 22,991bp and 77% (n=10,950) were below 100kb which is a typical cutoff for array-based CNVs. We note, however, that the lengths of CNVs generated from exome-sequencing are often inaccurate due to the discrete nature in which breakpoints are determined.
Expanded single-gene CNV dataset integrating XHMM and ExomeDepth
Increasing XHMM quality scores (SQ) disproportionately removes shorter CNVs. In an effort to quantify the proportion of shorter CNVs with lower quality scores that are true, we used exome-sequencing data from 624 trios(28) and calculated transmission as a function of quality score and minimum number of targets required per CNV. These data were processed as described above. We focused on rare CNVs (< 0.1%) to avoid counting transmissions arbitrarily. At default filtering thresholds (SQ ≥ 60, >= 3 exons) we calculated a transmission rate of 0.42 (64 maternal CNVs, 26 transmitted; 71 paternal CNVs, 30 transmitted). CNVs having a single supporting exon and no minimum SQ (i.e. all CNVs) were substantially more frequent but had reduced transmission rate of 0.114 (449 maternal CNVs, 55 transmitted; 575 paternal CNVs, 62 transmitted). However, this transmission rate suggests that potentially 20+% of these “low quality” events may be real. As a method to retain the true shorter CNVs while removing as many of the false positive CNVs as possible we required additional support from an independent approach ExomeDepth(29). Briefly, ExomeDepth selects a reference set of individuals having similar sequencing properties independently for CNV inference of each sample. We called CNVs within experimental plates of 96 individuals that were processed and sequenced at the same time in order to provide the most comparable reference set for each sample and reduce batch effects. In total, we called CNVs for 12,313 samples totaling 1,915,300 CNV with a mean of 155.5 per individual and ranging from 1-811.
We retained all XHMM calls with SQ ≥ 60 and any CNV called by both ExomeDepth and XHMM regardless of quality score (referred to as the “expanded exome” dataset). For comparison, 92% of calls from the exome QC dataset were called by ExomeDepth whereas only 20% of CNVs affecting a single exon and low SQ (< 30) were called across both methods. In total, the expanded exome dataset had 24,843 CNVs (10,600 added to exome QC). To further assess quality, we compared the additional CNVs to a high confident set of CNVs (>100kb) from genotyping arrays of the same individuals(25). While the vast majority of these calls (90%, 3,254 out of 3,597) are identified in our exome QC dataset, there are still 343 that that are called by XHMM but that do not surpass the filtering threshold. Only 28% (10,600/37,569) of the possible XHMM calls were added to our expanded exome dataset yet 88% (300/343) of the remaining high-quality genotyping CNVs were included. Using the union of the two approaches allows us to expand our set of shorter CNVs while retaining only those with the most support.
CNV burden and association analyses
We performed burden and association analyses using Plink(30), employing empirical permutation (n=10,000) of case/control label where permutation was performed within sequencing batch to account for any batch effects. CNVs were considered to affect a gene if there was any overlap of the genomic coordinates of the CNVs and the gene. For gene-set tests, we used a regression framework built into Plink(31) that tests whether cases carry more CNVs in the set of genes compared to all genes after covarying for number and amount of CNVs.
Incorporating CNVs from previously run genotyping arrays of the same individuals
To maximize the sensitivity to detect gene/exon level CNVs, we constructed a union call set by combining the data from GWAS array, exome array, and our expanded exome CNV dataset. We first created a database of all non-redundant CNVs, where, for each CNV record, we indicated (1) how many platform(s) had identified the CNV; (2) which specific platform(s) had identified the CNV; (3) the coordinates of CNV from each platform. We considered two CNVs redundant if they had the same direction of the copy number change and they overlapped more than 50% of their lengths. Details for this “exome plus array” dataset are described in supplemental materials (Tables S3-S4, Figure S2).
Validation of CNV
We attempted validation of 55 deletions from the exome plus array dataset that affected any calcium channel gene (N = 26 genes) using a combination of both quantitative PCR (qPCR) and NanoString nCounter technology. First, qPCR was used to verify 6 CNVs detected in calcium channel genes CACNA2D3, CACNA1B, CACNA2D4 and CACNG2 (Table S5). Several predesigned TaqMan Copy Number Assays were run in quadruplicate along with the internal RNase P Copy Number Reference Assay according to manufacturer’s instructions (Applied Biosystems, Foster City, CA). Briefly, 20 μl reactions containing 1 μl DNA (5 ng), 10 μl of 2X Taqman Genotyping Master Mix, 1 μl of one target CNV assay and 1 μl of RNase P reference assay were mixed. All qPCR reactions were run on a Life Technologies StepOnePlus machine with the following thermal cycling conditions: 95°C for 10 min, followed by 40 cycles of 95°C for 15 s and 60°C for 1 min. Samples included all suspected CNV carriers for each gene, regardless of case or control status, as well as four presumed two-copy controls per gene.
Second, for a larger scale validation, we used Nanostring nCounter technology. For each CNV, two probes were designed and analyses were performed according to manufacturer instructions. In brief, a spike-in plasmid of known amount was used to control for variability in DNA quantity across all samples and additional controls ensured optimal hybridization and purification efficiency. After hybridization and removal of excess probes, the probe/target complexes were aligned and immobilized in the nCounter Cartridge, and imaged in the nCounter Digital Analyzer for detection of CNVs. In a previous study, we examined nCounter’s CNV calling accuracy by testing 37 known CNVs in 384 samples and found 97% concordance. We were able to successfully attempt validation for 48 of the 55 deletions.
Results
Exome-sequencing CNVs demonstrate high concordance with genotyping array based CNVs while contributing substantial numbers of novel variants
We generated CNVs using XHMM for 4,913 SCZ cases and 6,188 controls resulting in a total of 14,243 rare (present in less than 1% of individuals) and high quality (SQ ≥ 60) CNVs (“exome QC dataset”). In a comparison to previously published CNVs from genotyping arrays on these individuals(25) (see Supplementary Methods) we identified 78% of the array-based CNVs in the exome QC dataset. More interestingly, 75% of the exome QC calls were not seen in the array-based call set. Individuals carried, on average, 2.2 times more CNVs in the exome QC dataset than in the array-based call set (1.28 versus 0.59 CNVs). This comparison is described in more detail(22). Specific to this work, 53% of exome QC CNVs overlapped a single protein coding gene (94% had length < 100kb) and, of those, only 12.6% were included in the previous work on this sample leaving 87.4% or 6,622 single-gene CNVs to be analyzed for the first time here.
Significant burden of exome-sequencing based CNVs in SCZ including among single-gene CNVs
We first assessed the burden of all CNVs in the exome QC dataset to SCZ. Utilizing empirical permutation of case/control label (see Methods) we identified a significant increase in the numbers of deletions (case rate: 0.56, control rate: 0.51, p = 0.0004) and duplications (case rate: 0.78, control rate: 0.72, p = 0.0006) in SCZ cases compared to controls as seen previously in this sample(25). To identify the contribution of the novel CNVs in our exome QC dataset, we performed the same burden test using only CNVs new to this analysis and not called by arrays in previous work. Here, we again saw significant burden in cases for both deletions (case rate: 0.48, control rate: 0.45, p = 0.0114) and duplications (case rate: 0.56, control rate: 0.51, p = 0.0003). The exome QC CNVs are substantially shorter and therefore more likely to affect only a single gene. We tested whether burden of CNVs was primarily driven by larger events affecting multiple genes or if single-gene CNVs were contributing. We identified a significant but modest burden of single-gene deletions (case rate: 0.36, control rate: 0.34, p = 0.0395) and duplications (case rate: 0.34, control rate: 0.32, p = 0.0332) in SCZ cases compared to controls (Figure 1, Table 1). These results were not driven by CNV length in deletions (p < 100kb = 0.071, p > 100kb = 0.072) or duplications (p < 100kb = 0.053, p > 100kb = 0.181).
Figure 1.
Burden tests across all high confident exome-seq CNVs (all), those not previously analyzed from genotyped arrays (new), those previously published (published) and only those CNVs affecting a single protein coding gene (single-gene). Deletions are in red (left) and duplications are in blue (right). Significance is represented as p < 0.05 (*), p < 0.001 (**).
Table 1.
CNV Burden Results Stratified by Type, Number of Genes Affected, and Novelty
| Deletions |
Duplications |
||||||||
|---|---|---|---|---|---|---|---|---|---|
| n | Case Rate | Control Rate | p | n | Case Rate | Control Rate | p | ||
| All | All | 5900 | 0.56 | 0.51 | .0004a | 8343 | 0.78 | 0.73 | .0006a |
| New | 5101 | 0.48 | 0.45 | .0114a | 5925 | 0.56 | 0.51 | .0003a | |
| Previously called by genotyping arrays | 799 | 0.08 | 0.06 | .0002a | 2418 | 0.23 | 0.21 | .1396 | |
| Single Gene | All | 3894 | 0.36 | 0.34 | .0395a | 3680 | 0.34 | 0.32 | .0332a |
| New | 3530 | 0.33 | 0.31 | .0543 | 3092 | 0.29 | 0.27 | .0162a | |
| Previously called by genotyping arrays | 364 | 0.03 | 0.03 | .1998 | 588 | 0.05 | 0.05 | .6180 | |
| Multiple Genes | All | 1773 | 0.18 | 0.14 | .0001a | 4516 | 0.42 | 0.39 | .0030a |
| New | 1339 | 0.13 | 0.11 | .0174a | 2695 | 0.25 | 0.23 | .0076a | |
| Previously called by genotyping arrays | 434 | 0.05 | 0.03 | .0001a | 1821 | 0.17 | 0.16 | .0756 | |
The copy number variant (CNV) burden results were stratified by CNV type (deletions, duplications), number of genes affected (all, single gene, or multiple genes), and whether the CNV was unique to our exome-sequencing call set or was identified in previous array-based CNV work.
p < .05.
Expanding the set of potential single-gene CNVs and testing for excess in specific genes
We next sought to test whether CNVs could implicate specific genes using both the exome QC dataset as well as an expanded exome dataset created to increase the proportion of shorter CNVs which our QC filters were disproportionately removing (see Methods). Briefly, we integrated CNVs called from both XHMM and ExomeDepth(29) retaining CNVs if detected by both methods regardless of XHMM quality scores or if detected only by XHMM at our previous filtering threshold (SQ ≥ 60). In total, our “expanded exome dataset” included an additional 10,600 CNVs (total: 24,843) substantially increasing the proportion of shorter events (Figure S1). Individual genes were tested for excess of deletions or duplications using empirical permutation. After 10,000 permutations in our exome QC dataset, 21 genes were significantly enriched for duplications and 40 genes were significantly enriched for deletions in cases compared to controls after multiple test correction. All significant genes fell into two genomic regions of already known large SCZ risk CNVs, 16p11.2 (duplications) and 22q11.2 (deletions) leaving no novel genes identified (Figure 2). Finally, using our expanded exome dataset we again tested for enrichment of deletions and duplications in specific genes. No gene was significant after correction with the most significant genes again being driven by the larger 16p11.2 or 22q11.2 CNVs.
Figure 2.
Gene-based Manhattan plot of duplications in blue (top) and deletions in red (bottom). Genes in most significant regions are labeled by known CNVs in that region.
Testing contribution of only single-gene CNVs to previously implicated SCZ gene sets
In the absence of any novel genes being identified above, we tested whether single-gene CNVs were enriched among previously implicated gene sets. In the expanded exome dataset, there were 14,091 CNVs affecting only a single protein-coding gene (7,423 deletions, 6,668 duplications) and 7,703 affecting multiple genes (2,443 deletions, 5,260 duplications). The sets tested included genes previously implicated in other SCZ studies (GWAS loci(32), de novo variants(33), CNV regions(6)), synaptic function(5) (ARC, mGluR5, NMDAR, PSD95), calcium channels(26) (CAV2, Voltage-gated), secondary sets (FMRP targets(23,26), ASD/DD/ID de novo(33), essential genes(34), constrained genes(35), RBFOX related genes(23) and antipsychotic targets(36)). The combined set of genes across all sets (n = 8,970) showed significant excess in cases for single-gene deletions (p = 0.008) but not duplications (p=0.186). We identified nominally significant enrichment of single-gene deletions in over half (11 out of 21) of the sets (Table 2), however no gene set surpassed a Bonferroni corrected p-value of 0.0005 for the 88 tests performed. For comparison, multi-gene deletions also showed nominally significant enrichment in 11 of the 21 sets, including 6 that surpassed Bonferroni correction (all genes, DD de novo, ID de novo, constrained, essential and SCZ deletion regions). Multi-gene duplications were significantly enriched in 7 of the 11 sets but none survived correction including all genes (p=0.007).
Table 2.
Gene Set CNV Results for Single-Gene and Multigene CNVs in Expanded Dataset
| Single-Gene CNV |
Multigene CNV |
|||||
|---|---|---|---|---|---|---|
| Group | Set | Genes (n) | p Del | p Dup | p Del | p Dup |
| SC2 Sets | PGC2 SCZ 108 loci | 329 | .4199 | .7021 | .6372 | .0260a |
| SCZ de novo LoF | 87 | .0296a | .0911 | .4697 | .6710 | |
| SCZ de novo NS | 611 | .2761 | .9024 | .0857 | .5107 | |
| PGC2 16 CNV | 175 | .0055a | .9989 | .0451a | .0212a | |
| PGC2 16 CNV (deletions) | 78 | .2594 | .9289 | .0001a | .9602 | |
| PGC2 16 CNV (duplications) | 111 | .0077a | .5790 | .0757 | .0082a | |
| Synaptic Sets | ARC | 28 | 1.0000 | .3543 | .4426 | .5778 |
| mGluR5 | 39 | .2425 | .8117 | .2168 | .2144 | |
| NMDAR network | 61 | .0335a | .8099 | .0180a | .0243a | |
| PSD-95 (core) | 65 | .0697 | .8984 | .0009a | .2630 | |
| Calcium Channel Sets | CAV2 | 206 | .0347a | .4065 | .1624 | .4822 |
| CAV2 ion | 44 | .1738 | .2395 | .3951 | .6500 | |
| Voltage-gated calcium channel genes | 26 | .0082a | .2792 | .9953 | .8284 | |
| Secondary Sets | FMRP targets | 788 | .0205a | .5018 | .0216a | .0123a |
| ASD de novo | 1080 | .0521 | .4664 | .0796 | .1123 | |
| DD de novo | 1271 | .0161a | .9660 | .0003a | .1308 | |
| ID de novo | 350 | .0898 | .5774 | .0001a | .5938 | |
| Antipsychotic targets | 347 | .0268a | .9338 | .0342a | .7322 | |
| Essential genes | 3915 | .0622 | .1975 | .0001a | .0987 | |
| RBFOX | 2737 | .0058a | .2056 | .0087a | .0200a | |
| LoF Intolerant (pLI > .9) | 3488 | .0163a | .2430 | .0001a | .0041a | |
ASD, autism spectrum disorder; CNV, copy number variant; DD, developmental delay; Del, deletion; Dup, duplication; ID, intellectual disability; LoF, loss of function: NS, nonsynonymous; PGC, Psychiatric Genomics Consortium; PLI, probability of being loss of function intolerant; SCZ, schizophrenia.
Pathways with p < .05.
Broad scale exploration of CNVs in calcium channel genes combining both the expanded exome dataset and array-based calls
Among the most significant gene sets, we selected voltage-gated calcium channels for a full-scale validation since it represented an approachable number of CNVs to validate comprehensively and had significant prior supporting literature. Across the 26 genes, we identified 6 deletions in cases and 0 in controls from our expanded exome dataset (Figure 3). Since validation with an independent technology is considered the gold standard for CNV work, we attempted to validate these deletions using quantitative PCR (qPCR). Four of the deletions validated, two identical single exon deletions in CACNA2D3 did not validate (these two did not surpass filtering thresholds to be included in the exome QC dataset). Since we had additional CNV data from genotyping arrays, we wanted to validate a larger set of calcium channel deletions to more comprehensively catalog the contribution of deletions in these genes to risk of SCZ in this sample. We identified a set of deletions across our exome plus array dataset (see Methods and Supplementary Methods) overlapping any voltage-gated calcium gene. In total, we identified 55 deletions in 55 different samples of which 48 could be tested using NanoString nCounter technology (see Methods). Of these, 34 were located over three common, intronic copy number polymorphisms (all of which validated). Of the 21-remaining rare-variant calls, 6 validated (see Table 3). The low validation rate is representative of our decision to take all CNVs with limited evidence and not filter on confidence. Nearly all of the CNVs that did not validate were low quality calls from the genotyping arrays. Four of the validated deletions were single-gene, in cases and identified in the expanded exome dataset. The remaining two validated deletions were a non-exonic deletion and a multi-gene deletion in a control identified from genotyping arrays. After validation, we were left with 4 single-gene deletions in cases and 0 in controls (p = 0.039).
Figure 3.
Gene model plots for each of the 4 genes and 6 deletions identified in voltage-gated calcium channel genes. Upper grey bars portray deletion in genomic space, below that is the gene model in genomic space. The bottom bars represent the exons as transcribed, red indicates exons that were deleted. All deletions replicated except the two shown in panel B.
Table 3.
List of All Rare Deletions Overlapping the 26 Voltage-Gated Calcium Channel Genes That Validated Including One That Did Not Overlap an Exon and One That Was Not a Single Gene
| Status | Chr | Start | Stop | Size (bp) | Gene | Single Gene? | Exonic? |
|---|---|---|---|---|---|---|---|
| SCZ Case | 9 | 140,866,027 | 141,004,005 | 137,978 | CACNA1B | Yes | Yes |
| SCZ Case | 9 | 140,846,726 | 141,016,451 | 169,725 | CACNA1B | Yes | Yes |
| SCZ Case | 12 | 1,949,932 | 1,965,357 | 15,425 | CACNA2D4 | Yes | Yes |
| SCZ Case | 22 | 36,960,396 | 36,960,935 | 539 | CACNG2 | Yes | Yes |
| Control | 3 | 54,262,746 | 54,316,431 | 53,685 | CACNA2D3 | Yes | No |
| Control | 7 | 79,818,265 | 82,072,777 | 2,254,512 | CACNA2D1 | No | Yes |
Four of the 6 single-gene deletions identified in the gene set analyses and in Figure 3 validated (CACNA2D3 did not). Chr, chromosome; SCZ, schizophrenia.
Discussion
This study represents an evaluation of smaller CNVs in a large SCZ sample. We found that, independent of larger events, deletions of single genes may contribute to schizophrenia risk through a number of biological pathways previously identified for SCZ. In particular, we identify and validate a small number of deletions in voltage-gated calcium channels that are enriched in SCZ cases compared to controls. We also demonstrate the utility of exome-sequencing to identify shorter, single-gene CNVs and the potential to improve the resolution of those events through combining multiple methods for further study.
To date, the contribution of CNVs to SCZ risk has been predominantly from large (>100kb) and rare CNVs both in specific loci and in aggregate across the genome(6). The ability to determine the contribution from shorter CNVs has been both technologically limited by the use of genotyping arrays but also biologically up for debate as few single genes have been implicated in SCZ risk and nearly all risk increasing CNVs affect many genes. Here, we point to the potential contribution of single-gene CNVs to risk for SCZ. This contribution can be identified both genome-wide and within genes having been previously implicated from other studies of genetic variation including synaptic genes, genes having de novo mutations in SCZ, DD, ASD or ID, conserved genes and gene targets of antipsychotics. In comparison, multi-gene CNVs showed more significant enrichment among these sets including 6 surpassing correction but the majority of significant sets were shared between single-gene and multi-gene CNVs. Many of these gene sets were discovered from large CNV analyses making these results already known. Also, many of the large CNVs implicated in SCZ also contribute to other related phenotypes such as DD and ID. More interestingly, there were 4 sets that showed significant enrichment only in single-gene CNVs that included calcium channels and SCZ LoF de novo variants pointing to potential examples of variants of large effect on SCZ risk that have not yet been seen in the larger CNVs. This work points to a confluence of evidence that these gene sets are relevant for schizophrenia biology. We did not identify any specific gene that was significantly associated after correction for multiple testing. Given other studies of rare variation in complex diseases with similar sample sizes, this is not surprising(25) but our results suggest that combining CNV data with SNV data could improve power to implicate specific genes and robust approaches to combine these classes of variation are needed. For these approaches, leveraging knowledge of how intolerant a gene is to variation, thereby weighting variants by their potential impact may also improve discovery. Further, while the overall contribution of CNVs to SCZ risk is modest and the contribution from single-gene CNVs is even less, the addition of CNV burden to measures of individual risk such as polygenic risk scores could offer improvements in risk stratification and should be fully assessed.
Calcium channel genes have been implicated in psychiatric disease risk, including in SCZ for many years. Studies to date from the genetics of SCZ have implicated both particular loci and the geneset as a whole. Here, we show an excess of single-gene CNVs in calcium channels among SCZ patients that remains after qPCR validation. Given the importance of this gene set and the relative size, we also performed a larger validation of deletions using a higher throughput method that again confirmed 4 qPCR validated single-gene deletions in cases as well as validating several common CNVs, one >2Mb deletion in a control and one deletion that did not overlap an exon in a control. Our results suggest that deleting a single calcium channel gene may be relevant for SCZ risk however substantially more data will be required to confirm this finding.
We show that exome-sequencing can identify a substantial number of novel CNVs that are not captured by genotyping arrays and are predominantly affecting only a single gene. Further, this work points to the existence of many real single-gene CNVs that are filtered out by default filtering criteria and by combining multiple currently existing approaches we can capture an expanded set of true calls. While exome-sequencing can substantially improve resolution of CNV calling it is not without its weaknesses and limitations that become even clearer as CNVs get smaller. Whole-genome sequencing will offer the best resolution to confidently identify single-gene CNVs but is still prohibitively expensive for most labs and hundreds of thousands of exome sequences currently exist, and many more are being generated, making CNV calling from exome-sequencing still important. We believe there are opportunities to improve the ability to call shorter CNVs from exome-sequencing that are more sophisticated than merging call sets from multiple approaches and there is continued effort in this area that we anticipate will provide additional value to CNV calling from exome-sequencing.
Here, we demonstrate a potential role for single-gene deletions to contribute to SCZ risk through similar pathways as previously implicated. We perform a comprehensive validation of deletions in voltage-gated calcium channel genes and show an enrichment of these deletions in SCZ cases compared to controls. Finally, we demonstrate further utility for CNV generated from exome-sequencing and the ability to improve resolution of shorter events which could improve our ability to identify biological causes of diseases like SCZ.
Supplementary Material
KEY RESOURCES TABLE
| Resource Type | Specific Reagent or Resource | Source or Reference | Identifiers | Additional Information |
|---|---|---|---|---|
| Add additional rows as needed for each resource type | Include species and sex when applicable. | Include name of manufacturer, company, repository, individual, or research lab. Include PMID or DOI for references; use “this paper” if new. | Include catalog numbers, stock numbers, database IDs or accession numbers, and/or RRIDs. RRIDs are highly encouraged; search for RRIDs at https://scicrunch.org/resources. | Include any additional information or notes if necessary. |
| Deposited Data; Public Database | Swedish Schizophrenia Population-Based Case-control Exome Sequencing | 24463508, 27694994 | phs000473/GRU | |
| Software; Algorithm | Plink | 17701901 | http://zzz.bwh.harvard.edu/plink/download.shtml#download | |
| XHMM | 23040492 | https://atgu.mgh.harvard.edu/xhmm/download.shtml | ||
| ExomeDepth | 22942019 | https://cran.r-project.org/web/packages/ExomeDepth/index.html |
Acknowledgements
This work was supported by NIMH R01 MH111776 (DMR) and R21 MH104831 (JPS, JJC). PFS gratefully acknowledges support from the Swedish Research Council (Vetenskapsrådet, award D0886501). The Sweden Schizophrenia Study was supported by NIMH R01 MH077139.
Footnotes
Declaration of Conflicts of Interest
The authors report no biomedical financial interests or potential conflicts of interest.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Saha S, Chant D, McGrath J. A Systematic Review of Mortality in Schizophrenia: Is the Differential Mortality Gap Worsening Over Time? Arch Gen Psychiatry. 2007. October 1;64(10):1123–31. [DOI] [PubMed] [Google Scholar]
- 2.WHO ∣ The global burden of disease: 2004 update [Internet]. WHO; [cited 2018 Mar 6]. Available from: http://www.who.int/healthinfo/global_burden_disease/2004_report_update/en/ [Google Scholar]
- 3.Knapp M, Mangalore R, Simon J. The global costs of schizophrenia. Schizophr Bull. 2004;30(2):279–93. [DOI] [PubMed] [Google Scholar]
- 4.Saha S, Chant D, Welham J, McGrath J. A Systematic Review of the Prevalence of Schizophrenia. PLOS Med. 2005. May 31;2(5):e141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kirov G, Pocklington AJ, Holmans P, Ivanov D, Ikeda M, Ruderfer D, et al. De novo CNV analysis implicates specific abnormalities of postsynaptic signalling complexes in the pathogenesis of schizophrenia. Mol Psychiatry. 2012. February;17(2):142–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.CNV and Schizophrenia Working Groups of the Psychiatric Genomics Consortium. Contribution of copy number variants to schizophrenia from a genome-wide study of 41,321 subjects. Nat Genet. 2017. January;49(1):27–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sullivan PF, Daly MJ, O’Donovan M. Genetic architectures of psychiatric disorders: the emerging picture and its implications. Nat Rev Genet. 2012. August;13(8):537–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Malhotra D, Sebat J. CNVs: Harbingers of a Rare Variant Revolution in Psychiatric Genetics. Cell. 2012. March 16;148(6):1223–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Levinson DF, Duan J, Oh S, Wang K, Sanders AR, Shi J, et al. Copy Number Variants in Schizophrenia: Confirmation of Five Previous Findings and New Evidence for 3q29 Microdeletions and VIPR2 Duplications. Am J Psychiatry. 2011. March;168(3):302–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Guha S, Rees E, Darvasi A, Ivanov D, Ikeda M, Bergen SE, et al. Implication of a Rare Deletion at Distal 16p11.2 in Schizophrenia. JAMA Psychiatry. 2013. March 1;70(3):253–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Liao H-M, Chao Y-L, Huang A-L, Cheng M-C, Chen Y-J, Lee K-F, et al. Identification and characterization of three inherited genomic copy number variations associated with familial schizophrenia. Schizophr Res. 2012. August 1;139(1):229–36. [DOI] [PubMed] [Google Scholar]
- 12.Kirov G, Rujescu D, Ingason A, Collier DA, O’Donovan MC, Owen MJ. Neurexin 1 (NRXN1) Deletions in Schizophrenia. Schizophr Bull. 2009. September;35(5):851–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Stoll G, Pietiläinen OPH, Linder B, Suvisaari J, Brosi C, Hennah W, et al. Deletion of TOP3β, a component of FMRP-containing mRNPs, contributes to neurodevelopmental disorders. Nat Neurosci. 2013. September;16(9):1228–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Steinberg S, Gudmundsdottir S, Sveinbjornsson G, Suvisaari J, Paunio T, Torniainen-Holm M, et al. Truncating mutations in RBM12 are associated with psychosis. Nat Genet [Internet]. 2017. June 19 [cited 2017 Jun 20];advance online publication. Available from: https://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3894.html [DOI] [PubMed]
- 15.Singh T, Kurki MI, Curtis D, Purcell SM, Crooks L, McRae J, et al. Rare loss-of-function variants in SETD1A are associated with schizophrenia and developmental disorders. Nat Neurosci. 2016. April;19(4):571–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015. October;526(7571):75–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, et al. Mapping copy number variation by population-scale genome sequencing. Nature. 2011. February;470(7332):59–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Stankiewicz P, Lupski JR. Structural Variation in the Human Genome and its Role in Disease. Annu Rev Med. 2010;61(1):437–55. [DOI] [PubMed] [Google Scholar]
- 19.Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, et al. Diversity of Human Copy Number Variation and Multicopy Genes. Science. 2010. October 29;330(6004):641–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Szatkiewicz JP, Neale BM, O’Dushlaine C, Fromer M, Goldstein JI, Moran JL, et al. Detecting large copy number variants using exome genotyping arrays in a large Swedish schizophrenia sample. Mol Psychiatry. 2013. November;18(11):1178–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev Genet. 2011. May;12(5):363–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ruderfer DM, Hamamsy T, Lek M, Karczewski KJ, Kavanagh D, Samocha KE, et al. Patterns of genic intolerance of rare copy number variation in 59,898 human exomes. Nat Genet. 2016. October;48(10):1107–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Genovese G, Fromer M, Stahl EA, Ruderfer DM, Chambert K, Landén M, et al. Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia. Nat Neurosci. 2016. November;19(11):1433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ripke S, O’Dushlaine C, Chambert K, Moran JL, Kahler AK, Akterin S, et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat Genet. 2013. October;45(10):1150–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Szatkiewicz JP, O’Dushlaine C, Chen G, Chambert K, Moran JL, Neale BM, et al. Copy number variation in schizophrenia in Sweden. Mol Psychiatry. 2014. July;19(7):762–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Purcell SM, Moran JL, Fromer M, Ruderfer D, Solovieff N, Roussos P, et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature. 2014. February 13;506(7487):185–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Fromer M, Moran JL, Chambert K, Banks E, Bergen SE, Ruderfer DM, et al. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am J Hum Genet. 2012. October 5;91(4):597–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Fromer M, Pocklington AJ, Kavanagh DH, Williams HJ, Dwyer S, Gormley P, et al. De novo mutations in schizophrenia implicate synaptic networks. Nature. 2014. February;506(7487):179–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Plagnol V, Curtis J, Epstein M, Mok KY, Stebbings E, Grigoriadou S, et al. A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics. 2012. November 1;28(21):2747–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet. 2007. September 1;81(3):559–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Raychaudhuri S, Korn JM, McCarroll SA, Consortium TIS, Altshuler D, Sklar P, et al. Accurately Assessing the Risk of Schizophrenia Conferred by Rare Copy-Number Variation Affecting Genes with Brain Function. PLOS Genet. 2010. September 9;6(9):e1001097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014. July 24;511(7510):421–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Nguyen HT, Bryois J, Kim A, Dobbyn A, Huckins LM, Munoz-Manchado AB, et al. Integrated Bayesian analysis of rare exonic variants to identify risk genes for schizophrenia and neurodevelopmental disorders. Genome Med. 2017. December 20;9:114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ji X, Kember RL, Brown CD, Bućan M. Increased burden of deleterious variants in essential genes in autism spectrum disorder. Proc Natl Acad Sci. 2016. December 27;113(52):15054–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016. August;536(7616):285–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ruderfer DM, Charney AW, Readhead B, Kidd BA, Kahler AK, Kenny PJ, et al. Polygenic overlap between schizophrenia risk and antipsychotic response: a genomic medicine approach. Lancet Psychiatry. 2016. April 1;3(4):350–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



