Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Feb 22.
Published in final edited form as: Mol Psychiatry. 2011 Feb 22;17(2):193–201. doi: 10.1038/mp.2011.11

Schizophrenia susceptibility alleles are enriched for alleles that affect gene expression in adult human brain

Alexander L Richards 1, Lesley Jones 1, Valentina Moskvina 1, George Kirov 1, Pablo V Gejman 2, Douglas F Levinson 3, Alan R Sanders 2; Molecular Genetics of Schizophrenia Collaboration (MGS); International Schizophrenia Consortium (ISC), Shaun Purcell 5,6,7,8, Peter M Visscher 9, Nick Craddock 1, Michael J Owen 1, Peter Holmans 1, Michael C O’Donovan 1,*
PMCID: PMC4761872  NIHMSID: NIHMS751248  PMID: 21339752

Abstract

It is widely thought that alleles that influence susceptibility to common diseases, including schizophrenia, will frequently do so through effects on gene expression. Since only a small proportion of the genetic variance for schizophrenia has been attributed to specific loci, this remains an unproven hypothesis. The International Schizophrenia Consortium (ISC) recently reported a substantial polygenic contribution to that disorder, and that schizophrenia risk alleles are enriched among SNPs selected for marginal evidence for association (p<0.5) from genome wide association studies (GWAS). It follows that if schizophrenia susceptibility alleles are enriched for those that affect gene expression, those marginally associated SNPs which are also eQTLs should carry more true association signals compared with SNPs which are not. To test this, we identified marginally associated (p<0.5) SNPs from two of the largest available schizophrenia GWAS datasets. We assigned eQTL status to those SNPs based upon an eQTL dataset derived from adult human brain. Using the polygenic score method of analysis reported by the ISC, we observed and replicated the observation that higher probability cis-eQTLs predicted schizophrenia better than those with a lower probability for being a cis-eQTL. Our data support the hypothesis that alleles conferring risk of schizophrenia are enriched among those that affect gene expression. Moreover, our data show that notwithstanding the likely developmental origin of schizophrenia, studies of adult brain tissue can in principle allow relevant susceptibility eQTLs to be identified.

Introduction

A high proportion of mutations for simple (Mendelian) genetic disorders exert their pathogenic effects by altering the structure of the encoded protein but this does not appear to be the case for the majority of susceptibility alleles for common phenotypes identified through genome-wide association studies (GWAS) (1). This is compatible with the hypothesis that inherited variation that impacts upon mRNA expression plays an important part in susceptibility to complex traits (2-4). Only a small proportion of the genetic variance for risk to common diseases has been attributed to specific loci (5, 6) including schizophrenia (7-10). Therefore, while it has been argued that gene expression analysis is a key component of understanding the pathogenesis of schizophrenia (11,12), the hypothesis of the involvement in that disorder of alleles that influence gene expression is unproven.

From the perspective of identifying risk alleles, the hypothesis that susceptibility variants for schizophrenia will be enriched for variants that influence mRNA expression is not merely of academic interest. We (13, 14) and others (15) have reported associations between gene expression and genetic variants whose associations with schizophrenia are controversial, the idea being that association with expression lends credibility to association with disease status. Others have used this principle in non-psychiatric disorders to localise the likely susceptibility genes or functional variants within regions of association (16). Since the effect sizes of common alleles are small (7), and most are unlikely to be reliably separated from chance findings in the full genome context in the near future (9), the ability to assign an enhanced prior probability to variants associated with gene expression may be of value in identifying novel disease associations.

Although the convergent use of expression and genetic data for informing pathophysiological theory seems intuitively reasonable (17,18), the validity of this approach for informing genetic studies depends on the assumption that true associations are enriched among variants that impact upon gene expression. Moreover, in the case of schizophrenia, attempts to relate disorder-associated variants to gene expression are generally based upon mRNA studies of adult brain, peripheral tissues, or cell lines. Whether such studies are justified for disorders like schizophrenia, whose origins are thought to be developmental, is unclear. Interestingly, however, in a recent study (19), SNPs that affected expression in lymphoblasts were enriched among the top 10,000 GWAS associations for a number of disorders including associations from a bipolar GWAS.

Here, we tested the hypothesis that polymorphisms that are associated with schizophrenia are enriched among those that show evidence for association to gene expression in adult brain. Loci that exert an effect on gene expression are often called expression quantitative trait loci (eQTLs) (20). In the present study, to identify putative eQTLs, we used the dataset originally reported by Myers and colleagues (21) and by Webster and colleagues (22), currently the largest expression dataset derived from human brain available to us that also contains genotype data for each sample.

To identify sets of variants enriched for schizophrenia susceptibility alleles, we exploited the approach of the International Schizophrenia Consortium (ISC) (7) who recently demonstrated the existence of large numbers of risk alleles for schizophrenia. They also showed that these are enriched among large sets of SNPs surpassing very liberal significance thresholds of association (e.g. P<0.5). The ISC defined sets of putative schizophrenia risk alleles in a training GWAS dataset as those that were more common in cases than controls at loci meeting the relaxed thresholds. Individuals in independent test GWAS datasets were assigned a ‘polygenic score’ based upon the number of putative risk alleles carried by that individual, and then the scores for cases and controls in those datasets were compared. In independent datasets, these ‘polygenic scores’ were significantly higher in cases than in controls, with the most significant distinction between groups occurring when the threshold for association in the training GWAS was set at p<0.5. Modeling suggested that the most plausible explanation for this finding was that there is a substantial polygenic component to schizophrenia comprising thousands of risk alleles, and that this contributes at least 30% of the overall variance in risk of the disorder at the population level.

Here, we used this general approach to test whether eQTLs are enriched among schizophrenia associated alleles. We defined schizophrenia ‘risk’ alleles according to the method reported by the ISC (7) in a subset of the ISC data and also in the European American subset of the Molecular Genetics of Schizophrenia study (10). Using the dataset of Myers and colleagues (21, 22), these SNPs were then classified as ‘top eQTL’ and ‘bottom eQTL’ sets based upon their p-value for association with expression levels of transcripts, and these sets were then tested for differences in their polygenic scores in cases and controls independent of the training sets.

Method

The eQTL dataset (21, 22) contains genotypes (Affymetrix GeneChip Human Mapping 500K Array) from 380157 SNPs, and expression (Illumina v1 Human RefSeq-8 BeadChip) data on 8650 transcripts, meeting the quality control criteria described in (22). There were 176 Alzheimer’s disease cases and 188 controls in the dataset, however our analysis was restricted to controls to exclude the impact of neurodegeneration on gene expression measures. We selected this option rather than allowing for affected status in the analysis as a crude categorical adjustment will not allow for a number of variables within the affected group that can be expected to have major effects, including aetiological heterogeneity, duration of illness, and rate of disease progression.

Beginning with the rank-invariant normalised expression data (22), samples with over 10% missing data were removed, as were probes with over 25% missing data in the remaining individuals. Where multiple probes mapped to the same gene, we retained only the probe with the lowest proportion of missing data (arbitrarily retaining the first to appear in the dataset file in the case of a tie). To minimize the impact of different brain regions in the dataset, we included only samples from the two most common regions represented in the study (frontal cortex and temporal cortex). Overall, we retained 163 samples and 8361 probes for analysis.

As in the primary publication, the data were log transformed to minimise the effect of departures from normality (using the statistical package R (23)). The log-transformed expression values were adjusted for a number of non-genetic covariates using linear regression. These covariates were gender, post mortem interval, brain area, age at death, institute and hybridisation date, and the expression value for Enolase 2 (ENO2). The residuals of this regression were used as covariate-adjusted expression values in all further analyses. ENO2 is a neuronal marker. Our intention in making this correction was to reduce expression variance arising from varying proportions of neurons in the samples (24, 25). We were unable to adjust our analyses for pH as those data were not available. However, we note that failure to adjust for this, or for other important variables that might lead to classification errors (false positive or negative) will bias our study towards the null. This is because false calls will blur any true differences between top and bottom eQTL groups, including differences in the extent to which they are enriched for schizophrenia susceptibility alleles.

For the Myers genotype data, we used the same quality control metrics as the original publication (21). All SNPs were required to have minor allele frequency of at least 1%, a call rate of at least 90%, and an exact Hardy-Weinberg equilibrium p-value > 0.05.

The ISC (7) and MGS (10) GWAS datasets were used for the study as these are currently the largest GWAS datasets available to us. We essentially followed the study design of the ISC. The ISC dataset was divided to create training and test subsets by assigning alternate cases and alternate controls to the training and test datasets; these we call the ‘Split ISC’ datasets. To derive a set of putative risk alleles independent of the ISC, we used the p-values from the MGS European American dataset (10) and tested these in the full ISC dataset. Full descriptions of those datasets are given in the primary publications (7, 10).

eQTL determination

Linear regression of the expression values for each gene (correcting for covariates) on SNP genotypes (coded as the number of minor alleles: 0, 1 or 2) was performed using PLINK v1.05 (26). This gave p-values for association between each SNP and mRNA expression as measured by each probe-set. To test our hypothesis, we based our analysis upon cis-eQTL p-values. Cis- eQTLs are variants that are in chromosomal proximity to the transcripts they putatively regulate, and have a higher prior probability for being true eQTLs than trans-eQTLs (20), the latter being defined on the basis of association with transcripts with which they are not co-located. Moreover, trans-eQTL analysis involves a much greater degree of multiple testing (all SNPs against all probesets) than cis-eQTL analysis. These considerations suggest that sets of ‘top cis-eQTLs’ will be more greatly enriched for true eQTLs than sets of top trans-eQTLs, so restriction to cis-eQTLs should enhance the power of our analysis. cis-eQTLs were ranked by p-value with respect to any transcript within 100kb of the SNP locus. The criterion of 100kb is to an extent arbitrary, but was based upon a previous study suggesting that cis-eQTLs are enriched within this boundary (27). If a SNP was within range of multiple transcripts, the lowest p-value for any transcript was taken as the eQTL p-value.

Given the presumed lower probability for any trans-eQTL representing a true association, we expected that even if our primary hypothesis was correct, SNPs selected on this basis of trans-eQTL status would be less effective at distinguishing between cases and controls. Nevertheless, as a secondary analysis, we explored the relative ability of top and bottom eQTLs after ranking those loci by the most significant p-value for association to any transcript in the dataset.

We did not specifically exclude probes corresponding to target sequences that contain SNPs, some of which might influence the efficiency of probe hybridisation. Where this occurs, expression of the target transcript could appear correlated with the SNP in the probe sequence, which could then be falsely classified as an eQTL, and the same is true for any SNPs in high linkage disequilibrium (LD) with that SNP. Conversely, where there is a true eQTLs that is in weak or low LD with a second SNP under a probe that influences hybridisation efficiency, the impact of that second SNP is likely to be to reduce the estimated correlation between the eQTL and gene expression, the result being a tendency to false negative eQTL classification. As argued above, eQTL misclassifications will bias this study towards towards the null. Nevertheless, for information, we present some summary information about the occurrence of known SNPs within probe target sequences.

Of 1372 probes representing the transcripts associated with the top 5% of QTLs, only 56 (4%) contain a SNP called at high quality (less than 5% missing genotypes) with a minor allele frequency >1% in the HapMap CEU sample (HapMap Phase 2 version 23). Only a single SNP out of the 2580 SNPs that comprised our pruned list of top 5% of eQTLs was either within a probe, or in strong LD (R2>0.8) with a variant known to be within a probe. This appears to contrast with an earlier study (21) in which about 13% of significant eQTLs were to probes targeting polymorphic transcript sequences. However, that earlier study was concerned with highly significantly associated eQTLs which might be particularly enriched for this particular artefact. Also, the dataset we used (22) was much more stringently filtered (reducing transcripts from 14,078 to 8650) than the other study (21), and we additionally further reduced this by removing probes with >25% missing data. Probes binding to sequences with common SNPs that influence hybridisation may have relatively high data-failure rates, and therefore we speculate this process would remove some of the affected transcripts. Finally, we aggressively LD prune our SNP data, which reduces the probability of including a SNP in even moderately high LD with a SNP in a probe sequence.

Post hoc analysis confirmed that our conclusions remain the same whether or not we exclude probes corresponding to sequences with known SNPs. Since the average impact of variants under target probes on misclassification is uncertain (22) but in the context of this study, it is likely to be a trivial source of misclassification compared with chance (see above), and since any bias is conservative (i.e. towards the null) we present the analysis of all probes in this manuscript.

Risk allele counts

The SNPs available in the training datasets were placed into the following categories according to eQTL p-value: top 5% eQTLs (corresponding to p<0.02), top 50% eQTLs (corresponding to p<0.38), bottom 50% eQTLs, and bottom 5% eQTLs. As in the ISC study, the SNPs in all sets were LD pruned (PLINK’s --indep-pairwise option; window size=200, step=5, r2 threshold=0.25).

In the randomly split ISC training datasets, as in the ISC paper (7), allelic p-values and odds ratios for association were calculated by a Cochrane-Mantel-Haenszel test conditioned by country of origin using the QC-cleaned datasets provided by that group. Training on the MGS European American Sample was based upon the association results that formed the basis of the primary publication (10). SNPs that had association p<0.5 in training sets were carried through for polygenic score analysis. Alleles that were more common in cases were defined as risk alleles. PLINK (using the --score option) was then used to perform a count of the number of risk alleles for each sample in the target dataset, weighted by the odds ratio at each SNP. PLINK gives the mean risk allele score for each individual, that is, the risk allele score is divided by the number of SNPs for which there are data in that individual.

Controlling for minor allele frequency and population stratification

For each pruned cis-eQTL SNP list, to test if ranking the SNPs by their most significant eQTL p-value introduced systematic differences in allele frequency between high and low eQTL SNP sets we calculated the mean and standard deviation of MAF and then compared them using t-tests.

To examine whether our results might be influenced by population stratification, we obtained from a previous study (28) FST values derived from the ISC sample for each SNP. FST is a measure of population stratification and is based upon the sequence similarity of members of a subpopulation, compared to their similarity with the population as a whole (29). In a stratified population, members of the subpopulations will be more similar to each other than to the whole population, leading to a high FST score.

SNPs with as close a FST value as possible to each SNP in the smaller of the two SNP lists (top or bottom eQTL) were extracted without replacement from the larger of the two SNP lists (top or bottom) to create eQTL sets matched for FST. A small number of SNPs could not be matched (those where the closest match differed by an FST >0.0005) and were removed from the analysis. This created pairs of SNP lists with the same number of SNPs and extremely similar means and standard deviations of FST (Supplementary Table 1).

Logistic regression

For each individual in the test ISC datasets, we calculated the difference between the polygenic score derived from the top eQTLs (5% or 50%) and that derived from the bottom eQTLs (5% or 50%), the null hypothesis being that these differences should be equal in cases and controls. We performed logistic regression of case/control status on risk allele score difference and also ISC sample country of origin to evaluate the significance of this difference. A significant positive regression coefficient indicates that the difference in risk scores between cases and controls is significantly greater for the top eQTL set.

Logistic regression of disease status on risk allele score was also calculated to determine how well each individual SNP list predicted disease status. We calculated the Nagelkerke pseudo-R2 (30), which is a measure of how well the risk allele score predicts schizophrenia by subtracting the R2 of the regression without the risk allele score term included from the R2 of the regression with the risk allele score term included.

Results

When we defined risk alleles using half of the ISC sample as the training set (Table 1, Split ISC analyses), the difference in the scores between the top and bottom cis-eQTLs was significantly greater in the cases than in the controls for all analyses. This is consistent with the hypothesis that schizophrenia susceptibility alleles are enriched among cis-eQTLs. Similar findings were observed when the risk alleles were defined from the MGS European dataset (entirely independent of the ISC dataset), with significant replication being obtained for two of the tests, even corrected for three replication tests (Table 1). Supplementary table 2 lists the pruned set of SNPs comprising those that were associated in the MGS training set at P<0.5 that were both within the top 5% of eQTLs and for which the allele designated as the ‘risk’ allele in the MGS sample was associated in the ISC sample at a nominally significant level (P<0.05). We should stress that for the reasons discussed already in this manuscript, the existence of potential sources of misclassification means the confidence that any one of these variants is either a genuine eQTL or that it is associated with the disorder is low, our study being designed to test a general hypothesis using global datasets and a methodology that can tolerate low signal to noise ratios rather than to identify individual findings of high significance. We also note the information driving our analysis comes not just from those alleles that are associated at nominally significant levels; rather it comes from the cumulative scores from all variants included in the analysis, however weakly associated they are.

Table 1.

Regression of affected status on difference in risk allele score derived from top and bottom eQTL sets.

Trained
in
Targeted
in
eQTL
comparison
Difference
in risk
allele score
Regression
p-value
Mean
MAF of
top
eQTLs
Mean
MAF of
bottom
eQTLs
T-test
significance
of difference
in MAF
Split
ISC
Split ISC Top 50%
versus
bottom 50%
2.56E-05 0.014 0.227 0.228 0.948
Split
ISC
Split ISC Top 5%
versus
bottom 50%
8.15E-05 0.014 0.241 0.228 0.001
Split
ISC
Split ISC Top 5%
versus
bottom 5%
9.63E-05 0.012 0.241 0.246 0.268
MGS ISC Top 50%
versus
bottom 50%
1.63E-05 0.298 0.227 0.228 0.594
MGS ISC Top 5%
versus
bottom 50%
9.27E-05 0.002 0.238 0.228 0.047
MGS ISC Top 5%
versus
bottom 5%
8.57E-05 0.003 0.238 0.249 0.054

Abbreviations: MAF – minor allele frequency. Regression of affected status on difference in risk allele score derived from top and bottom eQTL sets. A positive score in the ‘Difference in risk allele score’ column indicates that the difference between the top eQTL and bottom eQTL sets is greater in cases than controls.

There were no significant differences between the scores from the top and bottom trans-eQTLs between cases and controls (data not shown) in any analysis.

Minor allele frequency and population stratification

Of the 5 tests in which the top cis-eQTLs were significantly better at discriminating case-control status, the mean MAF was slightly but significantly higher in 2 of the top cis-eQTL sets, whereas for the other three tests, any trends were for a lower MAF in the top cis-eQTLs set (Table 1). This suggests that our findings are unlikely to be due to differences in MAF between the sets.

However, for each analysis, the top cis-eQTL set had significantly higher mean FST than the bottom eQTL SNP lists indicating that our analysis might be confounded by enhanced stratification in the top cis-eQTL set. For this to bias to our results, the MGS and ISC samples would have to be ascertained such that the same alleles are similarly biased towards overrepresentation in cases in each dataset. Although we do not consider this likely (7), to evaluate whether this does influence our results, we repeated all analyses using FST matched SNP sets. After matching, there were no significant differences in mean FST between pairs of comparator groups (Supplementary Table 1). Nevertheless, for two of the three analyses in the split ISC datasets, the top cis-eQTLs significantly discriminated better between cases and controls than the bottom cis-eQTLs, both of which replicated when the MGS sample was used as the training set. Moreover, in the FST adjusted data, for two of the significant runs, the top cis-eQTL sets had lower MAF and for two of the runs, the top group had higher MAF. We therefore conclude that our findings are not driven by systematic biases in these variables.

Discussion

To date, only a minuscule proportion of genetic susceptibility to schizophrenia, or indeed any psychiatric disorder, has been explained by robustly associated DNA variants. Moreover, in no case has the functional effect of a DNA variant responsible for a robust schizophrenia association been determined. It follows that the basic mechanisms by which genetic variation contribute to this disorder are unknown. One leading hypothesis is that a substantial amount of genetic risk is conferred by common alleles that influence gene expression, that is, common cis-eQTLs. However, while the existence of many common schizophrenia risk alleles has been demonstrated (7), there is no evidence to support the hypothesis that any of these influence gene expression. In the light of a recent rekindling of interest in the hypothesis that genetic risk for the disorder is mainly attributable to rare variants of major effect (31), which by analogy with Mendelian disorders are likely to be dominated by mutations that change the protein coding sequences of genes, the demonstration of a contribution from cis-eQTLs is of practical importance for several reasons.

The search for functional variants underpinning disease associations observed in GWAS studies is in general proving to be far from a trivial endeavour. Although it is relatively simple to scan the exonic sequences of individual genes for common non-synonymous variants, the process of scanning the full genomic context of a gene for potential cis-eQTLs, and then demonstrating that those variants impact on expression in a disease relevant manner remains arduous. To justify those endeavours, it is important to demonstrate that effects on gene expression are relevant mechanisms underpinning the influence of common susceptibility variants. Second, as discussed above, the use of gene expression data to support genetic associations or to assign higher prior probability to particular variants requires evidence that cis-eQTLs do in fact have a higher probability of being associated with disease. Finally, even if risk variants are enriched for common cis-eQTLs, it cannot be taken for granted that adult brain tissues, far less other sources of mRNA, are suitable substrates for generating eQTLs for disorders like schizophrenia whose presumed origins are developmental.

Using two independent training datasets we now demonstrate that among the variants selected for marginal association to schizophrenia, those that additionally show evidence for being cis-eQTLs predict affection status better than those variants showing no evidence for being cis- eQTLs. Thus, we show for the first time that schizophrenia risk alleles are indeed enriched for eQTLs. As expected from the ISC study, no set of SNPs explained more than a small fraction of the variance in disease risk (Table 2), although more comprehensive genome coverage in more powerful larger samples is likely to explain a much higher proportion (7).

Table 2.

Regression of affected status on risk allele score.

Training
dataset
Target
dataset
Top / bottom
eQTL set
SNP
count
Nagelkerke
pseudo-R2
Regression p-
value
Case/control risk allele
score difference
Split ISC Split ISC Top 50% 10805 1.59 1.43E-14 5.36E-05
Split ISC Split ISC Top 5% 1285 0.47 2.09E-05 1.10E-04
Split ISC Split ISC Bottom 50% 10967 0.63 9.22E-07 2.86E-05
Split ISC Split ISC Bottom 5% 2033 0.04 0.1122 1.38E-05
MGS ISC Top 50% 3903 0.50 8.78E-10 3.08E-05
MGS ISC Top 5% 435 0.30 1.47E-06 1.07E-04
MGS ISC Bottom 50% 4037 0.30 1.63E-06 1.45E-05
MGS ISC Bottom 5% 1154 0.11 0.0027 2.15E-05

Regression of affected status on risk allele score for individual cis-eQTL SNP lists. Population of origin was used as a covariate in this regression. Nagelkerke R2 is a measure of variance in disease state that is explained by the risk score. SNP count is the number of SNPs in the set.

In contrast to the findings with cis-eQTLs, SNPs, classified on the basis of potential trans effects were not superior at predicting schizophrenia affection status. This may be because the much greater multiple testing burden inherent to trans-eQTL analysis means a smaller proportion of the top rated trans-eQTLs are true positives.

While top sets of cis-eQTLs perform better than bottom sets in predicting disease risk, it is evident (Table 2) that even the latter significantly predict affected status. Moreover, after training in the MGS dataset, the top 5% of eQTLs were only 1.3 times more likely than the bottom 5% of eQTLs to achieve a nominal significance level of p<0.05 in the ISC dataset. This might be because a substantial part of the true association signal is not related to variants that alter gene expression. Alternatively, it may be that virtually all true association signals are eQTLs, but that many of these were incorrectly classified. We note the sample from which we derived eQTL status is relatively small in GWAS terms, and therefore has limited power to identify weak eQTLs. Moreover, the already limited power will be further constrained by variance introduced by the many well known confounders that plague the use of post mortem expression datasets (32). Both factors are likely to result in eQTL classification errors.

Potentially pointing to an important impact of eQTL misclassification, comparisons of the most extreme cis-eQTL categories (top and bottom 5% sets) revealed considerable differences in the ability of those groups to discriminate case and control status (Table 2). Thus, the risk allele score differences between cases and controls were about 10 times greater for the top 5% of cis- eQTLs and were 3-4 orders of magnitude more significant than they were for the bottom 5% of cis-eQTLs. The former also had better predictive power as indicated by a larger Nagelkerke R2, despite greater numbers of SNPs in the bottom 5% group. Indeed the bottom 5% of cis- eQTLs were either not significant predictors at all (trained in ISC) or the statistical significance of prediction was relatively modest (trained in the MGS). Assuming the extreme top and bottom cis-eQTL groups contain SNPs that are least likely to be misclassified, we postulate that the proportion of the polygenic signal captured by eQTLs will be enhanced by more precise delineation of eQTL status. Better eQTL classification could be relatively simply achieved by 1) using larger human brain expression and SNP datasets 2) increasing the transcriptome coverage; the present analysis only incorporates 8361 probes representing only 25-30% of the protein encoding genes in the human genome (33) and 3) using expression datasets derived from different brain regions rather than simply cortical structures as we have done here, and from different stages of human development, as functional variants may have variable temporal and spatial influences.

In summary, we have undertaken the first large scale analysis of the hypothesis that schizophrenia risk is mediated in part by common DNA variants that influence gene expression. Our results support this hypothesis. In doing so, we provide the first systematic demonstration that gene expression studies in human adult brain are informative for genetic investigations of schizophrenia. Larger eQTL datasets with the power to achieve lower eQTL misclassification rates, representing different brain regions and developmental stages, will be required to exploit the enhanced prior probability for cis-eQTLs to identify specific susceptibility loci.

Supplementary Material

Supp Table 1

Supplementary table 1 (see Supplementary_Table_1.xls)

Regression of affected status on difference in risk allele score derived from top and bottom eQTL sets matched for FST. A positive score in the ‘Difference in risk allele score’ column indicates that the difference between the top eQTL and bottom eQTL sets is greater in cases than controls.

Supp Table 9
supplemental material
Supp Table 10
Supp Table 2

Supplementary table 2 (see Supplementary_Table_2.xls)

Pruned set of SNPs comprising those that were associated in the MGS training set at P<0.5 that were both within the top 5% of eQTLs and for which the allele designated as the risk allele in the MGS sample was associated in the ISC sample at a nominally significant level (P<0.05). We also provide the estimated OR as it applies to the allele designated allele 1.

Supp Table 3
Supp Table 4
Supp Table 5
Supp Table 6
Supp Table 7
Supp Table 8

Acknowledgements

This work was supported by grants from the MRC and the Wellcome Trust. AR was initially supported by a MRC PhD studentship, and subsequently by NIMH (USA) CONTE Award: 2 P50 MH066392-05A1.

The following authors are included under:

Molecular Genetics of Schizophrenia Collaboration

PV Gejman (Evanston Northwestern Healthcare and Northwestern University, IL, USA), AR Sanders (Evanston Northwestern Healthcare and Northwestern University, IL, USA), J Duan (Evanston Northwestern Healthcare and Northwestern University, IL, USA), DF Levinson (Stanford University, CA, USA), NG Buccola (Louisiana State University Health Sciences Center, LA, USA), BJ Mowry (Queensland Centre for Mental Health Research, and Queensland Institute for Medical Research, Queensland, Australia), R Freedman (University of Colorado Denver, Colorado, USA), F Amin (Atlanta Veterans Affairs Medical Center and Emory University, Atlanta, USA), DW Black (University of Iowa Carver College of Medicine, IA, USA), JM Silverman (Mount Sinai School of Medicine, New York, USA), WJ Byerley (University of California at San Francisco, California, USA), CR Cloninger (Washington University, Missouri, USA).

International Schizophrenia Consortium (ISC)

Michael C. O’Donovan (Cardiff University, Cardiff, UK), George K. Kirov (Cardiff University, Cardiff, UK), Nick J. Craddock (Cardiff University, Cardiff, UK), Peter A. Holmans (Cardiff University, Cardiff, UK), Nigel M. Williams (Cardiff University, Cardiff, UK), Lyudmila Georgieva (Cardiff University, Cardiff, UK), Ivan Nikolov (Cardiff University, Cardiff, UK), N. Norton (Cardiff University, Cardiff, UK), H. Williams (Cardiff University, Cardiff, UK), Draga Toncheva (University Hospital Maichin Dom, Sofia, Bulgaria), Vihra Milanova (Alexander University Hospital, Sofia, Bulgaria), Michael J. Owen (Cardiff University, Cardiff, UK), Christina M. Hultman (Karolinska Institutet, Stockholm, Sweden and Uppsala University, Uppsala, Sweden), Paul Lichtenstein (Karolinska Institutet, Stockholm, Sweden), Emma F. Thelander (Karolinska Institutet, Stockholm, Sweden), PatrickSullivan (University of North Carolina at Chapel Hill, North Carolina, USA), Derek W. Morris (Trinity College Dublin, Dublin, Ireland), Colm T. O’Dushlaine (Trinity College Dublin, Dublin, Ireland), Elaine Kenny (Trinity College Dublin, Dublin, Ireland), Emma M. Quinn (Trinity College Dublin, Dublin, Ireland), Michael Gill (Trinity College Dublin, Dublin, Ireland), Aiden Corvin (Trinity College Dublin, Dublin, Ireland), Andrew McQuillin (University College London, London, UK), Khalid Choudhury (University College London, London, UK), Susmita Datta (University College London, London, UK), Jonathan Pimm (University College London, London, UK), Srinivasa Thirumalai (West Berkshire NHS Trust, Reading, UK), Vinay Puri (University College London, London, UK), Robert Krasucki (University College London, London, UK), Jacob Lawrence (University College London, London, UK), Digby Quested (University of Oxford, Oxford, UK), Nicholas Bass (University College London, London, UK), Hugh Gurling (University College London, London, UK), Caroline Crombie (University of Aberdeen, Aberdeen, UK), GillianFraser (University of Aberdeen, Aberdeen, UK), Soh Leh Kuan (University of Aberdeen, Aberdeen, UK), Nicholas Walker (Ravenscraig Hospital, Greenock, UK), David St Clair (University of Aberdeen, Aberdeen, UK), Douglas H. R. Blackwood (University of Edinburgh, Edinburgh, UK), Walter J. Muir (University of Edinburgh, Edinburgh, UK), Kevin A. McGhee (University of Edinburgh, Edinburgh, UK), Ben Pickard (University of Edinburgh, Edinburgh, UK), PatMalloy (University of Edinburgh, Edinburgh, UK), Alan W. Maclean (University of Edinburgh, Edinburgh, UK), Margaret Van Beck (University of Edinburgh, Edinburgh, UK), Naomi R. Wray (Queensland Institute of Medical Research, Queensland, Australia), Stuart Macgregor (Queensland Institute of Medical Research, Queensland, Australia), Peter M. Visscher (Queensland Institute of Medical Research, Queensland, Australia), Michele T. Pato (University of Southern California, California, USA), Helena Medeiros (University of Southern California, California, USA), Frank Middleton (Upstate Medical University, New York, USA), Celia Carvalho (University of Southern California, California, USA), Christopher Morley (Upstate Medical University, New York, USA), Ayman Fanous (University of Southern California, California, USA and Washington VA Medical Center, Washington, USA and Georgetown University School of Medicine, Washington DC, USA and Virginia Commonwealth University, Virginia, USA), David Conti (University of Southern California, California, USA), James A.Knowles (University of Southern California, California, USA), Carlos Paz Ferreira (Department of Psychiatry, Azores, Portugal), Antonio Macedo (University of Coimbra, Coimbra, Portugal), M. Helena Azevedo (University of Coimbra, Coimbra, Portugal), Carlos N.Pato (University of Southern California, California, USA); Massachusetts General Hospital Jennifer L. Stone (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Andrew N. Kirby (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Manuel A. R. Ferreira (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Mark J. Daly (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), ShaunM. Purcell (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Jennifer L. Stone (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Kimberly Chambert (The Broad Institute of Harvard and MIT, Massachusetts, USA), Douglas M. Ruderfer (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA), Finny Kuruvilla (The Broad Institute of Harvard and MIT, Massachusetts, USA), Stacey B. Gabriel (The Broad Institute of Harvard and MIT, Massachusetts, USA), Kristin Ardlie (The Broad Institute of Harvard and MIT, Massachusetts, USA), Jennifer L. Moran (The Broad Institute of Harvard and MIT, Massachusetts, USA), Edward M. Scolnick (The Broad Institute of Harvard and MIT, Massachusetts, USA), Pamela Sklar (Massachusetts General Hospital, Massachusetts, USA and The Broad Institute of Harvard and MIT, Massachusetts, USA).

4.

Full author details and affiliations are given in acknowledgements section.

Footnotes

Conflicts of interest

The authors declare no competing interests.

References

  • 1.Cantor RM, Lange K, Sinsheimer JS. Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application. Am J Hum Genet. Jan 8;86(1):6–22. doi: 10.1016/j.ajhg.2009.11.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Peltonen L, McKusick VA. Genomics and medicine. Dissecting human disease in the postgenomic era. Science. 2001 Feb 16;291(5507):1224–1229. doi: 10.1126/science.291.5507.1224. [DOI] [PubMed] [Google Scholar]
  • 3.Bray NJ, Buckland PR, Owen MJ, O’Donovan MC. Cis-acting variation in the expression of a high proportion of genes in human brain. Hum Genet. 2003 Jul;113(2):149–153. doi: 10.1007/s00439-003-0956-y. [DOI] [PubMed] [Google Scholar]
  • 4.Lander ES. The new genomics: global views of biology. Science. 1996 Oct 25;274(5287):536–539. doi: 10.1126/science.274.5287.536. [DOI] [PubMed] [Google Scholar]
  • 5.Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009 Oct 8;461(7265):747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Maher B. Personal genomes: The case of the missing heritability. Nature. 2008 Nov 6;456(7218):18–21. doi: 10.1038/456018a. [DOI] [PubMed] [Google Scholar]
  • 7.Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, Sullivan PF, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009 Aug 6;460(7256):748–752. doi: 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Stefansson H, Ophoff RA, Steinberg S, Andreassen OA, Cichon S, Rujescu D, et al. Common variants conferring risk of schizophrenia. Nature. 2009 Aug 6;460(7256):744–747. doi: 10.1038/nature08186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.O’Donovan MC, Craddock NJ, Owen MJ. Genetics of psychosis; insights from views across the genome. Hum Genet. 2009 Jun 12; doi: 10.1007/s00439-009-0703-0. [DOI] [PubMed] [Google Scholar]
  • 10.Shi J, Levinson DF, Duan J, Sanders AR, Zheng Y, Pe’er I, et al. Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature. 2009 Aug 6;460(7256):753–757. doi: 10.1038/nature08192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Le-Niculescu H, Balaraman Y, Patel S, Tan J, Sidhu K, Jerome RE, et al. Towards understanding the schizophrenia code: an expanded convergent functional genomics approach. Am J Med Genet B Neuropsychiatr Genet. 2007;144B(2):129–158. doi: 10.1002/ajmg.b.30481. [DOI] [PubMed] [Google Scholar]
  • 12.Kurian SM, Le-Niculescu H, Patel SD, Bertram D, Davis J, Dike C, et al. Identification of blood biomarkers for psychosis using convergent functional genomics. Mol Psychiatry. 2009 doi: 10.1038/mp.2009.117. e-pub ahead of print. [DOI] [PubMed] [Google Scholar]
  • 13.Peirce TR, Bray NJ, Williams NM, Norton N, Moskvina V, Preece A, et al. Convergent evidence for 2′,3′-cyclic nucleotide 3′-phosphodiesterase as a possible susceptibility gene for schizophrenia. Arch Gen Psychiatry. 2006 Jan;63(1):18–24. doi: 10.1001/archpsyc.63.1.18. [DOI] [PubMed] [Google Scholar]
  • 14.Bray NJ, Preece A, Williams NM, Moskvina V, Buckland PR, Owen MJ, et al. Haplotypes at the dystrobrevin binding protein 1 (DTNBP1) gene locus mediate risk for schizophrenia through reduced DTNBP1 expression. Hum Mol Genet. 2005 Jul 15;14(14):1947–1954. doi: 10.1093/hmg/ddi199. [DOI] [PubMed] [Google Scholar]
  • 15.Law AJ, Lipska BK, Weickert CS, Hyde TM, Straub RE, Hashimoto R, et al. Neuregulin 1 transcripts are differentially expressed in schizophrenia and regulated by 5′ SNPs associated with the disease. Proc Natl Acad Sci U S A. 2006 Apr 25;103(17):6747–6752. doi: 10.1073/pnas.0602002103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M. Mapping complex disease traits with global gene expression. Nat Rev Genet. 2009 Mar;10(3):184–194. doi: 10.1038/nrg2537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Niculescu AB, Le-Niculescu H. The P-value illusion: how to improve (psychiatric) genetic studies. Am J Med Genet B Neuropsychiatr Genet. 2010;153B(4):847–849. doi: 10.1002/ajmg.b.31076. [DOI] [PubMed] [Google Scholar]
  • 18.Patel SD, Le-Niculescu H, Koller DL, Green SD, Lahiri DK, McMahon FJ, et al. Coming to grips with complex disorders:genetic risk prediction in bipolar disorder using panels of genes identified through convergent functional genomics. Am J Med Genet B Neuropsychiatr Genet. 2010;153B(4):850–877. doi: 10.1002/ajmg.b.31087. [DOI] [PubMed] [Google Scholar]
  • 19.Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6(4):e1000888. doi: 10.1371/journal.pgen.1000888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Gilad Y, Rifkin SA, Pritchard JK. Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet. 2008 Aug;24(8):408–415. doi: 10.1016/j.tig.2008.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Myers AJ, Gibbs JR, Webster JA, Rohrer K, Zhao A, Marlowe L, et al. A survey of genetic human cortical gene expression. Nat Genet. 2007 Dec;39(12):1494–1499. doi: 10.1038/ng.2007.16. [DOI] [PubMed] [Google Scholar]
  • 22.Webster JA, Gibbs JR, Clarke J, Ray M, Zhang W, Holmans P, et al. Genetic control of human brain transcript expression in Alzheimer disease. Am J Hum Genet. 2009 Apr;84(4):445–458. doi: 10.1016/j.ajhg.2009.03.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ihaka R, Gentleman R. R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics. 1996;5(3):299–314. [Google Scholar]
  • 24.Marangos PJ, Schmechel DE. Neuron specific enolase, a clinically useful marker for neurons and neuroendocrine cells. Annu Rev Neurosci. 1987;10:269–295. doi: 10.1146/annurev.ne.10.030187.001413. [DOI] [PubMed] [Google Scholar]
  • 25.Teepker M, Munk K, Mylius V, Haag A, Moller JC, Oertel WH, et al. Serum concentrations of s100b and NSE in migraine. Headache. 2009 Feb;49(2):245–252. doi: 10.1111/j.1526-4610.2008.01228.x. [DOI] [PubMed] [Google Scholar]
  • 26.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet. 2007;(81) doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Schadt EE, Molony C, Chudin E, Hao K, Yang X, Lum PY, et al. Mapping the genetic architecture of gene expression in human liver. PLoS Biol. 2008 May 6;6(5):e107. doi: 10.1371/journal.pbio.0060107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Moskvina V, Ivanov D, Blackwood D, St Clair D, Smith AV, Hultman C, et al. Genetic differences between four European populations. Hum Hered. 2010 doi: 10.1159/000313854. In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Balding DJ. Likelihood-based inference for genetic correlation coefficients. Theor Popul Biol. 2003 May;63(3):221–230. doi: 10.1016/s0040-5809(03)00007-8. [DOI] [PubMed] [Google Scholar]
  • 30.Nagelkerke NJD. A Note on a General Definition of the Coefficient of Determination. Biometrika. 1991 Sep;78(3):691–692. [Google Scholar]
  • 31.Mitchell KJ, Porteous DJ. Rethinking the genetic architecture of schizophrenia. Psychol Med Apr. 12:1–14. doi: 10.1017/S003329171000070X. [DOI] [PubMed] [Google Scholar]
  • 32.Bray NJ, Buckland PR, Williams NM, Williams HJ, Norton N, Owen MJ, et al. A haplotype implicated in schizophrenia susceptibility is associated with reduced COMT expression in human brain. Am J Hum Genet. 2003 Jul;73(1):152–161. doi: 10.1086/376578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Southan C. Has the yo-yo stopped? An assessment of human protein-coding gene number. Proteomics. 2004 Jun;4(6):1712–1726. doi: 10.1002/pmic.200300700. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Table 1

Supplementary table 1 (see Supplementary_Table_1.xls)

Regression of affected status on difference in risk allele score derived from top and bottom eQTL sets matched for FST. A positive score in the ‘Difference in risk allele score’ column indicates that the difference between the top eQTL and bottom eQTL sets is greater in cases than controls.

Supp Table 9
supplemental material
Supp Table 10
Supp Table 2

Supplementary table 2 (see Supplementary_Table_2.xls)

Pruned set of SNPs comprising those that were associated in the MGS training set at P<0.5 that were both within the top 5% of eQTLs and for which the allele designated as the risk allele in the MGS sample was associated in the ISC sample at a nominally significant level (P<0.05). We also provide the estimated OR as it applies to the allele designated allele 1.

Supp Table 3
Supp Table 4
Supp Table 5
Supp Table 6
Supp Table 7
Supp Table 8

RESOURCES