Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Nov 29.
Published in final edited form as: Nat Genet. 2016 Aug 1;48(9):1031–1036. doi: 10.1038/ng.3623

Identification of 15 genetic loci associated with risk of major depression in individuals of European descent

Craig L Hyde 1, Mike W Nagle 2, Chao Tian 3, Xing Chen 1, Sara A Paciga 2, Jens R Wendland 2, Joyce Tung 3, David A Hinds 3, Roy H Perlis 4, Ashley R Winslow 2,5
PMCID: PMC5706769  NIHMSID: NIHMS797277  PMID: 27479909

Abstract

Despite strong evidence supporting the heritability of Major Depressive Disorder, previous genome-wide studies were unable to identify risk loci among individuals of European descent. We used self-reported data from 75,607 individuals reporting clinical diagnosis of depression and 231,747 reporting no history of depression through 23andMe, and meta-analyzed these results with published MDD GWAS results. We identified five independent variants from four regions associated with self-report of clinical diagnosis or treatment for depression. Loci with pval<1.0×10−5 in the meta-analysis were further analyzed in a replication dataset (45,773 cases and 106,354 controls) from 23andMe. A total of 17 independent SNPs from 15 regions reached genome-wide significance after joint-analysis over all three datasets. Some of these loci were also implicated in GWAS of related psychiatric traits. These studies provide evidence for large-scale consumer genomic data as a powerful and efficient complement to traditional means of ascertainment for neuropsychiatric disease genomics.

Keywords: genome-wide, depression, GWAS, single nucleotide polymorphism, SNP, MDD


Major depressive disorder remains one of the most significant contributors to morbidity and mortality13. Efforts to develop novel interventions have been hindered by a limited understanding of the underlying neurobiology. Despite strong evidence of heritability4,5, efforts to clarify this biology through common or rare variant association studies have been unsuccessful, which has been attributed to the heterogeneity of disease and absence of a biological gold standard diagnosis. One recent study of a Han Chinese population identified two risk loci, in the LHPP gene and near the SIRT1 gene, but neither was supported in European populations where the risk alleles are extremely rare6.

If one reasonable strategy adopted by that study is to develop more precise or refined phenotypes, another is to efficiently identify much larger cohorts for study despite less intensive phenotyping. This strategy has been validated in multiple non-psychiatric diseases, but not for psychiatric illness that is presumed to require more detailed interview. Here, we identified 75,607 individuals (62% female) who endorsed a prior clinical diagnosis of, or treatment for, major depression, and 231,747 (44% female) individuals reporting no clinical diagnosis of depression or treatment for depression. All subjects participated in the consumer genomics company 23andme’s optional research initiative (for population socio-demographic features, see Table 1). These individuals were genotyped on one of four custom arrays containing genome-wide content and genotypes were imputed using the September 2013 release of 1000 Genomes Phase1 reference haplotypes. Research participants with > 97% European ancestry, excluding close relatives, were included in the GWAS analysis. The Manhattan Plot and Q-Q plot for the analysis are shown in Supplementary Figure 1a–b, p-values were adjusted for inflation using LD score regression (Supplementary Table 1).

Table 1.

Cohort Demographics for the primary and replication 23andMe datasets

Discovery Replication

MDD Controls MDD Controls
Total (n=) 75607 231747 45773 106354

Age, counts

under 30 12.1% 11.6% 13.8% 13.4%
30–45 29.9% 27.5% 29.8% 25.4%
45–60 28.8% 27.2% 29.6% 27.7%
60+ 29.3% 33.7% 26.7% 33.3%

Sex, count

Male 38.0% 56.2% 33.8% 52.6%
Female 62.0% 43.8% 66.2% 47.4%

Novel major depression loci in a self-report population

From the discovery 23andMe dataset we identified two distinct regions containing SNPs with p-value < 1×10−8 and five additional loci with p-value < 5×10−8 (Supplementary Table 1) to be associated with self-report of depression. We have chosen to consider only the SNPs with pval < 1×10−8 to be genome-wide significant in this GWAS due to correction for 15 million SNPs in the 23andMe data. The most significant locus yielded an association at rs2806933 (adjusted p-value= 8.53×10−13, OR= 0.955, 95% CI= 0.943–0.968 effect allele frequency in controls= 0.61) in a region spanning the 3′ UTR for the olfactomedin-4 gene (OLFM4), not previously implicated in neuropsychiatric disease but known to be expressed in brain, including amygdala and medial temporal lobe7. The second, with peak association at rs768705 (p-value= 2.91×10−12, OR= 1.051, 95% CI= 1.036–1.067, effect allele frequency in controls= 0.25), spans a locus containing the myocyte enhancer factor 2C (MEF2C) and transmembrane protein 161B (TMEM161B) genes. Variants in MEF2C has been previously associated with multiple CNS phenotypes including epilepsy and intellectual disability8,9 and implicated in regulation of synaptic function10. TMEM161B, also brain-expressed, exhibits decreased levels of repressive dimethyl histone H3 Lys9/Lys27 methylation in response to social isolation in a mouse model of depression11. While Schizophrenia and Alzheimer’s Disease GWAS both identify the MEF2C region as a disease susceptibility locus, the peak schizophrenia and AD-associated SNPs are not in strong LD with the MDD SNP (Schizophrenia: rs18190012; r2=0.001 Alzheimer’s disease: rs190982, r2=0.016). Using a population prevalence of 15% for MDD estimated by the PGC Consortium13, we calculated heritability using LD score regression of h2liab=0.0528 for this dataset. When using the 23andMe MDD observed population prevalence of 25%, this results in h2liab=0.0612.

Meta-analysis of the 23andMe dataset with the previously reported Psychiatric Genomics Consortium (PGC) meta-analysis of MDD, which encompassed 9,240 cases and 9,519 controls of European descent, is presented in Figure 1a–b (Supplementary Table 2). From the PGC cohort only 1.22 million SNPs overlapped with the 23andMe MDD data (no results were reported for X or Y chromosomes)14 and only these SNPs were used for downstream analysis. As a result, several lead SNPs from the discovery 23andMe GWAS are absent including rs77741769 (SPPL3-HNF1A), rs144294997 (N6AMT1), rs1432639 (NEGR1), and rs67744457 (EP300-L3MBTL2). Each cohort was individually adjusted for inflation using LD score regression (as described in the Methods) and subsequently meta-analyzed using a standard fixed-effects, inverse-variance weighted approach15. Final results from the meta-analysis were further adjusted for the meta-analysis LD score regression intercept of 1.0025.

Figure 1.

Figure 1

Discovery phase meta-analysis of 23andMe self-report ascertainment of major depression (75,607 cases and 231,747 controls) and PGC MDD (9,240 cases and 9,519 controls). a) Manhattan plot of Discovery phase 23andMe GWAS. LD score regression calculated intercept was used for inflation correction. The threshold for genome-wide significance (p < 5×10−8) is indicated by the purple line. Red dots represent SNPs with p-values smaller than the genome-wide significant threshold. Regions labeled in black denote loci that reached genome-wide significance in the join-analysis. b) Q-Q plot for the 23andMe MDD GWAS.

From the original 23andMe lead SNPs, only the N6AMT1 locus is not represented in the meta-analysis results at a p-value less than 5×10−6 due to absence of the lead 23andMe SNP in the meta-dataset as well as an absence of significant secondary signals in the region. SNPs in the OLFM4, TMEM161B-MEF2C (two independent SNPs), MEIS2-TMCO5A, and NEGR1 regions reached genome-wide significance in the meta-analysis (p < 5×10−8, correcting for 1.22 million SNPs) (Supplementary Table 2). Regional association plots are shown for these regions in Figure 2. Heritability for the meta-analysis was estimated at h2liab=0.059 and 0.069, for 15 and 25% prevalence.

Figure 2.

Figure 2

Regional association plots for genome-wide significant regions and secondary independent signals identified in each region. a) OLFM4 locus (rs12552), b) TMEM161B-MEF2C (rs10514299), c) MEIS2-TMCO5A locus (rs8025231), and d) NEGR1 locus (rs11209948). Secondary signals in TMEM161B-MEF2C and NEGR1 (rs454214, rs2422321 respectively) are shown. Purple diamonds represent smallest p-value for each locus.

Replication of 15 loci associated with major depression

We assessed the ability of the top signals (p-val < 1.0×10−5) from the meta-analysis to replicate in a separate cohort of 45,773 cases and 106,354 controls from 23andMe (Table 1). All individuals in the replication dataset were independent from subjects included in the discovery 23andMe dataset and had similar characteristics for sex and age distributions. The replication cohort provided additional support for three of the five genome-wide significant SNPs in the TMEM161B-MEF2C (2 SNPs) and the NEGR1 locus. In a joint-analysis of the discovery 23andMe dataset, PGC, and the 23andMe replication dataset, a total of 15 independent loci (17 SNPs) reached genome-wide significance (p-val < 5×10−8) (Table 2). Of the remaining 46 SNPs with a p-value less than 1×10−5 in the meta-analysis of the 23andMe Discovery dataset and 23andMe, 41 had a consistent direction of effect between the meta-analysis and replication cohort (pvalues across all analyses including joint-analysis are shown in Supplementary Table 2 for SNPs that reached a pval < than 1×10−5 in the meta-analysis).

Table 2.

Summary statistics for 17 SNPs reaching genome-wide significance (pval < 5×10−8) in the joint analysis (23andMe Discovery, PGC, and 23andMe replication dataset). Corresponding pvalues, effect, and standard error shown for each phase of analysis: 23andMe MDD (Discovery dataset), PGC MDD, meta-analysis (23andMe Discovery + PGC), 23andMe Replication (23andme Replication cohort), and joint analysis. Effect and standard error are unadjusted. Pvalues are adjusted for 23andMe Discovery, PGC, and meta-analysis

rs ID Gene Context 23andMe PGC Meta Replication* Joint* sign-match
rs10514299 TMEM161B–[]—MEF2C 4.35E-12 5.73E-01 8.50E-11 1.15E-04 9.99E-16 Y
rs1518395 []–VRK2 1.45E-07 5.83E-01 2.01E-07 1.50E-05 4.32E-12 Y
rs2179744 [L3MBTL2] 4.34E-08 7.74E-01 9.24E-08 7.26E-04 6.03E-11 Y
rs11209948 NEGR1–[] 4.41E-08 8.57E-02 1.01E-08 9.39E-04 8.38E-11 Y
rs454214 TMEM161B—[]–MEF2C 6.28E-08 1.62E-01 2.42E-08 6.39E-03 1.09E-09 Y
rs301806 [RERE] 3.72E-06 8.68E-01 7.28E-06 2.52E-04 1.90E-09 Y
rs1475120 HACE1–[]–LIN28B 2.32E-06 2.29E-01 1.14E-06 9.27E-04 4.17E-09 Y
rs10786831 [SORCS3] 1.75E-06 2.98E-01 1.09E-06 2.33E-03 8.11E-09 Y
rs12552 [OLFM4] 1.23E-12 1.57E-01 5.74E-13 8.70E-01 8.16E-09 N
rs6476606 [PAX5] 2.59E-05 1.52E-01 9.30E-06 1.94E-04 1.20E-08 Y
rs8025231 MEIS2—[]—TMCO5A 2.04E-08 8.49E-02 4.66E-09 7.50E-02 1.23E-08 Y
rs12065553 [] 8.53E-07 8.67E-01 2.88E-06 6.79E-03 1.32E-08 Y
rs1656369 RSRC1–[]-MLF1 8.19E-08 3.22E-01 6.05E-08 3.56E-02 1.34E-08 Y
rs4543289 [] 1.19E-06 1.15E-02 8.23E-08 5.26E-03 1.36E-08 Y
rs2125716 []—SLC6A15 4.33E-07 8.78E-01 9.58E-07 2.24E-02 3.05E-08 Y
rs2422321 NEGR1—[] 5.12E-06 6.91E-03 3.28E-07 3.13E-03 3.18E-08 Y
rs7044150 KIAA0020—[]—RFX3 3.97E-07 9.64E-01 1.24E-06 3.05E-02 4.31E-08 Y
*

meta-analysis and replication not adjusted for inflation in joint analysis

Adjustments were done using LD score regression calculated intercepts. Sign-match is shown for matching of direction of effect between the meta-analysis and 23andMe replication datasets. All alleles are on the (+) or forward genomic strand. All effects are reported for the 2nd allele when listed in alphabetical order. Text representation of SNP location in relation to other genes in the region is shown. The SNP location is denoted by []. If the SNP occurs between genes then the distance from those genes are denoted by dashes (−), with ‘’ = <1kb, ‘−‘ = <10kb, ‘−−‘ = <100kb, ‘−−−‘ = <1000kb. HG19 release of UCSC was used for mapping. EAF= effect allele frequency for controls. Results for GWAS results are peak-pruned by distance (300 kb) and LD (r2 > 0.1)

To explore the biological implications of our findings we used DEPICT to derive tissue enrichment, gene-set enrichment, and gene predictions (Supplementary Table 3) for SNPs with a p-value less than 1×10−5 in the meta-analysis. While identification of the functional variant or gene is not straightforward many of the top associations in our dataset appear in or near transcription factors with known CNS developmental functions (for additional gene predictions from DEPICT and functional annotation for each region see Supplementary Table 4). Gene-set enrichment analysis prioritized the MEIS2 subnetwork (pval= 2.30×10−6). MEIS2 is a TALE homeodomain transcription factor known to function in development. While most studies implicate MEIS2 in peripheral tissue development, recent studies have shown a role for MEIS2 regulated pathways in neurogenesis through interactions with Pax6, as well as interactions with Pax3 and Pax716,17. Notably, our analysis identified significant associations with MDD in the MEIS2, PAX6, and PAX5 regions (pval= 2.04×10−8, 3.94×10−7, and 2.59×10−5 in the 23andMe Discovery dataset). Tissue enrichment analysis showed an over-representation of central nervous system, with 12 of the 19 nominally associated tissues being from different brain regions (with a Nervous System as a second level MeSH term). Although these associations did not pass multiple-testing correction, the top results from our MDD GWAS are enriched for CNS expression and transcriptional function important for CNS development or neurogenesis. Further functional annotations of predicted genomic/molecular function, brain tissue or monocyte eQTLs, gene predictions for each region using DEPICT18, and disease associations using publicly available GWAS datasets and the OMIM database are presented in Supplementary Table 4 for all 17 SNPs reaching genome-wide significance in the joint-analysis (Table 2).

After distance (300 kb) and LD pruning (r2>0.1), three regions had multiple SNPs with p-values less than 1.0×10−6 in the meta-analysis results. We tested SNPs in each of these regions for independence in the replication dataset using Wald and likelihood ratio tests. We conducted this analysis in the replication dataset to avoid SNP selection bias from the original findings. By conditioning on each SNP within the models at each locus we find two SNPs in each of the TMEM161B-MEF2C and NEGR1 regions are likely independent (rs10514299 and rs454214, rs11209948 and rs2422321, respectively), with the variance in the region being explained best by both SNPs, while most of the variance in the MLF1 region is explained by rs1656369 alone (with no additional significance provided by inclusion of rs4645169) (Supplementary Table 5).

Validity of the self-report phenotype for major depression

As the PGC cohort is substantially smaller than the 23andMe single cohort, power in the PGC MDD GWAS to detect the effect sizes for the two genome-wide significant loci observed in the preliminary 23andMe GWAS was less than 0.6 at a nominal level of significance (p<0.05 uncorrected), and the analogous power to replicate the remaining 23andMe loci in PGC declined thereafter19. However, the probability of PGC showing the same direction of effect in 23andMe exceeded 90% for each of the top ten independent 23andMe loci that were also evaluated in PGC (corresponds to all overlapping peak-pruned 23andMe loci with unadjusted p < 1.0×10−7 in 23andMe). We therefore conducted a sign-test examining concordance between PGC effect direction and the 23andMe effect direction for the top overlapping 23andMe peak loci. Nine of the top 10 loci matched sign, (Fisher’s exact test p = 0.033). The test continued to deviate significantly from chance at a range of thresholds, suggesting consistent signal between the PGC results and 23andMe. For the 82 independent SNPs with nominal p-values less than 1×10−5 in 23andMe, the p-value for the sign test was p=2×10−6 with the odds ratio for a sign match being 10.6 (95% CI= 3.5–37.1). Furthermore, the effect sizes for the top independent 23andMe loci are correlated with the effect sizes of those SNPs in PGC (removing loci with MAF < 5% to avoid highly variable values). This correlation peaks at the 39th peak 23andMe locus with 68% correlation (p=2.5×10−9). Additionally, we calculated the genetic correlation between the two datasets using LD score regression20 and found the two major depression datasets were highly and positively correlated (rg=0.725, SE=0.093, p=7.05×10−15).

Associations of lead SNPs with related phenotypes

To investigate the polygenic nature of this trait, we generated a genetic risk score from 17 SNPs (Supplementary Table 6) with p-values < 5×10−8 in the joint-analysis (Discovery 23andMe, PGC, and replication 23andMe) and tested for association of the weighted MDD GRS with reporting of related phenotypes, medication use, and age-at-onset (Table 3) in the combined discovery and replication cohort, adjusting for depression case/control status. The GRS was significantly associated (FDR < 0.05) with each of these phenotypes. Importantly, the MDD GRS significantly associated with an earlier-age-of-onset in cases (effect= −1.49 years per unit of log odds, standard error= 0.37, p-val= 6.1×10−5).

Table 3.

MDD gene risk score association with secondary phenotypes. Gene risk score explained in Supplementary Table 6. MDD age-at-onset associations were conducted in subjects with MDD. All other trait associations were conducted in cases and controls and adjusted for case/control status from the general 23andMe research community.

Phenotype N effect (SE) pvalue FDR
Early-onset 94891 0.283 (0.095) 2.90E-03 3.20E-03
Age-of-onset 94891 −1.49 (0.372) 6.10E-05 8.40E-05
Anxiety 250528 0.323 (0.061) 1.00E-07 2.50E-07
Panic attacks 247167 0.319 (0.072) 9.80E-06 1.50E-05
Insomnia 248576 0.272 (0.051) 1.10E-07 2.50E-07
Taking an SSRI 52698 0.448 (0.162) 5.50E-03 5.50E-03
Medication for mental health 349287 0.421 (0.057) 1.40E-13 1.50E-12
Prescription sleep aid 350119 0.184 (0.05) 2.70E-04 3.20E-04
Prescription pain medication 346989 0.236 (0.041) 5.60E-09 3.10E-08
Overweight (BMI>27) 401552 0.212 (0.038) 3.00E-08 1.10E-07
Obesity (BMI>30) 401552 0.216 (0.045) 1.50E-06 2.70E-06

The independent effect of each GRS SNP on this set of related phenotypes is presented in Supplementary Table 7. Importantly, while rs12552 in the OLFM4 region was not strongly supported in the replication dataset, this SNP is associated with increased reporting of panic attacks, use of medication to treat mental health problems, prescription sleep aids, and pain medication, BMI greater than 27, earlier age-of-onset of MDD, and commensurately associated with lower continuous age of onset. Individually, rs12552 and rs4543289 had the largest effect on age-at-onset, with a total of five SNPs having nominal significance (p-val < 0.05).

Sex effects

Due to known sex-disparities in the presentation of depression, incidence rate, and the suggestion of differences in underlying biology, we tested for sex-specific effects on our top SNPs as well as genotype-sex interaction for each SNP in the 23andMe discovery cohort (Supplementary Table 8). In the discovery cohort, four SNPs had nominal P<0.05 but none survived a multiple testing correction. No results reached nominal P<0.05 in the replication cohort. Our GWAS results thus provide no support for gender differences in genetic predisposition to depression.

Cohort Characteristics

We further validated the novel self-report phenotype by assessing expected characteristics of medication use, comorbid symptoms, and risk factors commonly seen in MDD within the 23andMe self-report cohort (Supplementary Table 9). Reporting of anxiety, panic attacks, insomnia were significantly increased (pval < 5.0 × 10−243 for all traits tested) among subjects reporting depression as well as a BMI greater than 27 (i.e. overweight) and a BMI greater than 30 (i.e. obese). Reporting of current SSRI use, medication for mental health problems, prescription sleep aids, and pain medication were also increased with the highest odds ratio for any trait tested being for SSRIs and psychotropic use (13.35 and 44.83, respectively), further supporting the validity of the phenotype ascertainment. Cohort characteristics were also tested separately in males and females with no evidence of sex-specific differences (Supplementary Table 10).

Studies have shown a degree of shared genetic liability for different psychiatric disorders, likely a result of multiple factors including genetic pleiotropy, diagnostic overlap, comorbid disease, or disease progression. To initially assess shared genetic risk across psychiatric disorders we present p-values across five psychiatric traits (Schizophrenia, Bipolar Disorder, Neuroticism, Depressive symptoms, and Subjective Well-Being) for SNPs with p-values less than 1×10−5 in the MDD meta-analysis (Supplementary Table 11)21. The MDD SNPs showed the highest degree of overlap (smallest p-values) in the Schizophrenia dataset, followed by Neuroticism, with less replication in the Bipolar, Depressive Symptoms, and Subjective Well-being phenotypes. Schizophrenia and Bipolar GWAS are from the publicly available PGC datasets12,22 while corresponding p-values for Neuroticism, Depressive Symptoms, and Subjective Well-being were provided by the bior The lack of correlation with SSGAC depressive symptoms self-report data may arise from the latter considering only depressive symptoms experienced during the previous two weeks-versus lifetime major depression in the primary cohorts. Conversely, the trait measure of neuroticism has previously been show to overlap with major depression, consistent with our results.

In order to rigorously assess genetic correlation of the MDD GWAS with other neuropsychiatric disease, we utilized available GWAS from PGC, including Bipolar Disorder and three Schizophrenia GWAS (different versions of the Schizophrenia PGC datasets), as well as neurodegenerative disease GWAS to test pairwise genetic correlation with the 23andMe MDD GWAS dataset using LD score regression. Due to the use of overlapping controls in the PGC datasets we did not use the results of the meta-analysis between 23andMe and PGC. The highest correlation with the primary 23andMe GWAS was observed for the PGC2 Schizophrenia GWAS (r = 0.282, SE = 0.03, p-val= 2.18×10−21) followed by Bipolar Disorder and the additional Schizophrenia GWAS (Table 4); however, we observed little to no correlation for the Parkinson’s disease and Alzheimer’s disease datasets. Additionally, we checked for correlation between 23andMe MDD and a trait with no known epidemiological correlation to depression (LDL Cholesterol) and observed no genetic correlation between the two traits.

Table 4.

Cross-trait genetic correlation with 23andMe MDD (LD score regression). The observed heritability score for the 23andMe discovery cohort is h2= 0.038. Genetic correlation of the 23andMe Discovery MDD dataset with related psychiatric disorders (PGC MDD, PGC SZ1, PGC SCZ1+SWE, PGC SCZ2, PGC Bipolar Disorder), non-psychiatric neurological disorders (IGAP AD, IPDGC PD (2012)), and non-psychiatric and non-neurological GWAS are shown (GLGC LDL). rg= genetic correlation

phenotype rg (se) nominal p-value cohort observed h2 significant after Bonferroni correction
PGC MDD 0.725 (0.093) 7.05E-15 0.128 *
PGC SCZ1 0.23 (0.042) 4.028E-08 0.543 *
PGC SCZ1+SWE 0.261 (0.036) 8.132E-13 0.411 *
PGC SCZ2 0.282 (0.03) 2.182E-21 0.371 *
PGC Bipolar Disorder 0.264 (0.049) 7.446E-08 0.350 *
IGAP AD −0.069 (0.071) 0.3331 0.039 ns
IPDGC PD (2012) 0.185 (0.091) 0.04123 0.200 ns
GLGC LDL 0.056 (0.031) 0.072 0.191 ns

Discussion

In this study we present a complementary approach to collecting large-scale genotypic data on major depression. By utilizing the self-report data on major depression from 23andMe, we were able to identify SNPs at a genome-wide level of significance associated with risk for depression in a cohort of European descent. Through a meta-analysis of the 23andMe data with PGC MDD GWAS and a joint-analysis with an independent 23andMe replication cohort, we identify 17 independent SNPs significantly associated with diagnosis of major depression. Through tissue and geneset enrichment analysis utilizing DEPICT, we find that these SNPs are predicted to be enriched in genes expressed in the CNS and function in transcriptional regulation related to neurodevelopment. We find no robust evidence for sex-specificity among our top results but this study combined both genders and only adjusted for sex as a covariate, and was therefore not structured to identify sex-specific loci. This would ideally be done through a sex-stratified GWAS.

Although the variance explained by these SNPs is small, we find that our cohorts identified by self-report of major depression are highly genetically correlated with cohorts identified by clinical interview, a result further corroborated by significant sign-test and effect size matching between the top 23andMe SNPs (nominal pval < 1×10−5) and their counterparts in PGC in self-report and clinical-interview datasets. To better understand the phenotypic characteristics of the 23andMe self-report subjects, we assessed reporting of medication use and comorbidities and found that all tested characteristics were significantly increased in the subjects reporting Depression, similar to what is seen in clinically ascertained subjects. Notably, many of the most significant SNPs show evidence of pleiotropy when examined in other clinically ascertained psychiatric disorders with the smallest p-values among individual SNPs seen for MDD SNPs in the PGC Schizophrenia and neuroticism datasets. This finding is unsurprising given the pleiotropy reported by other GWAS and cross-psychiatric analyses13, and lends further support to the relevance of a self-report phenotype to clinical disease.

We were unable to replicate the genome-wide significant loci identified in the recent CONVERGE study6 although we identified modest associations in each region (LHPP, rs145655839, minimum p-val= 0.0024 out of 6,204 SNPs in the region, and SIRT1, rs187810158, minimum p-val=0.0102 out of 5,111 SNPs in the region). This is unsurprising given that our study looked for genetic determinants of susceptibility in both males and females of European descent, and likely represented a very different population structure than that of the CONVERGE study of Han Chinese women.

Taken together, our results indicate the utility of complementary strategy to intensive phenotyping for identifying common variant associations with phenotypically heterogeneous neuropsychiatric diseases. The inter-rater reliability of lifetime MDD diagnosis even with structured interview is modest, with a kappa of 0.32–0.5723,24; conversely, the reliance on treatment-seeking patients in the present analysis rather than volunteers responding to advertisements lends additional face validity to the phenotype25. The finding in other large-scale analyses that cohorts ascertained based on treatment rather than structured interview yield similar associations12, and that such phenotypes are consistent with structured interview26, adds confidence to the validity of this approach12. In light of the massive impact of such disorders on health worldwide, any approach that can help elucidate pathophysiology merits consideration. The finding that a locus previously linked to other neuropsychiatric disease increases MDD risk also adds to a burgeoning literature indicating the pleiotropy of such risk genes.

Methods

Data Access

The full GWAS summary statistics for the 23andMe Discovery dataset will be made available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. Please contact David A. Hinds (dhinds@23andme.com) for more information and to apply to access the data. Information for the most significant 10,000 SNPs from the discovery 23andme GWAS are included in Supplementary Table 12.

Population and Study Design

Participants were part of the customer base of 23andMe, Inc., a consumer genetics company. This cohort has been described in detail elsewhere27,28. Participants provided informed consent and participated in the research online. The protocol was approved by an external AAHRPP accredited IRB, Ethical and Independent Review Services. The discovery cohort was selected from participant data available in January 2015, and the replication cohort was selected in January 2016 from additional data available at that time.

Genotyping, Quality Control, and Imputation

DNA extraction and genotyping were performed on saliva samples by National Genetics Institute (NGI), a CLIA licensed clinical laboratory and a subsidiary of Laboratory Corporation of America. Samples were genotyped on one of four genotyping platforms. The V1 and V2 platforms were variants of the Illumina HumanHap550+ BeadChip, including about 25,000 custom SNPs selected by 23andMe. The V3 platform was based on the Illumina OmniExpress+ BeadChip, with custom content to improve the overlap with theV2 array. The V4 platform use most recently is a fully custom array, including a lower redundancy subset of V2 and V3 SNPs with additional coverage of lower-frequency coding variation. The platforms contained 586,916; 584,942; 1,008,948; and 570,000 SNPs, respectively. Samples that failed to reach 98.5% call rate were re-analyzed. Individuals whose analyses failed repeatedly were re-contacted by 23andMe customer service to provide additional samples, as is done for all 23andMe customers.

Participant genotype data were imputed against the September 2013 release of 1000 Genomes Phase1 reference haplotypes, phased with ShapeIt229. We phased using an internally developed phasing tool, Finch, which implements the Beagle haplotype graph-based phasing algorithm30, modified to separate the haplotype graph construction and phasing steps. Finch extends the Beagle model to accommodate genotyping error and recombination, to handle cases where there are no consistent paths through the haplotype graph for the individual being phased. We constructed haplotype graphs for European samples on each 23andMe genotyping platform from a representative sample of genotyped individuals, and then performed out-of-sample phasing of all genotyped individuals against the appropriate graph.

In preparation for imputation, we split phased chromosomes into segments of no more than 10,000 genotyped SNPs, with overlaps of 200 SNPs. We excluded SNPs with Hardy-Weinberg equilibrium P<10−20, call rate < 95%, or with large allele frequency discrepancies compared to European 1000 Genomes reference data. Frequency discrepancies were identified by computing a 2×2 table of allele counts for European 1000 Genomes samples and 2000 randomly sampled 23andMe customers with European ancestry, and identifying SNPs with a chi squared P<10−15. We imputed each phased segment against all-ethnicity 1000 Genomes haplotypes (excluding monomorphic and singleton sites) using Minimac231, using 5 rounds and 200 states for parameter estimation.

For the X chromosome, we built separate haplotype graphs for the non-pseudoautosomal region and each pseudoautosomal region, and these regions were phased separately. We then imputed males and females together using Minimac2, as with the autosomes, treating males as homozygous pseudo-diploids for the non-pseudoautosomal region.

Ancestry Determination

We restricted the analysis to include individuals who have >97% European ancestry, as determined through an analysis of local ancestry32. Briefly, our algorithm first partitions phased genomic data into short windows of about 100 SNPs. Within each window, we use a support vector machine (SVM) to classify individual haplotypes into one of 31 reference populations. The SVM classifications are then fed into a hidden Markov model (HMM) that accounts for switch errors and incorrect assignments, and gives probabilities for each reference population in each window. Finally, we used simulated admixed individuals to recalibrate the HMM probabilities so that the reported assignments are consistent with the simulated admixture proportions. The reference population data is derived from public datasets (the Human Genome Diversity Project, HapMap, and 1000 Genomes), as well as 23andMe customers who have reported having four grandparents from the same country.

A maximal set of unrelated individuals was chosen for each analysis using a segmental identity-by-descent (IBD) estimation algorithm33. Individuals were defined as related if they shared more than 700 cM IBD, including regions where the two individuals share either one or both genomic segments identical-by-descent. This level of relatedness (roughly 20% of the genome) corresponds approximately to the minimal expected sharing between first cousins in an outbred population. When constructing the replication cohort, we identified unrelated individuals who were also unrelated to all individuals used in the discovery analysis.

We used principal component analysis (PCA) to characterize residual population structure in the subset of 23andMe participants with European ancestry. We computed principal components using 82,654 SNPs that were genotyped on all 23andMe array designs, with Hardy-Weinberg P > 1e–40, minor allele frequency > 0.01, call rate > 99%, and excluding regions of extended long range linkage disequilibrium. We used the ARPACK library34 to compute principal components using data for 519,914 individuals across all array designs; additional individuals were then projected onto this set of eigenvectors.

Supplementary Figure 2a shows the proportion of variance explained by each principal component, and Supplementary Figure 2b shows the proportion of each component’s variance that is explained by country of ancestry, for a set of individuals reporting four grandparents from a single country. The first 5 PCs are largely explained by geographic ancestry, while higher order PCs are not.

GWAS and Meta-analysis

In the GWAS and replication analysis, we computed association test results by logistic regression assuming additive allelic effects. For tests using imputed data, we use the imputed dosages rather than best-guess genotypes. We included covariates for age, gender, and the top 5 principal components to account for residual population structure. While we could justify the choice of 5 PCs based on the preceding ancestry analysis, we actually chose to use 5 based on computational considerations, and others have noted this to be a reasonable choice35.

For quality control of genotyped GWAS results, we removed SNPs that were only genotyped on our “V1” and/or “V2” platforms due to small sample size, and SNPs on chrM or chrY because many of these are not genotyped reliably. Using trio data, we flagged SNPs that failed a test for parent-offspring transmission; specifically, we regressed the child’s allele count against the mean parental allele count and flagged SNPs with fitted β<0.6 and P<10−20 for a test of β<1. We removed SNPs with a Hardy-Weinberg P<10−20 in Europeans; or a call rate of <90%. We also tested genotyped SNPs for genotype date effects, and removed SNPs with P<10−50 by ANOVA of SNP genotypes against a factor dividing genotyping date into 20 roughly equal-sized buckets. For imputed GWAS results, we removed SNPs with average r2<0.5 or minimum r2<0.3 in any imputation batch, as well as SNPs that had strong evidence of an imputation batch effect. The batch effect test is an F test from an ANOVA of the SNP dosages against a factor representing imputation batch; we removed results with P<10−50. Prior to GWAS, we identified, for each SNP, the largest subset of the data passing these criteria, based on their original genotyping platform – either v2+v3+v4, v3+v4, v3, or v4 only – and computed association test results for whatever was the largest passing set. After quality control, the 23andMe discovery GWAS included results for 13,474,321 imputed variants, and 60,949 genotyped variants that did not have imputed results passing our filters, for a total of 13,535,270 variants. Of these, 15,774 test results could not be computed due to logistic regression fitting problems, leaving 13,519,496 tests. HWE and batch-effect pvalues are presented in Supplementary Table 13.

Results from 23andMe were adjusted for variance inflation by multiplying the variance (i.e. square of the standard error) of each genetic effect estimate by the intercept of 1.0598 as calculated by LD score regression20. Meta-analysis with PGC was conducted by inverse-variance fixed effects meta-analysis on overlapping SNPs after adjusting the standard errors of each individual analysis for its own lambda (LD score regression intercept in PGC was 1.0243). Final results from the meta-analysis were further adjusted for the overall LD score regression intercept of 1.0025 (for more details on LD score regression methods see section on LD score regression).

LD score regression

We calculated LD scores (LD Score (LDSC) version 1.0.0) as previously described using the European 1000 Genomes reference panel (phase 3 version 5a) with a minor allele frequency cutoff for SNP inclusion greater than 5%. GWAS summary statistics data was collected from the following resources: Psychiatric Genomics Consortium (MDD, Bipolar Disorder, SCZ1, SCZ1+SWE, SCZ2), the International Genomics of Alzheimer’s Project (IGAP AD), the International Parkinson Disease Genomics Consortium (IPDGC PD), and the Global Lipids Genetics Consortium (GLGC LDL). GWAS data was harmonized using the munge_sumstats.py function, (using the SNP list derived from LD score calculation) and genomic inflation control intercepts were calculated for the 23andMe MDD data, PGC MDD data, and PGC+23andMe meta-analysis data using the ldsc.py function (using all default settings and options). Additionally, we calculated liability heritability estimates for the meta-analysis using the same function, with a population prevalence estimation of 15%, and 25% as previously described13. Finally, we calculated the cross-trait regression between 23andMe MDD GWAS and the PGC datasets, the IGAP data, the IPDGC data, and the GLGC data.

Trait Ascertainment

Subjects with depression were identified through self-report in web-based surveys. A total of six survey data sources were used to compose the depression phenotype:

  1. (Your Medical History survey: 2009–2013) “Have you ever been diagnosed by a doctor with any of the following psychiatric conditions?” (options for Depression: Yes, No, I don’t know)

  2. (Research Snippet: 2010–2014) “Have you ever been diagnosed with clinical depression?” (answers: Yes, No, I’m not sure)

  3. (Health Intake survey, unbranched: 2014–2015) “Have you ever been diagnosed with or treated for any of the following conditions?” (options for Depression: Yes, No, I’m not sure)

  4. (Health Intake survey, branched: 2013)
    • 4a
      “Have you ever been diagnosed or treated for any of the following conditions?” (“A mental health or psychiatric condition”: Yes, No, I’m not sure)
    • 4b
      “What mental health problems have you had? Please check all that apply” (check box: Depression)
  5. (Health Profile survey: 2015–2016)
    • 5a
      “Have you ever been diagnosed with or treated for any of the following conditions? Anxiety, Attention deficit disorders, Bipolar disorder / manic depression, Depression, Eating disorder (such as anorexia or bulimia)” (answers: Yes, No, I’m not sure)
    • 5b
      “Have you ever been diagnosed with or treated for depression?” (answers: Yes, No, I’m not sure)
  6. (Health Followup survey: 2014–2015) “In the last 2 years, have you been newly diagnosed with or started treatment for any of the following conditions?” (options for Depression: Yes, No, I’m not sure)

Sources 1, 3, 4, and 5 represent four different iterations of a general medical history survey, administered over successive time periods from 2009 to 2016. Source 2 used a different mechanism for presenting individual questions to participants outside of the context of a formal survey. Source 6 was a survey administered to a subset of participants at least a year after they had completed one of the Health Intake surveys.

Sources 1 to 5 were combined by keeping the first non-missing response among these sources for each participant, evaluated in the specified order (the “coalesced response”). We then incorporated responses to source 6, by defining cases as the union of cases from the coalesced response and cases from source 6; and defining controls as individuals who were controls for either and cases for neither of these.

For the branched data sources 4 and 5, participants were first asked a screening question (4a or 5a), and if they answered affirmatively, were asked a specific follow-up question (4b or 5b). Cases were defined as positive responses to the follow-up question, and controls were the union of ‘no’ responses to either the screening or follow-up questions.

As a result of the staging of the discovery and replication analyses, the discovery cohort did not include any responses from source 5, and the replication cohort consisted almost entirely of responses from sources 3 or 5.

In survey sources 1,3,4, and 5, we also asked for an age of first diagnosis of depression. This data was provided by a majority of participants, including 74% of cases in the discovery cohort and 85% of cases in the replication cohort.

We used Cohen’s Kappa to assess agreement across responses for sources 1 to 5, taking advantage of participants who had responded to more than one of the survey data sources (Table 14).

Agreement was good in most comparisons (κ > 0.7), but was somewhat worse for comparisons with branched source 4 (κ between 0.5 and 0.7). Source 4 systematically under-called cases compared to the other sources, apparently due to the wording of the screening question. This tendency is partially mitigated in the logic for the combined phenotype, where we preferentially use responses to sources 1 to 3 if available.

The logic for composing the depression phenotype in this way was based on several considerations. For most participants (>95%), we have either just one response, or the available responses are all in agreement, so a deeper analysis of the mismatch data was unlikely to substantially affect downstream results. Our strategy of selecting one response per participant without regard for their other responses also seemed least likely to introduce bias in classification of participants who provided multiple responses.

Secondary Phenotypes

A set of common co-morbidities of depression were defined based on responses to single questions, as follows:

  • Anxiety (Health Intake survey, unbranched, 2014–2015): “Have you ever been diagnosed with or treated for anxiety?” (Yes, No, I don’t know)

  • Panic attacks (Health Intake survey, unbranched, 2014–2015): “Have you ever been diagnosed with or treated for panic attacks?” (Yes, No, I don’t know)

  • Insomnia (Research Snippet, 2013–2016): “Do you routinely have trouble getting to sleep at night?” (Yes, No, I don’t know)

  • Taking an SSRI (Research Snippet, 2013–2016): “Are you currently taking an SSRI (selective serotonin reuptake inhibitor) for any reason?” (options: Yes, No, I don’t know)

  • Ever taken medication for a mental health condition; prescription sleep aids; or prescription pain medication (Health Intake survey, unbranched, 2014–2015): “Have you ever taken these medications?” “Medications to treat depression or anxiety or another mental health condition”, “Prescription sleep aids”, “Prescription pain medications” (checkbox for each category)

Overweight and obesity were defined based on BMI (>27, >30), computed from self-reported height and weight, which were collected using fill-in forms in multiple survey contexts.

Associations with secondary phenotypes and age-of-onset

We computed genetic risk scores based on the 17 SNPs with p-values < 5e–8 in the joint analysis of 23andMe discovery, PGC, and 23andMe replication results, as a linear combination of independent single-SNP effect sizes estimated from that joint-analysis (Supplementary Table 2). We tested each secondary phenotype for association with these scores in the combined 23andMe discovery and replication cohorts; we tested for effects on age-of-onset in depression cases only (Table 3). For age-of-onset, we defined “early onset” as onset before age 30, and fit this binary outcome by logistic regression; we also fit a model for continuous age-of-onset using linear regression. In all these tests, we included covariates for age, gender, five PCs, and depression case/control status. In this way, we were testing for residual association not explained by depression status, and thus these associations are independent of the data that was used to identify these 17 variants. Separately, we tested each of the 17 SNPs individually for association with this same set of phenotypes, including the same covariates (Supplementary Table 7).

DEPICT Functional Analysis

We utilized DEPICT18 to determine the most likely causal gene at each of the depression-associated loci, and to assess reconstituted gene sets enriched for and tissues highly expressing those genes. The reconstituted gene sets used in the analysis are derived from publicly available gene set annotations, which are then integrated with 77,840 gene expression arrays36 to predict which other genes are likely to be part of these gene sets.

For the analysis, we selected SNPs significantly associated to depression at p < 1 × 10−5. After clumping those SNPs using 500 kb flanking regions and an LD cutoff threshold r2 > 0.1, 63 independent SNP signals were identified from 816 top variants. These 63 top SNPs were further merged into 59 non-overlapping loci containing 157 genes, which were then assessed using the DEPICT algorithm for gene set and tissue enrichment18. Results shown in Supplementary Table 3 are not corrected for multiple testing.

Supplementary Material

1
supp_table12
supp_tables

URL list.

Acknowledgments

We would like to thank the research participants and employees of 23andMe for making this work possible. The authors thank the investigators and patient participants of the Psychiatric Genomic Consortium Major Depressive Disorder for making the PGC-MDD phase 1 results available for download. Dr. Perlis is supported in part by the National Institute of Mental Health and the National Human Genome Research Institute (P50 MH106933). We also thank the Social Science Genetics Association Consortium (SSGAC) for sharing results for Subjective Well-being, Depressive Symptoms, and Neuroticisim.

Financial Disclosures

RHP has served on scientific advisory boards for or consulted to Genomind; Healthrageous; Perfect Health; Proteus Biomedical; Psybrain; and RID Ventures. and receives royalties through Massachusetts General Hospital from Concordant Rater Systems (now Bracket). CH, XC, MWN, SAP are all employees and stockholders of Pfizer, Inc. CT, DAH, and JYT are employees of and own stock or stock options in 23andMe, Inc. ARW is a former employee and stockholder of Pfizer, Inc., and a current employee of the Perelman School of Medicine at the University of Pennsylvania Orphan Disease Center in partnership with the Loulou Foundation. JRW is a former employee and stockholder of Pfizer, Inc, and a current employee and stock holder of Nestlé Health Science.

Footnotes

Author Contributions

ARW, CLH, JRW conceived meta-analysis and statistical analysis. ARW, CLH, RHP oversaw dataset analysis and primary data interpretation. CLH designed and performed meta-analysis and further statistical analysis of the three datasets. XC provided statistical support and data visualization for the meta-analysis. MWN provided DEPICT functional annotation and LD score regression analyses. RHP, ARW, CLH wrote the manuscript. ARW, RHP, CLH, DAH, SAP, MWN provided data interpretation and revised the manuscript. JYT, DAH conceived and designed the 23andMe MDD GWAS. DAH and CT performed GWAS for 23andMe datasets and statistical support.

References

  • 1.Angst F, Stassen HH, Clayton PJ, Angst J. Mortality of patients with mood disorders: follow-up over 34–38 years. J Affect Disord. 2002;68:167–81. doi: 10.1016/s0165-0327(01)00377-9. [DOI] [PubMed] [Google Scholar]
  • 2.Lopez AD, Mathers CD, Ezzati M, Jamison DT, Murray CJ. Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data. Lancet. 2006;367:1747–57. doi: 10.1016/S0140-6736(06)68770-9. [DOI] [PubMed] [Google Scholar]
  • 3.Wittchen HU, et al. The size and burden of mental disorders and other disorders of the brain in Europe 2010. Eur Neuropsychopharmacol. 2011;21:655–79. doi: 10.1016/j.euroneuro.2011.07.018. [DOI] [PubMed] [Google Scholar]
  • 4.Lichtenstein P, et al. Recurrence risks for schizophrenia in a Swedish national cohort. Psychol Med. 2006;36:1417–25. doi: 10.1017/S0033291706008385. [DOI] [PubMed] [Google Scholar]
  • 5.Sullivan PF, Kendler KS, Neale MC. Schizophrenia as a complex trait: evidence from a meta-analysis of twin studies. Arch Gen Psychiatry. 2003;60:1187–92. doi: 10.1001/archpsyc.60.12.1187. [DOI] [PubMed] [Google Scholar]
  • 6.consortium, C. Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature. 2015;523:588–91. doi: 10.1038/nature14659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hawrylycz MJ, et al. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature. 2012;489:391–9. doi: 10.1038/nature11405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Le Meur N, et al. MEF2C haploinsufficiency caused by either microdeletion of the 5q14.3 region or mutation is responsible for severe mental retardation with stereotypic movements, epilepsy and/or cerebral malformations. J Med Genet. 2010;47:22–9. doi: 10.1136/jmg.2009.069732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Paciorkowski AR, et al. MEF2C Haploinsufficiency features consistent hyperkinesis, variable epilepsy, and has a role in dorsal and ventral neuronal developmental pathways. Neurogenetics. 2013;14:99–111. doi: 10.1007/s10048-013-0356-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Barbosa AC, et al. MEF2C, a transcription factor that facilitates learning and memory by negative regulation of synapse numbers and function. Proc Natl Acad Sci U S A. 2008;105:9391–6. doi: 10.1073/pnas.0802679105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wilkinson MB, et al. Imipramine treatment and resiliency exhibit similar chromatin regulation in the mouse nucleus accumbens in depression models. J Neurosci. 2009;29:7820–32. doi: 10.1523/JNEUROSCI.0932-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Schizophrenia Working Group of the Psychiatric Genomics, C. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511:421–7. doi: 10.1038/nature13595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Cross-Disorder Group of the Psychiatric Genomics, C. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet. 2013;381:1371–9. doi: 10.1016/S0140-6736(12)62129-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Major Depressive Disorder Working Group of the Psychiatric, G.C. et al. A mega-analysis of genome-wide association studies for major depressive disorder. Mol Psychiatry. 2013;18:497–511. doi: 10.1038/mp.2012.21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Whitehead A. Meta-analysis of controlled clinical trials. John Wiley & Sons; Chichester; New York: 2002. p. xiv, 336. [Google Scholar]
  • 16.Agoston Z, et al. Meis2 is a Pax6 co-factor in neurogenesis and dopaminergic periglomerular fate specification in the adult olfactory bulb. Development. 2014;141:28–38. doi: 10.1242/dev.097295. [DOI] [PubMed] [Google Scholar]
  • 17.Agoston Z, Li N, Haslinger A, Wizenmann A, Schulte D. Genetic and physical interaction of Meis2, Pax3 and Pax7 during dorsal midbrain development. BMC Dev Biol. 2012;12:10. doi: 10.1186/1471-213X-12-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Pers TH, et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun. 2015;6:5890. doi: 10.1038/ncomms6890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Purcell S, Cherny SS, Sham PC. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics. 2003;19:149–50. doi: 10.1093/bioinformatics/19.1.149. [DOI] [PubMed] [Google Scholar]
  • 20.Bulik-Sullivan BK, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47:291–5. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Okbay A, et al. Genetic Associations with Subjective Well-Being Also Implicate Depression and Neuroticism. bioRxiv. 2015;032789 [Google Scholar]
  • 22.Psychiatric, G.C.B.D.W.G. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat Genet. 2011;43:977–83. doi: 10.1038/ng.943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Keller MB, et al. Results of the DSM-IV mood disorders field trial. Am J Psychiatry. 1995;152:843–9. doi: 10.1176/ajp.152.6.843. [DOI] [PubMed] [Google Scholar]
  • 24.Regier DA, et al. DSM-5 field trials in the United States and Canada, Part II: test-retest reliability of selected categorical diagnoses. Am J Psychiatry. 2013;170:59–70. doi: 10.1176/appi.ajp.2012.12070999. [DOI] [PubMed] [Google Scholar]
  • 25.Wisniewski SR, et al. Can phase III trial results of antidepressant medications be generalized to clinical practice? A STAR*D report. Am J Psychiatry. 2009;166:599–607. doi: 10.1176/appi.ajp.2008.08071027. [DOI] [PubMed] [Google Scholar]
  • 26.Castro VM, et al. Validation of electronic health record phenotyping of bipolar disorder cases and controls. Am J Psychiatry. 2015;172:363–72. doi: 10.1176/appi.ajp.2014.14030423. [DOI] [PMC free article] [PubMed] [Google Scholar]

Methods-only References

  • 27.Eriksson N, et al. Web-based, participant-driven studies yield novel genetic associations for common traits. PLoS Genet. 2010;6:e1000993. doi: 10.1371/journal.pgen.1000993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Tung JY, et al. Efficient replication of over 180 genetic associations with self-reported medical data. PLoS One. 2011;6:e23473. doi: 10.1371/journal.pone.0023473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Genomes Project, C. et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet. 2007;81:1084–97. doi: 10.1086/521987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Fuchsberger C, Abecasis GR, Hinds DA. minimac2: faster genotype imputation. Bioinformatics. 2015;31:782–4. doi: 10.1093/bioinformatics/btu704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Durand EY, Do CB, Mountain JL, Macpherson JM. Ancestry composition: a novel, efficient pipeline for ancestry deconvolution. BioRxiv. 2014 [Google Scholar]
  • 33.Henn BM, et al. Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS One. 2012;7:e34267. doi: 10.1371/journal.pone.0034267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Lehoucq RB, Sorensen DC, Yang C. ARPACK users’ guide : solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods. SIAM; Philadelphia: 1998. p. xv, 142. [Google Scholar]
  • 35.Tucker G, Price AL, Berger B. Improving the power of GWAS and avoiding confounding from population stratification with PC-Select. Genetics. 2014;197:1045–9. doi: 10.1534/genetics.114.164285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Fehrmann RS, et al. Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat Genet. 2015;47:115–25. doi: 10.1038/ng.3173. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
supp_table12
supp_tables

RESOURCES