Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Jul 14.
Published in final edited form as: Nat Genet. 2019 Jan 14;51(2):237–244. doi: 10.1038/s41588-018-0307-5

Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use

Mengzhen Liu 1,#, Yu Jiang 2,3,#, Robbee Wedow 4,5,6,#, Yue Li 7,8,#, David M Brazel 4,9,10, Fang Chen 2,3, Gargi Datta 1, Jose Davila-Velderrain 7,8, Daniel McGuire 2,3, Chao Tian 11, Xiaowei Zhan 12,13; 23andMe Research Team14; HUNT All-In Psychiatry14, Hélène Choquet 15, Anna R Docherty 16,17, Jessica D Faul 18, Johanna R Foerster 19, Lars G Fritsche 19, Maiken Elvestad Gabrielsen 20, Scott D Gordon 21, Jeffrey Haessler 22, Jouke-Jan Hottenga 23, Hongyan Huang 24,25, Seon-Kyeong Jang 1, Philip R Jansen 26,27, Yueh Ling 2,9, Reedik Mägi 28, Nana Matoba 29, George McMahon 30, Antonella Mulas 31, Valeria Orrù 31, Teemu Palviainen 32, Anita Pandit 19, Gunnar W Reginsson 33, Anne Heidi Skogholt 20, Jennifer A Smith 18,34, Amy E Taylor 30, Constance Turman 24,25, Gonneke Willemsen 23, Hannah Young 1, Kendra A Young 35, Gregory J M Zajac 19, Wei Zhao 34, Wei Zhou 36, Gyda Bjornsdottir 33, Jason D Boardman 4,5,6, Michael Boehnke 19, Dorret I Boomsma 23, Chu Chen 22, Francesco Cucca 31, Gareth E Davies 37, Charles B Eaton 38, Marissa A Ehringer 4,39, Tõnu Esko 8,28, Edoardo Fiorillo 31, Nathan A Gillespie 16,21, Daniel F Gudbjartsson 33,40, Toomas Haller 28, Kathleen Mullan Harris 41,42, Andrew C Heath 43, John K Hewitt 4,44, Ian B Hickie 45, John E Hokanson 35, Christian J Hopfer 4,46, David J Hunter 24,25,47, William G Iacono 1, Eric O Johnson 48, Yoichiro Kamatani 29, Sharon L R Kardia 34, Matthew C Keller 4,44, Manolis Kellis 7,8, Charles Kooperberg 22, Peter Kraft 24,25,49, Kenneth S Krauter 4,9, Markku Laakso 50,51, Penelope A Lind 52, Anu Loukola 32, Sharon M Lutz 53, Pamela A F Madden 43, Nicholas G Martin 21, Matt McGue 1, Matthew B McQueen 4,39, Sarah E Medland 52, Andres Metspalu 28, Karen L Mohlke 54, Jonas B Nielsen 55, Yukinori Okada 29,56, Ulrike Peters 22,57, Tinca J C Polderman 26, Danielle Posthuma 26,58, Alexander P Reiner 22,57, John P Rice 59, Eric Rimm 25,60, Richard J Rose 61, Valgerdur Runarsdottir 62, Michael C Stallings 4,44, Alena Stančáková 50, Hreinn Stefansson 33, Khanh K Thai 15, Hilary A Tindle 63, Thorarinn Tyrfingsson 62, Tamara L Wall 64, David R Weir 18, Constance Weisner 15, John B Whitfield 21, Bendik Slagsvold Winsvold 65, Jie Yin 15, Luisa Zuccolo 30,66, Laura J Bierut 59, Kristian Hveem 20,67,68, James J Lee 1, Marcus R Munafo 66,69, Nancy L Saccone 70, Cristen J Willer 36,55,71, Marilyn C Cornelis 72, Sean P David 73, David Hinds 12, Eric Jorgenson 15, Jaakko Kaprio 32,74, Jerry A Stitzel 4,39, Kari Stefansson 33,75, Thorgeir E Thorgeirsson 33, Goncalo Abecasis 19, Dajiang J Liu 2,3,*, Scott Vrieze 1,*
PMCID: PMC6358542  NIHMSID: NIHMS1511852  PMID: 30643251

Abstract

Tobacco and alcohol use are leading causes of mortality that influence risk for many complex diseases and disorders1. They are heritable2,3 and etiologically related4,5 behaviors that have been resistant to gene discovery efforts611. In sample sizes up to 1.2 million individuals, we discovered 566 genetic variants in 406 loci associated with multiple stages of tobacco use (initiation, cessation, and heaviness) as well as alcohol use, with 150 loci evidencing pleiotropic association. Smoking phenotypes were positively genetically correlated with many health conditions, whereas alcohol use was negatively correlated with these conditions, such that increased genetic risk for alcohol use is associated with lower disease risk. We report evidence for the involvement of many systems in tobacco and alcohol use, including genes involved in nicotinic, dopaminergic, and glutamatergic neurotransmission. The results provide a solid starting point to evaluate the effects of these loci in model organisms and more precise substance use measures.


An analysis overview is provided in Supplementary Figure 1; all independent associated variants are in Supplementary Tables 1–5; and Quantile-Quantile (QQ), Manhattan, and LocusZoom plots are shown in Supplementary Figures 2–12. Smoking initiation phenotypes included age of initiation of regular smoking (AgeSmk; N=341,427; 10 associated variants) and a binary phenotype indicating whether an individual had ever smoked regularly (SmkInit, N=1,232,091; 378 associated variants). Heaviness of smoking was measured with cigarettes per day (CigDay; N=337,334; 55 associated variants). Smoking cessation (SmkCes, N=547,219; 24 associated variants) was a binary variable contrasting current versus former smokers. Available measures of alcohol use were simpler, with drinks per week (DrnkWk; N=941,280; 99 associated variants) widely available and similarly measured across studies. See the Supplementary Note and Supplementary Tables 6–7 for phenotype definition details.

The four smoking phenotypes were genetically correlated with one another (Figure 1; Supplementary Table 8). DrnkWk was not highly genetically correlated with the smoking phenotypes (rg~.10) except for SmkInit (rg~.34, p=6.7e−63), suggesting that sequence variation affecting alcohol use and those affecting initiation of smoking overlap substantially. The phenotypes were highly genetically correlated across constituent studies (Supplementary Table 9), suggesting minor impact of phenotypic heterogeneity in the present results, even across Western Europe and the United States. Smoking phenotypes were genetically correlated in expected directions with many behavioral, psychiatric, and medical phenotypes (Figure 1, Supplementary Table 10). Genetic variation associated with increased alcohol use was associated with greater levels of risky behavior (rg=.20, p=1.8×10−7) and cannabis use (rg=.36, p=6.2×10−10), but with less risk of disease, for almost all diseases (Figure 1, Supplementary Table 10).

Figure 1. Genetic correlations between substance use phenotypes and phenotypes from other large genome-wide association studies.

Figure 1.

Genetic correlations between each of the phenotypes are shown in the first 5 rows, with heritability estimates displayed down the diagonal. All genetic correlations and heritability estimates were calculated using LD Score Regression. Blue shading represents negative genetic correlations, and red shading represents positive correlations, with increasing color intensity reflecting increasing strength of a correlation. A single asterisk reflects significant genetic correlations at the p<.05 level. Double asterisks reflect significant genetic correlations at the Bonferroni-correction p<.000278 level (corrected for 180 independent tests). Note that SmkCes was oriented such that higher scores reflected current smoking, and for AgeSmk lower scores reflect earlier ages of initiation, both of which are typically associated with negative outcomes. AgeSmk=Age of Initiation of Smoking; CigDay=Cigarettes per Day; SmkInit=Smoking Initiation; SmkCes=Smoking Cessation; DrnkWk=Drinks per Week.

Using a novel method to evaluate multivariate genetic correlation at the locus (versus global) level, we observed 150 loci that affected multiple substance use phenotypes (Supplementary Table 11), summarized in Figure 2. Patterns of pleiotropy across phenotypes were highly diverse, with only three loci significantly associated with all five phenotypes. These three loci included associations implicating Phosphodiesterase 4B (PDE4B) and Cullin 3 (CUL3). PDE4B regulates the cAMP second messenger availability and thereby affects signal transduction, and is down-regulated by chronic nicotine administration in rats12. CUL3 has wide-ranging effects, including on ubiquination and protein degradation, and de novo mutations in CUL3 are associated with rare diseases affecting response to the mineralocorticoid aldosterone13, which itself is affected by smoking14 and associated with alcohol use15. In addition to testing for pleiotropy, we also used MTAG16 to leverage the observed genetic correlations to increase power for locus discovery. Using this method, we discovered 1,193 independent, genome-wide significantly associated common variants (MAF > 1%; 173 for AgeSmk, 89 CigDay, 83 SmkCes, 692 SmkInit, and 156 DrnkWk) listed in Supplementary Table 12 and described further in the supplement.

Figure 2. Pleiotropy.

Figure 2.

Depicted here are results from the multivariate analysis of pleiotropy. For each locus, the method returns the best fitting solution of which phenotypes were associated with that locus. All loci with one or more associated phenotypes are shown here. For example, every locus associated with AgeSmk was found to be pleiotropic for other phenotypes (green, blue, red, purple, and fuchsia bars), and no locus showed association with only AgeSmk (no dark grey bar for AgeSmk). When sample sizes are unequal across phenotypes, the method also improves power for those phenotypes with smaller samples. The total number of loci associated with each trait (whether pleiotropic or not) from these analyses was 40 (AgeSmk), 48 (SmkCes), 72 (CigDay), 111 (DrnkWk), and 278 (SmkInit). Full information is in Supplementary Table 11.

Phenotypic variation accounted for by our initial 566 conditionally independent genome-wide significant variants from the initial GWAS ranged from 0.1% (SmkCes) to 2.3% (SmkInit; see Figure 3). SNP heritability calculated using LD Score Regression17 ranged from 4.2% for DrnkWk to 8.0% for CigDay (Figure 3; Supplementary Table 13), consistent with estimates using individual-level data18, SNP heritabilities calculated from the largest individual contributing studies (Supplementary Table 13), and prior work19. The results suggest that these phenotypes are highly polygenic and the majority of the heritability is accounted for by variants below standard GWAS thresholds.

Figure 3. Heritability and polygenic prediction.

Figure 3.

The light gray bars reflect SNP heritability, estimated with LD Score Regression. The light blue and gold bars reflect the predictive power of polygenic risk scores in Add Health and the Health and Retirement Study (HRS), respectively. Despite the 41-year generational gap between participants from these two studies, and major tobacco-related policy changes during that time, the polygenic scores are similarly predictive in both samples. Error bars are 95% confidence intervals estimated with 1000 bootstrapped repetitions. Dark gray bars represent the total phenotypic variance explained by only genome-wide significant SNPs. H2=heritability.

To further investigate the polygenicity, polygenic risk scores (Supplementary Table 14) were computed on the Add Health20 and Health and Retirement Study21 datasets, which are representative of their birth cohorts in the United States, and represent exposures to different tobacco policy environments. Add Health participants were born, on average, in 1979; average birth year in the Health and Retirement Study was 1938. Despite these generational differences, the polygenic score performed similarly in both samples. It accounted for approximately 1%, 4%, 1%, 4%, and 2.5% of variance in AgeSmk, CigDay, SmkCes, SmkInit, and DrnkWk, respectively, about half of the estimated SNP heritability of these traits (Figure 3). More concretely, in Add Health and the Health and Retirement study, respectively, a one SD increase in the CigDay risk score resulted in two and three additional daily cigarettes; a one SD increase on the SmkInit risk score resulted in a 12% and 10% increased risk of regularly smoking; and a one SD increase on the DrnkWk risk score reflected one additional drink per week in both datasets.

Cell/tissue enrichment22 was observed across all five phenotypes within core histone marks from multiple central nervous system (CNS) tissues (Supplementary Figures 13–15, Supplementary Tables 15–16). Enrichment was observed in tissues from cortical and sub-cortical regions in the CNS. Structure and function of these regions have been robustly associated with individual differences in frequencies, magnitudes, and clinical characteristics of alcohol use, and substance use/misuse generally, in human imaging research. Our results include significant enrichment across phenotypes and histone marks in the hippocampus23, inferior temporal pathways24, dorsolateral and medial prefrontal cortex25, caudate, and striatum26. Consistent with gene and pathway findings described below, alcohol and nicotine use affect dopaminergic and glutamatergic neurotransmission among these brain regions, compromising reward-based learning and facilitating drug seeking behavior26. Enrichment within other cell/tissue groups and specific cell/tissue types included immune and liver cells but were less consistent across analytical approaches.

We manually reviewed all genes implicated by the GWAS or gene-based tests (see Supplementary Tables 1–5 for the full catalogue of implicated genes; Supplementary Tables 17–21 for gene and gene-set test results). We replicated known associations between multiple variants in nicotine metabolism gene CYP2A6 with CigDay (p=4.0×10−99) and SmkCes (p=1.6×10−48). We replicated an association signal in alcohol metabolism gene ADH1B associated with DrnkWk, identifying in that locus 11 conditionally independently associated variants (lowest p<2.2×10−303).

All drugs of abuse activate the mesolimbic dopamine system reward pathway27, and dopamine-related genes have long been popular candidate genes. We found that variants near the widely studied dopamine receptor D2 (DRD2)28 were associated across phenotypes, including CigDay, SmkCes, and DrnkWk (p=6.5×10−12, 1.1×10−10, and 4.9×10−11, respectively) but not with AgeSmk or SmkInit, suggesting that these variants are less relevant in early stages of nicotine use. Other specific dopamine-related genes only showed associations with smoking phenotypes, including multiple associations between CigDay and SmkCes with dopamine beta-hydroxylase (DBH, p=9.8×10−24 and 1.2×10−35, respectively)9, an enzyme necessary to convert dopamine to norepinephrine. SmkInit was associated with variation near protein phosphatase 1 regulatory subunit 1B (PPP1R1B, p=3.9×10−8), a signal transduction gene that affects synaptic plasticity and reward-based learning in the striatum29,30 and contributes to the behavioral effects of nicotine in mice31. In pathway analyses, dopamine gene sets were enriched only in SmkInit, where the exemplar pathway ‘reactome dopamine neurotransmitter release cycle’ pathway was enriched (p=9.2×10−5; Figure 4; Supplementary Table 18).

Figure 4. Correlations among exemplary DEPICT gene sets.

Figure 4.

There were 68 clusters available for Smoking initiation and 10 for Drinks Per Week (CigDay, AgeSmk, and SmkCes did not have > 1 exemplary sets.) Blue shading represents positive correlations, and red shading represents negative correlations, with increasing color intensity reflecting increasing strength of a correlation. Cluster names are truncated for space, with a full list of all names in Supplementary Table 18. The number after each name is the number of gene sets in each cluster. The matrix naturally falls into three blue superclusters along the diagonal. The largest supercluster contains primarily gene sets related to neurotransmitter receptors, ion channels (sodium, potassium, calcium), learning/memory, and other aspects of CNS function. The middle supercluster includes gene sets defined by regulation of transcription and translation, including RNA binding and transcription factor activity. The final supercluster is composed primarily of gene sets related to development of the nervous system.

Neuronal acetylcholine nicotinic receptors are the initial site of nicotine action in the brain and have long been implicated in nicotine use and dependence32. With the exception of CHRNA7, all CNS-expressed nicotinic receptor genes were significantly associated with one or more smoking phenotypes, many reported here for the first time. Enrichment was also noted for nicotinic receptor-related pathways and genes in smoking phenotypes (Supplementary Tables 17–21). There was no evidence of association between nicotinic receptor genes or pathways with DrnkWk, despite the use of nicotinic receptor partial agonists (e.g., varenicline) in the treatment of alcohol dependence33.

Associations with SmkInit highlighted structures and functions related to long-term potentiation and reward-related learning and memory, systems that affect reward processing and addiction28,34,35. Glutamate is an important neurotransmitter mediating these processes, and exemplar pathways related to glutamate were significantly enriched in SmkInit (e.g., ‘extracellular-glutamate-gated ion channel’, p=9.9×10−7; ‘post-NMDA receptor activation events’, p=5.5×10−5; and ‘DLG4 PPI subnetwork’, p=4.5×10−12; Supplementary Table 18). DLG4 affects NMDA receptors and potassium channel clusters, and plays a central role in glutamatergic models of reward-related learning35. Individual associated genes related to these pathways included glutamate ionotropic receptor NMDA type subunit 2 (GRIN2A, p=3.4×10−11) and homer scaffolding protein 2 (HOMER2, p=3.1×10−14), which affects addictive behavior in mice35,36 and regulates glutamate metabotropic receptor 1 (GRM1). Pathways enriched in SmkInit also included sodium, potassium, and calcium voltage-gated channels (Figure 4, Supplementary Table 18), essential to neuronal excitability and signaling.

Alcohol is known to affect glutamatergic signaling pathways37, and over half of the enriched pathways for DrnkWk clustered within the exemplar ‘glutamate ionotropic receptor kainate type subunit 2 (GRIK2) PPI subnetwork’ (Figure 4, Supplementary Table 18). Not all DrnkWk-enriched pathways involved the brain, however, as glucose and carbohydrate processing pathways were associated with DrnkWk but no smoking phenotype, perhaps suggesting that alcohol consumption is influenced by individual differences in one’s ability to process calorie-rich alcoholic beverages. Finally, we discovered variation in and around gene rich regions including corticotropin releasing hormone receptor 1 (CRHR1; p=1.6×10−17) and urocortin (UCN; p=8.1×10−45), associated with DrnkWk but not smoking. UCN encodes an endogenous ligand for CRHR1 and CRHR238. CRH affects hormones involved in the stress response, including cortisol, and has been associated with the stress response and relapse to drug taking in animals39,40.

Specific mechanisms by which implicated genes influence substance use in humans are largely unknown, even for those genes reported above involving systems such as neurotransmission, reward-related learning and memory, and the stress response. To prioritize genes for functional experimentation, we tabulated conditionally independent genome-wide significant nonsynonymous variants (Table 1). In the 406 GWAS loci, 4% of sentinel variants were nonsynonymous, representing a significant enrichment (p=2.5×10−10; 0.4% of variants with MAF>0.1% in the imputation panel41 were nonsynonymous). Several genes in Table 1 have been previously associated with substance use/addiction (see Supplementary Table 22 for a list of previous associations), and two variants have been functionally validated (rs1229984 and rs16969968)42,43. The others have not, but in some cases their genes interact with established molecular targets of addiction and may themselves be suitable targets for further investigation. For example, rs1024323 in G protein-coupled receptor (GPCR) kinase 4 (GRK4) was associated with CigDay (p=8.7×10−9) and lies within a locus associated with AgeSmk. GRK4 is involved in the regulation of GPCRs including metabotropic glutamate receptor 1 (GRM1)44, GABAB receptors45, and dopamine receptor D1 (DRD1) and D3 (DRD3) in the kidneys and cerebellum, and is involved in essential hypertension46. GRK4 is also expressed in the midbrain and forebrain46,47, but no research has evaluated its impact on substance use behavior. To take one more example, the nonsynonymous variant in SLC39A8 affects zinc and manganese transport, is highly pleiotropic for complex phenotypes, and may impair inflammation, glutamatergic neurotransmission, and regulation of various metals in the body48.

Table 1. Nonsynonymous sentinel variants.

The sentinel variant in approximately 4% of loci was nonsynonymous. Shown here are all nonsynonymous sentinel variants, and all nonsynonymous variants in near-perfect LD with a sentinel variant. If the listed gene was also associated (through single variant or gene-based test) with another phenotype, that phenotype is listed in parentheses. Several genes have been implicated in previous studies of substance use/addiction, including CHRNA5, BDNF, GCKR, and ADH1B.

Phenotype Gene rsID Chr Position REF ALT AF Beta p N Q
CigDay (SmkCes) CHRNA5 rs16969968a 15 78,882,925 G A .34 .075 1.2×10−278 330,721 .34
CigDay HIST1H2BE rs7766641 6 26,184,102 G A .27 −.014 2.9×10−10 335,553 .78
CigDay (AgeSmk) GRK4 rs1024323 4 3,006,043 C T .38 −.012 8.7×10−9 337,334 .17
SmkInit REV3L rs462779a 6 111,695,887 G A .81 −.019 4.5×10−29 1,232,091 .67
SmkInit (DrnkWk) BDNF rs6265 11 27,679,916 C T .20 −.016 2.8×10−19 1,232,091 .13
SmkInit RHOT2 rs1139897 16 720,986 G A .23 −.012 1.8×10−15 1,232,091 .61
SmkInit (DrnkWk) ZNF789 rs6962772a 7 99,081,730 A G .15 −.015 2.1×10−14 1,232,091 .92
SmkInit BRWD1 rs4818005a 21 40,574,305 A G .58 −.010 3.9×10−14 1,232,091 .75
SmkInit ENTPD6 rs6050446 20 25,195,509 A G .97 .035 8.8×10−13 1,225,969 .33
SmkInit RPS6KA4 rs17857342a 11 64,138,905 T G .38 −.010 9.8×10−12 1,232,091 .16
SmkInit FAM163A rs147052174 1 179,783,167 G T .02 .037 2.3×10−10 1,232,091 .59
SmkInit PRRC2B rs34553878 9 134,907,263 A G .11 .016 1.2×10−9 1,232,091 .28
SmkInit ADAM15 rs45444697a 1 155033918 C T .21 .010 5.3×10−9 1,232,091 .46
SmkInit MMS22L rs9481410a 6 97,677,118 G A .76 .010 1.1×10−8 1,232,091 .04
SmkInit QSER1 rs62618693 11 32,956,492 C T .04 −.020 2.1×10−8 1,232,091 1.00
DrnkWk ADH1B rs1229984 4 100,239,319 T C .96 .060 2.2×10−308 941,280 .05
DrnkWk GCKR rs1260326 2 27,730,940 T C .60 .008 8.1×10−45 941,280 .10
DrnkWk SLC39A8 rs13107325 4 103,188,709 C T .07 −.009 1.5×10−22 941,280 .33
DrnkWk SERPINA1 rs28929474 14 94,844,947 C T .02 −.012 1.3×10−11 941,280 .50
DrnkWk (SmkInit) ACTR1B rs11692465 2 98,275,354 G A .09 .008 2.5×10−11 937,516 .40
DrnkWk TNFSF12–13 rs3803800 17 7,462,969 A G .79 .004 1.5×10−10 941,280 .67
DrnkWk HGFAC rs3748034 4 3,446,091 G T .14 −.005 1.7×10−8 941,280 .65

Note: Phenotype abbreviations are defined in Figure 1. Chr=Chromosome; REF=reference allele; ALT=alternate allele; AF=allele frequency of ALT allele; Q=Cochrane’s Q statistic p-value.

a

These variants were not themselves sentinel, but were in near-perfect LD with a sentinel variant (R2 >.99, from the 1000 Genomes European population). The scale of Beta is on the unit of the standard deviation of the phenotype. For binary phenotypes the standard deviation was calculated from the weighted average prevalence across all studies included in the meta-analysis (available in Supplementary Table 7).

Ultimately, substance use is embedded in a complex web of causal relations49 (e.g., Figure 1), and caution must be exercised in drawing strong causal conclusions. However, the present findings represent a major step forward in understanding the etiology of these complex, disease-relevant behaviors. In particular, statistical and interpretive power were both enabled by simultaneously studying multiple related substance use behaviors representing different stages of use and substances. More precise measurements, including evaluating age and environment as moderators for these dynamic phenotypes50, functional research, and complementary gene mapping approaches (e.g., sequencing) will aid in the discovery of mechanisms by which implicated genes may affect substance use and related disease risk.

METHODS

This article is accompanied by a Supplementary Note with further details, as well as the Life Sciences Reporting Summary.

Generation of summary statistics.

Participants in all studies were genotyped on genome-wide arrays. The majority of studies imputed their genotypes to the Haplotype Reference Consortium41 using the University of Michigan Imputation Server (see URLs)51. Several studies did not impute using the imputation server, due to data sharing restrictions, computational, and/or resource limitations (described in the Supplementary Note). All studies used either Minimac351 or IMPUTE252 for imputation.

GWAS summary statistics were generated in each study sample using RVTESTS53 according to a standard analysis plan. Studies composed primarily of classically related individuals (e.g., family studies) first regressed out covariates including genetic principal components under a linear model, inverse-normalized the residuals (except for 23andMe), and tested for an additive effect of each variant under a linear mixed model with a genetic kinship matrix. Family studies followed this analysis for all phenotypes, even binary phenotypes such as smoking initiation and cessation. Studies of entirely classically unrelated individuals followed the same analysis for quasi-continuous phenotypes (AgeSmk, CigDay, DrnkWk), but estimated additive genetic effects under a logistic model for binary phenotypes (SmkInit and SmkCes).

Quality control checks were applied to ensure quality of both the phenotypes and genotypes. For each phenotype and covariate, distribution statistics including the minimum, maximum, quartiles, median, mean, and standard deviation were examined. We ensured that these statistics were within expected limits given the phenotype definitions and any scale transformations per the analysis plan. We also evaluated simple relationships among phenotypes. When discrepancies were noted we contacted the original study for clarification or re-analysis, or the data were removed from further analysis. Phenotypic statistics are presented in Supplementary Tables 6 and 7.

Extensive genetic quality control and filtering was performed on the contributed summary statistics from each cohort. We removed imputed variants with imputation quality less than 0.3 (the estimated squared correlation between the imputed dosage and true dosage). We compared the per-study allele labels and allele frequencies with those of the imputation reference panels, and removed or reconciled mismatches. For quantitative traits, we plotted the variance of the score statistics against the sample size, and tested whether the trait residuals in each study were properly normalized and whether the trait analyzed between studies was measured and analyzed using the same unit.

Meta-Analysis.

Meta-analysis was performed centrally using the software package rareGWAMA (see URLs). All statistical tests in the meta-analysis or secondary analyses of the meta-analytic results (e.g., polygenic risk scoring, functional enrichment, MTAG, Genomic SEM, etc.) were two-sided. Given that rarer variants and/or behavioral phenotypes may show between-study heterogeneity in allele frequencies, imputation qualities, or genetic architecture, we extended existing methods and developed a novel fixed effects approach that accounts for between-study heterogeneity. Specifically, the methods aggregated weighted Z-score statistics, i.e. ZMETA=kwkZk(kwk2)1/2, where Zk is the Z-score statistic in study k. The weight wk is defined by wk=Nkpk(1pk)Rk2, where pk is the variant allele frequency, Rk2 is the imputation quality, and Nk is the sample size for study k. Under the null and with the present sample sizes, ZMETA is normally distributed. The weights are proportional to the sample genotype variance. When the trait is uniformly measured and the allele frequency is similar, the method is approximately equivalent to meta-analysis of sample-size-weighted Z-scores. Yet, the method accounts for between-study heterogeneity in imputation accuracy and allele frequencies. The use of a fixed effects model, the most common approach in GWAS meta-analysis of single ancestry groups, appeared acceptable given the apparent lack of substantial meta-analytic effect heterogeneity (see Cochrane’s Q and I2 statistics in Supplementary Tables 1–5).

Population stratification and cryptic relatedness were addressed during the generation of summary statistics by each local study through the use of kinship-based linear mixed models54 and genetic principal components55. Residual stratification was further corrected at the meta-analytic level with study-specific genomic controls56 (calculated separately for variants with MAF ≥ 1% and .1%≤MAF<1%; Supplementary Table 23) applied to each study’s results prior to meta-analysis.

A locus was defined as a 1MB region surrounding the “sentinel” variant (the variant in the locus with the lowest p-value). When any two such loci overlapped or abutted, they were collapsed into a single locus. Variants within each locus were subjected to conditional analysis using a novel partial correlation-based score statistic using cohort-level summary statistics57 implemented in a sequential forward selection framework. The method requires marginal association statistics and approximated covariance matrices among them, and performs favorably compared to existing methods57 (Supplementary Table 24). Covariances among effects were based upon the linkage disequilibrium information estimated from a subset of the Haplotype Reference Consortium41.

We applied multiple post-meta-analysis variant filters to ensure robustness of reported findings. To reduce artifacts arising from a small number of studies, we excluded any variant that was present in only two or fewer studies. For each variant in the meta-analysis, we calculated the effective sample size Neff=kNkrk2, where Nk is the sample size in study k and rk2 is the imputation quality. We removed variants with effective sample sizes < 10% of the total sample size to ensure only well-imputed variants with a modicum of power were included. We also excluded all variants with minor allele frequency less than 0.001, the lower bound of moderate imputation accuracy with the currently best available imputation reference panel41. Variants with MAF > 1% are expected to be imputed with high accuracy. Results from the application of post-meta-analysis filters are displayed in Supplementary Table 25.

After applying variant filters and obtaining our final meta-analytic results, we calculated genomic controls and maximum/median per-variant sample sizes. Sample sizes ranged from 337,334 for cigarettes per day to 1,232,091 for smoking initiation. QQ plots, LD intercept tests, and genomic control values indicate that Type I error rates were well controlled, for common and low-frequency variants (Supplementary Figure 2, Supplementary Table 26). All conditionally independent variants were plotted in LocusZoom and included in Supplementary Figures 1–12. All plots were visually inspected, suspicious loci were identified (see Supplementary Table 27) and removed from further consideration. To ensure LD information was available between sentinel variants and others in the locus, we used surrogate variants for eight loci (Supplementary Table 28).

We estimated the extent of pleiotropy for each genome-wide associated locus from our GWAS using an Empirical Bayes approach (i.e. whether a given locus is simultaneously associated with multiple phenotypes). Using summary association statistics from a given locus as input, the method estimated the 5×5 genetic correlation of the locus and the posterior probability of association for all possible phenotype configurations, while accounting for genome-wide genetic correlations and trait residual correlations. In cases where loci associated with different phenotypes overlapped, the locus was expanded in size. Statistical details are available in the Supplementary Note, Section 3.3.

We applied MTAG16 to variants with MAF>1% from the final meta-analysis results for each phenotype, using the other four phenotypes to increase power for locus discovery. Genomic controls and LD Intercept tests of the MTAG results were well controlled (Supplementary Table 29), and Manhattan and QQ plots well-behaved (Supplementary Figures 16 and 17). GCTA-COJO58 was used to identify conditionally independent variants (listed in Supplementary Table 12). All loci were plotted with LocusZoom, visually inspected, with suspicious loci identified (e.g., those without LD support; see Supplementary Table 30) removed from further consideration. Additional details, including testing of MTAG model assumptions, are provided in the Supplementary Note. Finally, we also applied Genomic SEM59 to our five phenotypes to formally model and factor their correlation structure. See Supplementary Figure 18, Supplementary Table 31, and the Supplementary Note for further details.

Genome-wide significant threshold.

The primary focus was to test variants with MAF≥1%, as these will be imputed with high confidence. The statistical significance threshold applied to meta-analysis of all variants with MAF≥1% was 5×10−8, consistent with widespread convention in GWAS of European individuals. Since our imputation procedure is expected to provide some marginal level of accuracy down to MAF of 0.1%, we also conducted an exploratory association test for low frequency variants with 0.1%<MAF<1%, to which we applied a statistical significance threshold of p<5×10−9. Only two such low-frequency variants surpassed the conventional common variant threshold of p<5×10−8. Of these two, one low-frequency variant, associated with SmkInit, survived the more stringent multiple testing correction (rs181508347, intergenic, MAF=.0096, p=5×10−10), and is included in our count of discovered loci and included in Supplementary Table 4. The more stringent threshold applies a correction for ~10 million tests, which is approximately the number of conditionally independent variants tested once the MAF lower bound was extended from 1% to 0.1%. We calculated this threshold using three existing methods6062. These methods make use of the eigenvalues of the matrix of LD (measured in R2) between SNPs, calculated with a spectral decomposition. We estimated the number of independent tests using the genotype data from a subset of the Haplotype Reference Consortium panel41. We first calculated LD blocks across the genome using the algorithm implemented in PLINK version 1.963 with default settings, and then we lowered the MAF threshold to 0.1% to accommodate all low frequency variants. Next, we calculated the effective number of independent tests within each LD block and between LD blocks using the aforementioned three methods, which we aggregated to get the total number of independent tests. The three techniques estimated the number of independent variants at 9.8–10.1 million independent tests, similar to other independent estimates64. A total of 278 sentinel variants (including the one genome-wide significant low-frequency variant) had p < 5×10−9, out of the original 406 with p < 5×10−8.

Heritability.

We used univariate and bivariate LD Score Regression17 to assess the heritability of each phenotype and to estimate a variety of genetic correlations. Analyses included (1) LD Score Regression intercept tests to evaluate the extent to which population stratification or cryptic relatedness may artificially inflate our summary statistics; (2) estimation of genetic correlations across our five phenotypes; (3) estimation of genetic correlations computed within a phenotype but between the larger contributing studies, as an estimate of the extent to which phenotypes were measuring the same genetic risk in different studies; and (4) estimation of genetic correlation between the five phenotypes and a wide variety of other phenotypes related to smoking and alcohol behaviors, and for which GWAS have already been made publicly available.

Under standard assumptions, bivariate score regression produces unbiased estimates of genetic correlation, even in the presence of sample overlap65. Accordingly, to estimate the extent of genetic correlation between each of our phenotypes, and between our phenotypes and other phenotypes related to nicotine and alcohol use, we used standard procedures in LD Score Regression22. To be included in these analyses, variants were restricted to those present in HapMap3 with MAF>0.01. Standard errors were estimated with a block jackknife over all variants.

We estimated the proportion of variance explained by the set of all conditionally independently associated variants. The joint effects of variants in a locus were approximated by β^JOINT=VMETA1UMETA, where UMETA is the single variant score statistics and VMETA is the covariance matrix between them. The phenotypic variance explained by the independently associated variants in a locus is given by β^jointTcov(G)β^JOINT, where cov(G) is the genotype covariance estimated from the Haplotype Reference Consortium panel.

Polygenic scoring.

Polygenic risk scores (PRS) were computed using LDpred66, which accounts for linkage disequilibrium between variants. Since we do not know the variance-covariance matrix of the effects in the training sample (here, the GWAS results), we replace this matrix with a block diagonal matrix estimated using LD patterns from the prediction cohorts, after dropping cryptically-related individuals and ancestry outliers.

Smoking and alcohol use rates are influenced by secular trends and policy changes over the last half century. We therefore selected two independent prediction cohorts, the Health and Retirement Study (HRS)21 and the National Longitudinal Study of Adolescent to Adult Health (Add Health)20. The HRS is a nationally representative study of U.S. households that began in 1992; the mean birth year of respondents is 1938 (SD=9.3), and the mean age at the time of assessment is 57.6 (SD=8.9). Add Health is a nationally representative sample of U.S. adolescents enrolled in grades 7 through 12 during the 1994–1995 school year. The mean birth year of respondents was 1979 (SD=1.8), and the mean age at assessment (here, wave 4) was 29.0 (SD=1.8). In the HRS, ~57% of respondents reported ever smoking regularly, and these respondents smoked ~13 cigarettes per day. In Add Health, slightly fewer (~53%) of respondents reported ever smoking regularly, and these respondents smoked ~11 cigarettes per day on average (Supplementary Table 14). For each of our five phenotype scores, we used variants that overlapped with HapMap3 (~1.1 million) to construct the scores. Prediction accuracy was estimated using ordinary least squares regression of a given phenotype (AgeSmk, CigDay, SmkInit, SmkCes, or DrnkWk) on the polygenic score and covariates including age, sex, age × sex interaction, and the first ten genetic principle components.

Prediction accuracy comes from a two-step process where we first regress the phenotype on a standard set of covariates without including the PRS. Then, the PRS predictor is added and the difference in the coefficient of determination (R2) is calculated. For our quantitative phenotypes, AgeSmk, CigDay, and DrnkWk, the predictive power of the PRS is the change in the R2 in going from the regression without the PRS to the regression with the PRS. For our two binary phenotypes, SmkInit and SmkCes, we measure the incremental pseudo-R2 from probit regressions. 95% confidence intervals around all R2 values are bootstrapped with 1000 repetitions each. The same polygenic scoring procedure was applied to the MTAG results (Supplementary Table 32).

Epigenomic enrichment.

To detect genome-wide functional and tissue-specific epigenomic enrichments, we performed enrichment analyses by heritability stratification using Linkage Disequilibrium Score Regression (LDSC v1.0.0), implemented in the LDSC software. Annotation-stratified LD scores were estimated using dichotomized/binary annotations, 1000 Genomes Project samples with European ancestry, and one million base-pair LD windows by default. LDSC then determines functional enrichment of the GWAS traits by partitioning heritability according to the variance explained by the LD-linked SNPs belonging to each functional category22. Statistical enrichment was defined as the ratio between the percentage of heritability explained by variants in each annotated category and the percentage of variants covered by that category. A resampling approach was used to estimate standard errors22.

Following standard procedure, we trained a baseline LDSC model using the 52 non-cell-type specific functional categories (plus one category that includes all SNPs) and used the observed z-scores of HapMap3 SNPs for each trait. We tested cell-group enrichments over 10 pre-defined cell-group annotations22. The cell-group annotations are the result of aggregating 220 cell-type-specific annotations over 4 histone marks (H3K4me1, H3K4me3, H3K9ac, H3K27ac) and 100 well-defined cell types. To detect which specific epigenomes contribute to the group-level enrichment, we performed 220 tests over each individual annotation. Multiple testing was accounted for through Bonferroni correction within phenotype with 10 tests for the cell-group annotation enrichment analyses and 220 tests for the cell-specific enrichment analyses. As a complementary method to LDSC, we also applied a recently developed mixture model learning approach67, and report these results in Supplementary Figure 13.

Gene and Gene-Set Tests.

For each phenotype, we used SEQMINER68 and the UCSC genome browser annotations (refGene; retrieved December 15 2017) to annotate all conditionally independent genome-wide significant variants. We identified all genes (all variants 5’ to 3’ UTR) harboring at least one variant within LD r2>0.3 with any conditionally independent variant. See Supplementary Tables 1–5.

We conducted a manual review of all genes implicated within each locus, overlap with the GWAS catalogue (Supplementary Table 33), and all pathways identified by PASCAL and DEPICT (described below). We considered a gene to be implicated if it harbored variation in LD with a conditionally independent genome-wide significant variant, or if a gene was located within the locus and was significant by the PASCAL gene-based test. PASCAL69 was used for gene based and pathway analysis to test genes and canonical pathways from MSigDb (Supplementary Tables 20–21). Default settings were used to test all variants within all genes. DEPICT70 was used to identify enrichment within tissues/cell types, and reconstituted gene sets (also known as “pathways”). For each phenotype, variants from the GWAS were clumped using 500 kb flanking regions with the LD cutoff r2 > 0.1 (based on 1000 Genomes phase 1 release v3, the default in DEPICT). We used DEPICT to understand genetic signals beyond the genome-wide significant loci that surpass the conventional 5×10−8, and so included all variants with p<5×10−5. DEPICT tissue enrichment results are displayed in Supplementary Figure 15, where enrichment relative to genes in random sets of loci is indicated by red shading. To cluster DEPICT reconstituted gene sets, we used affinity propagation clustering71 and calculated the correlation between each resulting “exemplary gene set” in Figure 4. Genes, gene sets, and tissue/cell enrichments were considered significant when their false discovery rate was below 0.05. All such significant DEPICT results are reported in Supplementary Tables 17–19. PASCAL and DEPICT were also applied in the same fashion to the MTAG summary statistics (Supplementary Tables 34–39).

Statistics.

The GWAS meta-analysis was conducted using chi-square statistics based upon an imputation-quality aware fixed effect meta-analysis approach. Two sided p-values were calculated. The MTAG and GenomicSEM analysis test statistics was conducted using the GWAS meta-analysis results, and two-sided p-values were similarly calculated from chi-square distribution. The pleiotropic analysis was conducted based upon an empirical Bayes approach. The prior distribution for the effect sizes were assumed to follow a mixture distribution: with a point mass at 0 (representing the possibility the locus is not associated with the trait) and a normal distribution (representing the possibility that the locus is associated). The hyper-parameters were estimated by maximizing the marginal likelihood. The method properly accounts for the local genetic correlation and residual correlation between phenotypes. The posterior probability of association for each locus was estimated for each possible combination of 5 phenotypes, and the combination with the highest PPA was reported for each locus.

Supplementary Material

1
2

Editorial Summary:

Association studies of up to 1.2 million individuals identify 566 genetic variants in 406 loci associated with tobacco use and addiction (initiation, cessation, and heaviness) as well as alcohol use, with 150 loci showing pleiotropic association.

ACKNOWLEDGEMENTS

This study was designed and carried out by the GWAS and Sequencing Consortium of Alcohol and Nicotine use (GSCAN). It was conducted by using the UK Biobank Resource under Application Number 16651. This study was supported by funding from the US National Institutes of Health (NIH) awards R01DA037904 to S.Vrieze., R01HG008983 to D.J.Liu., and R21DA040177 to D.J.Liu. Ethical review and approval was provided by the University of Minnesota IRB; all human subjects received informed consent. A full list of acknowledgements is provided in the Supplementary Note.

Footnotes

CODE AVAILABILITY:

All software used to perform these analyses are available online.

CONTRIBUTOR LIST FOR THE 23andMe RESEARCH TEAM: Michelle Agee11, Babak Alipanahi11, Adam Auton11, Robert K. Bell11, Katarzyna Bryc11, Sarah L. Elson11, Pierre Fontanillas11, Nicholas A. Furlotte11, David A. Hinds11, Bethann S. Hromatka11, Karen E. Huber11, Aaron Kleinman11, Nadia K. Litterman11, Matthew H. McIntyre11, Joanna L. Mountain11, Carrie A.M. Northover11, J. Fah Sathirapongsasuti11, Olga V. Sazonova11, Janie F. Shelton11, Suyash Shringarpure11, Chao Tian11, Joyce Y. Tung11, Vladimir Vacic11, Catherine H. Wilson11, and Steven J. Pitts11.

CONTRIBUTOR LIST FOR HUNT ALL-IN PSYCHIATRY: Amy Mitchell65, Anne Heidi Skogholt20, Bendik S Winsvold65,76, Børge Sivertsen77,78,79, Eystein Stordal78,80, Gunnar Morken78,81, Håvard Kallestad78,81, Ingrid Heuch79, John-Anker Zwart65,76,82, Katrine Kveli Fjukstad83,84, Linda M Pedersen65, Maiken Elvestad Gabrielsen20, Marianne Bakke Johnsen65,82, Marit Skrove85, Marit Sæbø Indredavik78,85, Ole Kristian Drange78,81, Ottar Bjerkeset78,86, Sigrid Børte65,82, Synne Øien Stensland65,87

76 Department of Neurology, Oslo University Hospital, Oslo, Norway.

77 Department of Health Promotion, Norwegian Institute of Public Health, Bergen, Norway.

78 Department of Mental Health, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway.

79 Department of Research and Innovation, Helse-Fonna HF, Haugesund, Norway.

80 Department of Psychiatry, Hospital Namsos, Nord-Trøndelag Health Trust, Namsos, Norway.

81 Division of Mental Health Care, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway.

82 Institute of Clinical Medicine, University of Oslo, Oslo, Norway.

83 Department of Psychiatry, Nord-Trøndelag Hospital Trust, Levanger Hospital, Norway.

84 Department of Laboratory Medicine, Children’s and Women’s Health, Norwegian University of Science and Technology, Trondheim, Norway.

85 Regional Centre for Child and Youth Mental Health and Child Welfare, Department of Mental Health, Faculty of Medicine and Health Sciences, NTNU – Norwegian University of Science and Technology.

86 Faculty of Nursing and Health Sciences, Nord University, Levanger, Norway.

87 NKVTS, Norwegian Centre for Violence and Traumatic Stress Studies.

DATA AVAILABITY STATEMENT

GWAS summary statistics can be downloaded from the world wide web (https://genome.psych.umn.edu/index.php/GSCAN). We provide association results for all SNPs that passed quality-control filters in a GWAS meta-analysis of each of our five substance use phenotypes that excludes the research participants from 23andMe.

COMPETING INTERESTS STATEMENT: Laura J. Bierut and the spouse of Nancy L. Saccone are listed as inventors on Issued U.S. Patent 8,080,371, “Markers for Addiction” covering the use of certain SNPs in determining the diagnosis, prognosis, and treatment of addiction. Sean David is a scientific advisor to BaseHealth, Inc. Gyda Bjornsdottir, Daniel F. Gudbjartsson, Gunnar W. Reginsson, Hreinn Stefansson, Kari Stefansson, and Thorgeir E. Thorgeirsson are employees of deCODE Genetics/AMGEN, Inc. Chao Tian and David Hinds are employees of 23andMe, Inc.

REFERENCES

  • 1.Ezzati M et al. Selected major risk factors and global and regional burden of disease. Lancet 360, 1347–1360 (2002). [DOI] [PubMed] [Google Scholar]
  • 2.Hicks BM,Schalet BD, Malone SM,Iacono WG & McGue M Psychometric and genetic architecture of substance use disorder and behavioral disinhibition measures for gene association studies. Behavior Genetics 41, 459–75 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Polderman TJ et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet (2015). [DOI] [PubMed] [Google Scholar]
  • 4.Kendler KS, Schmitt E, Aggen SH & Prescott CA Genetic and environmental influences on alcohol, caffeine, cannabis, and nicotine use from early adolescence to middle adulthood. Arch Gen Psychiatry 65, 674–82 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kendler KS, Prescott CA, Myers J & Neale MC The structure of genetic and environmental risk factors for common psychiatric and substance use disorders in men and women. Archives of General Psychiatry 60, 929–937 (2003). [DOI] [PubMed] [Google Scholar]
  • 6.Bierut LJ et al. ADH1B is associated with alcohol dependence and alcohol consumption in populations of European and African ancestry. Mol Psychiatry 17, 445–50 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Thorgeirsson TE et al. Sequence variants at CHRNB3-CHRNA6 and CYP2A6 affect smoking behavior. Nature Genetics 42, 448–U135 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Thorgeirsson TE et al. A rare missense mutation in CHRNA4 associates with smoking behavior and its consequences. Mol Psychiatry 21, 594–600 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Furberg H et al. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nature Genetics 42, 441–U134 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Schumann G et al. KLB is associated with alcohol drinking, and its gene product beta-Klotho is necessary for FGF21 regulation of alcohol preference. Proceedings of the National Academy of Sciences of the United States of America 113, 14372–14377 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jorgenson E et al. Genetic contributors to variation in alcohol consumption vary by race/ethnicity in a large multi-ethnic genome-wide association study. Mol Psychiatry (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Polesskaya OO, Smith RF & Fryxell KJ Chronic nicotine doses down-regulate PDE4 isoforms that are targets of antidepressants in adolescent female rats. Biological Psychiatry 61, 56–64 (2007). [DOI] [PubMed] [Google Scholar]
  • 13.Boyden LM et al. Mutations in kelch-like 3 and cullin 3 cause hypertension and electrolyte abnormalities. Nature 482, 98–102 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wang W et al. Forced Expiratory Volume in the First Second and Aldosterone as Mediators of Smoking Effect on Stroke in African Americans: The Jackson Heart Study. Journal of the American Heart Association 5(2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Aoun EG et al. A relationship between the aldosterone-mineralocorticoid receptor pathway and alcohol drinking: preliminary translational findings across rats, monkeys and humans. Mol Psychiatry 23, 1466–1473 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Turley P et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nature Genetics 50, 229−+ (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bulik-Sullivan BK et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics 47, 291−+ (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Yang JA, Lee SH, Goddard ME & Visscher PM GCTA: a tool for genome-wide complex trait analysis. American Journal of Human Genetics 88, 76–82 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zheng J et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Harris KM, Halpern CT, Haberstick BC & Smolen A The National Longitudinal Study of Adolescent Health (Add Health) Sibling Pairs Data. Twin Research and Human Genetics 16, 391–398 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Sonnega A et al. Cohort Profile: the Health and Retirement Study (HRS). International Journal of Epidemiology 43, 576–585 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Finucane HK et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature Genetics 47, 1228−+ (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wilson S, Bair JL, Thomas KM & Iacono WG Problematic alcohol use and reduced hippocampal volume: a meta-analytic review. Psychological Medicine 47, 2288–2301 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ewing SWF, Sakhardande A & Blakemore SJ The effect of alcohol consumption on the adolescent brain: A systematic review of MRI and fMRI studies of alcohol-using youth. Neuroimage-Clinical 5, 420–437 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Goldstein RZ &Volkow ND Dysfunction of the prefrontal cortex in addiction: neuroimaging findings and clinical implications. Nature Reviews Neuroscience 12, 652–669 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Volkow ND & Morales M The Brain on Drugs: From Reward to Addiction. Cell 162, 712–725 (2015). [DOI] [PubMed] [Google Scholar]
  • 27.Koob GF & Volkow ND Neurocircuitry of Addiction. Neuropsychopharmacology 35, 217–238 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Koob GF & Volkow ND Neurobiology of addiction: a neurocircuitry analysis. Lancet Psychiatry 3, 760–773 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Fernandez E, Schiappa R, Girault JA & Le Novere N DARPP-32 is a robust integrator of dopamine and glutamate signals. Plos Computational Biology 2, 1619–1633 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Yagishita S et al. A critical time window for dopamine actions on the structural plasticity of dendritic spines. Science 345, 1616–1620 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Zhu HW et al. DARPP-32 phosphorylation opposes the behavioral effects of nicotine. Biological Psychiatry 58, 981–989 (2005). [DOI] [PubMed] [Google Scholar]
  • 32.Stoker AK & Markou A Unraveling the neurobiology of nicotine dependence using genetically engineered mice. Current Opinion in Neurobiology 23, 493–499 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Litten RZ et al. A Double-Blind, Placebo-Controlled Trial Assessing the Efficacy of Varenicline Tartrate for Alcohol Dependence. Journal of Addiction Medicine 7, 277–286 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Hyman SE, Malenka RC & Nestler EJ Neural mechanisms of addiction: The role of reward-related learning and memory. Annual Review of Neuroscience 29, 565–598 (2006). [DOI] [PubMed] [Google Scholar]
  • 35.Kalivas PW The glutamate homeostasis hypothesis of addiction. Nature Reviews Neuroscience 10, 561–572 (2009). [DOI] [PubMed] [Google Scholar]
  • 36.Szumlinski KK et al. Methamphetamine Addiction Vulnerability: The Glutamate, the Bad, and the Ugly. Biological Psychiatry 81, 959–970 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Gass JT & Olive MF Glutamatergic substrates of drug addiction and alcoholism. Biochemical Pharmacology 75, 218–265 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Vaughan J et al. Urocortin, a mammalian neuropeptide related to fish urotensin I and to corticotropin-releasing factor. Nature 378, 287–92 (1995). [DOI] [PubMed] [Google Scholar]
  • 39.Logrip ML, Koob GF & Zorrilla EP Role of corticotropin-releasing factor in drug addiction: potential for pharmacological intervention. CNS Drugs 25, 271–87 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Volkow ND, Koob GF & McLellan AT Neurobiologic Advances from the Brain Disease Model of Addiction. N Engl J Med 374, 363–71 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.McCarthy S et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Lassi G et al. The CHRNA5-A3-B4 Gene Cluster and Smoking: From Discovery to Therapeutics. Trends in Neurosciences 39, 851–861 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Edenberg HJ The genetics of alcohol metabolism: role of alcohol dehydrogenase and aldehyde dehydrogenase variants. Alcohol Res Health 30, 5–13 (2007). [PMC free article] [PubMed] [Google Scholar]
  • 44.Sallese M et al. The G-protein-coupled receptor kinase GRK4 mediates homologous desensitization of metabotropic glutamate receptor 1. Faseb Journal 14, 2569–2580 (2000). [DOI] [PubMed] [Google Scholar]
  • 45.Perroy J, Adam L, Qanbar R, Chenier S & Bouvier M Phosphorylation-independent desensitization of GABA(B) receptor by GRK4. Embo Journal 22, 3816–3824 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Yang J, Villar VM, Armando I, Jose PA & Zeng CY G Protein-Coupled Receptor Kinases: Crucial Regulators of Blood Pressure. Journal of the American Heart Association 5(2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Consortium G Genetic effects on gene expression across human tissues (vol 550, pg 204, 2017). Nature 553(2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Costas J The highly pleiotropic gene SLC39A8 as an opportunity to gain insight into the molecular pathogenesis of schizophrenia. American Journal of Medical Genetics Part B-Neuropsychiatric Genetics 177, 274–283 (2018). [DOI] [PubMed] [Google Scholar]
  • 49.Kong A et al. The nature of nurture: Effects of parental genotypes. Science 359, 424–428 (2018). [DOI] [PubMed] [Google Scholar]
  • 50.Vrieze SI, Hicks BM, Iacono WG & McGue M Decline in genetic influence on the co-occurrence of alcohol, marijuana, and nicotine dependence symptoms from age 14 to 29. Am J Psychiatry 169, 1073–81 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

METHODS ONLY REFERENCES

  • 51.Das S et al. Next-generation genotype imputation service and methods. Nat Genet (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Howie B, Fuchsberger C, Stephens M, Marchini J & Abecasis GR Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature Genetics 44, 955−+ (2012).f [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Zhan X, Hu Y, Li B, Abecasis GR & Liu DJ RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data. Bioinformatics 32, 1423–6 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Kang HM et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42, 348–54 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Price AL et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics 38, 904–909 (2006). [DOI] [PubMed] [Google Scholar]
  • 56.Devlin B & Roeder K Genomic control for association studies. Biometrics 55, 997–1004 (1999). [DOI] [PubMed] [Google Scholar]
  • 57.Jiang Y et al. Proper Conditional Analysis in the Presence of Missing Data Identified Novel Independently Associated Low Frequency Variants in Nicotine Dependence Genes. PLoS Genetics (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Yang J et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 44, 369–75, S1–3 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Grotzinger AD et al. Genomic SEM Provides Insights into the Multivariate Genetic Architecture of Complex Traits. bioRxiv (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Li J & Ji L Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95, 221–227 (2005). [DOI] [PubMed] [Google Scholar]
  • 61.Gao XY, Becker LC, Becker DM, Starmer JD & Province MA Avoiding the High Bonferroni Penalty in Genome-Wide Association Studies. Genetic Epidemiology 34, 100–105 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Chen ZX & Liu QZ A New Approach to Account for the Correlations among Single Nucleotide Polymorphisms in Genome-Wide Association Studies. Human Heredity 72, 1–9 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Chang CC et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4(2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Wu Y, Zheng ZL, Visscher PM & Yang J Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data. Genome Biology 18(2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Bulik-Sullivan B et al. An atlas of genetic correlations across human diseases and traits. Nature Genetics 47, 1236−+ (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Vilhjalmsson BJ et al. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. American Journal of Human Genetics 97, 576–592 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Li Y, Davila-Velderrain J & Kellis M A probabilistic framework to dissect functional cell-type-specific regulatory elements and risk loci underlying the genetics of complex traits. BioRxiv 059345(2017). [Google Scholar]
  • 68.Zhan X & Liu DJ SEQMINER: An R-Package to Facilitate the Functional Interpretation of Sequence-Based Associations. Genet Epidemiol 39, 619–23 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Lamparter D, Marbach D, Rueedi R, Kutalik Z & Bergmann S Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics. Plos Computational Biology 12(2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Pers TH et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nature Communications 6(2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Frey BJ & Dueck D Clustering by passing messages between data points. Science 315, 972–976 (2007). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

RESOURCES