Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use

Mengzhen Liu; Yu Jiang; Robbee Wedow; Yue Li; David M Brazel; Fang Chen; Gargi Datta; Jose Davila-Velderrain; Daniel McGuire; Chao Tian; Xiaowei Zhan; 23andMe Research Team; HUNT All-In Psychiatry; Hélène Choquet; Anna R Docherty; Jessica D Faul; Johanna R Foerster; Lars G Fritsche; Maiken Elvestad Gabrielsen; Scott D Gordon; Jeffrey Haessler; Jouke-Jan Hottenga; Hongyan Huang; Seon-Kyeong Jang; Philip R Jansen; Yueh Ling; Reedik Mägi; Nana Matoba; George McMahon; Antonella Mulas; Valeria Orrù; Teemu Palviainen; Anita Pandit; Gunnar W Reginsson; Anne Heidi Skogholt; Jennifer A Smith; Amy E Taylor; Constance Turman; Gonneke Willemsen; Hannah Young; Kendra A Young; Gregory J M Zajac; Wei Zhao; Wei Zhou; Gyda Bjornsdottir; Jason D Boardman; Michael Boehnke; Dorret I Boomsma; Chu Chen; Francesco Cucca; Gareth E Davies; Charles B Eaton; Marissa A Ehringer; Tõnu Esko; Edoardo Fiorillo; Nathan A Gillespie; Daniel F Gudbjartsson; Toomas Haller; Kathleen Mullan Harris; Andrew C Heath; John K Hewitt; Ian B Hickie; John E Hokanson; Christian J Hopfer; David J Hunter; William G Iacono; Eric O Johnson; Yoichiro Kamatani; Sharon L R Kardia; Matthew C Keller; Manolis Kellis; Charles Kooperberg; Peter Kraft; Kenneth S Krauter; Markku Laakso; Penelope A Lind; Anu Loukola; Sharon M Lutz; Pamela A F Madden; Nicholas G Martin; Matt McGue; Matthew B McQueen; Sarah E Medland; Andres Metspalu; Karen L Mohlke; Jonas B Nielsen; Yukinori Okada; Ulrike Peters; Tinca J C Polderman; Danielle Posthuma; Alexander P Reiner; John P Rice; Eric Rimm; Richard J Rose; Valgerdur Runarsdottir; Michael C Stallings; Alena Stančáková; Hreinn Stefansson; Khanh K Thai; Hilary A Tindle

doi:10.1038/s41588-018-0307-5

. Author manuscript; available in PMC: 2019 Jul 14.

Published in final edited form as: Nat Genet. 2019 Jan 14;51(2):237–244. doi: 10.1038/s41588-018-0307-5

Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use

Mengzhen Liu ^1,^#, Yu Jiang ^2,^3,^#, Robbee Wedow ^4,^5,^6,^#, Yue Li ^7,^8,^#, David M Brazel ^4,^9,¹⁰, Fang Chen ^2,³, Gargi Datta ¹, Jose Davila-Velderrain ^7,⁸, Daniel McGuire ^2,³, Chao Tian ¹¹, Xiaowei Zhan ^12,¹³; 23andMe Research Team¹⁴; HUNT All-In Psychiatry¹⁴, Hélène Choquet ¹⁵, Anna R Docherty ^16,¹⁷, Jessica D Faul ¹⁸, Johanna R Foerster ¹⁹, Lars G Fritsche ¹⁹, Maiken Elvestad Gabrielsen ²⁰, Scott D Gordon ²¹, Jeffrey Haessler ²², Jouke-Jan Hottenga ²³, Hongyan Huang ^24,²⁵, Seon-Kyeong Jang ¹, Philip R Jansen ^26,²⁷, Yueh Ling ^2,⁹, Reedik Mägi ²⁸, Nana Matoba ²⁹, George McMahon ³⁰, Antonella Mulas ³¹, Valeria Orrù ³¹, Teemu Palviainen ³², Anita Pandit ¹⁹, Gunnar W Reginsson ³³, Anne Heidi Skogholt ²⁰, Jennifer A Smith ^18,³⁴, Amy E Taylor ³⁰, Constance Turman ^24,²⁵, Gonneke Willemsen ²³, Hannah Young ¹, Kendra A Young ³⁵, Gregory J M Zajac ¹⁹, Wei Zhao ³⁴, Wei Zhou ³⁶, Gyda Bjornsdottir ³³, Jason D Boardman ^4,^5,⁶, Michael Boehnke ¹⁹, Dorret I Boomsma ²³, Chu Chen ²², Francesco Cucca ³¹, Gareth E Davies ³⁷, Charles B Eaton ³⁸, Marissa A Ehringer ^4,³⁹, Tõnu Esko ^8,²⁸, Edoardo Fiorillo ³¹, Nathan A Gillespie ^16,²¹, Daniel F Gudbjartsson ^33,⁴⁰, Toomas Haller ²⁸, Kathleen Mullan Harris ^41,⁴², Andrew C Heath ⁴³, John K Hewitt ^4,⁴⁴, Ian B Hickie ⁴⁵, John E Hokanson ³⁵, Christian J Hopfer ^4,⁴⁶, David J Hunter ^24,^25,⁴⁷, William G Iacono ¹, Eric O Johnson ⁴⁸, Yoichiro Kamatani ²⁹, Sharon L R Kardia ³⁴, Matthew C Keller ^4,⁴⁴, Manolis Kellis ^7,⁸, Charles Kooperberg ²², Peter Kraft ^24,^25,⁴⁹, Kenneth S Krauter ^4,⁹, Markku Laakso ^50,⁵¹, Penelope A Lind ⁵², Anu Loukola ³², Sharon M Lutz ⁵³, Pamela A F Madden ⁴³, Nicholas G Martin ²¹, Matt McGue ¹, Matthew B McQueen ^4,³⁹, Sarah E Medland ⁵², Andres Metspalu ²⁸, Karen L Mohlke ⁵⁴, Jonas B Nielsen ⁵⁵, Yukinori Okada ^29,⁵⁶, Ulrike Peters ^22,⁵⁷, Tinca J C Polderman ²⁶, Danielle Posthuma ^26,⁵⁸, Alexander P Reiner ^22,⁵⁷, John P Rice ⁵⁹, Eric Rimm ^25,⁶⁰, Richard J Rose ⁶¹, Valgerdur Runarsdottir ⁶², Michael C Stallings ^4,⁴⁴, Alena Stančáková ⁵⁰, Hreinn Stefansson ³³, Khanh K Thai ¹⁵, Hilary A Tindle ⁶³, Thorarinn Tyrfingsson ⁶², Tamara L Wall ⁶⁴, David R Weir ¹⁸, Constance Weisner ¹⁵, John B Whitfield ²¹, Bendik Slagsvold Winsvold ⁶⁵, Jie Yin ¹⁵, Luisa Zuccolo ^30,⁶⁶, Laura J Bierut ⁵⁹, Kristian Hveem ^20,^67,⁶⁸, James J Lee ¹, Marcus R Munafo ^66,⁶⁹, Nancy L Saccone ⁷⁰, Cristen J Willer ^36,^55,⁷¹, Marilyn C Cornelis ⁷², Sean P David ⁷³, David Hinds ¹², Eric Jorgenson ¹⁵, Jaakko Kaprio ^32,⁷⁴, Jerry A Stitzel ^4,³⁹, Kari Stefansson ^33,⁷⁵, Thorgeir E Thorgeirsson ³³, Goncalo Abecasis ¹⁹, Dajiang J Liu ^2,^3,^*, Scott Vrieze ^1,^*

¹Department of Psychology, University of Minnesota Twin Cities, Minneapolis, Minnesota, USA

²Department of Public Health Sciences, College of Medicine, Pennsylvania State University, Hershey, Pennsylvania, USA

³Institute of Personalized Medicine, College of Medicine, Pennsylvania State University, Hershey, Pennsylvania, USA

⁴Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, USA

⁵Department of Sociology, University of Colorado Boulder, Boulder, Colorado, USA

⁶Institute of Behavioral Science, University of Colorado Boulder, Boulder, Colorado, USA

⁷Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA

⁸The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA

⁹Department of Molecular, Cellular, and Developmental Biology, University of Colorado Boulder, Boulder, Colorado, USA

¹⁰Interdisciplinary Quantitative Biology Graduate Group, University of Colorado Boulder, Boulder, Colorado, USA

¹¹23andMe, Inc., Mountain View, California, USA

¹²Quantitative Biomedical Research Center, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, USA

¹³Center for the Genetics of Host Defense, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, USA

¹⁴A full list of members and affiliations appears at the end of the paper.

¹⁵Division of Research, Kaiser Permanente Northern California, Oakland, California, USA

¹⁶Department of Psychiatry, Virginia Institute for Psychiatric & Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, USA

¹⁷Department of Psychiatry and Human Genetics, University of Utah, Salt Lake City, Utah, USA

¹⁸Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, Michigan, USA

¹⁹Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA

²⁰K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway

²¹Genetic Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia

²²Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA

²³Department of Biology Psychology, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands

²⁴Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA

²⁵Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA

²⁶Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands

²⁷Department of Child and Adolescent Psychiatry, Erasmus MC Rotterdam, Rotterdam, the Netherlands

²⁸Estonian Genome Center, University of Tartu, Tartu, Estonia

²⁹Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama City, Kanagawa, Japan

³⁰Department of Population Health Science, Bristol Medical School, Oakfield Grove, Bristol, United Kingdom

³¹Consiglio Nazionale delle Ricerche, Istituto di Ricerca Genetica e Biomedica, Monserrato, Italy

³²Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland

³³deCODE Genetics/AMGEN, Inc., Reykjavik, Iceland

³⁴Department of Epidemiology, University of Michigan, Ann Arbor, Michigan, USA

³⁵Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA

³⁶Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan

³⁷Avera Institute for Human Genetics, Sioux Falls, SD, USA

³⁸Department of Family Medicine & Community Health, Alpert Medical School, Brown University, Providence, RI, USA

³⁹Department of Integrative Physiology, University of Colorado Boulder, Boulder, Colorado, USA

⁴⁰School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland

⁴¹Department of Sociology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA

⁴²Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA

⁴³Department of Psychiatry, Washington University in St. Louis, St. Louis, Missouri, USA

⁴⁴Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, Colorado, USA

⁴⁵Brain and Mind Centre, University of Sydney, New South Wales, Australia

⁴⁶Department of Psychiatry, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA

⁴⁷Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom

⁴⁸Fellows Program, RTI International, Research Triangle Park, NC, USA

⁴⁹Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA

⁵⁰Department of Internal Medicine, Institute of Clinical Medicine, University of Eastern Finland, Finland

⁵¹Department of Medicine, Kuopio University Hospital, Finland

⁵²Psychiatric Genetics, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia

⁵³Department of Biostatistics and Bioinformatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA

⁵⁴Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA

⁵⁵Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, Michigan

⁵⁶Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Osaka, Japan

⁵⁷Department of Epidemiology, University of Washington, Seattle, Washington, USA

⁵⁸Department of Clinical Genetics, VU Medical Centre Amsterdam, Amsterdam, the Netherlands

⁵⁹Department of Psychiatry, Washington University School of Medicine, St. Louis, Missouri, USA

⁶⁰Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA

⁶¹Department of Psychological and Brain Sciences, Indiana University, Bloomington, Indiana, USA

⁶²SAA - National Center of Addiction Medicine, Vogur Hospital, Reykjavik, Iceland

⁶³Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA

⁶⁴Department of Psychiatry, University of California San Diego, San Diego, California, USA

⁶⁵FORMI and Department of Neurology, Oslo University Hospital, Oslo, Norway

⁶⁶MRC Integrative Epidemiology Unit, University of Bristol, Oakfield Grove, Bristol, United Kingdom

⁶⁷HUNT Research Centre, Department of Public Health and Nursing, Norwegian University of Science and Technology, Levanger, Norway

⁶⁸Department of Medicine, Levanger Hospital, Nord-Trøndelag Hospital Trust, Levanger, Norway

⁶⁹UK Centre for Tobacco and Alcohol Studies, School of Psychological Science, University of Bristol, Bristol, United Kingdom

⁷⁰Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA

⁷¹Department of Human Genetics, University of Michigan, Ann Arbor, Michigan

⁷²Northwestern University Feinberg School of Medicine, Department of Preventative Medicine, Chicago, Ilinois, USA

⁷³Department of Medicine, Stanford University School of Medicine, Stanford, California, USA

⁷⁴Department of Public Health, University of Helsinki, Helsinki, Finland

⁷⁵Faculty of Medicine, University of Iceland, Reykjavik, Iceland

Dajiang Liu and Scott Vrieze jointly supervised the work.

AUTHOR CONTRIBUTIONS: G.A., D.J.L., and S.V. designed the study. D.J.L., and S.V. led and oversaw the study. M.L. was the study’s lead analyst. She was assisted by Y.J., D.J.L., S.V., R.W., D.M.B., and G.D. Bonferroni thresholds were calculated by D.M. Phenotype definitions were developed by L.J.B., M.C.C., D.A.H., J.K., E.J., D.J.L., M.M., M.R.M., S.V., and L.Z. Software development was carried out by Y.J., D.J.L., and X.Z. Conditional analyses were performed by Y.J. and M.L. Heritability, genetic correlation, and polygenic scoring analyses were performed by R.W. Multivariate analyses were performed by Y.J., M.L. and D.J.L. Bioinformatics analyses were performed and interpreted by F.C., J.D., J.J.L, Y.L., M.L., J.A.S., S.V., and R.W. The LocusZoom website was designed by G.D. Figures were created by M.L., R.W. Y.L., and S.V. M.A.E. and M.C.K. helped with data access. R.W. coordinated authorship and acknowledgement details. M.C.C, S.P.D., E.J., J.K., and J.A.S. provided helpful advice and feedback on study design and the manuscript. All authors contributed to and critically reviewed the manuscript. Y.L., D.J.L., M.L., S.V., and R.W. made major contributions to the writing and editing.

Correspondence to Dajiang J. Liu, dajiang.liu@psu.edu, or Scott Vrieze, vrieze@umn.edu

Contributed equally.

PMCID: PMC6358542 NIHMSID: NIHMS1511852 PMID: 30643251

Abstract

Tobacco and alcohol use are leading causes of mortality that influence risk for many complex diseases and disorders1. They are heritable^2,3 and etiologically related^4,5 behaviors that have been resistant to gene discovery efforts^6–11. In sample sizes up to 1.2 million individuals, we discovered 566 genetic variants in 406 loci associated with multiple stages of tobacco use (initiation, cessation, and heaviness) as well as alcohol use, with 150 loci evidencing pleiotropic association. Smoking phenotypes were positively genetically correlated with many health conditions, whereas alcohol use was negatively correlated with these conditions, such that increased genetic risk for alcohol use is associated with lower disease risk. We report evidence for the involvement of many systems in tobacco and alcohol use, including genes involved in nicotinic, dopaminergic, and glutamatergic neurotransmission. The results provide a solid starting point to evaluate the effects of these loci in model organisms and more precise substance use measures.

An analysis overview is provided in Supplementary Figure 1; all independent associated variants are in Supplementary Tables 1–5; and Quantile-Quantile (QQ), Manhattan, and LocusZoom plots are shown in Supplementary Figures 2–12. Smoking initiation phenotypes included age of initiation of regular smoking (AgeSmk; N=341,427; 10 associated variants) and a binary phenotype indicating whether an individual had ever smoked regularly (SmkInit, N=1,232,091; 378 associated variants). Heaviness of smoking was measured with cigarettes per day (CigDay; N=337,334; 55 associated variants). Smoking cessation (SmkCes, N=547,219; 24 associated variants) was a binary variable contrasting current versus former smokers. Available measures of alcohol use were simpler, with drinks per week (DrnkWk; N=941,280; 99 associated variants) widely available and similarly measured across studies. See the Supplementary Note and Supplementary Tables 6–7 for phenotype definition details.

The four smoking phenotypes were genetically correlated with one another (Figure 1; Supplementary Table 8). DrnkWk was not highly genetically correlated with the smoking phenotypes (r_g~.10) except for SmkInit (r_g~.34, p=6.7e⁻⁶³), suggesting that sequence variation affecting alcohol use and those affecting initiation of smoking overlap substantially. The phenotypes were highly genetically correlated across constituent studies (Supplementary Table 9), suggesting minor impact of phenotypic heterogeneity in the present results, even across Western Europe and the United States. Smoking phenotypes were genetically correlated in expected directions with many behavioral, psychiatric, and medical phenotypes (Figure 1, Supplementary Table 10). Genetic variation associated with increased alcohol use was associated with greater levels of risky behavior (r_g=.20, p=1.8×10⁻⁷) and cannabis use (r_g=.36, p=6.2×10⁻¹⁰), but with less risk of disease, for almost all diseases (Figure 1, Supplementary Table 10).

Using a novel method to evaluate multivariate genetic correlation at the locus (versus global) level, we observed 150 loci that affected multiple substance use phenotypes (Supplementary Table 11), summarized in Figure 2. Patterns of pleiotropy across phenotypes were highly diverse, with only three loci significantly associated with all five phenotypes. These three loci included associations implicating Phosphodiesterase 4B (PDE4B) and Cullin 3 (CUL3). PDE4B regulates the cAMP second messenger availability and thereby affects signal transduction, and is down-regulated by chronic nicotine administration in rats¹². CUL3 has wide-ranging effects, including on ubiquination and protein degradation, and de novo mutations in CUL3 are associated with rare diseases affecting response to the mineralocorticoid aldosterone¹³, which itself is affected by smoking¹⁴ and associated with alcohol use¹⁵. In addition to testing for pleiotropy, we also used MTAG¹⁶ to leverage the observed genetic correlations to increase power for locus discovery. Using this method, we discovered 1,193 independent, genome-wide significantly associated common variants (MAF > 1%; 173 for AgeSmk, 89 CigDay, 83 SmkCes, 692 SmkInit, and 156 DrnkWk) listed in Supplementary Table 12 and described further in the supplement.

Figure 2. — Depicted here are results from the multivariate analysis of pleiotropy. For each locus, the method returns the best fitting solution of which phenotypes were associated with that locus. All loci with one or more associated phenotypes are shown here. For example, every locus associated with AgeSmk was found to be pleiotropic for other phenotypes (green, blue, red, purple, and fuchsia bars), and no locus showed association with only AgeSmk (no dark grey bar for AgeSmk). When sample sizes are unequal across phenotypes, the method also improves power for those phenotypes with smaller samples. The total number of loci associated with each trait (whether pleiotropic or not) from these analyses was 40 (AgeSmk), 48 (SmkCes), 72 (CigDay), 111 (DrnkWk), and 278 (SmkInit). Full information is in Supplementary Table 11.

Phenotypic variation accounted for by our initial 566 conditionally independent genome-wide significant variants from the initial GWAS ranged from 0.1% (SmkCes) to 2.3% (SmkInit; see Figure 3). SNP heritability calculated using LD Score Regression¹⁷ ranged from 4.2% for DrnkWk to 8.0% for CigDay (Figure 3; Supplementary Table 13), consistent with estimates using individual-level data¹⁸, SNP heritabilities calculated from the largest individual contributing studies (Supplementary Table 13), and prior work¹⁹. The results suggest that these phenotypes are highly polygenic and the majority of the heritability is accounted for by variants below standard GWAS thresholds.

Figure 3. — The light gray bars reflect SNP heritability, estimated with LD Score Regression. The light blue and gold bars reflect the predictive power of polygenic risk scores in Add Health and the Health and Retirement Study (HRS), respectively. Despite the 41-year generational gap between participants from these two studies, and major tobacco-related policy changes during that time, the polygenic scores are similarly predictive in both samples. Error bars are 95% confidence intervals estimated with 1000 bootstrapped repetitions. Dark gray bars represent the total phenotypic variance explained by only genome-wide significant SNPs. H²=heritability.

To further investigate the polygenicity, polygenic risk scores (Supplementary Table 14) were computed on the Add Health²⁰ and Health and Retirement Study²¹ datasets, which are representative of their birth cohorts in the United States, and represent exposures to different tobacco policy environments. Add Health participants were born, on average, in 1979; average birth year in the Health and Retirement Study was 1938. Despite these generational differences, the polygenic score performed similarly in both samples. It accounted for approximately 1%, 4%, 1%, 4%, and 2.5% of variance in AgeSmk, CigDay, SmkCes, SmkInit, and DrnkWk, respectively, about half of the estimated SNP heritability of these traits (Figure 3). More concretely, in Add Health and the Health and Retirement study, respectively, a one SD increase in the CigDay risk score resulted in two and three additional daily cigarettes; a one SD increase on the SmkInit risk score resulted in a 12% and 10% increased risk of regularly smoking; and a one SD increase on the DrnkWk risk score reflected one additional drink per week in both datasets.

Cell/tissue enrichment²² was observed across all five phenotypes within core histone marks from multiple central nervous system (CNS) tissues (Supplementary Figures 13–15, Supplementary Tables 15–16). Enrichment was observed in tissues from cortical and sub-cortical regions in the CNS. Structure and function of these regions have been robustly associated with individual differences in frequencies, magnitudes, and clinical characteristics of alcohol use, and substance use/misuse generally, in human imaging research. Our results include significant enrichment across phenotypes and histone marks in the hippocampus²³, inferior temporal pathways²⁴, dorsolateral and medial prefrontal cortex²⁵, caudate, and striatum²⁶. Consistent with gene and pathway findings described below, alcohol and nicotine use affect dopaminergic and glutamatergic neurotransmission among these brain regions, compromising reward-based learning and facilitating drug seeking behavior²⁶. Enrichment within other cell/tissue groups and specific cell/tissue types included immune and liver cells but were less consistent across analytical approaches.

We manually reviewed all genes implicated by the GWAS or gene-based tests (see Supplementary Tables 1–5 for the full catalogue of implicated genes; Supplementary Tables 17–21 for gene and gene-set test results). We replicated known associations between multiple variants in nicotine metabolism gene CYP2A6 with CigDay (p=4.0×10⁻⁹⁹) and SmkCes (p=1.6×10⁻⁴⁸). We replicated an association signal in alcohol metabolism gene ADH1B associated with DrnkWk, identifying in that locus 11 conditionally independently associated variants (lowest p<2.2×10⁻³⁰³).

All drugs of abuse activate the mesolimbic dopamine system reward pathway²⁷, and dopamine-related genes have long been popular candidate genes. We found that variants near the widely studied dopamine receptor D2 (DRD2)²⁸ were associated across phenotypes, including CigDay, SmkCes, and DrnkWk (p=6.5×10⁻¹², 1.1×10⁻¹⁰, and 4.9×10⁻¹¹, respectively) but not with AgeSmk or SmkInit, suggesting that these variants are less relevant in early stages of nicotine use. Other specific dopamine-related genes only showed associations with smoking phenotypes, including multiple associations between CigDay and SmkCes with dopamine beta-hydroxylase (DBH, p=9.8×10⁻²⁴ and 1.2×10⁻³⁵, respectively)⁹, an enzyme necessary to convert dopamine to norepinephrine. SmkInit was associated with variation near protein phosphatase 1 regulatory subunit 1B (PPP1R1B, p=3.9×10⁻⁸), a signal transduction gene that affects synaptic plasticity and reward-based learning in the striatum^29,30 and contributes to the behavioral effects of nicotine in mice³¹. In pathway analyses, dopamine gene sets were enriched only in SmkInit, where the exemplar pathway ‘reactome dopamine neurotransmitter release cycle’ pathway was enriched (p=9.2×10⁻⁵; Figure 4; Supplementary Table 18).

Figure 4. — There were 68 clusters available for Smoking initiation and 10 for Drinks Per Week (CigDay, AgeSmk, and SmkCes did not have > 1 exemplary sets.) Blue shading represents positive correlations, and red shading represents negative correlations, with increasing color intensity reflecting increasing strength of a correlation. Cluster names are truncated for space, with a full list of all names in Supplementary Table 18. The number after each name is the number of gene sets in each cluster. The matrix naturally falls into three blue superclusters along the diagonal. The largest supercluster contains primarily gene sets related to neurotransmitter receptors, ion channels (sodium, potassium, calcium), learning/memory, and other aspects of CNS function. The middle supercluster includes gene sets defined by regulation of transcription and translation, including RNA binding and transcription factor activity. The final supercluster is composed primarily of gene sets related to development of the nervous system.

Neuronal acetylcholine nicotinic receptors are the initial site of nicotine action in the brain and have long been implicated in nicotine use and dependence³². With the exception of CHRNA7, all CNS-expressed nicotinic receptor genes were significantly associated with one or more smoking phenotypes, many reported here for the first time. Enrichment was also noted for nicotinic receptor-related pathways and genes in smoking phenotypes (Supplementary Tables 17–21). There was no evidence of association between nicotinic receptor genes or pathways with DrnkWk, despite the use of nicotinic receptor partial agonists (e.g., varenicline) in the treatment of alcohol dependence³³.

Associations with SmkInit highlighted structures and functions related to long-term potentiation and reward-related learning and memory, systems that affect reward processing and addiction^28,34,35. Glutamate is an important neurotransmitter mediating these processes, and exemplar pathways related to glutamate were significantly enriched in SmkInit (e.g., ‘extracellular-glutamate-gated ion channel’, p=9.9×10⁻⁷; ‘post-NMDA receptor activation events’, p=5.5×10⁻⁵; and ‘DLG4 PPI subnetwork’, p=4.5×10⁻¹²; Supplementary Table 18). DLG4 affects NMDA receptors and potassium channel clusters, and plays a central role in glutamatergic models of reward-related learning³⁵. Individual associated genes related to these pathways included glutamate ionotropic receptor NMDA type subunit 2 (GRIN2A, p=3.4×10⁻¹¹) and homer scaffolding protein 2 (HOMER2, p=3.1×10⁻¹⁴), which affects addictive behavior in mice^35,36 and regulates glutamate metabotropic receptor 1 (GRM1). Pathways enriched in SmkInit also included sodium, potassium, and calcium voltage-gated channels (Figure 4, Supplementary Table 18), essential to neuronal excitability and signaling.

Alcohol is known to affect glutamatergic signaling pathways³⁷, and over half of the enriched pathways for DrnkWk clustered within the exemplar ‘glutamate ionotropic receptor kainate type subunit 2 (GRIK2) PPI subnetwork’ (Figure 4, Supplementary Table 18). Not all DrnkWk-enriched pathways involved the brain, however, as glucose and carbohydrate processing pathways were associated with DrnkWk but no smoking phenotype, perhaps suggesting that alcohol consumption is influenced by individual differences in one’s ability to process calorie-rich alcoholic beverages. Finally, we discovered variation in and around gene rich regions including corticotropin releasing hormone receptor 1 (CRHR1; p=1.6×10⁻¹⁷) and urocortin (UCN; p=8.1×10⁻⁴⁵), associated with DrnkWk but not smoking. UCN encodes an endogenous ligand for CRHR1 and CRHR2³⁸. CRH affects hormones involved in the stress response, including cortisol, and has been associated with the stress response and relapse to drug taking in animals^39,40.

Specific mechanisms by which implicated genes influence substance use in humans are largely unknown, even for those genes reported above involving systems such as neurotransmission, reward-related learning and memory, and the stress response. To prioritize genes for functional experimentation, we tabulated conditionally independent genome-wide significant nonsynonymous variants (Table 1). In the 406 GWAS loci, 4% of sentinel variants were nonsynonymous, representing a significant enrichment (p=2.5×10⁻¹⁰; 0.4% of variants with MAF>0.1% in the imputation panel⁴¹ were nonsynonymous). Several genes in Table 1 have been previously associated with substance use/addiction (see Supplementary Table 22 for a list of previous associations), and two variants have been functionally validated (rs1229984 and rs16969968)^42,43. The others have not, but in some cases their genes interact with established molecular targets of addiction and may themselves be suitable targets for further investigation. For example, rs1024323 in G protein-coupled receptor (GPCR) kinase 4 (GRK4) was associated with CigDay (p=8.7×10⁻⁹) and lies within a locus associated with AgeSmk. GRK4 is involved in the regulation of GPCRs including metabotropic glutamate receptor 1 (GRM1)⁴⁴, GABA_B receptors⁴⁵, and dopamine receptor D1 (DRD1) and D3 (DRD3) in the kidneys and cerebellum, and is involved in essential hypertension⁴⁶. GRK4 is also expressed in the midbrain and forebrain^46,47, but no research has evaluated its impact on substance use behavior. To take one more example, the nonsynonymous variant in SLC39A8 affects zinc and manganese transport, is highly pleiotropic for complex phenotypes, and may impair inflammation, glutamatergic neurotransmission, and regulation of various metals in the body⁴⁸.

Table 1. Nonsynonymous sentinel variants.

The sentinel variant in approximately 4% of loci was nonsynonymous. Shown here are all nonsynonymous sentinel variants, and all nonsynonymous variants in near-perfect LD with a sentinel variant. If the listed gene was also associated (through single variant or gene-based test) with another phenotype, that phenotype is listed in parentheses. Several genes have been implicated in previous studies of substance use/addiction, including CHRNA5, BDNF, GCKR, and ADH1B.

Phenotype	Gene	rsID	Chr	Position	REF	ALT	AF	Beta	p	N	Q
CigDay (SmkCes)	CHRNA5	rs16969968^a	15	78,882,925	G	A	.34	.075	1.2×10⁻²⁷⁸	330,721	.34
CigDay	HIST1H2BE	rs7766641	6	26,184,102	G	A	.27	−.014	2.9×10⁻¹⁰	335,553	.78
CigDay (AgeSmk)	GRK4	rs1024323	4	3,006,043	C	T	.38	−.012	8.7×10⁻⁹	337,334	.17
SmkInit	REV3L	rs462779^a	6	111,695,887	G	A	.81	−.019	4.5×10⁻²⁹	1,232,091	.67
SmkInit (DrnkWk)	BDNF	rs6265	11	27,679,916	C	T	.20	−.016	2.8×10⁻¹⁹	1,232,091	.13
SmkInit	RHOT2	rs1139897	16	720,986	G	A	.23	−.012	1.8×10⁻¹⁵	1,232,091	.61
SmkInit (DrnkWk)	ZNF789	rs6962772^a	7	99,081,730	A	G	.15	−.015	2.1×10⁻¹⁴	1,232,091	.92
SmkInit	BRWD1	rs4818005^a	21	40,574,305	A	G	.58	−.010	3.9×10⁻¹⁴	1,232,091	.75
SmkInit	ENTPD6	rs6050446	20	25,195,509	A	G	.97	.035	8.8×10⁻¹³	1,225,969	.33
SmkInit	RPS6KA4	rs17857342^a	11	64,138,905	T	G	.38	−.010	9.8×10⁻¹²	1,232,091	.16
SmkInit	FAM163A	rs147052174	1	179,783,167	G	T	.02	.037	2.3×10⁻¹⁰	1,232,091	.59
SmkInit	PRRC2B	rs34553878	9	134,907,263	A	G	.11	.016	1.2×10⁻⁹	1,232,091	.28
SmkInit	ADAM15	rs45444697^a	1	155033918	C	T	.21	.010	5.3×10⁻⁹	1,232,091	.46
SmkInit	MMS22L	rs9481410^a	6	97,677,118	G	A	.76	.010	1.1×10⁻⁸	1,232,091	.04
SmkInit	QSER1	rs62618693	11	32,956,492	C	T	.04	−.020	2.1×10⁻⁸	1,232,091	1.00
DrnkWk	ADH1B	rs1229984	4	100,239,319	T	C	.96	.060	2.2×10⁻³⁰⁸	941,280	.05
DrnkWk	GCKR	rs1260326	2	27,730,940	T	C	.60	.008	8.1×10⁻⁴⁵	941,280	.10
DrnkWk	SLC39A8	rs13107325	4	103,188,709	C	T	.07	−.009	1.5×10⁻²²	941,280	.33
DrnkWk	SERPINA1	rs28929474	14	94,844,947	C	T	.02	−.012	1.3×10⁻¹¹	941,280	.50
DrnkWk (SmkInit)	ACTR1B	rs11692465	2	98,275,354	G	A	.09	.008	2.5×10⁻¹¹	937,516	.40
DrnkWk	TNFSF12–13	rs3803800	17	7,462,969	A	G	.79	.004	1.5×10⁻¹⁰	941,280	.67
DrnkWk	HGFAC	rs3748034	4	3,446,091	G	T	.14	−.005	1.7×10⁻⁸	941,280	.65

Open in a new tab

Note: Phenotype abbreviations are defined in Figure 1. Chr=Chromosome; REF=reference allele; ALT=alternate allele; AF=allele frequency of ALT allele; Q=Cochrane’s Q statistic p-value.

These variants were not themselves sentinel, but were in near-perfect LD with a sentinel variant (R² >.99, from the 1000 Genomes European population). The scale of Beta is on the unit of the standard deviation of the phenotype. For binary phenotypes the standard deviation was calculated from the weighted average prevalence across all studies included in the meta-analysis (available in Supplementary Table 7).

Ultimately, substance use is embedded in a complex web of causal relations⁴⁹ (e.g., Figure 1), and caution must be exercised in drawing strong causal conclusions. However, the present findings represent a major step forward in understanding the etiology of these complex, disease-relevant behaviors. In particular, statistical and interpretive power were both enabled by simultaneously studying multiple related substance use behaviors representing different stages of use and substances. More precise measurements, including evaluating age and environment as moderators for these dynamic phenotypes⁵⁰, functional research, and complementary gene mapping approaches (e.g., sequencing) will aid in the discovery of mechanisms by which implicated genes may affect substance use and related disease risk.

METHODS

This article is accompanied by a Supplementary Note with further details, as well as the Life Sciences Reporting Summary.

Generation of summary statistics.

Participants in all studies were genotyped on genome-wide arrays. The majority of studies imputed their genotypes to the Haplotype Reference Consortium⁴¹ using the University of Michigan Imputation Server (see URLs)⁵¹. Several studies did not impute using the imputation server, due to data sharing restrictions, computational, and/or resource limitations (described in the Supplementary Note). All studies used either Minimac3⁵¹ or IMPUTE2⁵² for imputation.

GWAS summary statistics were generated in each study sample using RVTESTS⁵³ according to a standard analysis plan. Studies composed primarily of classically related individuals (e.g., family studies) first regressed out covariates including genetic principal components under a linear model, inverse-normalized the residuals (except for 23andMe), and tested for an additive effect of each variant under a linear mixed model with a genetic kinship matrix. Family studies followed this analysis for all phenotypes, even binary phenotypes such as smoking initiation and cessation. Studies of entirely classically unrelated individuals followed the same analysis for quasi-continuous phenotypes (AgeSmk, CigDay, DrnkWk), but estimated additive genetic effects under a logistic model for binary phenotypes (SmkInit and SmkCes).

Quality control checks were applied to ensure quality of both the phenotypes and genotypes. For each phenotype and covariate, distribution statistics including the minimum, maximum, quartiles, median, mean, and standard deviation were examined. We ensured that these statistics were within expected limits given the phenotype definitions and any scale transformations per the analysis plan. We also evaluated simple relationships among phenotypes. When discrepancies were noted we contacted the original study for clarification or re-analysis, or the data were removed from further analysis. Phenotypic statistics are presented in Supplementary Tables 6 and 7.

Extensive genetic quality control and filtering was performed on the contributed summary statistics from each cohort. We removed imputed variants with imputation quality less than 0.3 (the estimated squared correlation between the imputed dosage and true dosage). We compared the per-study allele labels and allele frequencies with those of the imputation reference panels, and removed or reconciled mismatches. For quantitative traits, we plotted the variance of the score statistics against the sample size, and tested whether the trait residuals in each study were properly normalized and whether the trait analyzed between studies was measured and analyzed using the same unit.

Meta-Analysis.

Meta-analysis was performed centrally using the software package rareGWAMA (see URLs). All statistical tests in the meta-analysis or secondary analyses of the meta-analytic results (e.g., polygenic risk scoring, functional enrichment, MTAG, Genomic SEM, etc.) were two-sided. Given that rarer variants and/or behavioral phenotypes may show between-study heterogeneity in allele frequencies, imputation qualities, or genetic architecture, we extended existing methods and developed a novel fixed effects approach that accounts for between-study heterogeneity. Specifically, the methods aggregated weighted Z-score statistics, i.e. $Z_{M E T A} = \frac{\sum_{k} w_{k} Z_{k}}{{(\sum_{k} w_{k}^{2})}^{1 / 2}}$ , where Z_k is the Z-score statistic in study k. The weight w_k is defined by $w_{k} = N_{k} p_{k} (1 - p_{k}) R_{k}^{2}$ , where p_k is the variant allele frequency, $R_{k}^{2}$ is the imputation quality, and N_k is the sample size for study k. Under the null and with the present sample sizes, Z_META is normally distributed. The weights are proportional to the sample genotype variance. When the trait is uniformly measured and the allele frequency is similar, the method is approximately equivalent to meta-analysis of sample-size-weighted Z-scores. Yet, the method accounts for between-study heterogeneity in imputation accuracy and allele frequencies. The use of a fixed effects model, the most common approach in GWAS meta-analysis of single ancestry groups, appeared acceptable given the apparent lack of substantial meta-analytic effect heterogeneity (see Cochrane’s Q and I² statistics in Supplementary Tables 1–5).

Population stratification and cryptic relatedness were addressed during the generation of summary statistics by each local study through the use of kinship-based linear mixed models⁵⁴ and genetic principal components⁵⁵. Residual stratification was further corrected at the meta-analytic level with study-specific genomic controls⁵⁶ (calculated separately for variants with MAF ≥ 1% and .1%≤MAF<1%; Supplementary Table 23) applied to each study’s results prior to meta-analysis.

A locus was defined as a 1MB region surrounding the “sentinel” variant (the variant in the locus with the lowest p-value). When any two such loci overlapped or abutted, they were collapsed into a single locus. Variants within each locus were subjected to conditional analysis using a novel partial correlation-based score statistic using cohort-level summary statistics⁵⁷ implemented in a sequential forward selection framework. The method requires marginal association statistics and approximated covariance matrices among them, and performs favorably compared to existing methods⁵⁷ (Supplementary Table 24). Covariances among effects were based upon the linkage disequilibrium information estimated from a subset of the Haplotype Reference Consortium⁴¹.

We applied multiple post-meta-analysis variant filters to ensure robustness of reported findings. To reduce artifacts arising from a small number of studies, we excluded any variant that was present in only two or fewer studies. For each variant in the meta-analysis, we calculated the effective sample size $N_{e f f} = \sum_{k} N_{k} r_{k}^{2}$ , where N_k is the sample size in study k and $r_{k}^{2}$ is the imputation quality. We removed variants with effective sample sizes < 10% of the total sample size to ensure only well-imputed variants with a modicum of power were included. We also excluded all variants with minor allele frequency less than 0.001, the lower bound of moderate imputation accuracy with the currently best available imputation reference panel⁴¹. Variants with MAF > 1% are expected to be imputed with high accuracy. Results from the application of post-meta-analysis filters are displayed in Supplementary Table 25.

After applying variant filters and obtaining our final meta-analytic results, we calculated genomic controls and maximum/median per-variant sample sizes. Sample sizes ranged from 337,334 for cigarettes per day to 1,232,091 for smoking initiation. QQ plots, LD intercept tests, and genomic control values indicate that Type I error rates were well controlled, for common and low-frequency variants (Supplementary Figure 2, Supplementary Table 26). All conditionally independent variants were plotted in LocusZoom and included in Supplementary Figures 1–12. All plots were visually inspected, suspicious loci were identified (see Supplementary Table 27) and removed from further consideration. To ensure LD information was available between sentinel variants and others in the locus, we used surrogate variants for eight loci (Supplementary Table 28).

We estimated the extent of pleiotropy for each genome-wide associated locus from our GWAS using an Empirical Bayes approach (i.e. whether a given locus is simultaneously associated with multiple phenotypes). Using summary association statistics from a given locus as input, the method estimated the 5×5 genetic correlation of the locus and the posterior probability of association for all possible phenotype configurations, while accounting for genome-wide genetic correlations and trait residual correlations. In cases where loci associated with different phenotypes overlapped, the locus was expanded in size. Statistical details are available in the Supplementary Note, Section 3.3.

We applied MTAG¹⁶ to variants with MAF>1% from the final meta-analysis results for each phenotype, using the other four phenotypes to increase power for locus discovery. Genomic controls and LD Intercept tests of the MTAG results were well controlled (Supplementary Table 29), and Manhattan and QQ plots well-behaved (Supplementary Figures 16 and 17). GCTA-COJO⁵⁸ was used to identify conditionally independent variants (listed in Supplementary Table 12). All loci were plotted with LocusZoom, visually inspected, with suspicious loci identified (e.g., those without LD support; see Supplementary Table 30) removed from further consideration. Additional details, including testing of MTAG model assumptions, are provided in the Supplementary Note. Finally, we also applied Genomic SEM⁵⁹ to our five phenotypes to formally model and factor their correlation structure. See Supplementary Figure 18, Supplementary Table 31, and the Supplementary Note for further details.

Genome-wide significant threshold.

The primary focus was to test variants with MAF≥1%, as these will be imputed with high confidence. The statistical significance threshold applied to meta-analysis of all variants with MAF≥1% was 5×10⁻⁸, consistent with widespread convention in GWAS of European individuals. Since our imputation procedure is expected to provide some marginal level of accuracy down to MAF of 0.1%, we also conducted an exploratory association test for low frequency variants with 0.1%<MAF<1%, to which we applied a statistical significance threshold of p<5×10⁻⁹. Only two such low-frequency variants surpassed the conventional common variant threshold of p<5×10⁻⁸. Of these two, one low-frequency variant, associated with SmkInit, survived the more stringent multiple testing correction (rs181508347, intergenic, MAF=.0096, p=5×10⁻¹⁰), and is included in our count of discovered loci and included in Supplementary Table 4. The more stringent threshold applies a correction for ~10 million tests, which is approximately the number of conditionally independent variants tested once the MAF lower bound was extended from 1% to 0.1%. We calculated this threshold using three existing methods^60–62. These methods make use of the eigenvalues of the matrix of LD (measured in R²) between SNPs, calculated with a spectral decomposition. We estimated the number of independent tests using the genotype data from a subset of the Haplotype Reference Consortium panel⁴¹. We first calculated LD blocks across the genome using the algorithm implemented in PLINK version 1.9⁶³ with default settings, and then we lowered the MAF threshold to 0.1% to accommodate all low frequency variants. Next, we calculated the effective number of independent tests within each LD block and between LD blocks using the aforementioned three methods, which we aggregated to get the total number of independent tests. The three techniques estimated the number of independent variants at 9.8–10.1 million independent tests, similar to other independent estimates⁶⁴. A total of 278 sentinel variants (including the one genome-wide significant low-frequency variant) had p < 5×10⁻⁹, out of the original 406 with p < 5×10⁻⁸.

Heritability.

We used univariate and bivariate LD Score Regression¹⁷ to assess the heritability of each phenotype and to estimate a variety of genetic correlations. Analyses included (1) LD Score Regression intercept tests to evaluate the extent to which population stratification or cryptic relatedness may artificially inflate our summary statistics; (2) estimation of genetic correlations across our five phenotypes; (3) estimation of genetic correlations computed within a phenotype but between the larger contributing studies, as an estimate of the extent to which phenotypes were measuring the same genetic risk in different studies; and (4) estimation of genetic correlation between the five phenotypes and a wide variety of other phenotypes related to smoking and alcohol behaviors, and for which GWAS have already been made publicly available.

Under standard assumptions, bivariate score regression produces unbiased estimates of genetic correlation, even in the presence of sample overlap⁶⁵. Accordingly, to estimate the extent of genetic correlation between each of our phenotypes, and between our phenotypes and other phenotypes related to nicotine and alcohol use, we used standard procedures in LD Score Regression²². To be included in these analyses, variants were restricted to those present in HapMap3 with MAF>0.01. Standard errors were estimated with a block jackknife over all variants.

We estimated the proportion of variance explained by the set of all conditionally independently associated variants. The joint effects of variants in a locus were approximated by ${\hat{\vec{β}}}_{J O I N T} = V_{M E T A}^{- 1} {\vec{U}}_{M E T A}$ , where ${\vec{U}}_{M E T A}$ is the single variant score statistics and V_META is the covariance matrix between them. The phenotypic variance explained by the independently associated variants in a locus is given by ${\hat{\vec{β}}}_{j o i n t}^{T} c o v (G) {\hat{\vec{β}}}_{J O I N T}$ , where cov(G) is the genotype covariance estimated from the Haplotype Reference Consortium panel.

Polygenic scoring.

Polygenic risk scores (PRS) were computed using LDpred⁶⁶, which accounts for linkage disequilibrium between variants. Since we do not know the variance-covariance matrix of the effects in the training sample (here, the GWAS results), we replace this matrix with a block diagonal matrix estimated using LD patterns from the prediction cohorts, after dropping cryptically-related individuals and ancestry outliers.

Smoking and alcohol use rates are influenced by secular trends and policy changes over the last half century. We therefore selected two independent prediction cohorts, the Health and Retirement Study (HRS)²¹ and the National Longitudinal Study of Adolescent to Adult Health (Add Health)²⁰. The HRS is a nationally representative study of U.S. households that began in 1992; the mean birth year of respondents is 1938 (SD=9.3), and the mean age at the time of assessment is 57.6 (SD=8.9). Add Health is a nationally representative sample of U.S. adolescents enrolled in grades 7 through 12 during the 1994–1995 school year. The mean birth year of respondents was 1979 (SD=1.8), and the mean age at assessment (here, wave 4) was 29.0 (SD=1.8). In the HRS, ~57% of respondents reported ever smoking regularly, and these respondents smoked ~13 cigarettes per day. In Add Health, slightly fewer (~53%) of respondents reported ever smoking regularly, and these respondents smoked ~11 cigarettes per day on average (Supplementary Table 14). For each of our five phenotype scores, we used variants that overlapped with HapMap3 (~1.1 million) to construct the scores. Prediction accuracy was estimated using ordinary least squares regression of a given phenotype (AgeSmk, CigDay, SmkInit, SmkCes, or DrnkWk) on the polygenic score and covariates including age, sex, age × sex interaction, and the first ten genetic principle components.

Prediction accuracy comes from a two-step process where we first regress the phenotype on a standard set of covariates without including the PRS. Then, the PRS predictor is added and the difference in the coefficient of determination (R²) is calculated. For our quantitative phenotypes, AgeSmk, CigDay, and DrnkWk, the predictive power of the PRS is the change in the R² in going from the regression without the PRS to the regression with the PRS. For our two binary phenotypes, SmkInit and SmkCes, we measure the incremental pseudo-R² from probit regressions. 95% confidence intervals around all R² values are bootstrapped with 1000 repetitions each. The same polygenic scoring procedure was applied to the MTAG results (Supplementary Table 32).

Epigenomic enrichment.

To detect genome-wide functional and tissue-specific epigenomic enrichments, we performed enrichment analyses by heritability stratification using Linkage Disequilibrium Score Regression (LDSC v1.0.0), implemented in the LDSC software. Annotation-stratified LD scores were estimated using dichotomized/binary annotations, 1000 Genomes Project samples with European ancestry, and one million base-pair LD windows by default. LDSC then determines functional enrichment of the GWAS traits by partitioning heritability according to the variance explained by the LD-linked SNPs belonging to each functional category²². Statistical enrichment was defined as the ratio between the percentage of heritability explained by variants in each annotated category and the percentage of variants covered by that category. A resampling approach was used to estimate standard errors²².

Following standard procedure, we trained a baseline LDSC model using the 52 non-cell-type specific functional categories (plus one category that includes all SNPs) and used the observed z-scores of HapMap3 SNPs for each trait. We tested cell-group enrichments over 10 pre-defined cell-group annotations²². The cell-group annotations are the result of aggregating 220 cell-type-specific annotations over 4 histone marks (H3K4me1, H3K4me3, H3K9ac, H3K27ac) and 100 well-defined cell types. To detect which specific epigenomes contribute to the group-level enrichment, we performed 220 tests over each individual annotation. Multiple testing was accounted for through Bonferroni correction within phenotype with 10 tests for the cell-group annotation enrichment analyses and 220 tests for the cell-specific enrichment analyses. As a complementary method to LDSC, we also applied a recently developed mixture model learning approach⁶⁷, and report these results in Supplementary Figure 13.

Gene and Gene-Set Tests.

For each phenotype, we used SEQMINER⁶⁸ and the UCSC genome browser annotations (refGene; retrieved December 15 2017) to annotate all conditionally independent genome-wide significant variants. We identified all genes (all variants 5’ to 3’ UTR) harboring at least one variant within LD r²>0.3 with any conditionally independent variant. See Supplementary Tables 1–5.

We conducted a manual review of all genes implicated within each locus, overlap with the GWAS catalogue (Supplementary Table 33), and all pathways identified by PASCAL and DEPICT (described below). We considered a gene to be implicated if it harbored variation in LD with a conditionally independent genome-wide significant variant, or if a gene was located within the locus and was significant by the PASCAL gene-based test. PASCAL⁶⁹ was used for gene based and pathway analysis to test genes and canonical pathways from MSigDb (Supplementary Tables 20–21). Default settings were used to test all variants within all genes. DEPICT⁷⁰ was used to identify enrichment within tissues/cell types, and reconstituted gene sets (also known as “pathways”). For each phenotype, variants from the GWAS were clumped using 500 kb flanking regions with the LD cutoff r² > 0.1 (based on 1000 Genomes phase 1 release v3, the default in DEPICT). We used DEPICT to understand genetic signals beyond the genome-wide significant loci that surpass the conventional 5×10⁻⁸, and so included all variants with p<5×10⁻⁵. DEPICT tissue enrichment results are displayed in Supplementary Figure 15, where enrichment relative to genes in random sets of loci is indicated by red shading. To cluster DEPICT reconstituted gene sets, we used affinity propagation clustering⁷¹ and calculated the correlation between each resulting “exemplary gene set” in Figure 4. Genes, gene sets, and tissue/cell enrichments were considered significant when their false discovery rate was below 0.05. All such significant DEPICT results are reported in Supplementary Tables 17–19. PASCAL and DEPICT were also applied in the same fashion to the MTAG summary statistics (Supplementary Tables 34–39).

Statistics.

The GWAS meta-analysis was conducted using chi-square statistics based upon an imputation-quality aware fixed effect meta-analysis approach. Two sided p-values were calculated. The MTAG and GenomicSEM analysis test statistics was conducted using the GWAS meta-analysis results, and two-sided p-values were similarly calculated from chi-square distribution. The pleiotropic analysis was conducted based upon an empirical Bayes approach. The prior distribution for the effect sizes were assumed to follow a mixture distribution: with a point mass at 0 (representing the possibility the locus is not associated with the trait) and a normal distribution (representing the possibility that the locus is associated). The hyper-parameters were estimated by maximizing the marginal likelihood. The method properly accounts for the local genetic correlation and residual correlation between phenotypes. The posterior probability of association for each locus was estimated for each possible combination of 5 phenotypes, and the combination with the highest PPA was reported for each locus.

Supplementary Material

NIHMS1511852-supplement-1.pdf^{(27.3MB, pdf)}

NIHMS1511852-supplement-2.xlsx^{(2.6MB, xlsx)}

Editorial Summary:

Association studies of up to 1.2 million individuals identify 566 genetic variants in 406 loci associated with tobacco use and addiction (initiation, cessation, and heaviness) as well as alcohol use, with 150 loci showing pleiotropic association.

ACKNOWLEDGEMENTS

This study was designed and carried out by the GWAS and Sequencing Consortium of Alcohol and Nicotine use (GSCAN). It was conducted by using the UK Biobank Resource under Application Number 16651. This study was supported by funding from the US National Institutes of Health (NIH) awards R01DA037904 to S.Vrieze., R01HG008983 to D.J.Liu., and R21DA040177 to D.J.Liu. Ethical review and approval was provided by the University of Minnesota IRB; all human subjects received informed consent. A full list of acknowledgements is provided in the Supplementary Note.

Footnotes

CODE AVAILABILITY:

All software used to perform these analyses are available online.

URLs:

GSCAN website (with summary statistics and LocusZoom plots for MTAG loci): https://genome.psych.umn.edu/index.php/GSCAN

ANNO: https://github.com/zhanxw/anno

APIGenome: https://github.com/hyunminkang/apigenome

BCFtools: http://samtools.github.io/bcftools/

BOLT-LMM: https://data.broadinstitute.org/alkesgroup/BOLT-LMM/

DEPICT: https://data.broadinstitute.org/mpg/depict/

GCTA: http://cnsgenomics.com/software/gcta/

GenomicSEM: https://github.com/MichelNivard/GenomicSEM

LDpred: https://github.com/bvilhjal/ldpred

LDSC: https://github.com/bulik/ldsc

LocusZoom: https://github.com/statgen/locuszoom-standalone

Michigan Imputation Server: http://imputationserver.sph.umich.edu/

Minimac3: https://genome.sph.umich.edu/wiki/Minimac3

MTAG: https://github.com/omeed-maghzian/mtag

PASCAL: https://www2.unil.ch/cbg/index.php?title=Pascal

PLINK: https://www.cog-genomics.org/plink/1.9/

PriorityPruner: http://prioritypruner.sourceforge.net/

R: https://www.r-project.org/

rareGWAMA: https://github.com/dajiangliu/rareGWAMA

RiVIERA: https://github.com/yueli-compbio/RiVIERA

RVTESTS: https://github.com/zhanxw/rvtests

SEQMINER: https://github.com/zhanxw/seqminer

SHAPEIT: http://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html

CONTRIBUTOR LIST FOR THE 23andMe RESEARCH TEAM: Michelle Agee¹¹, Babak Alipanahi¹¹, Adam Auton¹¹, Robert K. Bell¹¹, Katarzyna Bryc¹¹, Sarah L. Elson¹¹, Pierre Fontanillas¹¹, Nicholas A. Furlotte¹¹, David A. Hinds¹¹, Bethann S. Hromatka¹¹, Karen E. Huber¹¹, Aaron Kleinman¹¹, Nadia K. Litterman¹¹, Matthew H. McIntyre¹¹, Joanna L. Mountain¹¹, Carrie A.M. Northover¹¹, J. Fah Sathirapongsasuti¹¹, Olga V. Sazonova¹¹, Janie F. Shelton¹¹, Suyash Shringarpure¹¹, Chao Tian¹¹, Joyce Y. Tung¹¹, Vladimir Vacic¹¹, Catherine H. Wilson¹¹, and Steven J. Pitts¹¹.

CONTRIBUTOR LIST FOR HUNT ALL-IN PSYCHIATRY: Amy Mitchell⁶⁵, Anne Heidi Skogholt²⁰, Bendik S Winsvold^65,76, Børge Sivertsen^77,78,79, Eystein Stordal^78,80, Gunnar Morken^78,81, Håvard Kallestad^78,81, Ingrid Heuch⁷⁹, John-Anker Zwart^65,76,82, Katrine Kveli Fjukstad^83,84, Linda M Pedersen⁶⁵, Maiken Elvestad Gabrielsen²⁰, Marianne Bakke Johnsen^65,82, Marit Skrove⁸⁵, Marit Sæbø Indredavik^78,85, Ole Kristian Drange^78,81, Ottar Bjerkeset^78,86, Sigrid Børte^65,82, Synne Øien Stensland^65,87

76 Department of Neurology, Oslo University Hospital, Oslo, Norway.

77 Department of Health Promotion, Norwegian Institute of Public Health, Bergen, Norway.

78 Department of Mental Health, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway.

79 Department of Research and Innovation, Helse-Fonna HF, Haugesund, Norway.

80 Department of Psychiatry, Hospital Namsos, Nord-Trøndelag Health Trust, Namsos, Norway.

81 Division of Mental Health Care, St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway.

82 Institute of Clinical Medicine, University of Oslo, Oslo, Norway.

83 Department of Psychiatry, Nord-Trøndelag Hospital Trust, Levanger Hospital, Norway.

84 Department of Laboratory Medicine, Children’s and Women’s Health, Norwegian University of Science and Technology, Trondheim, Norway.

85 Regional Centre for Child and Youth Mental Health and Child Welfare, Department of Mental Health, Faculty of Medicine and Health Sciences, NTNU – Norwegian University of Science and Technology.

86 Faculty of Nursing and Health Sciences, Nord University, Levanger, Norway.

87 NKVTS, Norwegian Centre for Violence and Traumatic Stress Studies.

DATA AVAILABITY STATEMENT

GWAS summary statistics can be downloaded from the world wide web (https://genome.psych.umn.edu/index.php/GSCAN). We provide association results for all SNPs that passed quality-control filters in a GWAS meta-analysis of each of our five substance use phenotypes that excludes the research participants from 23andMe.

COMPETING INTERESTS STATEMENT: Laura J. Bierut and the spouse of Nancy L. Saccone are listed as inventors on Issued U.S. Patent 8,080,371, “Markers for Addiction” covering the use of certain SNPs in determining the diagnosis, prognosis, and treatment of addiction. Sean David is a scientific advisor to BaseHealth, Inc. Gyda Bjornsdottir, Daniel F. Gudbjartsson, Gunnar W. Reginsson, Hreinn Stefansson, Kari Stefansson, and Thorgeir E. Thorgeirsson are employees of deCODE Genetics/AMGEN, Inc. Chao Tian and David Hinds are employees of 23andMe, Inc.

REFERENCES

1.Ezzati M et al. Selected major risk factors and global and regional burden of disease. Lancet 360, 1347–1360 (2002). [DOI] [PubMed] [Google Scholar]
2.Hicks BM,Schalet BD, Malone SM,Iacono WG & McGue M Psychometric and genetic architecture of substance use disorder and behavioral disinhibition measures for gene association studies. Behavior Genetics 41, 459–75 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Polderman TJ et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet (2015). [DOI] [PubMed] [Google Scholar]
4.Kendler KS, Schmitt E, Aggen SH & Prescott CA Genetic and environmental influences on alcohol, caffeine, cannabis, and nicotine use from early adolescence to middle adulthood. Arch Gen Psychiatry 65, 674–82 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Kendler KS, Prescott CA, Myers J & Neale MC The structure of genetic and environmental risk factors for common psychiatric and substance use disorders in men and women. Archives of General Psychiatry 60, 929–937 (2003). [DOI] [PubMed] [Google Scholar]
6.Bierut LJ et al. ADH1B is associated with alcohol dependence and alcohol consumption in populations of European and African ancestry. Mol Psychiatry 17, 445–50 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Thorgeirsson TE et al. Sequence variants at CHRNB3-CHRNA6 and CYP2A6 affect smoking behavior. Nature Genetics 42, 448–U135 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Thorgeirsson TE et al. A rare missense mutation in CHRNA4 associates with smoking behavior and its consequences. Mol Psychiatry 21, 594–600 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Furberg H et al. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nature Genetics 42, 441–U134 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Schumann G et al. KLB is associated with alcohol drinking, and its gene product beta-Klotho is necessary for FGF21 regulation of alcohol preference. Proceedings of the National Academy of Sciences of the United States of America 113, 14372–14377 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Jorgenson E et al. Genetic contributors to variation in alcohol consumption vary by race/ethnicity in a large multi-ethnic genome-wide association study. Mol Psychiatry (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Polesskaya OO, Smith RF & Fryxell KJ Chronic nicotine doses down-regulate PDE4 isoforms that are targets of antidepressants in adolescent female rats. Biological Psychiatry 61, 56–64 (2007). [DOI] [PubMed] [Google Scholar]
13.Boyden LM et al. Mutations in kelch-like 3 and cullin 3 cause hypertension and electrolyte abnormalities. Nature 482, 98–102 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Wang W et al. Forced Expiratory Volume in the First Second and Aldosterone as Mediators of Smoking Effect on Stroke in African Americans: The Jackson Heart Study. Journal of the American Heart Association 5(2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Aoun EG et al. A relationship between the aldosterone-mineralocorticoid receptor pathway and alcohol drinking: preliminary translational findings across rats, monkeys and humans. Mol Psychiatry 23, 1466–1473 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Turley P et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nature Genetics 50, 229−+ (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Bulik-Sullivan BK et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics 47, 291−+ (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Yang JA, Lee SH, Goddard ME & Visscher PM GCTA: a tool for genome-wide complex trait analysis. American Journal of Human Genetics 88, 76–82 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Zheng J et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Harris KM, Halpern CT, Haberstick BC & Smolen A The National Longitudinal Study of Adolescent Health (Add Health) Sibling Pairs Data. Twin Research and Human Genetics 16, 391–398 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Sonnega A et al. Cohort Profile: the Health and Retirement Study (HRS). International Journal of Epidemiology 43, 576–585 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Finucane HK et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature Genetics 47, 1228−+ (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Wilson S, Bair JL, Thomas KM & Iacono WG Problematic alcohol use and reduced hippocampal volume: a meta-analytic review. Psychological Medicine 47, 2288–2301 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Ewing SWF, Sakhardande A & Blakemore SJ The effect of alcohol consumption on the adolescent brain: A systematic review of MRI and fMRI studies of alcohol-using youth. Neuroimage-Clinical 5, 420–437 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Goldstein RZ &Volkow ND Dysfunction of the prefrontal cortex in addiction: neuroimaging findings and clinical implications. Nature Reviews Neuroscience 12, 652–669 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Volkow ND & Morales M The Brain on Drugs: From Reward to Addiction. Cell 162, 712–725 (2015). [DOI] [PubMed] [Google Scholar]
27.Koob GF & Volkow ND Neurocircuitry of Addiction. Neuropsychopharmacology 35, 217–238 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Koob GF & Volkow ND Neurobiology of addiction: a neurocircuitry analysis. Lancet Psychiatry 3, 760–773 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Fernandez E, Schiappa R, Girault JA & Le Novere N DARPP-32 is a robust integrator of dopamine and glutamate signals. Plos Computational Biology 2, 1619–1633 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Yagishita S et al. A critical time window for dopamine actions on the structural plasticity of dendritic spines. Science 345, 1616–1620 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Zhu HW et al. DARPP-32 phosphorylation opposes the behavioral effects of nicotine. Biological Psychiatry 58, 981–989 (2005). [DOI] [PubMed] [Google Scholar]
32.Stoker AK & Markou A Unraveling the neurobiology of nicotine dependence using genetically engineered mice. Current Opinion in Neurobiology 23, 493–499 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Litten RZ et al. A Double-Blind, Placebo-Controlled Trial Assessing the Efficacy of Varenicline Tartrate for Alcohol Dependence. Journal of Addiction Medicine 7, 277–286 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Hyman SE, Malenka RC & Nestler EJ Neural mechanisms of addiction: The role of reward-related learning and memory. Annual Review of Neuroscience 29, 565–598 (2006). [DOI] [PubMed] [Google Scholar]
35.Kalivas PW The glutamate homeostasis hypothesis of addiction. Nature Reviews Neuroscience 10, 561–572 (2009). [DOI] [PubMed] [Google Scholar]
36.Szumlinski KK et al. Methamphetamine Addiction Vulnerability: The Glutamate, the Bad, and the Ugly. Biological Psychiatry 81, 959–970 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Gass JT & Olive MF Glutamatergic substrates of drug addiction and alcoholism. Biochemical Pharmacology 75, 218–265 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Vaughan J et al. Urocortin, a mammalian neuropeptide related to fish urotensin I and to corticotropin-releasing factor. Nature 378, 287–92 (1995). [DOI] [PubMed] [Google Scholar]
39.Logrip ML, Koob GF & Zorrilla EP Role of corticotropin-releasing factor in drug addiction: potential for pharmacological intervention. CNS Drugs 25, 271–87 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Volkow ND, Koob GF & McLellan AT Neurobiologic Advances from the Brain Disease Model of Addiction. N Engl J Med 374, 363–71 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
41.McCarthy S et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Lassi G et al. The CHRNA5-A3-B4 Gene Cluster and Smoking: From Discovery to Therapeutics. Trends in Neurosciences 39, 851–861 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Edenberg HJ The genetics of alcohol metabolism: role of alcohol dehydrogenase and aldehyde dehydrogenase variants. Alcohol Res Health 30, 5–13 (2007). [PMC free article] [PubMed] [Google Scholar]
44.Sallese M et al. The G-protein-coupled receptor kinase GRK4 mediates homologous desensitization of metabotropic glutamate receptor 1. Faseb Journal 14, 2569–2580 (2000). [DOI] [PubMed] [Google Scholar]
45.Perroy J, Adam L, Qanbar R, Chenier S & Bouvier M Phosphorylation-independent desensitization of GABA(B) receptor by GRK4. Embo Journal 22, 3816–3824 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Yang J, Villar VM, Armando I, Jose PA & Zeng CY G Protein-Coupled Receptor Kinases: Crucial Regulators of Blood Pressure. Journal of the American Heart Association 5(2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Consortium G Genetic effects on gene expression across human tissues (vol 550, pg 204, 2017). Nature 553(2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Costas J The highly pleiotropic gene SLC39A8 as an opportunity to gain insight into the molecular pathogenesis of schizophrenia. American Journal of Medical Genetics Part B-Neuropsychiatric Genetics 177, 274–283 (2018). [DOI] [PubMed] [Google Scholar]
49.Kong A et al. The nature of nurture: Effects of parental genotypes. Science 359, 424–428 (2018). [DOI] [PubMed] [Google Scholar]
50.Vrieze SI, Hicks BM, Iacono WG & McGue M Decline in genetic influence on the co-occurrence of alcohol, marijuana, and nicotine dependence symptoms from age 14 to 29. Am J Psychiatry 169, 1073–81 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

METHODS ONLY REFERENCES

51.Das S et al. Next-generation genotype imputation service and methods. Nat Genet (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Howie B, Fuchsberger C, Stephens M, Marchini J & Abecasis GR Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature Genetics 44, 955−+ (2012).f [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Zhan X, Hu Y, Li B, Abecasis GR & Liu DJ RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data. Bioinformatics 32, 1423–6 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Kang HM et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42, 348–54 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Price AL et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics 38, 904–909 (2006). [DOI] [PubMed] [Google Scholar]
56.Devlin B & Roeder K Genomic control for association studies. Biometrics 55, 997–1004 (1999). [DOI] [PubMed] [Google Scholar]
57.Jiang Y et al. Proper Conditional Analysis in the Presence of Missing Data Identified Novel Independently Associated Low Frequency Variants in Nicotine Dependence Genes. PLoS Genetics (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Yang J et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 44, 369–75, S1–3 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Grotzinger AD et al. Genomic SEM Provides Insights into the Multivariate Genetic Architecture of Complex Traits. bioRxiv (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Li J & Ji L Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95, 221–227 (2005). [DOI] [PubMed] [Google Scholar]
61.Gao XY, Becker LC, Becker DM, Starmer JD & Province MA Avoiding the High Bonferroni Penalty in Genome-Wide Association Studies. Genetic Epidemiology 34, 100–105 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Chen ZX & Liu QZ A New Approach to Account for the Correlations among Single Nucleotide Polymorphisms in Genome-Wide Association Studies. Human Heredity 72, 1–9 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Chang CC et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4(2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Wu Y, Zheng ZL, Visscher PM & Yang J Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data. Genome Biology 18(2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Bulik-Sullivan B et al. An atlas of genetic correlations across human diseases and traits. Nature Genetics 47, 1236−+ (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Vilhjalmsson BJ et al. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. American Journal of Human Genetics 97, 576–592 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Li Y, Davila-Velderrain J & Kellis M A probabilistic framework to dissect functional cell-type-specific regulatory elements and risk loci underlying the genetics of complex traits. BioRxiv 059345(2017). [Google Scholar]
68.Zhan X & Liu DJ SEQMINER: An R-Package to Facilitate the Functional Interpretation of Sequence-Based Associations. Genet Epidemiol 39, 619–23 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Lamparter D, Marbach D, Rueedi R, Kutalik Z & Bergmann S Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics. Plos Computational Biology 12(2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Pers TH et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nature Communications 6(2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Frey BJ & Dueck D Clustering by passing messages between data points. Science 315, 972–976 (2007). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1511852-supplement-1.pdf^{(27.3MB, pdf)}

NIHMS1511852-supplement-2.xlsx^{(2.6MB, xlsx)}

[R1] 1.Ezzati M et al. Selected major risk factors and global and regional burden of disease. Lancet 360, 1347–1360 (2002). [DOI] [PubMed] [Google Scholar]

[R2] 2.Hicks BM,Schalet BD, Malone SM,Iacono WG & McGue M Psychometric and genetic architecture of substance use disorder and behavioral disinhibition measures for gene association studies. Behavior Genetics 41, 459–75 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Polderman TJ et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet (2015). [DOI] [PubMed] [Google Scholar]

[R4] 4.Kendler KS, Schmitt E, Aggen SH & Prescott CA Genetic and environmental influences on alcohol, caffeine, cannabis, and nicotine use from early adolescence to middle adulthood. Arch Gen Psychiatry 65, 674–82 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Kendler KS, Prescott CA, Myers J & Neale MC The structure of genetic and environmental risk factors for common psychiatric and substance use disorders in men and women. Archives of General Psychiatry 60, 929–937 (2003). [DOI] [PubMed] [Google Scholar]

[R6] 6.Bierut LJ et al. ADH1B is associated with alcohol dependence and alcohol consumption in populations of European and African ancestry. Mol Psychiatry 17, 445–50 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Thorgeirsson TE et al. Sequence variants at CHRNB3-CHRNA6 and CYP2A6 affect smoking behavior. Nature Genetics 42, 448–U135 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Thorgeirsson TE et al. A rare missense mutation in CHRNA4 associates with smoking behavior and its consequences. Mol Psychiatry 21, 594–600 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Furberg H et al. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nature Genetics 42, 441–U134 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Schumann G et al. KLB is associated with alcohol drinking, and its gene product beta-Klotho is necessary for FGF21 regulation of alcohol preference. Proceedings of the National Academy of Sciences of the United States of America 113, 14372–14377 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Jorgenson E et al. Genetic contributors to variation in alcohol consumption vary by race/ethnicity in a large multi-ethnic genome-wide association study. Mol Psychiatry (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Polesskaya OO, Smith RF & Fryxell KJ Chronic nicotine doses down-regulate PDE4 isoforms that are targets of antidepressants in adolescent female rats. Biological Psychiatry 61, 56–64 (2007). [DOI] [PubMed] [Google Scholar]

[R13] 13.Boyden LM et al. Mutations in kelch-like 3 and cullin 3 cause hypertension and electrolyte abnormalities. Nature 482, 98–102 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Wang W et al. Forced Expiratory Volume in the First Second and Aldosterone as Mediators of Smoking Effect on Stroke in African Americans: The Jackson Heart Study. Journal of the American Heart Association 5(2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Aoun EG et al. A relationship between the aldosterone-mineralocorticoid receptor pathway and alcohol drinking: preliminary translational findings across rats, monkeys and humans. Mol Psychiatry 23, 1466–1473 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Turley P et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nature Genetics 50, 229−+ (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Bulik-Sullivan BK et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics 47, 291−+ (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Yang JA, Lee SH, Goddard ME & Visscher PM GCTA: a tool for genome-wide complex trait analysis. American Journal of Human Genetics 88, 76–82 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Zheng J et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Harris KM, Halpern CT, Haberstick BC & Smolen A The National Longitudinal Study of Adolescent Health (Add Health) Sibling Pairs Data. Twin Research and Human Genetics 16, 391–398 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Sonnega A et al. Cohort Profile: the Health and Retirement Study (HRS). International Journal of Epidemiology 43, 576–585 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Finucane HK et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature Genetics 47, 1228−+ (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Wilson S, Bair JL, Thomas KM & Iacono WG Problematic alcohol use and reduced hippocampal volume: a meta-analytic review. Psychological Medicine 47, 2288–2301 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Ewing SWF, Sakhardande A & Blakemore SJ The effect of alcohol consumption on the adolescent brain: A systematic review of MRI and fMRI studies of alcohol-using youth. Neuroimage-Clinical 5, 420–437 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Goldstein RZ &Volkow ND Dysfunction of the prefrontal cortex in addiction: neuroimaging findings and clinical implications. Nature Reviews Neuroscience 12, 652–669 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Volkow ND & Morales M The Brain on Drugs: From Reward to Addiction. Cell 162, 712–725 (2015). [DOI] [PubMed] [Google Scholar]

[R27] 27.Koob GF & Volkow ND Neurocircuitry of Addiction. Neuropsychopharmacology 35, 217–238 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Koob GF & Volkow ND Neurobiology of addiction: a neurocircuitry analysis. Lancet Psychiatry 3, 760–773 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Fernandez E, Schiappa R, Girault JA & Le Novere N DARPP-32 is a robust integrator of dopamine and glutamate signals. Plos Computational Biology 2, 1619–1633 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Yagishita S et al. A critical time window for dopamine actions on the structural plasticity of dendritic spines. Science 345, 1616–1620 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Zhu HW et al. DARPP-32 phosphorylation opposes the behavioral effects of nicotine. Biological Psychiatry 58, 981–989 (2005). [DOI] [PubMed] [Google Scholar]

[R32] 32.Stoker AK & Markou A Unraveling the neurobiology of nicotine dependence using genetically engineered mice. Current Opinion in Neurobiology 23, 493–499 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Litten RZ et al. A Double-Blind, Placebo-Controlled Trial Assessing the Efficacy of Varenicline Tartrate for Alcohol Dependence. Journal of Addiction Medicine 7, 277–286 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Hyman SE, Malenka RC & Nestler EJ Neural mechanisms of addiction: The role of reward-related learning and memory. Annual Review of Neuroscience 29, 565–598 (2006). [DOI] [PubMed] [Google Scholar]

[R35] 35.Kalivas PW The glutamate homeostasis hypothesis of addiction. Nature Reviews Neuroscience 10, 561–572 (2009). [DOI] [PubMed] [Google Scholar]

[R36] 36.Szumlinski KK et al. Methamphetamine Addiction Vulnerability: The Glutamate, the Bad, and the Ugly. Biological Psychiatry 81, 959–970 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Gass JT & Olive MF Glutamatergic substrates of drug addiction and alcoholism. Biochemical Pharmacology 75, 218–265 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Vaughan J et al. Urocortin, a mammalian neuropeptide related to fish urotensin I and to corticotropin-releasing factor. Nature 378, 287–92 (1995). [DOI] [PubMed] [Google Scholar]

[R39] 39.Logrip ML, Koob GF & Zorrilla EP Role of corticotropin-releasing factor in drug addiction: potential for pharmacological intervention. CNS Drugs 25, 271–87 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Volkow ND, Koob GF & McLellan AT Neurobiologic Advances from the Brain Disease Model of Addiction. N Engl J Med 374, 363–71 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.McCarthy S et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Lassi G et al. The CHRNA5-A3-B4 Gene Cluster and Smoking: From Discovery to Therapeutics. Trends in Neurosciences 39, 851–861 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Edenberg HJ The genetics of alcohol metabolism: role of alcohol dehydrogenase and aldehyde dehydrogenase variants. Alcohol Res Health 30, 5–13 (2007). [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Sallese M et al. The G-protein-coupled receptor kinase GRK4 mediates homologous desensitization of metabotropic glutamate receptor 1. Faseb Journal 14, 2569–2580 (2000). [DOI] [PubMed] [Google Scholar]

[R45] 45.Perroy J, Adam L, Qanbar R, Chenier S & Bouvier M Phosphorylation-independent desensitization of GABA(B) receptor by GRK4. Embo Journal 22, 3816–3824 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] 46.Yang J, Villar VM, Armando I, Jose PA & Zeng CY G Protein-Coupled Receptor Kinases: Crucial Regulators of Blood Pressure. Journal of the American Heart Association 5(2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] 47.Consortium G Genetic effects on gene expression across human tissues (vol 550, pg 204, 2017). Nature 553(2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Costas J The highly pleiotropic gene SLC39A8 as an opportunity to gain insight into the molecular pathogenesis of schizophrenia. American Journal of Medical Genetics Part B-Neuropsychiatric Genetics 177, 274–283 (2018). [DOI] [PubMed] [Google Scholar]

[R49] 49.Kong A et al. The nature of nurture: Effects of parental genotypes. Science 359, 424–428 (2018). [DOI] [PubMed] [Google Scholar]

[R50] 50.Vrieze SI, Hicks BM, Iacono WG & McGue M Decline in genetic influence on the co-occurrence of alcohol, marijuana, and nicotine dependence symptoms from age 14 to 29. Am J Psychiatry 169, 1073–81 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use

Mengzhen Liu

Yu Jiang

Robbee Wedow

Yue Li

David M Brazel

Fang Chen

Gargi Datta

Jose Davila-Velderrain

Daniel McGuire

Chao Tian

Xiaowei Zhan

Hélène Choquet

Anna R Docherty

Jessica D Faul

Johanna R Foerster

Lars G Fritsche

Maiken Elvestad Gabrielsen

Scott D Gordon

Jeffrey Haessler

Jouke-Jan Hottenga

Hongyan Huang

Seon-Kyeong Jang

Philip R Jansen

Yueh Ling

Reedik Mägi

Nana Matoba

George McMahon

Antonella Mulas

Valeria Orrù

Teemu Palviainen

Anita Pandit

Gunnar W Reginsson

Anne Heidi Skogholt

Jennifer A Smith

Amy E Taylor

Constance Turman

Gonneke Willemsen

Hannah Young

Kendra A Young

Gregory J M Zajac

Wei Zhao

Wei Zhou

Gyda Bjornsdottir

Jason D Boardman

Michael Boehnke

Dorret I Boomsma

Chu Chen

Francesco Cucca

Gareth E Davies

Charles B Eaton

Marissa A Ehringer

Tõnu Esko

Edoardo Fiorillo

Nathan A Gillespie

Daniel F Gudbjartsson

Toomas Haller

Kathleen Mullan Harris

Andrew C Heath

John K Hewitt

Ian B Hickie

John E Hokanson

Christian J Hopfer

David J Hunter

William G Iacono

Eric O Johnson

Yoichiro Kamatani

Sharon L R Kardia

Matthew C Keller

Manolis Kellis

Charles Kooperberg

Peter Kraft

Kenneth S Krauter

Markku Laakso

Penelope A Lind

Anu Loukola

Sharon M Lutz

Pamela A F Madden

Nicholas G Martin