Abstract
Introduction
Heritability estimates of nicotine dependence (ND) range from 40% to 70%, but discovery GWAS of ND are underpowered and have limited predictive utility. In this work, we leverage genetically correlated traits and diseases to increase the accuracy of polygenic risk prediction.
Methods
We employed a multi-trait model using summary statistic-based best linear unbiased predictors (SBLUP) of genetic correlates of DSM-IV diagnosis of ND in 6394 individuals of European Ancestry (prevalence = 45.3%, %female = 46.8%, µ age = 40.08 [s.d. = 10.43]) and 3061 individuals from a nationally-representative sample with Fagerström Test for Nicotine Dependence symptom count (FTND; 51.32% female, mean age = 28.9 [s.d. = 1.70]). Polygenic predictors were derived from GWASs known to be phenotypically and genetically correlated with ND (i.e., Cigarettes per Day [CPD], the Alcohol Use Disorders Identification Test [AUDIT-Consumption and AUDIT-Problems], Neuroticism, Depression, Schizophrenia, Educational Attainment, Body Mass Index [BMI], and Self-Perceived Risk-Taking); including Height as a negative control. Analyses controlled for age, gender, study site, and the first 10 ancestral principal components.
Results
The multi-trait model accounted for 3.6% of the total trait variance in DSM-IV ND. Educational Attainment (β = –0.125; 95% CI: [–0.149,–0.101]), CPD (0.071 [0.047,0.095]), and Self-Perceived Risk-Taking (0.051 [0.026,0.075]) were the most robust predictors. PGS effects on FTND were limited.
Conclusions
Risk for ND is not only polygenic, but also pleiotropic. Polygenic effects on ND that are accessible by these traits are limited in size and act additively to explain risk.
Implications
These findings enhance our understanding of inherited genetic factors for nicotine dependence. The data show that genome-wide association study (GWAS) findings across pre- and comorbid conditions of smoking are differentially associated with nicotine dependence and that when combined explain significantly more trait variance. These findings underscore the utility of multivariate approaches to understand the validity of polygenic scores for nicotine dependence, especially as the power of GWAS of broadly-defined smoking behaviors increases. Realizing the potential of GWAS to inform complex smoking behaviors will require similar theory-driven models that reflect the myriad of mechanisms that drive individual differences.
Introduction
Despite attempts to regulate tobacco use since the early 1950s and a steady decrease in smoking prevalence, tobacco consumption remains common in the United States with approximately 14% of Americans (37.8 million) reporting daily use.1 The prevalence of tobacco use poses a significant public health concern in the U.S., as roughly 500 000 Americans die from smoking or exposure to smoke each year, and 16 million Americans live with serious and costly illnesses caused by tobacco use, such as cancer, cardiovascular disease, and pulmonary disease.2 Although the predominant cause of the detrimental health effects of tobacco use lies with the toxic chemicals that are produced when consuming tobacco products, nicotine has been identified as the key addictive component that explains why so many people use tobacco, and continue to use tobacco products despite negative effects.
Research into the etiology of tobacco use and associated problems has used both population and family-based samples.3 Twin and family studies allow for the estimation of additive genetic effects, common environmental effects, and unique environmental effects on tobacco use behaviors.4 Twin studies have estimated the heritability of Nicotine Dependence (ND) (i.e., meeting the 3+ past-year/lifetime Diagnostic and Statistical Manual of Mental Disorders [version IV] criterion) between 40% and 70%, and found it to be consistent across studies. Genome-wide Association Studies (GWAS) link single nucleotide polymorphisms (SNPs) with disease status or severity.5 A recent meta-analysis of GWAS by Liu et al.6 examined genome-wide effects on smoking initiation (i.e., using age of initiation of regular smoking and a binary phenotype indicating whether an individual had ever smoked regularly) using 1.2 million individuals and identified 378 associated variants. It also identified 55 variants for heaviness of smoking (i.e., number of cigarettes/day) using 337 334 individuals.6 Of these two smoking phenotypes, cigarettes per day (CPD) most closely aligns with psychometric and clinical measures of ND as evidenced by strong phenotypic and genetic models. CPD is a key component of the two most common measures of ND, namely the Fagerström Test for Cigarette Dependence and structured interviews based on DSM-V tobacco use disorder criteria. There have been 17 GWAS of ND and four meta-analyses that suggest several causal loci, most notably variants involved in the coding of nicotinic acetylcholine receptor subunits. Though important genes involved in the downstream biological pathways of ND have been identified (e.g., NRXN1), the application of the effects of these variants in predictive models for ND has been lacking.7
Polygenic Scores (PGSs) aggregate the estimated associations of genetic variants for a trait into a score reflecting the genetic risk of individuals for a phenotype (either the same phenotype as that studied in the GWAS used to obtain the variant effect sizes, or a genetically correlated phenotype) in an independent sample.8 We identified nine studies that have used a PGS to predict individual differences in smoking phenotypes,9–17 with just two specifically focusing on nicotine dependence (ND) or tobacco use disorder (TUD).11,17 These two studies yielded inconsistent results. Marees et al. tested whether rare variants identified from a genetic study of TUD predicted the number of cigarettes smoked per day and found no significant contribution.17 Belsky et al. developed a risk score of CPD variants from a meta-analysis of three GWAS and successfully predicted CPD and progression to ND (OR = 1.27 [1.09–1.27]); however, the risk score was not a better predictor than family history (OR = 1.53 [1.29–1.80]). With improvement of PGSs, a CPD PGS may be valuable as a clinical predictor of ND. Together, these recent studies highlight the promise of PGSs in a clinical setting, but also reveal their limitations, and the need to utilize novel polygenic approaches to improve the strength of PGSs.11,18
The majority of PGSs for smoking phenotypes (e.g., CPD and time to smoke a cigarette after waking) have been primarily derived from GWAS of smoking outcomes. This approach has failed to capture the complex and heterogeneous nature of ND/TUD that has been suggested by past genetic and behavioral studies.11,19,20 Recent GWAS studies using the Fagerström Test for Nicotine Dependence (FTND) severity score21,22 address some of these limitations, but are still based on relatively small sample sizes compared to the large GSCAN study on smoking behaviors.6 Specifically, risk for ND is associated with a wide variety of behaviors and traits, and there is accumulating evidence of genetic and phenotypic correlations between these traits and ND (e.g., depression, risk-taking).23 However, the biological bases for the relationships among these traits are unknown. A recent study leveraged a similar multi-polygenic score approach to predict various phenotypes, such as cognitive ability, BMI, and educational achievement, resulting in a gain of variance explained than single-score model.24 We similarly believe that additional knowledge could be gained by leveraging multiple PGSs utilizing these related traits as predictors of ND to identify possible sources of genetic risk.
The primary aims of this study were to identify robust PGSs for ND and to examine whether utilizing multiple etiologically associated behaviors or traits increases predictive utility. Moreover, the current study employs genetic predictors with Best Linear Unbiased Prediction (BLUP) properties based on random effects models in large reference GWAS, which has been shown to increase prediction power.25 We compared/contrasted single-trait PGS models using summary-statistic-based BLUP (SBLUP) weight-adjusted summary statistic data from GWAS with a multi-trait PGS model that integrates SBLUP-adjusted GWAS information from known phenotypes that are genetically correlated with ND (see Figure 1). As such, this multi-trait SBLUP model accounts for pleiotropic effects (i.e., shared genetic effects between traits and risk for developing nicotine dependence that explain the phenotypic association between them)—thus providing a less biased indication of which PGS may be useful for predicting and understanding ND risk; however, much work is still needed to understand how these traits and the variants included in these PGSs influence ND. We hypothesized that GWAS SNP effects on ND are reproducible and explain individual differences across multiple populations. Additionally, we hypothesized that GWAS effect sizes used to create each SBLUP-derived PGS would be partially confounded with other behaviors related to smoking risk, meaning the predictive ability of single-trait SBLUP-derived PGSs would overlap with other single-trait SBLUP-derived PGSs (i.e., pleiotropy). Thus, we predicted that a multi-trait SBLUP model would provide more robust effects and explain more variance in ND than single trait applications to date. Additionally, we examined whether the polygenic effects observed in a sample of cases and controls would generalize to a nationally representative sample of smokers with varying degrees of dependence severity.
Figure 1.
Conceptual framework of analyses characterizing the etiology of nicotine dependence.
Methods
Identification of Traits to Predict ND
To supplement measures of tobacco use, we conducted a multi-trait analysis using measures that are phenotypically and genetically associated with ND. This approach increases the strength of predictive scores for genetically complex behavioral phenotypes.26 We conducted a literature review to identify known co-occurring indicators of ND. We identified GWAS studies that met the following criteria: (1) The GWAS of the predictor must be conducted within the last five years (to minimize the possibility of cohort effects and genotyping array differences); (2) the predictor must have both a phenotypic and a genetic relationship to ND with at least a moderate effect size (r > 0.30)27–30; and (3) the GWAS of the predictor must have used a sample of at least 10 000 individuals of European Ancestry. These studies (additional details in Supplementary Material) were used to source discovery summary statistics.
Based on these criteria, PGSs were developed for eight phenotypes in addition to a measure of nicotine consumption (CPD): Alcohol Use Disorder-Consumption Score (AUDIT-C),31 Alcohol Use Disorder-Problem Use Score (AUDIT-P),31 Neuroticism,32 Schizophrenia,33 Depression,34 Body-Mass Index (BMI),35 Self- Perceived Risk-Taking,36 and Educational Attainment.37 Additionally, we used genome-wide association data from a study on height as a negative control to compare the effectiveness of our PGS in explaining variance in ND.35 Details of the samples used to create the discovery summary statistics for all predictors, results of the GWASs, and genetic relationships of the phenotypes to ND can be found in the Supplementary Material.
Target Sample Descriptions
GWAS summary data from the discovery summary statistics were applied to two independent target samples. These analyses have been approved by the Institutional Review Board (IRB00090295) at Emory University.
The first sample consisted of 6394 unrelated (relatedness cutoff of 0.05) individuals of European ancestry (46.90% female, mean age: 40.08 [s.d. = 10.43]) from pooled public use datasets that were obtained with permission from the database of Genotypes and Phenotypes (dbGaP). None of these data are included in the aforementioned discovery studies. Of individuals in the pooled dbGAP samples, 46.72% met the criteria for DSM-IV ND. The pooled sample comprised data from four dbGAP studies on which we have previously published38: The Study of Addiction: Genetics and Environment (SAGE; study accession phs000092.v1.p1), the Alcohol Dependence GWAS in European and African Americans (Yale Study; study accession phs000425.v1.p1), the Australian twin-family study of alcohol use disorder (OZ-ALC; study accession phs000181.v1.p1), and the GWAS of Heroin Dependence (Heroin GWAS study; study accession phs000277.v1.p1). Individual study descriptions related to the independent target sample are provided in the Supplementary Material. Each study collected DSM-IV criteria (coded as present or absent) of ND by using the Semi-Structured Assessment for the Genetics of Alcoholism (SAGE study), the adapted Semi-Structured Assessment for the Genetics of Alcoholism OZ (OZ-ALC study), or the Semi-Structured Assessment for Drug Dependence and Alcoholism (Yale Study, Heroin GWAS).
The second sample comprised 3061 unrelated individuals of European ancestry (51.32% female, mean age: 28.9 [s.d. = 1.70]) with genomic and phenotype data at Wave IV of the National Longitudinal Study of Adolescent to Adult Health (Add Health), a community-based, nationally representative sample; these data were not included in the aforementioned discovery GWAS. Additional information about the Add Health sample can be found at http://www.cpc.unc.edu/projects/addhealth/design. All participants provided written, informed consent for participation in all aspects of Add Health per the University of North Carolina School of Public Health Institutional Review Board guidelines (https://www.cpc.unc.edu/projects/addhealth/faqs/index.html#Was-informed-consent-required). Of the total sample of individuals who endorsed being current smokers, 50.76% would be considered minimally dependent (FTND symptom count 0–4, N = 1554), 31.95% would be considered moderately dependent (FTND symptom count 5–6, N = 978), and 17.28% would be considered highly dependent (FTND symptom count 7–10, N = 529). Analyses focused on the FTND severity score in order to maximize statistical power already afforded by the scale.
Genetic Imputation and Quality Control
Both target samples were genetically imputed and cleaned separately. Briefly, the data were imputed after strict selection for individuals of European Ancestry using ancestral principal components and multidimensional scaling.37 Genomic imputation was conducted using the HRC r1.1 2016 EUR reference panel and Eagle v2.4 phasing via the Michigan Imputation Server.39 After imputation, PLINK (version 1.9)40 was used to exclude low frequency variants (MAF < 0.01), multi-allelic variants, missingness per individual (i.e., >10%), missingness per variant (i.e., >10%), and variants that failed the Hardy-Weinberg Equilibrium test (i.e., H-W p < .001) (50). Following imputation and cleaning, Principal Component Analysis was conducted in order to derive the first 10 principal components (PCs) to control for any additional population stratification within the European Ancestry samples.
Estimation of SNP-heritability in Target Samples
Using Genome-wide Complex Trait Analysis (GCTA),41 we estimated the SNP heritability (h2SNP) of ND and FTND using GREML-LDMS, which corrects for LD bias. Analyses controlled for age, sex, the first 10 PCs, and study origin (i.e., for the pooled dbGAP datasets). In our pooled dbGAP samples, the SNP-heritability of DSM-IV ND diagnosis was estimated at 0.311 (SE = 0.096) and, in the Add Health sample, 0.060 (SE = 0.122) for FTND symptom count.
Development and Application of PGSs to Pooled Nicotine Dependence Datasets
Analyses employed all SNP summary statistics from the aforementioned discovery GWASs (i.e., not just those that met genome-wide significance [i.e., p < 5 × 10–8]) to minimize biasing the findings by poorly reflecting the polygenic nature of these phenotypes. Using SBLUP, we estimated linkage-disequilibrium- (LD-) adjusted effect sizes for all variants in common between our LD reference file (containing 503 individuals of European Ancestry from 1KG), our target sample, and each summary statistic file.41Supplementary Table S1 provides the number of variants used to calculate PGS for each trait, and for each sample (i.e., pooled dbGaP sample and Add Health sample). Supplementary Table S2 provides the percentage of variants in common between each PGS developed for the pooled dbGaP sample and the Add Health Sample. On average, 99.47% of variants used for the same trait PGS are in common between pooled dbGaP PGS and Add Health PGS. Polygenic scores were then calculated using the --score function in PLINK (version 1.90).40 All PGSs were standardized and observed to be normally distributed (i.e., Anderson-Darling normality p > .05).42 Descriptive statistics of the standardized PGS for each sample (pooled dbGaP and Add Health) are available in Supplementary Table S3; inter-PGS correlations within sample are presented in Supplementary Table S4.
Analyses focused on covariate-adjusted standardized residuals of ND and FTND. We examined the joint and conditional effect of these PGSs by including them as a set of predictors in a multiple regression model (i.e., Y = β 0 + β 1 [PGSCPD] + β 2 [PGSHeight] + β 3 [PGSAUDIT-C] + β 4 [PGSAUDIT-P] + β 5 [PGSDepression] + β 6 [PGSEDU] + β 7 [PGSNeuroticism] + β 8 [PGSBMI] + β 9 [PGSSchizophrenia] + β 10 [PGSSelf-Perceived Risk-Taking] + ε i). All models were fitted in MPlus (version 8)43 using full information maximum likelihood estimation. Variance explained was determined using the R-squared test statistic.
Results
Description of PGS Effects on ND
Table 1 shows the partial correlations between the PGS for each trait and the respective dependence phenotype for each dataset (i.e., DSM-IV ND and FTND), when only controlling for the covariates. Correlations between PGSs and DSM-IV ND diagnosis were modest and statistically significant. The strongest partial correlations observed between PGS and DSM-IV ND were with Educational Attainment (partial correlation = –0.128, SE = 0.012), CPD (partial correlation = 0.088, SE = 0.012), Neuroticism (partial correlation = 0.064, SE = 0.012), and Depression (partial correlation = 0.064, SE = 0.012). Partial correlations observed with FTND symptom count were smaller in magnitude and similar in direction, but were not statistically significant.
Table 1.
Partial Correlations of PGS and Nicotine Dependence (ND) Phenotypes
| DSM-IV ND diagnosis (pooled dbGAP samples) | FTND symptom count (Add health sample) | |||||
|---|---|---|---|---|---|---|
| PGS predictors | Partial correlation | 95% CI | p value | Partial correlation | 95% CI | p value |
| AUDIT-C | –0.011 | [–0.035,0.012] | .356 | 0.005 | [–0.036,0.048] | .808 |
| AUDIT-P | 0.041 | [0.016,0.064] | .001 | 0.036 | [–0.018,0.089] | .192 |
| BMI | 0.043 | [0.019,0.063] | <.001 | 0.024 | [–0.021,0.072] | .315 |
| CPD | 0.088 | [0.064,0.112] | <.001 | 0.016 | [–0.023,0.056] | .419 |
| Depression | 0.064 | [0.041,0.087] | <.001 | 0.020 | [–0.029,0.074] | .456 |
| Educational Attainment | –0.128 | [–0.151,–0.105] | <.001 | 0.004 | [–0.041,0.050] | .858 |
| Height | –0.001 | [–0.024,0.022] | .955 | –0.016 | [–0.060,0.026] | .460 |
| Neuroticism | 0.064 | [0.041,0.088] | <.001 | 0.021 | [–0.024,0.069] | .386 |
| Risk-Taking | 0.056 | [0.031,0.079] | <.001 | 0.009 | [–0.038,0.059] | .709 |
| Schizophrenia | 0.050 | [0.025,0.077] | <.001 | 0.010 | [–0.024,0.046] | .571 |
AUDIT-C, alcohol use disorders identification test consumption score; AUDIT-P, alcohol use disorders identification test problem score; BMI, body mass index; CI, confidence interval; CPD, cigarettes smoked per day; DSM-IV, diagnostic statistical manual of mental disorder (Version 4); FTND, fagerstrom test for nicotine dependence; ND, nicotine dependence.Table shows partial correlations of PGS with the respective Nicotine Dependence phenotype it predicts. In the pooled dbGAP samples, the models controlled for age, gender, study site, and the first 10 Principal Components (PCs). In Add Health, the models controlled for age, gender, and the first 10 PCs. Partial Correlations were the standardized estimates of the PGS predicting the ND phenotype in the model.
Multi-Trait Regression PGS Effects on ND
In the pooled dbGAP sample, the multivariate PGS model that included all PGSs accounted for 11.6% of the total estimated SNP heritability of DSM-IV ND diagnosis (multi-trait R2 = 0.036, p value ≤ .001). In the Add Health sample, the multivariate PGS model accounted for 6.7% of the total estimated SNP heritability of FTND (multi-trait R2 = 0.004, p value = .160). Excluding Height, AUDIT-Consumption, and Broad Depression, all other PGSs were significant predictors in the multi-trait model predicting DSM-IV ND (Table 2); notably, effect sizes were largely similar to those observed in the sub model that examined each PGS accounting for covariates (Table 1). This pattern of results and the modest associations across PGSs suggests that each of the PGSs add additional explanatory power by capturing shared and non-shared genetic aspects of the liability of ND. Polygenic scores for CPD, Self-Perceived Risk-Taking, Schizophrenia, and AUDIT-Problems were the most robust predictors of ND with higher scores indicating greater risk. On the contrary, higher Educational Attainment PGSs were associated with a lower likelihood of being diagnosed with ND (EDU—standardized β = –0.125, p < .001, 95% CI [–0.149,–0.101] (see Table 2 for full multi-trait regression results).
Table 2.
Results of Multi-Trait PGS Models Predicting DSM-IV ND and FTND
| DSM-IV ND diagnosis (pooled dbGAP samples) | FTND symptom count (Add health sample) | |||||
|---|---|---|---|---|---|---|
| PGS predictors | β | 95% CI | p value | β | 95% CI | p value |
| AUDIT-C | –0.012 | [–0.040,0.015] | .393 | –0.019 | [–0.070,0.033] | .483 |
| AUDIT-P | 0.037 | [0.010,0.065] | .008 | 0.048 | [–0.013,0.110] | .125 |
| BMI | 0.027 | [0.002,0.052] | .031 | 0.024 | [–0.028,0.075] | .373 |
| CPD | 0.071 | [0.047,0.095] | <.001 | 0.015 | [–0.024,0.054] | .442 |
| Depression | 0.027 | [0.000,0.054] | .054 | 0.004 | [–0.059,0.067] | .906 |
| Educational attainment | –0.125 | [–0.149,–0.101] | <.001 | 0.016 | [–0.036,0.067] | .554 |
| Height | 0.000 | [–0.024,0.023] | .997 | –0.022 | [–0.074,0.030] | .412 |
| Neuroticism | 0.028 | [0.000,0.055] | .049 | 0.018 | [–0.037,0.072] | .529 |
| Risk-taking | 0.051 | [0.026,0.075] | <.001 | 0.004 | [–0.056,0.064] | .898 |
| Schizophrenia | 0.036 | [0.012,0.061] | .005 | 0.012 | [–0.023,0.047] | .501 |
AUDIT-C, alcohol use disorders identification test consumption score; AUDIT-P, alcohol use disorders identification test problem score; BMI, body mass index; CI, confidence interval; CPD, cigarettes smoked per day; DSM-IV, diagnostic statistical manual of mental disorder (Version 4); FTND, fagerstrom test for nicotine dependence; ND, nicotine dependence.Table shows results from multi-trait regressions using all PGS as predictors for ND phenotypes. Standardized estimates are provided along with 95% confidence intervals (CI).
The aforementioned PGS effects and trends were not observed on FTND. A comparison of the PGS distribution of both samples (Supplementary Figures 1–20) suggested similar variability in genetic risk. Using ANOVA, we compared the unstandardized PGSs between DSM-IV ND cases and controls, as well as between Add Health to DSM-IV ND cases and controls from dbGAP, separately (see Supplementary Tables S5 and S6, respectively). With the exception of AUDIT-C, which was not a significant predictor of ND, and height, which was our negative control, PGS means within cases were higher than controls. For the educational attainment PGS, which was the sole negative predictor of ND, PGS means within cases were lower than controls. This suggests that as a group, DSM-IV ND cases were at elevated risk along these dimensions, with the exception being Educational Attainment. Comparison of the Add Health PGS to both the case group and the control group in our target sample revealed that for AUDIT-C, Depression, Neuroticism, Risk-Taking, and Schizophrenia, the Add Health sample PGS means were significantly lower compared to the pooled dbGaP cases and control groups. For AUDIT-P and CPD PGS, Add Health scores were significantly less than the case group in our target sample, but not significantly different from the control group in our target sample. For BMI and Educational Attainment Add Health scores were significantly greater than both pooled dbGaP case and control groups. Height scores in Add Health were not significantly different from either pooled dbGaP case or control groups.
Discussion
To date, the application of singular PGSs to understand risk for nicotine dependence in a target sample have only resulted in a small percentage of genetic variance being accounted for (i.e., ~1%). The current study demonstrates the utility of leveraging multiple traits related to tobacco addiction to (1) build a better predictive model tobacco use disorder, and (2) identify the most robust polygenic indicators of risk, whose biology may shed further light on the underpinnings of ND. Our findings show that ND is genetically complex, with evidence for polygenic, pleiotropic, and cumulative genetic liability across PGSs.
Polygenic effects from our conceptual framework shared very little in common, suggesting that each of these differentially optimized PGSs’ largely indexed unique genetic effects related to ND. This was evidenced by the modest differences in effect sizes in our regression models between our dependence phenotypes and each SBLUP-derived PGSs across our regression models. It should be noted that the total variance explained by the multi-trait models reflects both shared and non-shared effects across the PGSs on ND. By employing a multiple regression framework, we describe the relative influences of these PGSs rather than focusing on common effects alone via a dimension reduction method, such as principal components that provides ambiguous genetic scores. These findings align with recent multivariate approaches (e.g., MTAG), which emphasize the potential to increase predictive power using multiple heritable traits as risk indicators. It is plausible that, with more powerful Discovery GWAS for smoking-related behaviors and pleiotropic traits, future expansions of this model could explore clusters of polygenic risk. For example, the addition of anxiety related traits would likely provide additional associations with neuroticism as suggested by preliminary evidence.44 Likewise, the inclusion of PGS based on deep phenotypes that complement our robust PGS associations (i.e., CPD, Educational Attainment, and Risk-Taking), such as nicotine metabolism, common executive functioning, and impulsive choice, may provide insights into additional mechanisms.
Although we could not control for all forms of possible substance involvement (i.e., cannabis, cocaine, and opioid), we targeted the most predominantly co-used/problematic substance in the poly-substance literature: genetic risk for alcoholism (via AUDIT-C and AUDIT-P). In doing so, we accounted for genetic contributions to generalized substance use, which previous work has shown to be highly correlated with alcohol use disorder (AUD).45 Several other studies support our findings of a shared genetic liability for AUDIT-P and ND.6,15,46 For example, a recent study conducted by Kranzler et al.46 found a negative genetic correlation between AUD and quitting smoking, which indirectly supports our finding that AUDIT-P and ND/TUD share a genetic relationship.46 Similarly, Walters et al.’s recent GWAS of alcohol dependence found significant genetic correlations between alcohol dependence and smoking initiation (rg = 0.708, s.e. = 0.134, p = 1.3 × 10–7) and lifetime cannabis initiation (rg = 0.793, s.e. = 0.217, p = 2.5 × 10–4), also suggesting a shared genetic liability between these three substances, and that alcohol reflects shared genetic risk.47
These results should be interpreted in relation to the study’s limitations, which are attributable to the characteristics of the available large-scale data sets and reflect the compromises and biases inherent in sample collection and aggregation for GWAS. First, rather than a symptom count of ND, our analyses focused on ND diagnosis as a binary phenotype due to data access constraints across studies, but also a tendency in the psychiatric genetics field to leverage the convenience of clinical outcomes. A quantitative phenotype would have provided greater statistical power and comparability to the FTND score employed in the Add Health sample. Moreover, it is unclear to what extent, phenotypic heterogeneity in DSM-IV ND impacted these findings, as our models assume that individuals with three criteria or more are equivalent to those with seven and that genetic risk is systematically greater in cases relative to controls who are often subsyndromal. Additionally, inherent differences in the measurements of nicotine dependence between the two samples—DSM-IV focusing more on affective symptoms of nicotine use and negative outcomes due to nicotine use and FTND focusing more on compulsivity and heaviness of nicotine use—may account for differences in the predictive value of the PGS within each sample. This may suggest that there might be greater diversity in the genetic underpinnings related to the complexities of nicotine dependence as a construct and the subtleties that each dependence measurement captures. Further research is needed to investigate this relationship.
It is also important to note that the differences between the AddHealth and pooled dbGAP samples may have limited our ability to detect genetic effects on FTND, which we have shown to be highly correlated with DSM-IV ND.48 First the AddHealth sample was less than half the size of the pooled set of cases and controls resulting in reduced power to detect the marginal effect sizes, all of which were in the same direction as the effects in dbGAP samples. Second, additive genetic effects on FTND symptom count (h2SNP = 0.06) were modest compared to DSM-IV ND, which could have resulted from a number of factors, including (1) reduced genetic risk in Add Health, which we ruled out by post-hoc comparison of the PGS distributions, (2) a lower incidence of nicotine dependence, which is plausible given that AddHealth participants are much younger and may not yet have had the opportunity to express their proclivities, (3) varying combinations of GxE or rGE between the two samples that could also explain why, despite similarities in PGS distributions the presentation of smoking behaviors was lower in AddHealth, and (4) the nature of recruiting individuals via community sampling rather than ascertaining cases with matched controls results in a large number of controls with low symptom counts. Together, these factors could have affected the power to capture genetic effects on FTND in emerging adulthood thus necessitating the need for further studies to confirm how each of these factors affect the application of smoking-related PGS.
Also related to ND phenotyping, we recognize that DSM-IV diagnosis does not reflect the varied expression of ND captured by a broader severity dimension, such as that in the DSM-5. Such analyses were not possible due to the absence of abuse criteria for ND in the DSM-IV and limited assessments of craving in past studies. Studies have underscored the need for multiple assessments of tobacco-related problems, such as the FTND, which assesses problems, craving, and heaviness of use. Using the FTND as a measure of ND, two recent GWASs conducted by Hancock et al. identified novel variants (rs910083-C within the DNMT3B gene and rs2273500-C within CHRNA4) associated with ND, demonstrating FTND as a valid measure to discover variants that influence response to nicotine and risk for developing ND.21 Donny and Dierker illustrated this issue with DSM-IV ND diagnosis by pointing out that 37.7% of smokers who smoke ≥10 cigarettes a day fail to meet DSM-IV criteria.49 The unidimensional measures (DSM-IV, FTND/FTCD) and the multidimensional measures (NDSS and WISDM) all tap into the many aspects of dependence.50 Similarly, we recently showed moderate genetic overlap between FTND and DSM symptom dimensions of ND (rDSM-FTND = 0.545 [95% CI: 0.50 to 0.60]; rg-SNP = 1.00 (SE = 0.084), suggesting that other smoking phenotypes reflect sensitivity to tobacco or nicotine that may be reflected in these PGSs.48 Given the moderate heritability of ND in our sample (estimated SNP h2 = 38.6%) and the polygenic heterogeneity in pathways contributing to ND, it is likely that larger GWAS discovery samples of individual smoking behaviors will be able to further detect effect sizes of variants contributing to ND.
Lastly, it should be noted that the model in Figure 1 is limited to behaviors or traits that are well-powered for GWAS with available summary statistics for use in individuals of European Ancestry. Furthermore, our discovery samples were comprised of individuals of European descent, meaning our PGS results cannot be accurately applied to other ancestral groups. In the future, the use of more predictors related to ND will likely lead to more variance in ND accounted for by multi-trait PGS models.
Overall, this study further supports the utility of incorporating known genetic correlates in predictive models of ND to achieve more robust risk prediction. Utilizing a multi-trait polygenic method to predict ND explains almost four times the amount of phenotypic variance as a single trait PGS, and about one tenth of the total estimated genetic liability of DSM-IV ND. Future research should consider expanding to other behaviors as GWASs approach sizes that are adequately powered (e.g., depression and anxiety) as well as smoking related phenotypes (e.g., serum cotinine level).51
Supplementary Material
A Contributorship Form detailing each author’s specific involvement with this content, as well as any supplementary data, are available online at https://academic.oup.com/ntr.
Acknowledgments
The authors would like to thank Drs. David Weinshenker (Emory University), Leslie Brick (Brown University), and Jian Yang (University of Queensland) for their intellectual contributions to the interpretation of these findings. Add Health is directed by Robert A. Hummer and funded by the National Institute on Aging cooperative agreements U01 AG071448 (Hummer) and U01 AG071450 (Aiello and Hummer) at the University of North Carolina at Chapel Hill. Waves I-V data are from the Add Health Program Project, grant P01 HD31921 (Harris) from Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), with cooperative funding from 23 other federal agencies and foundations. Add Health was designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill.
Funding
This work was supported by grants from the National Institute on Drug Abuse R01DA042742 (awarded to Dr. Rohan Palmer). Dr. Jaakko Kaprio is supported by the Academy of Finland (grant 312073). The contents do not represent the views of the U.S. Department of Veterans Affairs or the United States Government. NIDA and Emory University had no role in the study design, collection, analysis, or interpretation of the data, writing the manuscript, or the decision to submit the paper for publication. Funding support for the SAGE study was supported by NIH and NHLBI grant # R01HL117004; study enrollment supported by the Sandler Family Foundation, the American Asthma Foundation, the RWJF Amos Medical Faculty Development Program, Harry Wm. and Diana V. Hind Distinguished Professor in Pharmaceutical Sciences II. Funding support for the OZ-ALC GWAS was provided through the Center for Inherited Disease Research (CIDR) and the National Institute on Alcohol Abuse and Alcoholism (NIAAA). CIDR-OZ-ALC GWAS was funded as part of the NIAAA grant 5 R01 AA013320–04. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by the CIDROZ-ALC GWAS. Assistance with data cleaning was provided by the National Center for Biotechnology Information. Support for collection of OZ-ALC data was provided by the MARC: Risk Mechanisms in Alcoholism and Comorbidity (MARC; P60 AA011998–11). Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited Disease Research, was provided by the NIH GEI (U01HG004438), the National Institute on Alcohol Abuse and Alcoholism, and the NIH contract “High throughput genotyping for studying the genetic contributions to human disease” (HHSN268200782096C). Funding support for the Yale Study was provided through the Center for Inherited Disease Research (CIDR) and the Genetics of Alcohol Dependence in American Populations (CIDR-Gelernter Study). CIDR-Gelernter Study is a genome-wide association studies funded as part of the Genetics of Alcohol Dependence in American Populations. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by the Genetics of Alcohol Dependence in American Populations. Assistance with data cleaning was provided by the National Center for Biotechnology Information. Cleared 1/4/10 Gelernter: Whole Genome Association (CIDR-Gelernter Study) Dataset January 21, 2010 version Page 5 of 7 Support for collection of datasets and samples were provided by the Genetics of Alcohol Dependence in American Populations (R01 AA011330). Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited Disease Research, was provided by the NIH GEI (U01HG004438), the National Institute on Alcohol Abuse and Alcoholism, and the NIH contract “High throughput genotyping for studying the genetic contributions to human disease” (HHSN268200782096C). The datasets used for the analyses described in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gap through dbGaP accession number [phs000425].Finally, funding support for the GWAS of Heroin Dependence was provided by R01DA17305.
Declaration of Interests
The authors have no conflicts to declare.
References
- 1. Warren GW, Alberg AJ, Kraft AS, Cummings KM. The 2014 Surgeon General’s report: “the health consequences of smoking – 50 years of progress”: a paradigm shift in cancer care. Cancer. 2014;120(13):1914–1916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Xu X, Bishop EE, Kennedy SM, Simpson SA, Pechacek TF. Annual healthcare spending attributable to cigarette smoking: An update. Am J Prev Med. 2015;48(3):326–333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Benowitz NL. Pharmacology of nicotine: Addiction, smoking-induced disease, and therapeutics. Annu Rev Pharmacol Toxicol. 2009;49:57–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Zyphur MJ, Zhang Z, Barsky AP, Li W-D. An ACE in the hole: Twin family models for applied behavioral genetics research. The Leadership Quarterly. 2013;24(4):572–594. doi: 10.1016/j.leaqua.2013.04.001. [DOI] [Google Scholar]
- 5. Sullivan PF, Purcell S. Analyzing genome-wide association study data: a tutorial using PLINK. In: Neale B, Ferreira M, Medland S, Posthuma D, eds. Statistical Genetics: Gene Mapping Through Linkage and Association. New York, NY: Taylor & Francis Group; 2007:355–360. [Google Scholar]
- 6. Liu M, Jiang Y, Wedow R, et al. Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat Genet. 2019;51(2):237–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Bierut LJ, Madden PA, Breslau N, et al. Novel genes identified in a high-density genome wide association study for nicotine dependence. Hum Mol Genet. 2007;16(1):24–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Bogdan R, Baranger DAA, Agrawal A. Polygenic risk scores in clinical psychology: bridging genomic risk to individual differences. Annu Rev Clin Psychol. 2018;14:1,119–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Otto JM, Gizer IR, Bizon C, Wilhelmsen KC, Ehlers CL. Polygenic risk scores for cigarettes smoked per day do not generalize to a Native American population. Drug Alcohol Depend. 2016;167:95–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Chen LS, Hartz SM, Baker TB, Ma Y, Saccone NL, Bierut LJ. Use of polygenic risk scores of nicotine metabolism in predicting smoking behaviors. Pharmacogenomics. 2018;19(18):1383–1394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Belsky DW, Moffitt TE, Baker TB, et al. Polygenic risk and the developmental progression to heavy, persistent smoking and nicotine dependence: Evidence from a 4-decade longitudinal study. JAMA Psychiatry. 2013;70(5):534–542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Chang LH, Couvy-Duchesne B, Liu M, et al. ; GSCAN Consortium . Association between polygenic risk for tobacco or alcohol consumption and liability to licit and illicit substance use in young Australian adults. Drug Alcohol Depend. 2019;197:271–279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Stevens VL, Jacobs EJ, Gapstur SM, et al. Evaluation of a novel difficulty of smoking cessation phenotype based on number of quit attempts. Nicotine Tob Res. 2017;19(4):435–441. [DOI] [PubMed] [Google Scholar]
- 14. Allegrini AG, Verweij KJH, Abdellaoui A, et al. Genetic vulnerability for smoking and cannabis use: Associations with e-cigarette and water pipe use. Nicotine Tob Res. 2019;21(6):723–730. [DOI] [PubMed] [Google Scholar]
- 15. Vink JM, Hottenga JJ, de Geus EJ, et al. Polygenic risk scores for smoking: predictors for alcohol and cannabis use? Addiction. 2014;109(7):1141–1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Musci RJ, Uhl G, Maher B, Ialongo NS. Testing gene × environment moderation of tobacco and marijuana use trajectories in adolescence and young adulthood. J Consult Clin Psychol. 2015;83(5):866–874. [DOI] [PubMed] [Google Scholar]
- 17. Marees AT, Hammerschlag AR, Bastarache L, et al. Exploring the role of low-frequency and rare exonic variants in alcohol and tobacco use. Drug Alcohol Depend. 2018;188:94–101. [DOI] [PubMed] [Google Scholar]
- 18. Young AI, Benonisdottir S, Przeworski M, Kong A. Deconstructing the sources of genotype-phenotype associations in humans. Science. 2019;365(6460):1396–1400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. The Brainstorm Consortium, Anttila V, Bulik-Sullivan B, et al. Analysis of shared heritability in common disorders of the brain. Science. 2018;360(6395):eaap8757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Vrieze SI, McGue M, Iacono WG. The interplay of genes and adolescent development in substance use disorders: Leveraging findings from GWAS meta-analyses to test developmental hypotheses about nicotine consumption. Hum Genet. 2012;131(6):791–801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Hancock DB, Guo Y, Reginsson GW, et al. Genome-wide association study across European and African American ancestries identifies a SNP in DNMT3B contributing to nicotine dependence. Mol Psychiatry. 2018;23(9):1911–1919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Quach BC, Bray MJ, Gaddis NC, et al. Expanding the genetic architecture of nicotine dependence and its shared genetics with multiple traits. Nat Commun. 2020;11(1):5562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Mackillop J, Obasi E, Amlung MT, McGeary JE, Knopik VS. The role of genetics in nicotine dependence: mapping the pathways from genome to syndrome. Curr Cardiovasc Risk Rep. 2010;4(6):446–453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Krapohl E, Patel H, Newhouse S, et al. Multi-polygenic score approach to trait prediction. Mol Psychiatry. 2018;23(5):1368–1374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Maier RM, Zhu Z, Lee SH, et al. Improving genetic prediction by leveraging genetic correlations among human diseases and traits. Nat Commun. 2018;9(1):989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Turley P, Walters RK, Maghzian O, et al. ; 23andMe Research Team; Social Science Genetic Association Consortium . Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat Genet. 2018;50(2):229–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Zvolensky MJ, Taha F, Bono A, Goodwin RD. Big five personality factors and cigarette smoking: A 10-year study among US adults. J Psychiatr Res. 2015;63:91–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Ittermann T, Thamm M, Schipf S, John U, Rettig R, Volzke H. Relationship of smoking and/or passive exposure to tobacco smoke on the association between serum thyrotropin and body mass index in large groups of adolescents and children. Thyroid. 2013;23(3):262–268. [DOI] [PubMed] [Google Scholar]
- 29. Breslau N, Kilbey MM, Andreski P. Nicotine dependence and major depression - new evidence from a prospective investigation. Arch Gen Psychiat. 1993;50(1):31–35. [DOI] [PubMed] [Google Scholar]
- 30. Kheradmand A, Ziaaddini H, Vahabi M. Prevalence of cigarette smoking in schizophrenic patients compared to other hospital admitted psychiatric patients. Addict Health. 2011;1(1):38–42. [PMC free article] [PubMed] [Google Scholar]
- 31. Sanchez-Roige S, Palmer AA, Fontanillas P, et al. ; 23andMe Research Team, the Substance Use Disorder Working Group of the Psychiatric Genomics Consortium . Genome-wide association study meta-analysis of the Alcohol Use Disorders Identification Test (AUDIT) in two population-based cohorts. Am J Psychiatry. 2019;176(2):107–118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Nagel M, Watanabe K, Stringer S, Posthuma D, van der Sluis S. Item-level analyses reveal genetic heterogeneity in neuroticism. Nat Commun. 2018;9(1):905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511(7510):421–427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Howard DM, Adams MJ, Shirali M, et al. Genome-wide association study of depression phenotypes in UK Biobank identifies variants in excitatory synaptic pathways. Nat Commun. 2018;9(1):1470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Yengo L, Sidorenko J, Kemper KE, et al. ; GIANT Consortium . Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum Mol Genet. 2018;27(20):3641–3649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Karlsson Linnér R, Biroli P, Kong E, et al. ; 23and Me Research Team; eQTLgen Consortium; International Cannabis Consortium; Social Science Genetic Association Consortium . Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences. Nat Genet. 2019;51(2):245–257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Lee JJ, Wedow R, Okbay A, et al. ; 23andMe Research Team; COGENT (Cognitive Genomics Consortium); Social Science Genetic Association Consortium . Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat Genet. 2018;50(8):1112–1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Brick LA, Keller MC, Knopik VS, McGeary JE, Palmer RHC. Shared additive genetic variation for alcohol dependence among subjects of African and European ancestry. Addict Biol. 2019;24(1):132–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Das S, Forer L, Schönherr S, et al. Next-generation genotype imputation service and methods. Nat Genet. 2016;48(10):1284–1287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Purcell S, Neale B, Todd-Brown K, et al. PLINK: A toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet. 2007;81(3):559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: A tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88(1):76–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Stephens M. EDF statistics for goodness of fit and some comparisons. J Am Stat Assoc. 1974;69:730–737. [Google Scholar]
- 43. Muthén LK, Muthén BO.. Mplus User’s Guide. 8th ed. Los Angeles, CA: Muthén & Muthén; 2017. [Google Scholar]
- 44. Purves KL, Coleman JRI, Meier SM, et al. A major role for common genetic variation in anxiety disorders. Mol Psychiatry. 2020;25(12):3292–3303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Palmer, Button TM, Rhee SH, et al. Genetic etiology of the common liability to drug dependence: Evidence of common and specific mechanisms for DSM-IV dependence symptoms. Drug Alcohol Depend. 2012;123(suppl 1):S24–S32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Kranzler HR, Zhou H, Kember RL, et al. Genome-wide association study of alcohol consumption and use disorder in 274,424 individuals from multiple populations. Nat Commun.. 2019;10(1):1499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Walters RK, Polimanti R, Johnson EC, et al. ; 23andMe Research Team . Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nat Neurosci. 2018;21(12):1656–1669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Bidwell LC, Palmer RH, Brick L, McGeary JE, Knopik VS. Genome-wide single nucleotide polymorphism heritability of nicotine dependence as a multidimensional phenotype. Psychol Med. 2016;46(10):2059–2069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Donny EC, Dierker LC. The absence of DSM-IV nicotine dependence in moderate-to-heavy daily smokers. Drug Alcohol Depend. 2007;89(1):93–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Piper ME, McCarthy DE, Bolt DM, et al. Assessing dimensions of nicotine dependence: An evaluation of the Nicotine Dependence Syndrome Scale (NDSS) and the Wisconsin Inventory of Smoking Dependence Motives (WISDM). Nicotine Tob Res. 2008;10(6):1009–1020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Ware JJ, Chen X, Vink J, et al. Genome-wide meta-analysis of cotinine levels in cigarette smokers identifies locus at 4q13.2. Sci Rep. 2016;6:20092. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

