Skip to main content
Nicotine & Tobacco Research logoLink to Nicotine & Tobacco Research
. 2020 Nov 9;23(6):1055–1063. doi: 10.1093/ntr/ntaa229

Little Evidence of Modified Genetic Effect of rs16969968 on Heavy Smoking Based on Age of Onset of Smoking

Christine Adjangba 1,, Richard Border 1,2,, Pamela N Romero Villela 1,3, Marissa A Ehringer 1,4, Luke M Evans 1,5,
PMCID: PMC8150133  PMID: 33165565

Abstract

Introduction

Tobacco smoking is the leading cause of preventable death globally. Smoking quantity, measured in cigarettes per day, is influenced both by the age of onset of regular smoking (AOS) and by genetic factors, including a strong effect of the nonsynonymous single-nucleotide polymorphism rs16969968. A previous study by Hartz et al. reported an interaction between these two factors, whereby rs16969968 risk allele carriers who started smoking earlier showed increased risk for heavy smoking compared with those who started later. This finding has yet to be replicated in a large, independent sample.

Methods

We performed a preregistered, direct replication attempt of the rs16969968 × AOS interaction on smoking quantity in 128 383 unrelated individuals from the UK Biobank, meta-analyzed across ancestry groups. We fit statistical association models mirroring the original publication as well as formal interaction tests on multiple phenotypic and analytical scales.

Results

We replicated the main effects of rs16969968 and AOS on cigarettes per day but failed to replicate the interaction using previous methods. Nominal significance of the rs16969968 × AOS interaction term depended strongly on the scale of analysis and the particular phenotype, as did associations stratified by early/late AOS. No interaction tests passed genome-wide correction (α = 5e-8), and all estimated interaction effect sizes were much smaller in magnitude than previous estimates.

Conclusions

We failed to replicate the strong rs16969968 × AOS interaction effect previously reported. If such gene–moderator interactions influence complex traits, they likely depend on scale of measurement, and current biobanks lack the power to detect significant genome-wide associations given the minute effect sizes expected.

Implications

We failed to replicate the strong rs16969968 × AOS interaction effect on smoking quantity previously reported. If such gene–moderator interactions influence complex traits, current biobanks lack the power to detect significant genome-wide associations given the minute effect sizes expected. Furthermore, many potential interaction effects are likely to depend on the scale of measurement employed.

Introduction

Approximately 20% of deaths every year in the United States can be attributed to cigarette smoking, and smokers have life expectancies at least 10 years shorter than nonsmokers.1 There is strong evidence from adoption, family, and twin studies that both genetic and environmental factors contribute to risk for smoking behaviors, with heritability estimates for nicotine dependence, ever becoming a regular smoker, and smoking quantity ranging between 33% and 71%.2–6 Recently, genome-wide association studies (GWAS) have identified common variants associated with smoking.7–12 In particular, the nicotinic acetylcholine receptor subunit genes CHRNA5-CHRNA3-CHRNB4 on chromosome 15 have been implicated by well-powered GWAS of smoking behaviors.11,13,14 Within CHRNA5, which codes for the α5 receptor subunit, the nonsynonymous G/A single-nucleotide polymorphism (SNP) rs16969968 has been replicated through both large-scale GWAS7–12,15 and functional assays16–19 to influence smoking quantity, as measured by the number of cigarettes smoked per day, and nicotine dependence. The rs16969968-A risk allele has the largest estimated allelic effect on smoking quantity known to date.11 Although GWAS have identified many additional smoking-associated variants, rs16969968 remains a focus of individual functional studies and genetic epidemiological studies, with 292 publications reporting analyses of rs16969968 indexed by dbSNP (https://www.ncbi.nlm.nih.gov/snp/rs16969968#publications) and 454 publications (198 within the last 5 years) listed on LitVar (www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/LitVar), at the time of this writing.

In addition to genetic risk factors for heavy smoking, earlier age at onset of regular smoking (AOS) is well known to predict risk for later heavy use and nicotine dependence.20,21 In light of previous findings, Hartz et al.12 conducted a meta-analysis of 33 348 individuals across 43 European and American data sets to test whether genetic vulnerability to heavy smoking and nicotine dependence at rs16969968 depends on AOS and found a strong, significant interaction between early AOS and the rs16969968-A allele on heavy smoking (odds ratio [OR] = 1.16). Additional studies22,23 have focused on rs16969968 interactions with other variables, highlighting the continued interest in rs16969968 interactions on behavior. Notably, these include those evaluating rs16969968 × age of nicotine exposure,19,24 and include a report that early intervention to prevent adolescent smoking reduces the genetic risk of rs16969968 for heavy smoking later in life, a gene × intervention interaction.25 The original finding of rs16969968 × AOS has been referenced in reviews citing the need for evaluation of gene × environment (G × E) interaction effects on nicotine dependence26 and suggesting that direct replication of methods is needed to rigorously evaluate G × E interactions on smoking.27 However, despite the large rs16969968 × AOS interaction effect size originally reported, animal model evidence to support the plausibility of such an interaction,19 and continued interest in rs16969968, we are aware of no large-scale replication attempt in an independent sample. Here, we assessed whether there is an rs16969968 × age of onset of smoking interaction in a well powered (Supplementary Figure S1), independent sample, in an attempt to directly replicate the original findings.

Methods

We preregistered our analyses through the Open Science Framework (osf.io/ynh2j) after we had obtained the UK Biobank data, but before we analyzed cigarettes per day (CPD) or AOS.

Study Population

We used the UK Biobank, a large sample with rich phenotype and genome-wide genotype data.28 We included all participants with available genomic data who had reported CPD and AOS data. The participants were either current or former smokers aged 40 years or older. To avoid confounding influences of population stratification,29 we initially, and following our preregistration, performed analyses using only individuals of European (EUR) ancestry, the largest subsample within the UK Biobank, identified by those whose first scores on the first four principal components (UK Biobank data field ID 22009) fell within the range of the UK Biobank identified individuals of EUR ancestry (field 22006). Following this, we expanded our analysis to include all available individuals within the UK Biobank. We identified relatively genetically homogeneous groups of individuals within the UK Biobank after excluding the EUR-ancestry individuals noted above using K-means clustering, from K = 2–10, applied to the first 10 principal component axes (field 22009). The percent variance explained plateau at K = 10 clusters (Supplementary Figure S2). All analyses were subsequently performed within these 11 genetic clusters (EUR ancestry + K = 10 clusters). We note that the purpose of this clustering was solely to identify relatively genetically homogeneous groups of individuals within which to perform association analyses, and not to make population genetic inferences.

We only included unrelated individuals in our primary analyses to avoid possible confounding due to shared environmental factors. Relatedness was estimated within each genetic cluster using MAF- and LD-pruned array markers (plink230 command: --maf 0.01 --hwe 1e-8 --indep-pairwise 50 5 0.2) after excluding those individuals with self-report and genetic sex mismatch (fields 31 and 22001), those with unusually high inbreeding coefficients (|Fhet| > 0.2), and those identified by the UK Biobank and Affymetrix as having poor-quality genomic data (fields 220010 and 22051). Unrelated individuals (estimated relatedness < 0.05) were identified with GCTA31 v1.91.3 within each cluster. After removing individuals with missing phenotype and covariate data (see below), a total of 128 383 unrelated individuals across all genetic clusters were included.

Variables

Smoking quantity, as measured by CPD, was the primary dependent variable in analyses. Data on CPD (fields 2887, 3456, and 6183) were obtained from current or former smokers by asking the following question: “About how many cigarettes do/did you smoke on average each day?” The dichotomous encoding defined smoking quantity as light smoking (CPD ≤ 10) versus heavy smoking (CPD > 20), mirroring the definition used by Hartz et al. Because this excluded intermediate quantity smokers, the sample size was reduced for this phenotype, to 61 077 individuals. The binned encoding defined smoking quantity as a linear variable consisting of 0 (CPD ≤ 10), 1 (CPD = 11–20), 2 (CPD = 21–30), or 3 (CPD > 30), also matching their secondary analysis. CPD data were highly skewed; therefore, we also analyzed log10-transformed CPD (Supplementary Figure S3). Because of observed evidence of scale dependence32 (see Results), we also analyzed heavy/light CPD on an additive scale. These two additional procedures were the only deviations from our preregistered analyses. Final analyses considered untransformed CPD, log10(CPD), heavy/light (analyzed on both multiplicative, ie, logistic and matching the original report, and additive scales), and binned encodings. Age of onset of regular smoking (AOS) was determined from fields 3426 and 2867, where participants were asked the following question: “How old were you when you first started smoking on most days?” AOS was analyzed based on a dichotomous encoding, a binned encoding, and the raw AOS data, again replicating the methods of Hartz et al.12 The dichotomous encoding defined early as AOS ≤ 16 years and late as AOS > 16 years. The binned encodings were as follows: 0 (AOS ≤ 15 years), 1 (AOS = 16 years), 2 (AOS = 17–18 years), or 3 (AOS > 18 years). We note that, matching Hartz et al., the median AOS was 16 in the UK Biobank (Supplementary Figure S3), making a reasonable and comparable age at which to separate early versus late initiating smokers.

Covariates included were sex (field 31), age at time of assessment (field 21003), age2, Townsend Deprivation Index (field 21003), educational attainment (“qualification,” categorical, field 6138), genotyping batch (field 22000), assessment center (field 54), and the first 10 genetic principal components as estimated with flashpca33 applied to the MAF- and LD-pruned SNPs as described above. Townsend Deprivation Index and educational attainment were included because socioeconomic factors are correlated with smoking phenotypes.34 High collinearity of covariates within this sample resulted in a rank-deficient design matrix, which we addressed by performing a principal components analysis of the c = 141 fixed effects (the total number of columns in the regression design matrix, when including all levels of all categorical effects and continuous covariates) using the prcomp function in R v3.2.235. We then estimated the rank of the resulting eigenvector matrix (rank r < c) using the matrix R package36 and included the first r = 140 principal components as covariates in all analyses.

Statistical Analyses

All statistical analyses were performed within each genetic ancestry cluster separately.

To directly replicate previous methods, we performed AOS-stratified logistic regression using glm (family = “binomial”) in R35 of heavy/light CPD using only rs16969968 and sex as independent variables. As the original used only sex as covariate, this direct replication test only included sex as a covariate.

For dichotomized light/heavy CPD, we performed logistic regression using glm (family = “binomial”) in R35 to assess the multiplicative scale interaction. The model included the rs16969968 genotype (coded as 0, 1, or 2), AOS, and rs16969968 × AOS. All genotype × covariate and AOS × covariate interactions were included within the models to appropriately control for confounding.37 For continuous variables (raw and log-transformed CPD, and binned, noting that the binned variable is not strictly normally distributed, but matching previous analyses11,12) and the additive scale interaction model of the dichotomous heavy/light phenotype, we tested the same model using linear regression with the R lm function. Because many of the non-EUR-ancestry clusters had relatively few unrelated individuals within them, including all 140 covariates and their interactions resulted in a model that could not be fitted. We therefore reduced the number of covariates to be the scores from the first five principal component scores of the covariate design matrix for the K = 10 non-EUR-ancestry clusters.

The above interaction models varied from that tested by Hartz et al., who tested rs16969968 effects on smoking phenotypes stratified by AOS (early vs late), using logistic regression (ie, multiplicative scale). To recapitulate the stratified approach with updated methods to improve statistical power, we performed secondary association tests of rs16969968 stratified by early versus late AOS using BOLT-LMM v2.3.2,38 with 339 444 genome-wide SNPs (quality control as described above, but without LD pruning) to control for cryptic relatedness. All covariates described above (except interaction terms) were included in the BOLT–LMM models, excluding interaction terms. Because BOLT–LMM is not recommended for samples of less than 5000 (see documentation from ref. 38), we used GCTA leave-one-chromosome-out (--mlma-loco) approach39 for the non-EUR-ancestry genetic clusters.

We meta-analyzed the results using the inverse variance weighting approach in METAL.40 We report meta-analyzed results below, and all cluster-specific results in Supplementary Material.

We also performed several power analyses, to determine the power to detect the previously reported effect size,12 as well as to determine the sample size needed to achieve 80% power at specified effect sizes and α. To estimate the power to detect the previously reported effect size in the UK Biobank sample under a multiplicative scale interaction model, we simulated 61 077 diploid genotypes (matching the sample size for heavy/light CPD phenotypes, as described above) and early/late AOS in R, with linear predictors simulated using the previously reported main effect sizes as follows:

lp=0.33+log(1.28)g+log(2.63)a+log(ORAOSrs16969968)ag (1)

where genotypes, g, were simulated from a binomial distribution with MAF = 0.34, the observed frequency of the A allele in the UK Biobank, early versus late AOS status, a, was randomly assigned to individuals. We varied the interaction effect size, log(ORAOS × rs16969968) between 0.005 and 0.4, reflecting a range of plausible effect sizes and encompassing the previously reported interaction effect (OR = 1.16). Binary phenotypes, y, were then simulated in R as,

y = rbinom(61077,1,exp(lp)/(1+ exp(lp))) (2)

For each simulated interaction effect size, we performed 1000 replicate simulations, estimating the interaction effect using logistic regression as above, and recorded the number of observations with an interaction p-value below either nominal significance, α̣ = 0.05, or genome-wide significance, α̣ = 5e-8. We performed similar simulations with the main AOS and rs16969968 effect sizes estimated within the UK Biobank (see below). We varied the sample size from 1e-3 to 2e-6, varying interaction effect size (previously reported ORAOS×rs16969968 = 1.16 vs our meta-analyzed estimate ORAOS×rs16969968 = 1.004), and nominal versus genome-wide significance thresholds (α̣ = 0.05 vs 5e-8, respectively).

Results

We observed significant main effects of the rs16969968 A allele and AOS on CPD (Figure 1A and B; Supplementary Figure S4; Table 1; Supplementary Tables S1 and S2). When estimated as predictors of heavy versus light smoker status, the meta-analyzed estimated genetic effect, ORrs16969968 = 1.12 (p = 4.8e-28), was similar to but lower than the previous estimate12 of 1.28. The effect of early AOS, ORAOS = 1.19 (p = 3.6e-45), was less than previously reported12 (ORAOS = 2.63). However, both main effects were significantly associated with CPD in the expected direction, regardless of the CPD or AOS encoding, and represent strong evidence that both the rs16969968 A allele and early AOS are positively associated with heavier smoking, replicating previous findings. We note that, like previous findings,11,12 rs16969968 is not associated with AOS (Supplementary Figure S3D).

Figure 1.

Figure 1.

Main effects of rs16969968 genotype (A) and age of smoking initiation (AOS) (B) on cigarettes per day (CPD or log10[CPD]), and rs16969968 effect on CPD and log10(CPD) as a function of early or late age of initiation (C).

Table 1.

Transethnic meta-analysis estimated main and interaction effects (β a) and SE for rs16969968, age of smoking initiation, and their interaction

AOS rs16969968_A rs16969968 × AOS
CPD coding N AOS coding β SE p β SE p β SE p Direction HetChiSq HetPVal
CPD (raw) 128 383 Raw −0.296 1.15e-02 9.34e-147 1.271 2.17e-01 4.65e-09 −0.019 1.24e-02 1.17e-01 -++++--+-+- 5.299 8.70e-01
Binned −0.830 3.59e-02 3.89e-118 1.042 6.53e-02 2.57e-57 −0.063 3.81e-02 1.01e-01 -++++--+-+- 3.059 9.80e-01
Early/late 1.589 8.18e-02 4.99e-84 0.861 5.97e-02 3.41e-47 0.190 8.71e-02 2.87e-02* +-+---+-+++ 4.433 9.26e-01
Log10(CPD) 128 383 Raw −0.008 2.88e-04 6.91e-159 0.026 5.36e-03 1.55e-06 0.000 3.07e-04 7.86e-01 -+++---+++- 4.023 9.46e-01
Binned −0.021 8.83e-04 1.80e-129 0.025 1.60e-03 4.22e-54 0.000 9.34e-04 8.19e-01 -++-+-++-+- 3.02 9.81e-01
Early/late 0.040 2.01e-03 1.95e-86 0.023 1.46e-03 3.40e-56 0.003 2.13e-03 2.12e-01 +--------++ 4.396 9.28e-01
Binned 128 383 Raw −0.023 9.39e-04 1.10e-130 0.102 1.77e-02 9.89e-09 −0.001 1.01e-03 2.03e-01 -+++---+++- 4.992 8.92e-01
Binned −0.065 2.93e-03 3.07e-109 0.089 5.32e-03 4.94e-63 −0.006 3.11e-03 6.59e-02 -++++--+-+- 5.804 8.31e-01
Early/late 0.126 6.67e-03 2.55e-79 0.074 4.87e-03 1.26e-52 0.015 7.10e-03 3.79e-02* +++---+--++ 4.273 9.34e-01
Heavy/light 61 077 Raw −0.018 1.41e-03 1.15e-38 0.116 3.94e-02 3.16e-03 0.000 2.16e-03 9.00e-01 -++----+-+- 7.649 6.63e-01
Binned −0.077 4.92e-03 1.23e-54 0.146 1.27e-02 2.78e-30 −0.001 6.80e-03 8.30e-01 -+--+-++-+- 7.851 6.43e-01
Early/late 0.171 1.21e-02 3.65e-45 0.116 1.05e-02 4.82e-28 0.004 1.62e-02 8.22e-01 +++---+---+ 4.666 9.12e-01
Heavy/light (additive scale) 61 077 Raw −0.013 7.19e-04 6.59e-76 0.094 1.43e-02 5.02e-11 −0.002 8.15e-04 2.87e-02* -++----+-+- 8.097 6.19e-01
Binned −0.044 2.23e-03 2.40e-88 0.070 4.24e-03 2.40e-61 −0.005 2.45e-03 5.70e-02 -+--+-++-+- 8.095 6.20e-01
Early/late 0.090 5.17e-03 4.34e-67 0.058 3.88e-03 7.84e-50 0.012 5.69e-03 3.41e-02* +++---+---+ 4.745 9.08e-01

Cochran’s Q-test of heterogeneity as estimated in METAL is shown for the interaction effect. Shown are estimates for each encoding of CPD and AOS. AOS = age of smoking initiation; CPD = cigarettes per day.

aβ refers to the regression slope. For CPD coded as heavy/light, exp(β) is the odds ratio when analyzed on the multiplicative scale.

*p < .05.

Conversely, the direct replication test using the methods by Hartz et al.12 with only rs16969968 and sex as independent variables found no evidence of different allelic effects between early and late AOS smokers (p = .41; Supplementary Tables S6S8).

The formal interaction test of rs16969968 genotype and AOS was only nominally significant (α = 0.05) and only in some combinations of CPD and AOS encodings (Figure 1C; Supplementary Figure S4; Table 1; Supplementary Tables S1 and S2). Specifically, when treating both CPD and AOS as binary phenotypes the logistic model interaction was not significant (ORrs16969968×AOS = 1.004, p = .82) and the effect was notably lower than the previously reported estimate of 1.16. Interestingly, the interaction effect was nominally significant (p < .05) for the binned CPD phenotype and dichotomized AOS, and when heavy/light CPD was analyzed on the additive scale, but not when the CPD phenotype was either heavy versus light analyzed on the multiplicative scale model or when CPD was log-transformed. Across all tests and all CPD and AOS encodings, no interaction effects reached genome-wide significance (p > .028).

Associations of rs16969968 stratified by AOS also produced mixed results. Ninety-five percent confidence intervals (α = 0.05) of the meta-analyzed effect sizes were nonoverlapping only for binned and binary CPD encodings (Figure 2; Supplementary Table S3). When examining meta-analyzed genetic effects of rs16969968 on heavy versus light CPD, OREarly/ORLate was much lower than previously reported (OREarly/ORLate = 1.016, Supplementary Table S3). Within the largest ancestry cluster (EUR ancestry), 95% confidence intervals of the rs16969968 effects were nonoverlapping in early versus late AOS individuals for all CPD encodings except log10(CPD) (Supplementary Figure S5; Supplementary Tables S3S5). For all other ancestry clusters, we found no evidence of different rs16969968 effects between early and late AOS smokers using either a genome-wide (α̣ = 5e-8) or nominal (α = 0.05) significance threshold (Supplementary Table S4).

Figure 2.

Figure 2.

Trans-ethnic meta-analyzed allelic effect size estimates, β, of the rs16969968 risk allele, A, estimated in stratified association analyses by early and late age of initiation, with confidence intervals indicated using either α = 0.05 (solid line) or genome-wide Bonferroni-corrected α = 5e-8 (dashed line). For the heavy versus light smoker phenotype, allelic effects (β) were transformed to odds ratio using the BOLT-LMM38-suggested transformation, of eβ/(u(1 − u)), where u = 0.42 is the proportion of cases. Effect size difference Z-test p-values are shown for each comparison. See Supplementary Table S3 for estimates and statistics.

Our power analyses yielded two main results. First, our sample was well powered (>99%) to detect an interaction effect of the size previously reported at nominal significance (Figure 3, Supplementary Figure S1), though not at genome-wide significance (power ~5%), even with over 61 000 subjects. Second, a sample drastically larger than that analyzed here would be required to detect an interaction effect of ORrs16969968×AOS = 1.004, as estimated within our sample, with 80% power at α̣ = 0.05 (Figure 3). Even applying the upper 95% confidence interval limit of our estimate (ORrs16969968×AOS = 1.035), or the estimate within the largest genetic cluster (1.03) would require a sample of approximately 1.3–2 million participants to achieve 80% power at a genome-wide threshold (α = 5e-8) (Supplementary Figures S6 and S7).

Figure 3.

Figure 3.

Power to detect the statistical interaction effect across a range of sample sizes for the interaction effect size estimated previously (odds ratio [OR] = 1.16) and from the current study (OR = 1.004), applying either a nominal α = 0.05 or genome-wide Bonferroni-corrected α = 5e-8.

Discussion

We replicated the substantial main effects of rs16969968 and early age of onset of smoking on CPD, across all phenotypic and analytical scales (Table 1). Estimates were in the same direction and of roughly similar magnitude as those previously reported.12

Conversely, we found limited evidence of an rs16969968 × AOS interaction effect. Notably, our attempt to directly replicate the methods of Hartz et al. failed to identify a significant difference in the rs16969968-A allele effect on heavy smoking between early and late AOS (p = .41; Supplementary Table S6). Formal interaction model results were mixed and depended heavily on measurement scale and phenotype encoding. This is similar to the results from stratified linear mixed model analyses, where the genetic effect in early AOS individuals was 1.016-fold higher than in late AOS individuals, despite greater statistical power of linear mixed models,39 as well as more control of potential confounding variables, such as genetic ancestry and geographic variation throughout the UK. Patterns within the largest genetic ancestry cluster, individuals with primarily EUR ancestry, were similar to the transethnic meta-analyzed results. Although some tests of differences in the stratified associations did reach nominal significance, the results suggest only minute differences in rs16969968 effects between early and late initiating smokers (Supplementary Table S3). Across multiple analytic frameworks and phenotype encodings, the majority of our results were incongruent with an interaction between rs16969968 and AOS.

Magnitude of Effects and Power

No interaction test, and no comparison of stratified estimates, reached genome-wide significance (α = 5e-8) despite the comparatively large sample size of our study. With genome-wide genotyping arrays and imputation commonly applied,41 and as genome-wide interaction associations and heritability studies have become more frequent,42–48 focusing on genome-wide significance thresholds is paramount to avoid false positives, even in situations where there are a priori hypotheses of interaction, as in rs16969968 × AOS. Applying sufficiently stringent significance thresholds in initial studies, whether genome-wide 5e-8 or another specified threshold, is a best practice for replication of association studies,49 and we believe that as GWAS interactions (including G × E and G × G) studies become more frequent, the question of applicable significance thresholds should be revisited.

In all tests related to interaction effects and stratified associations, the estimated interaction effect sizes were much smaller than previously reported.12 Despite substantially greater power than the original study, which had a sample size of ~30 000 (Figure 3, Supplementary Figures S6 and S7), we estimated the effect to be only 1.004 (or 1.016 in the stratified associations). The lack of replication when using the exact same methods suggests that there is no true interaction at this locus. It is important to recognize that both replicated main effects were strong, significant, and in the expected direction, reflecting the strongest single-locus genetic effect on CPD11 and a strong, consistent risk factor of heavy smoking (early age of initiation). This suggests that if the interaction were to exist, its effect would be much less than previously expected. Importantly, with an OR = 1.004, it would be insignificant for possible clinical interventions, such as targeted smoking awareness based on rs16969968 genotype.25

The discrepancy between our results and those reported by Hartz et al.12 could additionally reflect differences between the study populations and models used for analyses. The study by Hartz et al.12 exemplified a tremendous effort to collect the largest available sample size at the time. They were able to do so by meta-analyzing multiple individual studies together, a highly coordinated endeavor that must be recognized and applauded. One possible outcome of this approach is heterogeneity of effect estimates, which they found and noted. Our analysis focused on a single, relatively homogeneous dataset instead of many studies, removing potential heterogeneity that could have influenced the previous results. Our meta-analysis of relatively homogenous ancestry clusters also attempted to minimize any confounding of stratification. Second, although 33% of the data by Hartz et al. were European data sets, consistent cultural differences may exist between American and UK samples, such as general attitudes toward smoking, and any potential impact would be difficult to assess. Such differences between samples could lead to true heterogeneity in the effects50 and the different estimated effects we observed, though Hartz et al. reported no significant difference between OR estimates from American versus EU studies. A possible source of bias, in both the initial and the current study, is that of collider bias.51,52 The UK Biobank is healthier and wealthier51 than the general UK Population, leading to ascertainment bias and the potential for colliders. Genetic effects on education could lead to false negative genetic associations in the UK Biobank with smoking traits when controlling for education,51 but we view this as an unlikely explanation for our failure to replicate results, as both main effects were replicated, and because whether we controlled for education or not, we found little evidence of rs16969968 × AOS interaction. Selection bias in general could lead to false-positive or -negative associations. Additional methodological differences include testing a full statistical interaction model with complete covariate × AOS and covariate × genotype terms and using a linear mixed model in our stratified analyses, neither of which were previously employed. Mixed model approaches generally improve power,39 and including the covariate interaction terms should lead to unbiased estimates of the rs16969968 × AOS interaction.37 On the other hand, comparing estimates across different subsamples, as in stratified linear mixed model analysis, introduces an additional potential source of confounding. However, the respective strengths and weaknesses of these methods cannot account for our failure to directly replicate the original finding; our stratified association tests with only sex as a covariate (mirroring the approach of Hartz et al.) failed to identify significant differences in allelic effect sizes between early and late AOS individuals (p = .41; Supplementary Table S6), despite being well powered to do so.

Regardless, with respect to particular phenotype encodings and analyses (eg, stratified analyses of heavy vs light smoker status, with linear mixed models), we did find nominally significant, very small differences in allelic effect size estimates between early- and late-onset smokers. These findings are thus potentially congruent with a small interaction between rs1696968 and AOS. If there is a true rs16969968 × AOS interaction of roughly the magnitude we estimated (OR = 1.004), it would require a drastically larger sample size to detect it (Figure 4). We must therefore conclude that any such interactions specific to an individual locus are likely of very small effect, will be very difficult to identify even with the largest available biobanks, and likely contribute minimally to phenotypic variance.

Conclusions

We found limited support for the rs16969968 × AOS interaction. To the extent that AOS might moderate the effect of rs16969968, we estimate this effect to be far smaller than previously reported. We suggest that even larger sample sizes will be required to identify, with genome-wide significance, interactions at individual loci given the expected magnitude of the interaction effects. On the other hand, our unambiguous replications of the main effects of both rs16969968 and AOS on smoking quantity support epidemiological evidence that individuals who begin regularly smoking at a young age are at a higher risk for nicotine dependence later in life.20,21 This provides further evidence in support of public health interventions for adolescent smoking that could help reduce tobacco use, which would in turn lower the number of tobacco-related deaths and illnesses. Although genotype-dependent intervention would not be effective, targeting adolescents for intervention to reduce tobacco use is still advised.

Supplementary Material

A Contributorship Form detailing each author’s specific involvement with this content, as well as any supplementary data, are available online at https://academic.oup.com/ntr.

Supplementary Figure S1. Power (α = 0.05) to detect the rs16969968×AOS interaction effect, given the main effect estimates previously reported or estimated in the current study using the entire sample (left) or only EUR-ancestry individuals (right).

Supplementary Figure S2. Clustering of non-EUR-ancestry individuals within the UK Biobank. A. Percent variance explained (PVE) as a function of the number of clusters, K, in non-European-ancestry individuals within the UK Biobank. B. Clusters K = 1–10 coded by color and symbol on the first 8 PC axes. C. PCs 1–6 coded by self-reported ethnicity within the UK Biobank (data field 21000).

Supplementary Figure S3. Distributions of A) CPD, B) log10(CPD) outcomes, and C) the age of onset of regular smoking (AOS). The median AOS is indicated by the red line (at 16 years of age). D) Across the entire sample, using either logistic or linear regression and the same variables as Hartz et al., rs16969968 is not associated with AOS, matching GSCAN findings (Liu et al. 2018, Nat. Gen.).

Supplementary Figure S4. Main effects of rs16969968 genotype (A) and AOS (B) on cigarettes per day (CPD or log10(CPD)), and rs16969968 effect on CPD and log10(CPD) as a function of early or late age of initiation (C) within the EUR-ancestry group of individuals only.

Supplementary Figure S5. EUR Allelic effect size estimates, β, of the rs16969968 risk allele, A, estimated in stratified association analyses by early and late age of initiation within the EUR-ancestry set of individuals, with CIs indicated using either α = 0.05 (solid line) or genome-wide Bonferroni-corrected α = 5e-8 (dashed line). For the heavy vs light smoker phenotype, allelic effects (β) were transformed to OR using the BOLT-LMM32-suggested transformation, of eβ/(u(1 − u)), where u = 0.42 is the proportion of cases. Effect size difference Z-test p-values are shown for each comparison. See Supplementary Table S1 for estimates and statistics.

Supplementary Figure S6. Power to detect the statistical interaction effect across a range of sample sizes for the interaction effect size estimated previously (OR = 1.16) and from the current study within EUR-ancestry individuals (OR = 1.03), applying either a nominal α = 0.05 or genome-wide Bonferroni-corrected α = 5e-8.

Supplementary Figure S7. Power to detect the statistical interaction effect across a range of sample sizes for the interaction effect size estimated previously (OR = 1.16) and the upper 95% CI OR estimate from the current study (OR = 1.035), applying either a nominal α = 0.05 or genome-wide Bonferroni-corrected α = 5e-8.

ntaa229_suppl_Supplementary_Figures
ntaa229_suppl_Supplementary_Tables
ntaa229_suppl_Supplementary_Taxonomy_Form

Acknowledgments

Ms. Adjangba was supported by the Summer Multicultural Access to Research Training Program at the University of Colorado. Drs. Evans and Border are supported by National Institute of Mental Health (R01 MH100141-06) (PI: Matthew C. Keller), and Dr. Evans is supported by National Institute on Drug Abuse (R01 DA044283-01A1) (PI: Scott I. Vrieze) and National Institute on Aging (R01 AG046938) (PI: C.A. Reynolds/S.M. Wadsworth).

Declaration of Interests

None declared.

References

  • 1. National Center for Chronic Disease Prevention and Health Promotion (US) Office on Smoking and Health. The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General. Centers for Disease Control and Prevention (US). 2014. [PubMed] [Google Scholar]
  • 2. Haberstick BC, Ehringer MA, Lessem JM, Hopfer CJ, Hewitt JK. Dizziness and the genetic influences on subjective experiences to initial cigarette use. Addiction. 2011;106(2):391–399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Haberstick BC, Zeiger JS, Corley RP, et al. Common and drug-specific genetic influences on subjective effects to alcohol, tobacco and marijuana use. Addiction. 2011;106(1):215–224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Kaprio J. Genetic epidemiology of smoking behavior and nicotine dependence. COPD. 2009;6(4):304–306. [DOI] [PubMed] [Google Scholar]
  • 5. Rose RJ, Broms U, Korhonen T, Dick DM, Kaprio J. Genetics of smoking behavior. In: Kim YK, ed. Handbook of Behavior Genetics. New York, NY: Springer; 2009. [Google Scholar]
  • 6. Kendler KS, Schmitt E, Aggen SH, Prescott CA. Genetic and environmental influences on alcohol, caffeine, cannabis, and nicotine use from early adolescence to middle adulthood. Arch Gen Psychiatry. 2008;65(6):674–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Hancock DB, Guo Y, Reginsson GW, et al. Genome-wide association study across European and African American ancestries identifies a SNP in DNMT3B contributing to nicotine dependence. Mol Psychiatry. 2018;23(9):1911–1919. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Hancock DB, Reginsson GW, Gaddis NC, et al. Genome-wide meta-analysis reveals common splice site acceptor variant in CHRNA4 associated with nicotine dependence. Transl Psychiatry. 2015;5:e651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Saccone NL, Emery LS, Sofer T, et al. Genome-wide association study of heavy smoking and daily/nondaily smoking in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). Nicotine Tob Res. 2018;20(4):448–457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Wen L, Yang Z, Cui W, Li MD. Crucial roles of the CHRNB3–CHRNA6 gene cluster on chromosome 8 in nicotine dependence: update and subjects for future research. Transl Psychiatry. 2016;6(6):e843. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Liu M, Jiang Y, Wedow R, et al. ; 23andMe Research Team; HUNT All-In Psychiatry . Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat Genet. 2019;51(2):237–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Hartz SM, Short SE, Saccone NL, et al. Increased genetic vulnerability to smoking at CHRNA5 in early-onset smokers. Arch Gen Psychiatry. 2012;69(8):854–860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Bierut LJ, Stitzel JA, Wang JC, et al. Variants in nicotinic receptors and risk for nicotine dependence. Am J Psychiatry. 2008;165(9):1163–1171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Berrettini W, Yuan X, Tozzi F, et al. Alpha-5/alpha-3 nicotinic receptor subunit alleles increase risk for heavy smoking. Mol Psychiatry. 2008;13(4):368–373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Tobacco Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat Genet. 2010;42(5):441–447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Bailey CD, Tian MK, Kang L, O’Reilly R, Lambe EK. Chrna5 genotype determines the long-lasting effects of developmental in vivo nicotine exposure on prefrontal attention circuitry. Neuropharmacology. 2014;77:145–155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Kuryatov A, Berrettini W, Lindstrom J. Acetylcholine receptor (AChR) α5 subunit variant associated with risk for nicotine dependence and lung cancer reduces (α4β2)₂α5 AChR function. Mol Pharmacol. 2011;79(1):119–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. George AA, Lucero LM, Damaj MI, Lukas RJ, Chen X, Whiteaker P. Function of human α3β4α5 nicotinic acetylcholine receptors is reduced by the α5(D398N) variant. J Biol Chem. 2012;287(30):25151–25162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. O’Neill HC, Wageman CR, Sherman SE, Grady SR, Marks MJ, Stitzel JA. The interaction of the Chrna5 D398N variant with developmental nicotine exposure. Genes Brain Behav. 2018;17(7):e12474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Lydon DM, Wilson SJ, Child A, Geier CF. Adolescent brain maturation and smoking: what we know and where we’re headed. Neurosci Biobehav Rev. 2014;45:323–342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Kendler KS, Myers J, Damaj MI, Chen X. Early smoking onset and risk for subsequent nicotine dependence: a monozygotic co-twin control study. Am J Psychiatry. 2013;170(4):408–413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Adrian M, Kiff C, Glazner C, et al. Examining gene–environment interactions in comorbid depressive and disruptive behavior disorders using a Bayesian approach. J Psychiatr Res. 2015;68:125–133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Schneider KK, Hüle L, Schote AB, Meyer J, Frings C. Sex matters! Interactions of sex and polymorphisms of a cholinergic receptor gene (CHRNA5) modulate response speed. Neuroreport. 2015;26(4):186–191. [DOI] [PubMed] [Google Scholar]
  • 24. Grucza RA, Johnson EO, Krueger RF, et al. Incorporating age at onset of smoking into genetic models for nicotine dependence: evidence for interaction with multiple genes. Addict Biol. 2010;15(3):346–357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Vandenbergh DJ, Schlomer GL, Cleveland HH, et al. An adolescent substance prevention model blocks the effect of CHRNA5 genotype on smoking during high school. Nicotine Tob Res. 2016;18(2):212–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Dick DM, Barr PB, Cho SB, et al. Post-GWAS in psychiatric genetics: a developmental perspective on the “other” next steps. Genes Brain Behav. 2018;17(3):e12447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Do EK, Maes HH. Genotype × environment interaction in smoking behaviors: a systematic review. Nicotine Tob Res. 2017;19(4):387–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Bycroft C, Freeman C, Petkova D, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38(8):904–909. [DOI] [PubMed] [Google Scholar]
  • 30. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88(1):76–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. VanderWeele TJ, Knol MJ. A tutorial on interaction. Epidemiol Methods. 2014;3(1):33–72. [Google Scholar]
  • 33. Abraham G, Inouye M. Fast principal component analysis of large-scale genome-wide data. PLoS One. 2014;9(4):e93766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Fagerström K. The epidemiology of smoking: health consequences and benefits of cessation. Drugs. 2002;62(Suppl 2):1–9. [DOI] [PubMed] [Google Scholar]
  • 35. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2015. [Google Scholar]
  • 36. Douglas Bates and Martin Maechler. Matrix: Sparse and Dense Matrix Classes and Methods. R package version 1.2-18, 2019. https://CRAN.R-project.org/package=Matrix. [Google Scholar]
  • 37. Keller MC. Gene × environment interaction studies have not properly controlled for potential confounders: the problem and the (simple) solution. Biol Psychiatry. 2014;75(1):18–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Loh PR, Kichaev G, Gazal S, Schoech AP, Price AL. Mixed-model association for biobank-scale datasets. Nat Genet. 2018;50(7):906–908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet. 2014;46(2):100–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–2191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. McCarthy S, Das S, Kretzschmar W, et al. ; Haplotype Reference Consortium . A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 2016;48(10):1279–1283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Rawlik K, Canela-Xandri O, Tenesa A. Evidence for sex-specific genetic architectures across a spectrum of human complex traits. Genome Biol. 2016;17(1):166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Young AI, Wauthier FL, Donnelly P. Identifying loci affecting trait variability and detecting interactions in genome-wide association studies. Nat Genet. 2018;50(11):1608–1614. [DOI] [PubMed] [Google Scholar]
  • 44. Dahl A, Nguyen K, Cai N, Gandal MJ, Flint J, Zaitlen N. A robust method uncovers significant context-specific heritability in diverse complex traits. Am J Hum Genet. 2020;106(1):71–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Peterson RE, Cai N, Dahl AW, et al. Molecular genetic analysis subdivided by adversity exposure suggests etiologic heterogeneity in major depression. Am J Psychiatry. 2018;175(6):545–554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Arnau-Soler A, Adams MJ, Hayward C, Thomson PA; Generation Scotland; Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium . Genome-wide interaction study of a proxy for stress-sensitivity and its prediction of major depressive disorder. PLoS One. 2018;13(12):e0209160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Nivard MG, Middeldorp CM, Lubke G, et al. Detection of gene–environment interaction in pedigree data using genome-wide genotypes. Eur J Hum Genet. 2016;24(12):1803–1809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Robinson MR, English G, Moser G, et al. ; LifeLines Cohort Study . Genotype-covariate interaction effects and the heritability of adult body mass index. Nat Genet. 2017;49(8):1174–1181. [DOI] [PubMed] [Google Scholar]
  • 49. Studies N-NWGoRiA, Chanock SJ, Manolio T, et al. Replicating genotype-phenotype associations. Nature. 2007;447(7145):655–660. [DOI] [PubMed] [Google Scholar]
  • 50. König IR. Validation in genetic association studies. Brief Bioinform. 2011;12(3):253–258. [DOI] [PubMed] [Google Scholar]
  • 51. Munafò MR, Tilling K, Taylor AE, Evans DM, Davey Smith G. Collider scope: when selection bias can substantially influence observed associations. Int J Epidemiol. 2018;47(1):226–235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Pingault JB, O’Reilly PF, Schoeler T, Ploubidis GB, Rijsdijk F, Dudbridge F. Using genetic data to strengthen causal inference in observational research. Nat Rev Genet. 2018;19(9):566–580. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ntaa229_suppl_Supplementary_Figures
ntaa229_suppl_Supplementary_Tables
ntaa229_suppl_Supplementary_Taxonomy_Form

Articles from Nicotine & Tobacco Research are provided here courtesy of Oxford University Press

RESOURCES