Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Oct 2.
Published in final edited form as: J Am Stat Assoc. 2014 Oct 2;109(507):905–930. doi: 10.1080/01621459.2014.901223

Identifying Genetic Variants for Addiction via Propensity Score Adjusted Generalized Kendall’s Tau

Yuan Jiang 1, Ni Li 2, Heping Zhang *
PMCID: PMC4219655  NIHMSID: NIHMS571814  PMID: 25382885

Abstract

Identifying replicable genetic variants for addiction has been extremely challenging. Besides the common difficulties with genome-wide association studies (GWAS), environmental factors are known to be critical to addiction, and comorbidity is widely observed. Despite the importance of environmental factors and comorbidity for addiction study, few GWAS analyses adequately considered them due to the limitations of the existing statistical methods. Although parametric methods have been developed to adjust for covariates in association analysis, difficulties arise when the traits are multivariate because there is no ready-to-use model for them. Recent nonparametric development includes U-statistics to measure the phenotype-genotype association weighted by a similarity score of covariates. However, it is not clear how to optimize the similarity score. Therefore, we propose a semiparametric method to measure the association adjusted by covariates. In our approach, the nonparametric U-statistic is adjusted by parametric estimates of propensity scores using the idea of inverse probability weighting. The new measurement is shown to be asymptotically unbiased under our null hypothesis while the previous non-weighted and weighted ones are not. Simulation results show that our test improves power as opposed to the non-weighted and two other weighted U-statistic methods, and it is particularly powerful for detecting gene-environment interactions. Finally, we apply our proposed test to the Study of Addiction: Genetics and Environment (SAGE) to identify genetic variants for addiction. Novel genetic variants are found from our analysis, which warrant further investigation in the future.

Keywords: Addiction, Comorbidity, Genome-wide association study, Inverse probability weighting, Substance dependence

1 INTRODUCTION

Identifying genetic risk variants for addiction (substance dependence) has drawn much attention due to the popularity of genome-wide association studies (GWAS) based on high throughput data. Many genetic signals for addiction have been discovered using GWAS in recent years. Studies focusing on nicotine dependence include Bierut et al. (2007), Uhl et al. (2007), Luo et al. (2008), Drgon et al. (2009), Rice et al. (2012), and Wang et al. (2012), among others. Similarly, there are many important discoveries for alcohol dependence, including but not limited to, Reich et al. (1998), Treutlein et al. (2009), Edenberg et al. (2010), Bierut et al. (2010), Johnson et al. (2006), Kendler et al. (2011), Heath et al. (2011), Wang et al. (2011), and Frank et al. (2012).

Despite these important findings, it still remains to be a very challenging problem to identify genetic variants for addiction, especially taking into account the following two issues. First, comorbidity of addiction is widely observed in the existing literature (National Institute on Drug Abuse, 2010). For example, Zuo et al. (2012a) and Zuo et al. (2012b) studied the risk gene regions in alcohol and nicotine co-dependence. Substance dependence can also be comorbid with other diseases such as depression (Edwards et al., 2012). Second, environmental factors (covariates) are known to play an important role in the association analysis between genetic risk factors and addiction. Examples include stress and history of violence. These factors can potentially produce confounding effects, or they can interact with genotypes known as the gene-environment interactions.

In this work, we aim to analyze the data from the Study of Addiction: Genetics and Environment (SAGE), which is part of the Gene Environment Association Studies initiative (GENEVA) funded by the National Human Genome Research Institute. In the SAGE data, addiction to six different substances were measured simultaneously for the subjects, including alcohol, nicotine, marijuana, cocaine, opiates, and other drugs. A preliminary analysis shows that different addictions are dependent. In the data, there are about 45% subjects who are addicted to nicotine and 47% subjects addicted to alcohol. The nicotine and alcohol co-dependence rate is 32%, much higher than the rate if assuming these two traits are statistically independent. Moreover, information about important environmental factors was also collected. Environmental factors such as history of sexual abuse or violence and socioeconomic status have a non-negligible effect on substance dependence. To analyze the SAGE data, it remains an open question on how to properly adjust for these important covariates with such a complicated constitution of phenotypes. This motivates us to develop a new statistical method to fill this gap.

Traditionally, covariates were usually adjusted in GWAS by being added into a parametric association model such as a binary or an ordinal logistic regression model (Wang et al., 2006). However, there are two major drawbacks when using a parametric model-based approach for analysis of comorbidity of multiple traits. First, it is challenging to build a parametric model for multiple traits especially with different scales. Second, it is not clear how to remove the confounding effects through the model. Therefore, nonparametric tools were recently proposed. To handle comorbidity, Zhang et al. (2010) proposed a nonparametric U-statistic to measure association, called the “generalized Kendall’s tau”, which can take any hybrid of dichotomous, ordinal and quantitative traits. The generalized Kendall’s tau is applicable to both population-based and family-based designs. It is also noteworthy that the family-based association tests (FBAT) (Laird et al., 2000; Rabinowitz and Laird, 2000) are a special case of the generalized Kendall’s tau. To further adjust for environmental factors in a nonparametric setting, Zhu et al. (2012) and Jiang and Zhang (2011) proposed weighted versions of generalized Kendall’s tau. For the weight function, Zhu et al. (2012) used covariates themselves while Jiang and Zhang (2011) used propensity scores (Rosenbaum and Rubin, 1983). The weighted nonparametric tests have shown their power for detecting genetic effects after considering environmental effects.

The weighted tests are proven useful but still face difficulties. For instance, researchers are often required to select the tuning parameters in the weight function (Jiang and Zhang, 2011; Zhu et al., 2012). Although suggestions were made, this extra step makes the tests less accessible. In this work, we propose an alternative that is more natural and convenient. Instead of directly weighting the generalized Kendall’s tau, we employ the idea of “inverse probability weighting” from the applications of propensity scores (Rosenbaum, 1987; Robins et al., 2000; Lunceford and Davidian, 2004). First, we use a parametric model to estimate the genomic propensity scores (Zhao et al., 2009) which summarize all covariates. Then, we apply the inverse probability weighting using the parametric propensity score estimates to the genotype kernel of the nonparametric U-statistic. These procedures result in our proposed semiparametric measurement of association adjusted by covariates.

In an observational study, the inverse probability weighting method aims to construct an unbiased estimator of treatment effect. Similarly, we show that our U-statistic is an asymptotically unbiased estimator of the phenotype-genotype association under the null hypothesis, while the non-weighted and other weighted U-statistics are not necessarily asymptotically unbiased. Moreover, the inverse probability weighted U-statistic is free of tuning parameters. Another contribution of this work is to provide the null distribution of our test statistic incorporating the estimation step of propensity scores. Interestingly, we find that if the propensity scores are estimated consistently (n-consistency indeed), the U-statistic has even a smaller variance than the one with true propensity scores. This confirms a surprising but known fact that “it is better to use the ‘estimated propensity score’ than the true propensity score even when the true score is known” (Robins et al., 1992). Nonetheless, it is the first time (to the best of our knowledge) to rigorously formalize this idea either from a U-statistic viewpoint or in the framework of genome-wide association tests.

To evaluate the performance of our proposed test, we perform simulation studies to compare with the generalized Kendall’s tau and its weighted versions in terms of type I error and power. The simulation results show that our test possesses a higher power in most situations we examined and is particularly powerful for detecting gene-environment interactions.

Finally, we apply our proposed test to the SAGE data, together with non-weighted and other weighted tests, for comorbidity of multiple addictions. We also compare the comorbidity based analyses with the analysis from a single addiction at a time. Interestingly, besides a few overlapped markers, novel regions have been detected using multiple phenotypes, and different approaches may be more powerful under different settings; for example, a comorbidity genetic analysis is more powerful only for shared genes. Among the tests for multiple addictions, we clearly see the advantage of adjusting for important covariates in our analysis. Without any adjustment, no SNP was identified to be genome-wide significant. With adjustment, different adjusted tests work complementarily to each other. Our proposed test, in particular, reveals SNPs/genes that are not discovered by other tests. For example, the SNP rs251133 (on chromosome 5) achieves the genome-wide significance only using our proposed test. The new findings from our analyses warrant further investigation with either a replication study or a biological verification.

2 SEMIPARAMETRIC ASSOCIATION TEST

2.1 Non-weighted and Weighted Association Measurements

Suppose we observe a vector of traits Yi={Yi(1),,Yi(p)}, a test-locus genotype Gi, and a vector of covariates Zi={Zi(1),,Zi(q)} for the ith subject in the n study subjects from a population association study. Our data are independent samples {(Yi,Gi,Zi):i=1,,n}. In the following, we denote Y = {Y1, … , Yn} and Z = {Z1, … , Zn} for all the traits and covariates, respectively. We present here a few nonparametric association statistics to measure the association between the multiple traits and the genetic marker.

The first statistic was proposed by Zhang et al. (2010). For individuals i and j, let Yi and Yj be their vectors of traits respectively. Then, a trait kernel is defined as

ϕt(Yi,Yj)=[f1{Yi(1)Yj(1)},,fp{Yi(p)Yj(p)}],

where function fk(·) (k = 1, … , p) can be chosen as the identity function for a quantitative or binary trait (Rabinowitz, 1997), or the sign function for an ordinal trait (Zhang et al., 2006). Traditionally, a genotype kernel is chosen as

ϕg(Gi,Gj)=GiGj.

Based on these two kernels, Zhang et al. (2010) proposed a nonparametric U-statistic to measure the association between the phenotype and genotype as

U=(n2)1i<jϕt(Yi,Yj)ϕg(Gi,Gj), (1)

which is a generalization of Kendall’s tau (Kendall, 1938). This U-statistic was used there to test the null hypothesis that there is no phenotype-genotype association.

For the purpose of adjusting for the covariates, Zhu et al. (2012) introduced another statistic, which is a weighted version of U in (1). Let w(Zi, Zj) be a weight function measuring the similarity between Zi and Zj. For instance, the most intuitive weight function w(Zi, Zj) can be defined as a function of the distance or similarity of the two covariate vectors Zi and Zj. Afterwards, they defined the weighted U-statistic as

Uw,1=(n2)1i<jϕt(Yi,Yj)ϕg(Gi,Gj)w(Zi,Zj). (2)

This weighted U-statistic is used to measure the covariate-adjusted association between the multiple traits and the genetic marker.

Considering the fact that there exist potentially continuous (such as age) and categorical (such as gender) covariates, their distance or similarity can become arbitrary and complicated especially when we have many covariates. Therefore, Jiang and Zhang (2011) proposed to summarize all the covariates, continuous or categorical, into the propensity score (Rosenbaum and Rubin, 1983; Zhao et al., 2009). Its definition is the likelihood of an individual having a particular test-locus genotype based on that individual’s covariate makeup, which can be explicitly stated as

p(zi)={P(Gi=gZi=zi):g𝒢},

with 𝒢 being the set of possible values for the genotype G; while in our context, 𝒢 = {0, 1, 2} representing {aa, Aa, AA} for a SNP marker with two alleles A and a. Then the weighted U-statistic in (2) becomes

Uw,2=(n2)1i<jϕt(Yi,Yj)ϕg(Gi,Gj)w(p(Zi),p(Zj)). (3)

These weighted U-statistics (2) and (3) were proposed to adjust the association taking into account the covariate effects. They have been proven useful in both theory and application especially when the covariates have direct or indirect effects on the traits (Jiang and Zhang, 2011; Zhu et al., 2012).

2.2 Inverse Probability Weighting

In the case without covariates, a natural choice of measurement of genotype-phenotype association is given by U in (1). One property of U is its unbiasedness under the null hypothesis. That is, E(U | Y) = 0 when there is no association between the genotype and phenotype (Zhang et al., 2010). It is noteworthy that conditioning on the traits is necessary to eliminate the need for assumptions about the phenotypic distribution (Laird et al., 2000).

When the covariate information is available, however, in order to remove the confounding effects of the covariates, one needs to test the conditional independence between the genotype and phenotype conditional on the covariates (Zhu et al., 2012). That is ℋ0 : YiGi | Zi, i = 1, … , n. Under the new null hypothesis ℋ0, however, the U-statistic U in (1) is not necessarily an unbiased measure. The reason is that, under ℋ0,

E(UY)=(n2)1i<jϕt(Yi,Yj){E(GiYi)E(GjYj)},

which is a similar association measurement to U in (1) with the genotype Gi replaced by its conditional mean E(Gi | Yi). This implies that E(U | Y) would have a non-degenerate distribution (when Yi’s are regarded as random) unless all E(Gi | Yi)’s are equal. Therefore, E(U | Y) cannot always be zero. The same conclusion holds for the weighted U-statistics UW,1 and UW,2 in (2) and (3). They are also not necessarily unbiased under the null hypothesis ℋ0.

Therefore, we need to revise the above-mentioned U-statistics to ensure the theoretical unbiasedness. Borrowing the idea of the inverse probability weighting method for propensity scores (Rosenbaum, 1987; Robins et al., 2000; Lunceford and Davidian, 2004), we revise the genotype kernel from ϕg(Gi, Gj) = GiGj to

ϕg(Gi,Gj;Zi,Zj)=Gie(Zi)Gje(Zj),

where e(zi) = E(Gi | Zi = zi) is the conditional expectation of Gi given Zi = zi. In general, e(zi) can be directly obtained from the propensity score as

e(zi)=g𝒢gP(Gi=gZi=zi).

Then we propose the propensity score-inverse probability weighted U-statistic as

UIPW=(n2)1i<jϕt(Yi,Yj)ϕg(Gi,Gj;Zi,Zj). (4)

From (4), we see that

E(UIPWY)=(n2)1i<jϕt(Yi,Yj)E[E{ϕg(Gi,Gj;Zi,Zj)Zi,Zj}Y]=0,

as E{ϕg(Gi, Gj; Zi, Zj) | Zi, Zj} = 0 under ℋ0. This shows that UIPW is an unbiased estimator of the conditional association between the genotype and phenotype under ℋ0, provided that the true values of propensity scores are known.

2.3 Asymptotic Distribution with True Propensity Scores

As illustrated by Zhu et al. (2012), the asymptotic distribution of UIPW may be derived conditioning on both traits Y = y and covariates Z = z. Write ui=1nj=1nϕt(Yi,Yj), then

UIPW=2n1i=1nuiGie(Zi).

Conditioning on both traits and covariates, the mean of UIPW is still zero under ℋ0. The asymptotic distribution of UIPW can be derived by applying the central limit theorem. Theorem 1 reveals that UIPW has an asymptotic normal distribution after normalization by its variance.

Theorem 1. Let v(zi) = var(Gi | Zi = zi). Assume infn,i |e(zi)| > 0 and infn,i |v(zi)| > 0. Suppose max1inui2=o{λmin(i=1nuiui)}, where λmin represents the minimum eigenvalue. Then, under the null hypothesis ℋ0,

n12UIPWN(0,Ip)

in distribution, conditioning on all the traits and covariates, where

=4ni=1nuiuiν(zi)e2(zi).

Uipw is a linear combination of the independent genotypes G1, … , Gn. This observation inspires the application of Corollary 1.3 in Shao (2003) to prove Theorem 1. The conditions infn,i |e(zi)| > 0 and infn,i |v(zi)| > 0 are assumed to ensure the positive definiteness of the covariance matrix Σ. Moreover, the condition max1inui2=o{λmin(i=1nuiui)} is used to control the contribution of each term in the linear combination so that no term is dominant of all the others (see the regularity condition in Corollary 1.3 in Shao (2003)).

2.4 Test Statistic with Estimated Propensity Scores

In Section 2.3, UIPW involves the true values of the propensity score p(zi) and the mean e(zi). However, in the real situation, the propensity scores are always estimated from the samples, i.e., by p^(Zi). So is the mean e(zi) in the statistic UIPW, estimated by e^(Zi). In this case, the test statistic becomes

U^IPW=2n1i=1nuiGie^(Zi).

Therefore, we aim to find the asymptotic distribution of the test statistic U^IPW in this subsection. This distribution will serve as the reference distribution for our association test.

We assume a parametric model indexed by parameters θRd to estimate the propensity scores. Therefore, we call U^IPW a semiparametric measurement given both its parametric and nonparametric components. To estimate p(zi) and further e(zi), we make use of the maximum likelihood estimator or the root of the likelihood equations θ^ from this model. It is noteworthy that we do not limit ourselves to any specific form of models. Instead, we build the theory upon the following general parametric form,

P(Gi=gZi=zi)=pg(zi;θ),g=0,1,2;i=1,,n, (5)

with g=02pg(zi;θ)=1. For clarity, θ0 is used for the true values of θ. Thus, eθ0(zi) and vθ0(zi) denote the true values of e(zi) and v(zi), respectively.

With model (5), we observe that U^IPW=UIPW(θ^) is a statistic with estimated parameters θ^. To derive the asymptotic distribution of U^IPW, we follow the approach suggested by Pierce (1982) and Randles (1982). The idea is to derive the asymptotic joint distribution of {UIPW(θ0),θ^} and then to approximate the distribution of U^IPW using the mean value theorem.

Before presenting the main theoretical result, we need to introduce some necessary notation. With i = 1, … , n, the log-likelihood function log ℓi(θ) of model (5) is

logi(θ)=g=02I(Gi=g)logpg(zi;θ).

We assume the score function ψθ(Gi, zi) and information matrix Iθ(zi) are well defined as

ψθ(Gi,zi)=θlogi(θ)=g=02I(Gi=g)pg1(zt;θ)θpg(zi;θ), (6)
Iθ(zi)=E{ψθ(Gi,zi)ψθ(Gi,zi)}=g=02pg1(zi;θ)θpg(zi;θ)θpg(zi;θ). (7)

In addition, define the following matrices,

θ0=4ni=1nuiuiνθ0(zi)eθ02(zi),Γθ0=2ni=1nuig=02(gθpg{zi;θ0)}eθ0(zi), (8)

and vectors (for i = 1, … , n),

γi1={uieθ0(zi),p11(zi;θ0)θp1(zi;θ0)p01(zi;θ0)θp0(zi;θ0)},γi2={2uieθ0(zi),p21(zi;θ0)θp2(zi;θ0)p01(zi;θ0)θp0(zi;θ0)}.

Theorem 2 presents the asymptotic distribution of the test statistic U^IPW, with the detailed derivation provided in the Appendix.

Theorem 2. Let the parameter space Θ be an open set. Suppose that, there exist some δ > 0 andcθ0 > 0 such that pg(zi; θ) ∈ [δ, 1 − δ] for all θ satisfyingθθ0∥ ≤ cθ0 with g = 0, 1, 2 and i = 1, … , n;i(θ) is twice continuously differentiable; for each g = 0, 1, 2,

max1insupθθ0cθ0θpg(zi;θ)=O(1),max1insupθθ0cθ02θθpg(zi;θ)=O(1), (9)

and there exists constants Cθ0 > 0 and α > 0 such that for all θ satisfyingθθ0∥ ≤ cθ0,

1ni=1n2θθpg(zi;θ)2θθpg(zi;θ0)Cθ0θθ0α, (10)

whereA∥ = {tr(AA)}1/2 is the Frobenius norm for any matrix A; there exists a positive definite matrix Iθ0 such that 1ni=1nIθ0(zi)Iθ0;λmax(i=1nuiui)=O(n) and max1inui2=o(n); furthermore, max1inλmax(γi1γi1+γi2γi2)=o[λmin{i=1n(γi1γi1+γi2γi2)}] and λmin{i=1n(γi1γi1+γi2γi2)}nϵ for some ϵ > 0, where λmax represents the maximum eigenvalue. Let Λθ0=θ0Γθ0Iθ01Γθ0. Then, under the null hypothesis0,

nΛθ012U^IPWN(0,Ip),

in distribution, conditioning on all the traits and covariates.

The condition max1inλmax(γi1γi1+γi2γi2)=o[λmin{i=1n(γi1γi1+γi2γi2)}] in Theorem 2 has the same role as the condition max1inui2=o{λmin(i=1nuiui)} in Theorem 1. It is a typical requirement of the central limit theorem for a weighted sum of independent random variables. That is, none of the weights would dominate all the others in an asymptotic sense.

Theorem 2 implies the asymptotic unbiasedness of the semiparametric statistic U^IPW under our null hypothesis ℋ0, when the propensity scores are estimated using a parametric model. This property has not been achieved by either the non-weighted or the weighted statistics in the previous work (Zhang et al., 2010; Jiang and Zhang, 2011; Zhu et al., 2012). This agrees with our observation in Section 2.2 when the true values of propensity scores are assumed to be known.

In addition, a comparison between Theorems 1 and 2 reveals that the asymptotic variance of U^IPW is smaller than that of UIPW, the U-statistic with true propensity scores. This confirms a surprising but known fact that “it is better to use the ‘estimated propensity score’ than the true propensity score even when the true score is known” (Robins et al., 1992). This phenomenon has been revealed by both theory (Rosenbaum, 1987; Robins et al., 1992) and empirical studies (Gu and Rosenbaum, 1993). Nonetheless, it is the first time (to the best of our knowledge) to rigorously formalize the idea either from a U-statistic viewpoint or in the framework of association tests.

2.5 A Specific Example

As a specific example of model (5), we consider the ordinal logistic regression model

logit{GigZi=zi}=λg+βzi,g=0,1;i=1,,n, (11)

where λ0 < λ1 are ascending level parameters, and β reflects the association between the gene and covariates. Using the notation in Section 2.4, θ = (λ0, λ1, β′)′ ∈ Rq+2 and d = q + 2.

Let

qg(zi;θ)=exp(λg+βzi)1+exp(λg+βzi),g=0,1,

be the cumulative probabilities with qg(zi; θ) = Σg′≤g pg(zi; θ), then the first-order derivatives in (6) can be explicitly written as follows,

θp0(zi;θ)=π{q0(zi;θ)}ϕ10i,θp1(zi;θ)=π{q1(zi;θ)}ϕ01iπ{q0(zi;θ)}ϕ10i,θp2(zi;θ)=π{q1(zi;θ)}ϕ01i,

with π(x) = x(1 − x), ϕ10i=(1,0,zi) and ϕ01i=(0,1,zi). The second-order derivatives in (9) and (10) can also be explicitly written as

2θθp0(zi;θ)=ϖ{q0(zi;θ)}ϕ10iϕ10i,2θθp1(zi;θ)=ϖ{q1(zi;θ)}ϕ01iϕ01iϖ{q0(zi;θ)}ϕ10iϕ10i,2θθp2(zi;θ)=ϖ{q1(zi;θ)}ϕ01iϕ01i,

with ϖ(x)=x(1x)(12x). In this way, we can write the explicit form of the information matrix in (7) as

Iθ(zi)=[1p0(zi;θ)+1p1(zi;θ)]π2{q0(zi;θ)}ϕ10iϕ10i+[1p1(zi;θ)+1p2(zi;θ)]π2{q1(zi;θ)}ϕ01iϕ01i1p1(zi;θ)π{q0(zi;θ)}π{q1(zi;θ)}(ϕ10iϕ10i+ϕ01iϕ10i), (12)

and the matrix Γθ0 in (8) as

Γθ0=2ni=1nui[π{q0(zi;θ0)}ϕ10i+π{q1(zi;θ0)}ϕ01i]eθ0(zi). (13)

The main result in Theorem 2 follows as long as its conditions are satisfied. Indeed, some of the conditions become redundant in this specific example, such as the twice continuous differentiability of the likelihood function. Moreover, conditions (9) and (10) can be simplified into a simple condition max1≤inzi∥ = O(1). In summary, we present the following corollary parallel to Theorem 2 specifically for this example.

Corollary 1. Assume model (11) holds. Suppose that, there exist some δ > 0 andcθ0 > 0 such that pg(zi; θ) ∈ [δ, 1 − δ] for all θ satisfyingθθ0cθ0 with g = 0, 1, 2 and i = 1, … , n; max1≤inzi∥ = O(1), max1inui2=o(n), and λmax(i=1nuiui)=O(n);max1inλmax(γi1γi1+γi2γi2)=o[λmin{i=1n(γi1γi1+γi2γi2)}] and λmin{i=1n(γi1γi1+γi2γi2)}nϵ for some ϵ > 0, where

γi1={uieθ01(zi),1pi0pi2pi11,pi2+pi0pi2pi11,(pi0+pi1)zi},γi2={2uieθ01(zi),pi1pi2,pi0pi1,(1+pi1)zi},

with the simplified notation pig = pg(zi; θ0); there exists a positive definite matrix Iθ0 such that 1ni=1nIθ0(zi)Iθ0 with Iθ0(zi) in (12). Then, the conclusion of Theorem 2 holds with the explicit form of Γθ0 given in (13).

Following the asymptotic distribution of U^IPW in Corollary 1, we define the test statistic

T^IPW=nU^IPWΛ^1U^IPW,

where Λ^=Λθ^ is the estimator of Λθ0. The consistency of Λ^ can be verified under the conditions of Corollary 1. Therefore, it is clear that

T^IPWχp2,

in distribution, conditioning on all the traits and covariates. This serves as the reference distribution in our numerical studies.

2.6 Genotype Coding

As mentioned in Section 2.1, the genotype G is coded as 0, 1, 2 representing aa, Aa, AA respectively, which record the number of a reference allele A. The choice of a different reference allele a leads to a different coding of genotype such as G′ = 2 − G. We illustrate in this subsection the effect of different genotype codings on the association measurements we studied in Sections 2.1–2.2.

Firstly, notice that the genotype kernel ϕg(Gi, Gj) in (1) is invariant to the change of genotype coding from G to G′, i.e., ϕg(Gi,Gj)=ϕg(Gi,Gj). Therefore, the non-weighted U-statistic U in (1) and the weighted U-statistic UW,1 in (2) are both invariant to the genotype codings.

Secondly, the propensity score vector p(zi) = {P(Gi = g | Zi = zi) : g𝒢}′ in the weighted U-statistics UW,2 in (3) is invariant except that the order of its elements is reversed. It leads to the invariance of UW,2, as long as the weight function w(u1, u2) in (3) is not changed by the synchronous permutation of the elements in u1 and u2. This is often the case. For example, Jiang and Zhang (2011) used w(u1, u2) = exp(−∥u1u22/2), which satisfies the above condition.

Finally, we should note that our proposed measurement UIPW does not possess the invariance property under the two genotype codings. The revised genotype kernel ϕg(Gi, Gj; Zi, Zj) is not invariant under codings G and G′. Using a different genotype coding will actually change our association measurement UIPW and further change the test result. This is understandable because we apply a new weighting scheme. In the non-weighted U-statistic U, the genotypes Gi are treated equally in the genotype kernel. However, to achieve the unbiasedness under ℋ0, the new U-statistic UIPW inversely weights the genotypes by their expected values conditional on the covariates. It is the new weighting scheme that violates the invariance but achieves the unbiasedness. From the practical viewpoint, the new method can give us more flexibility to choose a genotype coding which better fits the real situation.

For clarity, we recommend the simple genotype coding. We choose the major allele as the reference allele for practical reasons. In practice, the inverse probability weighting often encounters the difficulty of small weights in the denominator. However, it is fairly easy to see that the above choice is much less likely to result in small denominators e(zi) (or e^(zi)) in UIPW (or U^IPW) than the other choice. Therefore, we try to avoid the situation where the weights e(zi) (or e^(zi)) in the denominator are close to zero.

3 SIMULATION STUDIES

3.1 Settings

We conduct simulation studies to compare the performance of our semiparametric association test T^IPW with the three methods mentioned in Section 2.1. They are the non-weighted and weighted tests derived from the association measures (1)–(3), denoted by T, TW,1 and TW,2 respectively. We utilize the same “conditional independence” null hypothesis ℋ0 (see Section 2.2) for all four tests for a fair comparison. The simulation results are obtained from samples with size of 500, which are generated as follows.

Step 1: For the ith sample, a continuous covariate Zi1 is simulated from N(0, 1) distribution, and a binary covariate Zi2 is randomly sampled from {−1, 1} with equal probabilities.

Step 2: For the relationship between the covariates and the test-locus genotype Gi, we generate Gi from the ordinal logistic regression model

OLR:logit{P(GigZi1,Zi2)}=μgν1Zi1ν2Zi2,g=0,1,

where ν1 and ν2 control the association between the genotype and the covariates. An alternative genotype model is to generate Gi according to a binomial distribution Bin(2, ri) with probability ri satisfying

BIN:logit(ri)=μ+ν1Zi1+ν2Zi2+ϵi,

where ϵiN(0, 1) is a random error. We refer to the former model “OLR” and the latter model “BIN”. The former model is the one we specified in Section 2.5, while the latter model is used to assess the effect of model misspecification with ϵi deliberately added for additional complexity.

Step 3: Conditional on the genotype Gi and the covariates Zi1 and Zi2, two binary traits Yi=(Yi(1),Yi(2)) are generated according to a logistic regression phenotype model,

logit{P(Yi(j)=1Gi,Zi1,Zi2)}=αj+βGGi+βZ1Zi1+βZ2Zi2+βGZ1GiZi1+βGZ2GiZi2+ϵij,

with i = 1, … , n; j = 1, 2; and (ϵi1, ϵi2)′ ∼ N(0,Σϵ).

In the two genotype models (OLR and BIN), the minor allele frequency (MAF) of the simulated genotype depends on the values of μ0, μ1, μ and ν1, ν2. To investigate the possible effect of different minor allele frequencies on our results, we fix ν1 = ν2 = 1 and select appropriate values of μ0, μ1 and μ. Their values are chosen so that the simulated minor allele frequency is equal to one of the following values: 0.05, 0.10, 0.15, … , 0.40. These choices give a broad and reasonable range for evaluating how an association test performs with different minor allele frequencies.

In the phenotype model, we set α1 = −0.75, α2 = −1, and ϵ=(10.250.251). The choices of the coefficients (βG, βZ1, βZ2, βGZ1, βGZ2)′ are provided by Table 1 as different phenotype models. The models N1 and N2 are null models under ℋ0 in which Yi and Gi are independent conditional on (Zi1, Zi2), and the models A1–A6 are under our alternative hypothesis.

Table 1.

Phenotype models

Null Models
N1 βG = 0 βZ1 = βZ2 = 0 βGZ1 = βGZ2 = 0
N2 βG = 0 βZ1 = βZ2 = 0.5 βGZ1 = βGZ2 = 0

Alternative Models
A1 βG = 0.5 βZ1 = βZ2 = 0 βGZ1 = βGZ2 = 0
A2 βG = 0.5 βZ1 = βZ2 = 0.5 βGZ1 = βGZ2 = 0
A3 βG = 0.5 βZ1 = βZ2 = 0 βGZ1 = βGZ2 = 1
A4 βG = 0.5 βZ1 = βZ2 = 0.5 βGZ1 = βGZ2 = 1
A5 βG = 0.5 βZ1 = βZ2 = 0 βGZ1 = βGZ2 = 2
A6 βG = 0.5 βZ1 = βZ2 = 0 βGZ1 = βGZ2 = 2

3.2 Results for Bivariate Phenotypes

In this subsection, we present simulation results for the generated bivariate phenotypes. In terms of type I error, Table 2 presents the empirical type I error of the four tests based on 10,000 replications when the nominal level is set to 0.001. Table 2 also includes the type I error results when the nominal level is 5 × 10−7. To save the computational time, we fix the minor allele frequency at 0.10 there. This smaller nominal level provides an additional comparison among different methods in a situation similar to the real application (Burton et al., 2007). To illustrate the necessity of utilizing the “conditional independence” null hypothesis ℋ0, we also include T′, the non-weighted test under the original “unconditional independence” null hypothesis 0—no association between phenotype and genotype. In terms of power, Figures 1-4 present the statistical power of the four tests with respect to a wide range of minor allele frequencies. Figures 1-2 correspond to the nominal level 0.001 and Figures 3-4 correspond to the nominal level 5 × 10−7.

Table 2.

Type I error for bivariate phenotypes

MAF T T W,1 T W,2 T^IPW T T T W,1 T W,2 T^IPW T
Model OLR (Nominal Level: 0.001)
Model N1
Model N2
0.05 1.0e-3 0.9e-3 1.6e-3 1.1e-3 0.5e-3 0.5e-3 0.8e-3 0.6e-3 1.1e-3 0.2358
0.10 0.7e-3 0.7e-3 0.7e-3 1.0e-3 0.7e-3 0.4e-3 0.9e-3 1.2e-3 0.9e-3 0.4913
0.15 1.4e-3 0.8e-3 0.8e-3 1.3e-3 0.6e-3 0.5e-3 0.6e-3 0.7e-3 1.0e-3 0.6463
0.20 1.0e-3 0.7e-3 0.9e-3 1.0e-3 0.7e-3 1.0e-3 1.1e-3 1.0e-3 1.4e-3 0.7249
0.25 0.8e-3 0.9e-3 0.6e-3 0.7e-3 0.5e-3 0.8e-3 1.0e-3 1.1e-3 1.1e-3 0.7804
0.30 0.9e-3 0.9e-3 1.0e-3 1.1e-3 0.7e-3 0.7e-3 1.2e-3 0.8e-3 0.7e-3 0.8049
0.35 0.9e-3 0.6e-3 0.8e-3 1.5e-3 0.8e-3 0.5e-3 0.8e-3 1.4e-3 0.9e-3 0.8250
0.40 0.9e-3 1.3e-3 1.0e-3 1.0e-3 1.3e-3 0.5e-3 0.9e-3 0.5e-3 1.2e-3 0.8391

Model BIN (Nominal Level: 0.001)
Model N1
Model N2
0.05 0.5e-3 1.1e-3 0.5e-3 0.8e-3 0.8e-3 0.2e-3 0.1e-3 0.3e-3 0.7e-3 0.1937
0.10 1.2e-3 1.2e-3 0.7e-3 1.7e-3 1.3e-3 0.2e-3 0.4e-3 0.5e-3 0.6e-3 0.4293
0.15 0.8e-3 0.7e-3 0.4e-3 0.9e-3 0.8e-3 0.6e-3 1.1e-3 0.8e-3 1.7e-3 0.5950
0.20 1.1e-3 1.2e-3 1.0e-3 1.5e-3 1.5e-3 0.6e-3 0.6e-3 0.7e-3 1.1e-3 0.6954
0.25 0.5e-3 0.6e-3 0.7e-3 0.5e-3 1.1e-3 0.4e-3 0.6e-3 0.6e-3 0.7e-3 0.7691
0.30 1.2e-3 0.9e-3 0.8e-3 1.7e-3 1.3e-3 0.6e-3 1.1e-3 1.2e-3 0.8e-3 0.8072
0.35 0.7e-3 0.7e-3 0.5e-3 1.4e-3 0.8e-3 1.0e-3 1.6e-3 1.4e-3 0.7e-3 0.8263
0.40 1.1e-3 1.2e-3 1.4e-3 0.9e-3 1.3e-3 0.8e-3 0.9e-3 1.2e-3 0.8e-3 0.8437

Model OLR (Nominal Level: 5 × 10−7)
Model N1
Model N2
0.10 2e-7 2e-7 6e-7 3e-7 2e-7 3e-7 6e-7 7e-7 5e-7 0.0466208

Model BIN (Nominal Level: 5 × 10−7)
Model N1
Model N2
0.10 2e-7 4e-7 1e-7 5e-7 5e-7 1e-7 1e-7 1e-7 5e-7 0.0331154

Figure 1.

Figure 1

Power versus minor allele frequency for bivariate phenotypes. The significance level is 0.001. The genotype is simulated using model OLR. Solid line with circles: inverse probability weighted test T^IPW; dashed line with triangles: non-weighted test T; dotted line with pluses: covariate weighted test TW,1; dotdash line with crosses: propensity score weighted test TW,2.

Figure 4.

Figure 4

Power versus minor allele frequency for bivariate phenotypes. The significance level is 5 × 10−7. The genotype is simulated using model BIN. Solid line with circles: inverse probability weighted test T^IPW; dashed line with triangles: non-weighted test T; dotted line with pluses: covariate weighted test TW,1; dotdash line with crosses: propensity score weighted test TW,2.

Figure 2.

Figure 2

Power versus minor allele frequency for bivariate phenotypes. The significance level is 0.001. The genotype is simulated using model BIN. Solid line with circles: inverse probability weighted test T^IPW; dashed line with triangles: non-weighted test T; dotted line with pluses: covariate weighted test TW,1; dotdash line with crosses: propensity score weighted test TW,2.

Figure 3.

Figure 3

Power versus minor allele frequency for bivariate phenotypes. The significance level is 5 × 10−7. The genotype is simulated using model OLR. Solid line with circles: inverse probability weighted test T^IPW; dashed line with triangles: non-weighted test T; dotted line with pluses: covariate weighted test TW,1; dotdash line with crosses: propensity score weighted test TW,2.

From the perspective of type I error (in models N1 and N2), we find that all four tests under ℋ0 behave fairly well since they all possess reasonably accurate type I errors under both nominal levels. This is partially due to the fact that ℋ0 removes the confounding effects of covariates. By contrast, T′ cannot control its type I error in model N2. The reason is clear: T′ does not remove the confounding effect in model N2 (Jiang and Zhang, 2011; Zhu et al., 2012).

From the perspective of power, we consider models A1–A6. Models A1–A2 are from a phenotype model without the gene-environment interaction, and A3–A6 are with an interaction. To assess situations with different gene-environment interactions, in models A5–A6, we double the interaction coefficients from models A3–A4, respectively.

In model A1 with the genetic effect only, the non-weighted test T possesses the highest power among all four methods, although their differences are actually quite small. This agrees with our expectation since it is not necessary to adjust for covariates in this case. But adjusting for covariates does not harm the statistical power. In model A2 with both genetic and environmental effects, the non-weighted test T performs the worst for most values of minor allele frequency. The other three methods are slightly better, indicating the essentiality of including covariates in the association test. It is noteworthy that the proposed inverse probability weighted test favors the region of a small minor allele frequency in both models A1 and A2. Compared to other weighted tests, the proposed test is comparable or even better for low MAF’s, but is slightly underpowered when the MAF is higher than 0.30.

By including gene-environment interactions (models A3–A4), different methods perform quite differently. It is fairly clear from all figures that the proposed test T^IPW outperforms all competitors for all minor allele frequencies. When the nominal level is 0.001, the proposed test has a power close to 1, which means that it can identify the genetic signal in almost every replicate of the simulated data. The covariate weighted test TW,1 wins the second place in terms of power. The non-weighted test T and the propensity score weighted test TW,2 do not have a comparable power for a wide range of minor allele frequencies.

A further study with stronger gene-environment interactions (models A5–A6) provides additional evidence for our conclusion drawn from models A3–A4. When the gene-environment interactions dominate both genetic and environmental effects, the semiparametric inverse probability weighted test outperforms other tests in all minor allele frequencies we considered, showing the power of the proposed test in detecting the gene-environment interactions.

Comparing the two genotype models (OLR versus BIN), we have not observed a major impact from the misspecified model on testing the associations. When the genotype is generated using the binomial distribution, our test derived from the ordinal logistic regression (Section 2.5) still has a quite accurate type I error and also a high power (even higher in some cases) to detect either genetic effects or gene-environment interactions.

Between the two nominal levels (0.001 and 5 × 10−7), the statistical power becomes smaller with the lower nominal level given the same effect sizes (β’s in Table 1), especially in models A1–A2. All methods are underpowered there; with the sample size of 500, it is expected that we cannot achieve a reasonable power for a full GWAS scan, but unfortunately, the simulation for a much larger sample size takes a very long time to complete. Since our objective is to compare the relative power, we can achieve this goal with the modest sample size. In fact, for models A3–A6, the power of our proposed test is only slightly affected by this small nominal level, and it still dominates all others. In a situation similar to the real application (nominal level 5×10−7), it is clear that some adjustment is necessary when there is a gene-environment interaction.

3.3 Results for Individual Phenotypes

In addition to the simulation results for the bivariate phenotypes in Section 3.2, we also present the results for each individual phenotype Y(1) and Y(2) separately. For simplicity, we fix the nominal level to be 0.001 throughout this subsection. In terms of type I error, Table 3 presents the empirical type I error of the tests based on 10,000 replications. In terms of power, Figures 5-8 present the statistical power of the four tests with respect to a wide range of minor allele frequencies, where Figures 5-6 correspond to the first phenotype and Figures 7-8 correspond to the second phenotype.

Table 3.

Type I error for individual phenotypes

MAF T T W,1 T W,2 T^IPW T T T W,1 T W,2 T^IPW T
Phenotype Y(1), Model OLR
Model N1
Model N2
0.05 0.7e-3 0.9e-3 0.5e-3 0.9e-3 1.2e-3 0.4e-3 0.6e-3 0.8e-3 0.7e-3 0.1288
0.10 1.2e-3 1.2e-3 1.3e-3 1.1e-3 0.6e-3 0.3e-3 0.6e-3 0.6e-3 0.9e-3 0.2689
0.15 1.0e-3 0.8e-3 1.2e-3 1.0e-3 0.9e-3 0.4e-3 0.4e-3 1.1e-3 0.8e-3 0.3703
0.20 1.4e-3 1.3e-3 1.1e-3 1.2e-3 1.1e-3 0.9e-3 1.2e-3 1.0e-3 1.1e-3 0.4441
0.25 1.3e-3 1.5e-3 1.0e-3 0.6e-3 0.7e-3 1.1e-3 1.2e-3 1.2e-3 1.7e-3 0.4966
0.30 1.0e-3 1.0e-3 0.7e-3 1.0e-3 1.0e-3 0.5e-3 0.9e-3 0.8e-3 0.9e-3 0.5173
0.35 0.7e-3 0.7e-3 0.9e-3 0.8e-3 0.7e-3 0.7e-3 1.2e-3 1.0e-3 1.3e-3 0.5402
0.40 1.2e-3 1.1e-3 1.3e-3 1.6e-3 0.7e-3 0.5e-3 0.9e-3 1.1e-3 0.9e-3 0.5566

Phenotype Y(1), Model BIN
Model N1
Model N2
0.05 0.6e-3 0.2e-3 0.6e-3 0.7e-3 1.2e-3 0.3e-3 0.3e-3 0.5e-3 1.2e-3 0.1099
0.10 0.5e-3 0.6e-3 0.2e-3 1.0e-3 1.5e-3 1.0e-3 1.2e-3 0.8e-3 1.8e-3 0.2319
0.15 0.5e-3 0.8e-3 0.7e-3 1.2e-3 1.4e-3 0.6e-3 0.3e-3 0.3e-3 1.0e-3 0.3321
0.20 1.0e-3 1.3e-3 1.2e-3 1.2e-3 1.0e-3 1.0e-3 1.2e-3 1.3e-3 1.5e-3 0.4100
0.25 0.6e-3 1.1e-3 0.9e-3 1.2e-3 1.1e-3 0.7e-3 0.9e-3 0.7e-3 1.1e-3 0.4768
0.30 0.5e-3 0.4e-3 0.3e-3 0.9e-3 1.0e-3 0.3e-3 0.7e-3 0.4e-3 1.4e-3 0.5136
0.35 1.3e-3 1.4e-3 1.2e-3 1.3e-3 0.8e-3 0.5e-3 0.8e-3 0.9e-3 0.7e-3 0.5491
0.40 1.2e-3 1.1e-3 0.7e-3 0.7e-3 1.1e-3 0.8e-3 0.8e-3 1.2e-3 0.7e-3 0.5665

Phenotype Y(2), Model OLR
Model N1
Model N2
0.05 0.9e-3 0.7e-3 0.7e-3 0.9e-3 0.8e-3 0.7e-3 1.0e-3 0.9e-3 1.1e-3 0.1246
0.10 0.6e-3 0.9e-3 0.6e-3 0.4e-3 0.3e-3 0.3e-3 0.9e-3 1.0e-3 0.9e-3 0.2586
0.15 1.2e-3 1.4e-3 1.5e-3 1.3e-3 1.1e-3 0.5e-3 0.7e-3 0.6e-3 0.7e-3 0.3620
0.20 1.7e-3 1.3e-3 1.7e-3 1.3e-3 1.4e-3 0.5e-3 0.6e-3 1.5e-3 1.3e-3 0.4232
0.25 1.0e-3 0.8e-3 1.0e-3 0.9e-3 0.9e-3 0.7e-3 1.1e-3 0.9e-3 1.0e-3 0.4678
0.30 1.0e-3 0.7e-3 1.3e-3 0.8e-3 0.9e-3 0.5e-3 0.9e-3 0.5e-3 1.0e-3 0.5047
0.35 1.1e-3 0.9e-3 1.2e-3 1.3e-3 0.6e-3 0.9e-3 0.9e-3 0.9e-3 1.1e-3 0.5170
0.40 0.4e-3 0.7e-3 0.6e-3 0.7e-3 1.0e-3 0.7e-3 1.4e-3 1.5e-3 1.2e-3 0.5235

Phenotype Y(2), Model BIN
Model N1
Model N2
0.05 0.7e-3 0.7e-3 1.2e-3 0.8e-3 0.9e-3 0.5e-3 0.8e-3 0.4e-3 1.3e-3 0.1091
0.10 0.3e-3 0.4e-3 0.6e-3 0.7e-3 0.6e-3 0.7e-3 0.7e-3 0.2e-3 1.0e-3 0.2282
0.15 0.8e-3 0.9e-3 0.5e-3 0.8e-3 0.5e-3 0.2e-3 0.5e-3 0.2e-3 0.9e-3 0.3195
0.20 0.7e-3 0.7e-3 0.4e-3 1.0e-3 0.9e-3 0.9e-3 1.1e-3 0.9e-3 1.6e-3 0.4007
0.25 0.7e-3 0.6e-3 0.4e-3 1.0e-3 1.0e-3 0.5e-3 0.6e-3 1.1e-3 0.8e-3 0.4558
0.30 0.4e-3 0.6e-3 0.6e-3 0.3e-3 1.0e-3 0.7e-3 0.9e-3 1.0e-3 0.6e-3 0.4942
0.35 0.5e-3 0.7e-3 0.5e-3 1.4e-3 0.7e-3 1.1e-3 1.1e-3 1.3e-3 1.4e-3 0.5106
0.40 0.8e-3 0.9e-3 0.7e-3 1.2e-3 1.0e-3 0.6e-3 0.7e-3 1.0e-3 0.8e-3 0.5313

Figure 5.

Figure 5

Power versus minor allele frequency for phenotype Y(1). The significance level is 0.001. The genotype is simulated using model OLR. Solid line with circles: inverse probability weighted test T^IPW; dashed line with triangles: non-weighted test T; dotted line with pluses: covariate weighted test TW,1; dotdash line with crosses: propensity score weighted test TW,2.

Figure 8.

Figure 8

Power versus minor allele frequency for phenotype Y(2). The significance level is 0.001. The genotype is simulated using model BIN. Solid line with circles: inverse probability weighted test T^IPW; dashed line with triangles: non-weighted test T; dotted line with pluses: covariate weighted test TW,1; dotdash line with crosses: propensity score weighted test TW,2.

Figure 6.

Figure 6

Power versus minor allele frequency for phenotype Y(1). The significance level is 0.001. The genotype is simulated using model BIN. Solid line with circles: inverse probability weighted test T^IPW; dashed line with triangles: non-weighted test T; dotted line with pluses: covariate weighted test TW,1; dotdash line with crosses: propensity score weighted test TW,2.

Figure 7.

Figure 7

Power versus minor allele frequency for phenotype Y(2). The significance level is 0.001. The genotype is simulated using model OLR. Solid line with circles: inverse probability weighted test T^IPW; dashed line with triangles: non-weighted test T; dotted line with pluses: covariate weighted test TW,1; dotdash line with crosses: propensity score weighted test TW,2.

In our simulations, the single-trait results are very similar to the bivariate-trait results in Section 3.2. From the perspective of type I error, all four tests under ℋ0 behave fairly well since they all possess reasonably accurate type I errors. By contrast, T′ cannot control its type I error in model N2. From the perspective of power, we observe that the inverse probability weighted test is generally comparable to others when there is only genetic effects and/or environmental effects, and it outperforms others when there are gene-environment interactions.

3.4 Impact of Model Misspecification

In Sections 3.2–3.3, we observed no major impact on testing the genetic associations caused by a possibly misspecified parametric gene-environment model. To better understand how the model misspecification affects the estimation of the propensity scores, we compare the estimation results under the two genotype models (OLR and BIN) used in Section 3.1. Figure 9 provides the boxplot of the mean squared errors of the estimated propensity scores p^0,p^1 and p^2 from random samples with size of 500 based on 1, 000 replications.

Figure 9.

Figure 9

Mean squared error of the estimated propensity scores p^0,p^1 and p^2. Each panel includes the boxplots for mean squared errors of the estimated propensity scores p^0,p^1 and p^2, in that particular order, from genotype models OLR, BIN, and BIN’, respectively.

Since we use the ordinal logistic regression model to estimate the propensity scores (Section 2.5), when the genotype is simulated using model OLR, the estimation performance is the best. The mean squared errors of the estimated propensity scores are higher when the genotype data are simulated from model BIN.

We would like to note that we deliberately added a random error ϵi in model BIN for additional complexity, which can cause spurious estimation errors. For a more fair comparison, we also simulate genotype data using model BIN without the random error (referred to as model BIN’) and further present the results for BIN’ in Figure 9. From the results, it is obvious that the extra estimation error for model BIN is mainly caused by the random error we added. There is no significant difference between the estimation errors for models OLR and BIN’, indicating that the difference between the estimation performance under the two genotype models is negligible if no additional noise is included.

4 DATA ANALYSIS

4.1 Data and Methods

The Study of Addiction: Genetics and Environment (SAGE) aims to identify susceptible genetic factors that contribute to substance dependence through three large-scale genomewide association studies: the Collaborative Study on the Genetics of Alcoholism (COGA), the Family Study of Cocaine Dependence (FSCD), and the Collaborative Genetic Study of Nicotine Dependence (COGEND). These three studies have been reported separately in previous work (Reich et al., 1998; Hartel et al., 2006; Luo et al., 2008; Bierut et al., 2008). The SAGE data include 4,121 subjects for whom the addiction to alcohol, nicotine, marijuana, cocaine, opiates, and other drugs and genome-wide SNP data (ILLUMINA Human 1M platform) were available. Lifetime dependence on these six categories of substances was diagnosed in accordance with the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV). We hypothesize that there is a common genetic effect for the comorbidity including the addiction to the six categories of substances. We thus use multivariate traits, each of which stands for whether or not the subject is addicted to a single substance. The six phenotypes are coded into binary scales according to whether the subject is addicted to a particular substance.

In our study, we excluded 60 duplicate genotype samples and removed nine subjects with ethnic backgrounds other than African origin (black) or European origin (white). In total we have 3,627 unrelated subjects for whom we have both genotype and phenotype data. Following Chen et al. (2011), we performed a separate analysis for both race (black or white) and gender (female or male), due to the complexity of substance dependence with possible environmental components. Therefore, our analysis was performed in each of the four subpopulations: 1,393 white women, 1,131 white men, 568 black women, and 535 black men (Chen et al., 2011). In addition, we filtered SNPs by setting thresholds for call rate (> 90%), minor allele frequency (MAF) in each sub-population (> 1%), and Hardy-Weinberg equilibrium in each sub-population (p-value > 0.0001).

As we have already split the data by the covariates race and gender, they were not adjusted in the further analysis in each subset. Hence, the remaining covariates include age and some environmental risk factors, such as whether experienced rape/sexual assault, whether experienced physical assault, and whether experienced non-assaultive trauma. Some other risk factors, such as whether experienced neglect as a child, whether experienced physical abuse as a child, and childhood sexual abuse, were not included due to their high rates of missing values.

Similar to the simulation study, we compare four association tests: non-weighted test T, covariate-weighted test TW,1, propensity score-weighted test TW,2, and our semiparametric propensity score-inverse probability weighted test T^IPW. With the above selected covariates, the weight functions w(·, ·) in both weighted tests TW,1 and TW,2 are chosen following previous work (Jiang and Zhang, 2011; Zhu et al., 2012) with default parameters. Meanwhile, we continue to use the ordinal logistic regression model for the genotype-covariate relationship in our proposed test. In addition to the above tests with multivariate traits, we also tabulate the results from analyses using a single trait at a time. For each of the six traits, we utilize two approaches to analyze them. Firstly, we fit a logistic regression model including both genotype and the selected covariates. The statistical significance is drawn from a likelihood ratio test based on the logistic regression model. Secondly, we apply the same association tests T, TW,1, TW,2, and T^IPW as above to each trait, and present the significant findings.

4.2 Summary Statistics

We provided in Table 4 the co-dependence information of the six substances among the 3,627 unrelated subjects included in our final analysis. The diagonal entries are the rates of each substance dependence, and the lower-diagonal entries are the co-dependence rates of each pair of substances. Comparing a lower-diagonal entry to its two corresponding diagonal entries suggests the statistical dependence among the six addictions. For example, there are 1,625 subjects (45%) who are addicted to nicotine and 1,693 subjects (47%) addicted to alcohol. The co-dependence rate of nicotine and alcohol is 32% (1,154 out of 3,627), which is much higher than the rate if assuming these two addictions are statistically independent. This observation supports the existence of comorbidity among the six addictions in this data set.

Table 4.

Dependence and co-dependence rate of six substances. nic: nicotine; mj: marijuana; coc: cocaine; op: opiates; alc: alcohol; oth: other drugs. The percentage in the parenthesis is the dependence or co-dependence rate in the 3,627 unrelated subjects.

Substance Dependence
nic (%) mj (%) coc (%) op (%) alc (%) oth (%)
nic 1625 (45)
mj 486 (13) 620 (17)
coc 686 (19) 464 (13) 937 (26)
op 203 (6) 145 (4) 217 (6) 258 (7)
alc 1154 (32) 577 (16) 820 (23) 238 (7) 1693 (47)
oth 332 (9) 258 (7) 335 (9) 168 (5) 406 (11) 432 (12)

Table 5 summarizes the addiction distribution in each subset of data split by race and sex. We can see that the addiction to some categories of substances is homogeneous across the four subpopulations, such as nicotine, with addiction rates 47%, 48%, 47% and 41% respectively. However, other substance dependencies differ by race (e.g., cocaine, 46% and 36% for black men and women versus 27% and 12% for white men and women) and/or sex (e.g., alcohol, 62% and 62% for black and white men versus 39% and 31% for black and white women). Throughout our analysis, the data are divided into four subsets according to sex and race of the subjects. Therefore, we focus on the subset specific analysis, removing the heterogeneity across the subpopulations.

Table 5.

Summary of substance dependence in each subpopulation. nic: nicotine; mj: marijuana; coc: cocaine; op: opiates; alc: alcohol; oth: other drugs. The percentage in the parenthesis is the substance dependence rate in each subpopulation.

Subset Total Substance Dependence
nic (%) mj (%) coc (%) op (%) alc (%) oth (%)
Black Men 535 254 (47) 136 (25) 248 (46) 44 (8) 332 (62) 61 (11)
Black Women 568 271 (48) 78 (14) 206 (36) 35 (6) 224 (39) 37 (7)
White Men 1131 528 (47) 285 (25) 309 (27) 112 (10) 704 (62) 203 (18)
White Women 1393 572 (41) 121 (9) 174 (12) 67 (5) 433 (31) 131 (9)

Total 3627 1625 (45) 620 (17) 937 (26) 258 (7) 1693 (47) 432 (12)

4.3 Single-Trait Results

Before presenting the multiple-trait results, we summarize the single-trait results from logistic regression models and the association tests in Table 6 and Table 7, respectively. The p-values in bold characters indicate that they reach the genome-wide significance level after Bonferroni correction for the number of traits (p-value < 5 × 10−7/6) (Burton et al., 2007).

Table 6.

Significant SNPs in the genome-wide association study of a single substance dependence from logistic regression. nic: nicotine; mj: marijuana; coc: cocaine; op: opiates; alc: alcohol; oth: other drugs.

Chr SNP Gene MAF p-values
nic mj coc op alc oth
White Women
 3 rs445057 FHIT 0.174 5.9e-1 2.2e-2 2.0e-4 1.7e-1 4.5e-8 1.8e-2

Table 7.

Significant SNPs in the genome-wide association study of a single substance dependence from association tests. op: opiates; oth: other drugs.

Chr SNP MAF Gene p-values
T T W,1 T W,2 T^IPW
op
Black Men
2 rs2377339 0.019 NCK2 1.1e-8 1.1e-9 1.4e-9 8.2e-9
16 rs2042360 0.066 9.2e-7 6.5e-8 4.3e-7 9.6e-7
17 rs17544779 0.017 5.6e-8 6.3e-6 1.8e-6 4.6e-8
White Men
13 rs9529180 0.111 PCDH9 1.5e-7 4.6e-7 4.9e-8 1.1e-7
13 rs9540995 0.112 PCDH9 2.2e-7 7.0e-7 5.9e-8 1.5e-7
13 rs9529185 0.111 PCDH9 1.6e-7 4.7e-7 5.2e-8 1.1e-7
Black Women
5 rs2441010 0.012 1.0e-7 1.1e-4 8.2e-5 7.6e-8
7 rs2528381 0.084 UBE2D4 1.9e-5 5.1e-8 2.9e-5 1.6e-5
7 rs1182398 0.014 UBE3C 1.9e-7 5.6e-8 1.2e-6 1.1e-7
10 rs7911634 0.011 PCDH15 7.2e-5 2.7e-9 3.1e-6 6.6e-5
14 rs17197261 0.020 0R10G3 1.3e-5 4.5e-8 1.4e-3 1.0e-5
White Women
19 rs3745816 0.016 EML2 2.2e-5 4.4e-11 2.0e-5 1.3e-5
19 rs4445998 0.015 EML2 1.2e-5 1.2e-11 2.4e-5 6.7e-6
19 rs1545040 0.020 EML2 1.5e-3 5.7e-8 2.5e-3 1.1e-3

oth
Black Women
11 rs11603357 0.041 2.5e-7 2.6e-8 1.1e-8 1.5e-7
White Women
17 rs3098945 0.187 ANKRD13B 4.5e-6 1.8e-8 6.0e-7 1.1e-6

From Table 6, only one SNP achieves the genome-wide significance level (after Bonferroni correction) in the subpopulation of white women: rs445057 in gene FHIT is identified as a significant marker for addiction to alcohol. Very recently, FHIT has been documented to be in correlation with lifetime cigarette addiction (Antczak et al., 2013). This existing result, combined with our finding that FHIT is associated with alcohol dependence, partially supports the hypothesis that common genes underlie the comorbidity of multiple substance dependencies.

From Table 7, we have identified several significant SNP markers for each of the two phenotypes: addiction to opiates and addiction to other drugs, using the association tests T, TW,1, TW,2, and T^IPW.

For the addiction to opiates, three SNPs are identified to be genome-wide significant in black men. Among these SNPs, rs2377339 is located within gene NCK2, which has a strong association with normal angle glaucoma (Akiyama et al., 2008; Fuse, 2010). Furthermore, a meta-analysis (Bonovas et al., 2004) reported that smoking is a risk factor for glaucoma. These findings indicate some intriguing interplay between smoking and NCK2. A more recent study also verified the association of NCK2 with opiates addiction (Liu et al., 2013).

Three SNPs, all in gene PCDH9, are significantly associated with opiates dependence in white men. PCDH9 was discovered to contain variants that contribute to general addiction vulnerability (Liu et al., 2006), agreeing with our current finding.

Five additional SNPs, located in four known genes, achieve the genome-wide significance in black women. Among these genes, UBE3C has recently been discovered to be one of the four particularly promising candidate genes susceptible to cocaine dependence and major depressive episode (Yang et al., 2011); PCDH15 was also found to be associated with nicotine dependence by multiple human genome-wide association studies (Uhl et al., 2008; Lind et al., 2010). These results partially support our findings about the association between these two genes and opiates dependence.

Three SNPs in gene EML2 are discovered for addiction to opiates in white women. EML2 was found to be one of the potential candidate genes for bipolar disorder comorbid with alcoholism in mice (Le-Niculescu et al., 2008). However, no human studies have suggested the association of EML2 with substance dependence yet.

In addition to opiates, we have two more findings for addiction to other drugs, for which we have not found supporting evidence in the literature. All these single-trait findings can be potentially important for researchers to better understand the genetic components of substance dependence.

4.4 Multiple-Trait Results

The results from the analysis of multivariate traits are summarized in Table 8, with the p-values in bold characters indicating that they reach the genome-wide significance level (p-value < 5 × 10−7) (Burton et al., 2007). Comparing the four tests for multivariate traits, it is fairly clear to see the advantage of adjusting for important covariates in this data set. Without any adjustment, no SNP can be identified at the genome-wide significance level using test T. In addition, we find that different adjusted tests work complementarily to each other. These three tests (TW,1, TW,1 and T^IPW) have some common findings and also non-overlapping discoveries. The results of the weighted tests might depend on the strength of the genetic signals and/or gene-environment interactions, as illustrated by our simulation studies. Similar conclusions can also be drawn from the comparison among different methods for single-trait results in Table 7.

Table 8.

Significant SNPs in the genome-wide association study of multiple substance dependencies. The symbol * indicates that the same SNP is also found by single-trait analysis in Table 7.

Chr SNP MAF Gene p-values
T T W,1 T W,2 T^IPW
Black Men
2 rs2377339* 0.019 NCK2 1.1e-06 6.2e-08 1.4e-07 9.0e-07
5 rs251133 0.406 STARD4-AS1 5.3e-07 5.2e-06 2.8e-05 4.2e-07
5 rs10483285 0.037 ADCY4 2.4e-03 1.3e-07 5.0e-05 2.0e-03
White Men
3 rs4016435 0.042 CTNNB1 7.3e-07 6.2e-07 1.5e-07 2.6e-07
8 rs1477908 0.177 MMP16 1.1e-05 2.3e-05 2.3e-07 4.1e-06
Black Women
1 rs2175254 0.035 RASAL2 2.6e-05 4.1e-07 1.0e-05 1.7e-05
8 rs10504824 0.014 WWP1 1.1e-06 9.1e-09 2.7e-07 5.9e-07
8 rs17609515 0.014 CPNE3 1.1e-06 9.1e-09 2.7e-07 5.9e-07
10 rs7911634* 0.011 PCDH15 1.7e-04 1.1e-08 1.3e-05 1.6e-04
White Women
2 rs16866493 0.011 6.1e-04 1.9e-07 5.2e-04 3.3e-04
2 rs878167 0.010 1.3e-04 4.8e-08 1.0e-04 6.4e-05
2 rs6731600 0.039 2.1e-05 9.7e-06 7.1e-08 5.2e-06
2 rs6721762 0.039 MPV17 3.2e-05 1.1e-05 2.3e-07 8.7e-06
11 rs955396 0.068 TOLLIP/MUC5B 4.4e-05 1.5e-06 9.3e-08 4.4e-05
19 rs3745816* 0.016 EML2 5.2e-05 8.8e-10 1.7e-04 4.6e-05
19 rs4445998* 0.015 EML2 5.4e-05 3.8e-10 3.1e-04 4.6e-05
19 rs1545040* 0.020 EML2 6.7e-04 1.6e-07 2.4e-03 6.8e-04

Interestingly, we have several common findings between the multiple-trait results in Table 8 and the single-trait results in Table 7. These common genes, such as NCK2, PCDH15, and EML2, can be of particular interest to the addiction research. In the following, we provide a brief overview of the multiple-trait findings.

Three SNPs, rs2377339, rs251133 and rs10483285, which are located in genes NCK2, STARD4-AS1 and ADCY4 respectively, reach the genome-wide significance in black men. In addition to NCK2, previous research has also provided evidence for ADCY4: it is associated with opioid dependence (Wang et al., 2005; Li et al., 2008). All these results support NCK2 and ADCY4 as potentially relevant genes to substance dependence.

Two other SNPs, rs4016435 and rs1477908, in genes CTNNB1 and MMP16, achieve the genome-wide significance level in white men. It has come to our attention that the gene CTNNB1 has been suggested by microarray studies of nicotine exposure in rats (Sullivan et al., 2004), but it is the first time that this gene is discovered to be related to substance dependence in a human study. In addition, MMP16 belongs to a family of genes (matrix metalloproteinases, i.e., MMPs) that is known to play an important role in drug addiction (Wright and Harding, 2009).

Four SNPs located in four different genes are discovered to be associated with substance dependence in black women. Similar to CTNNB1, RASAL2 is also a candidate gene for nicotine dependence from pathway analysis (Sullivan et al., 2004). Furthermore, multiple human genome-wide association studies identified PCDH15 to be associated with nicotine dependence (Uhl et al., 2008; Lind et al., 2010). These existing results provide partial support to our findings.

Eight other SNPs are identified using multiple addictions in white women. Similar to EML2, previous microarray study in mice has provided evidence that MPV17 is associated with alcohol dependence (Li et al., 2008). However, no human studies have suggested the association of these two genes with substance dependence yet.

Besides the SNPs/genes discussed above, there are other SNPs/genes showing strong evidence of association with substance dependence in our study, and those SNPs/genes warrant further investigation.

5 DISCUSSION

Understanding comorbidity related with addictions is one of the most pressing challenges with enormous public health significance (National Institute on Drug Abuse, 2010). In this work, we studied genetics of multiple addictions by analyzing the data from the Study of Addiction: Genetics and Environment (SAGE). To properly utilize the information collected by this study, we propose a novel statistical method to incorporate environmental factors into a nonparametric U-statistic (generalized Kendall’s tau) which can handle comorbidity of multiple traits. Compared with directly imposing a weight function on the U-statistic, the idea of inverse probability weighting is more natural and convenient. On the one hand, the inverse probability weighted U-statistic is asymptotically unbiased under the null hypothesis while the non-weighted and other weighted tests are not necessarily. On the other hand, the proposed test is free of tuning parameters, which is more convenient and accessible than other weighted tests.

A byproduct of our theoretical work is to confirm a previous finding that estimated propensity scores can be preferable to their true values in applications. It is shown that our semiparametric U-statistic has a smaller asymptotic variance with n-consistent propensity score estimates than with true propensity scores. Although this phenomenon has been revealed before, to the best of our knowledge this is the first time to formalize it in the areas of U-statistics and genetic association tests. Moreover, a recently proposed multiple-trait association test called “Scaled Multiple-phenotype Association Test” (SMAT) (Schifano et al., 2013) was brought to our attention by a referee. It is noteworthy that SMAT can only handle continuous phenotypes while our proposed test can take any hybrid of dichotomous, ordinal and quantitative traits. Since we focus on binary responses in our current investigation of addictions, we will leave the comparison study with SMAT to our future work.

We have demonstrated numerical performance of our method, and should note the topics that deserve further research. For example, a key assumption for the distribution of our statistic is that the propensity scores are estimated under the correct parametric model. We assessed the impact of model misspecification in simulation studies, and our empirical results did not reveal a major impact. Nonetheless, a deeper theoretical understanding is still important. Another issue is the choice of genotype coding in our method. As discussed in Section 2.6, our test is not invariant to the genotype coding and we provided a practical suggestion. Although it is not the focus of the current study, it warrants some future investigations.

Applying the new method (together with other methods) to the SAGE data leads to a few interesting findings. Firstly, the multiple-trait analysis reveals new markers that were not identified by the single addiction analysis. When a genetic signal is not strong enough for any single addiction and yet underlies multiple ones, it can become stronger (to a detectable level) by combining different substance dependencies.

Secondly, our analysis of the SAGE data reveals an advantage of adjusting for environmental factors. To study comorbidity, adjusted tests identified a few genetic variants to addiction but the unadjusted test did not have any findings. This agrees with the observations from our simulation studies. Most of the time, the inclusion of important environmental factors can increase the power to detect either the genetic effect or the gene-environment interaction. Even under the situation with a genetic effect only (no environmental effects), an unnecessary adjustment for the environmental factors has little effect on the power of a test.

Lastly, tests with different adjustments behave differently. Due to the nature of the real data analysis, we cannot really tell which method performs the best. In a real application, it is usually not practical to have one method that is always superior to all others. Therefore, it is useful that different adjusted tests work complementarily to each other in this data set.

Acknowledgments

The authors would like to thank Zhifa Liu for his assistance in biologically interpreting the findings from the data analysis. The authors also thank the editor, the associate editor, and two anonymous referees for their comments and suggestions that led to considerable improvements of the paper. This research is supported in part by grants R01 DA016750 and R01 DA029081 from the National Institutes of Health (NIH). The dataset used for the analyses described in this manuscript was obtained from dbGaP at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000092.v1.p1 through dbGaP accession number phs000092.v1.p. The data collection was funded by NIH grants U01 HG004422, U01 HG004446, U10 AA008401, P01 CA089392, R01 DA013423, U01 HG004438, and HHSN268200782096C.

A APPENDIX

We split our derivation of Theorem 2 into three steps as follows. The first step is to obtain an asymptotic representation of θ^. Under regularity conditions, there exists a n-consistent estimator θ^ of θ0. The following lemma presents the result, with its proof given in Appendix A.1.

Lemma 1. Let the parameter space Θ be an open set. Suppose that, there exists some δ > 0 and cθ0 > 0 such that pg(zi; θ) ∈ [δ, 1 − δ] for all θ satisfyingθθ0∥ ≤ cθ0 with g = 0, 1, 2 and i = 1, … , n;i(θ) is twice continuously differentiable; for each g = 0, 1, 2, condition (9) holds, and there exists constants Cθ0 > 0 and α > 0 such that for all θ satisfyingθθ0∥ ≤ cθ0, condition (10) holds; there exists a positive definite matrix Iθ0 such that 1ni=1nIθ0(zi)Iθ0. Then, there exists a root of the likelihood equations θ^ of θ0 which has the following representation

n(θ^θ0)=Iθ011ni=1nψθ0(Gi,zi)+op(1). (A.1)

The result of Lemma 1 is fairly standard for a root of the likelihood equations θ^ in the framework of maximum likelihood. We refer to Theorem 5.21 in van der Vaart (1998) and Theorem 4.17 in Shao (2003) as similar conclusions. A distinct part of this lemma is that the samples are only independent but not identically distributed due to the conditional inference given all the covariates. In other words, the covariates are regarded as non-random. This characteristic results in the unique conditions (9) and (10) involving the covariate zi’s, compared with the traditional theories. Thus, we provide a proof in Appendix A.1 for being clear and self-contained.

The second step is to investigate the asymptotic joint distribution of {UIPW(θ0),θ^}. The idea becomes clear with the conclusion of Lemma 1, as both UIPW(θ0) and θ^θ0 can be written in the form of a sum of independent random vectors. Hence, {UIPW(θ0),(θ^θ0)} becomes a sum of independent random vectors, on which we can apply the central limit theorem. Thus, we leave the proof in Appendix A.2 and present the result in the following lemma.

Lemma 2. In addition to the conditions in Lemma 1, assume that λmax(i=1nuiui)=O(n) and max1inλmax(γi1γi1+γi2γi2)=o[λmin{i=1n(γi1γi1+γi2γi2)}]. Then, under the null hypothesis0,

nΩθ012[UIPW(θ0)θ^θ0]N(0,Ip+d), (A.2)

in distribution, conditioning on all the traits Y = y and covariates Z = z. In (A.2),

Ωθ0=(θ0Γθ0Iθ01Iθ01Γθ0Iθ01),

where Σθ0 and Γθ0 are defined in Section 2.4.

Finally, as the last step, the asymptotic distribution of U^IPW follows from the joint asymptotic distribution of UIPW(θ0) and θ^, borrowing the idea from Pierce (1982) and Randles (1982). The proof of this step can be found in Appendix A.3.

A.1 Proof of Lemma 1

In this section, all probability related arguments/operations will be conditioning on the covariates. However, to simplify the notation, we still write E(·) or var(·) instead of E(· | Z = z) or var(· | Z = z).

We first prove that n(θ^θ0)=Op(1). This is implied by the fact that for any ϵ > 0, there exists C > 0 and n0 > 1 such that

P{log(θ)log(θ0)<0for allθBn(C)}1ϵ,n>n0, (A.3)

where log(θ)=i=1nlogi(θ) and ∂Bn(C) is the boundary of Bn(C)={θ:nθθ0C}. Let Ψn(θ)=1ni=1nψθ(Gi,zi). The Taylor expansion gives that

1n{log(θ)log(θ0)}=Ψn(θ0)(θθ0)+12(θθ0)Ψn(θ)θ(θθ0), (A.4)

where θ is the generic notation of a vector lying between θ0 and θ. We will show at the end that,

Ψn(θ0)=Op(n12),Ψn(θ)θ+Iθ0=op(1). (A.5)

Combining (A.4) and (A.5),

1n{log(θ)log(θ0)}=θθ0Op(n12)12(θθ0){Iθ0+op(1)}(θθ0),

therefore (A.3) holds with large enough C and n0. The n-consistency of θ^ is proved.

To obtain the asymptotic representation (A.1) of θ^, we consider the Taylor expansion of Ψn(θ^) at θ0. On the one hand, Ψn(θ^)=0 by the definition of a root of the likelihood equations; on the other hand,

Ψn(θ^)=Ψn(θ0)+Ψn(θ)θ(θ^θ0), (A.6)

where θ lies between θ0 and θ^. Then the representation (A.1) in Lemma 1 holds by (A.6), n(θ^θ0)=Op(1), and the same result as the second part of (A.5) but with θ denoting a vector between θ0 and θ^ (which will be proved immediately).

At the end, we provide the proof of (A.5). For Ψn(θ0), it is seen that

E{Ψn(θ0)}=0,nvar{Ψn(θ0)}=1ni=1nIθ0(zi)Iθ0,

because of the exchangeability of the partial derivative and integration with respect to a discrete measure. Then, for any ϵ > 0, we can choose Cϵ large enough such that

P{nΨn(θ0)>Cϵ}Cϵ2E{nΨn(θ0)2}=Cϵ2tr[nvar{Ψn(θ0)}]<ϵ,

This is the first part of (A.5). For the second part, we need to show it holds for θ satisfying either nθθ0C or nθθ0=Op(1). In either case, we have that

Ψn(θ)θ=Ψn(θ0)θ+op(1), (A.7)
E{Ψn(θ0)θ}=1ni=1nIθ0(zi)Iθ0, (A.8)
var{Ψn(θ0)θc}=1n2i=1nvar[{θψθ0(Gi,zi)}c]0, (A.9)

for an arbitrary d-dimensional vector c. (A.7) follows from the following equation

Ψn(θ)θ=1ni=1ng=02I(Gi=g){pg1(zi;θ)2θθpg(zi;θ)pg2(zi;θ)θpg(zi;θ)θpg(zi;θ)}

and the conditions (9) and (10) in Lemma 1. (A.8) follows from the exchangeability of the partial derivative and integration with respect to a discrete measure. (A.9) follows from the condition (9) in Lemma 1. By Markov’s inequality, for any ϵ > 0,

P[{Ψn(θ0)θ+Iθ0}c>ϵ]ϵ2E[{Ψn(θ0)θEΨn(θ0)θ}c2]+ϵ2E[{EΨn(θ0)θ+Iθ0}c2]=ϵ2tr[var{Ψn(θ0)θc}]+ϵ2{EΨn(θ0)θ+Iθ0}c20. (A.10)

The second part of (A.5) is implied by (A.7) and (A.10).

A.2 Proof of Lemma 2

In the next two subsections (Sections A.2 and A.3), all probability related arguments/operations will be conditioning on the traits and covariates. However, to simplify the notation, we still write E(·) or var(·) instead of E(· | Y = y, Z = z) or var(· | Y = y, Z = z).

From the Cramér-Wold device, it suffices to find the asymptotic distribution of c1UIPW(θ0)+c2(θ^θ0) for arbitrary p- and d-dimensional vectors c1 and c2. As nUIPW(θ0)=Op(1) from Theorem 1 and the condition λmax(i=1nuiui)=O(n), it is seen that

n{c1UIPW(θ0)+c2(θ^θ0)}=1ni=1n[2c1uiGieθ0(zi)+c2Iθ01ψθ0(Gi,zi)]+op(1). (A.11)

A direct calculation gives its variance

σn2=var[1ni=1n{2c1uiGieθ0(zi)+c2Iθ01ψθ0(Gi,zi)}]=c1[4ni=1nuiuiνθ0(zi)eθ02(zi)]c1+c2[Iθ011ni=1nIθ0(zi)Iθ01]c2 (A.12)
+2c1[2ni=1nuiE{Gi,ψθ0(Gi,zi)}Iθ01eθ0(zi)]c2, (A.13)

where we have in (A.12) that

c2[Iθ011ni=1nIθ0(zi)Iθ01]c2c2Iθ01c2,n,

and in (A.13) that

2c1[2ni=1nuiE{Giψθ0(Gi,zi)}Iθ01eθ0(zi)]c2=2c1(2ni=1nuig=02[E{GiI(Gi=g)}pg1(zi;θ0)θpg(zi;θ0)]Iθ01eθ0(zi))c2=2c1[2ni=1nuig=02{gθpg(zi;θ0)}Iθ01eθ0(zi)]c2.

Therefore,

σn2=c1θ0c1+c2Iθ01c2+2c1Γθ0Iθ01c2+o(1).

In order to apply the central limit theorem as in Corollary 1.3 in Shao (2003), we need to rewrite (A.11) into

1ni=1n[2c1uiGieθ0(zi)+c2Iθ01ψθ0(Gi,zi)]=1ni=1ndi{RiE(Ri)},

with di = (di1, di2)′, Ri = {I(Gi = 1), I(Gi = 2)}′, and

di1=2c1uieθ0(zi)+c2Iθ01{p11(zi;θ0)θp1(zi;θ0)p01(zi;θ0)θp0(zi;θ0)}=(2c1,c2Iθ01)γi1,di2=4c1uieθ0(zi)+c2Iθ01{p21(zi;θ0)θp2(zi;θ0)p01(zi;θ0)θp0(zi;θ0)}=(2c1,c2Iθ01)γi2

using the notation introduced in Lemma 2.

From the condition max1inλmax(γi1γi1+γi2γi2)=o[λmin{i=1n(γi1γi1+γi2γi2)}], we see that

max1indi2i=1ndi20.

The conditions in Lemma 2 also lead to infn,i λmin({var(Ri)}) > 0 and supn,i E(∥Ri2+δ) < ∞ for δ = 2. These regularity conditions imply that

1σnn{c1UIPW(θ0)+c2(θ^θ0)}N(0,1)

in distribution. If Ωθ0 is positive definite, then substituting (c1,c2)=(c1,c2)Ωθ012 already leads to the result in Lemma 2.

The last piece to prove is the positive definiteness of Ωθ0. Let Vi = var(Ri) and Ai=diag(2Ip,Iθ01)(γi1,γi2), then

Ωθ0=1n(A1,,An)diag(V1,,Vn)(A1,,An)+o(1).

We see that infnmin{diag(V1, … , Vn)}] > 0. In addition, there exists some δn > 0,

(x,y)(A1,,An)2=(2x,yIθ01){i=1n(γi1γi1+γi2γi2)}(2x,yIθ01)δn(x,y)2,

for arbitrary p- and d-dimensional vectors x and y. Therefore, for n sufficiently large,

(x,y)Ωθ0(x,y){δn(2n)}infn[λmin{diag(V1,,Vn)}](x,y)2, (A.14)

which implies the positive definiteness of Ωθ0.

A.3 Proof of Theorem 2

The proof follows from the idea in Pierce (1982) and Randles (1982) who provided a general guidance of deriving the asymptotic distribution of statistics with estimated parameters. In our situation, the statistic is U^IPW=UIPW(θ^) where θ^ are the estimated parameters. The proof starts from the following fact,

U^IPW=UIPW(θ^)=UIPW(θ0)+θUIPW(θ)(θ^θ0), (A.15)

with some θ lying between θ0 and θ^. As

UIPW(θ)=2n1i=1nuiGieθ(Zi),

it is seen that

θUIPW(θ0)=2n1i=1nuiGiθeθ0(Zi)eθ02(Zi)=2n1i=1nuiGig=02{gθpg(Zi;θ0)}eθ02(Zi)=Γθ0{1+o(1)}+op(1). (A.16)

The equality in (A.16) follows from the facts that

E{θUIPW(θ0)}=nn1Γθ0,andvar{θUIPW(l)(θ0)}=4(n1)2i=1n{ui(l)}2vθ0(zi)g=02{gθpg(zi;θ0)}g=02{gθpg(zi;θ0)}eθ04(zi)0,

due to the condition max1inui2=o(n) and the first part of condition (9). In addition, since θθ0=Op(n12),

θUIPW(θ)θUIPW(θ0)=2θθUIPW(θ)(θθ0)=op(1), (A.17)

with θ between θ0 and θ. The equality in (A.17) follows from the fact that for each l = 1, … , p,

2θθUIPW(l)(θ)=2n1i=1nui(l)Gig=02{g2θθpg(Zi;θ)}eθ2(Zi)+4n1i=1nui(l)Gig=02{gθpg(Zi;θ)}g=02{gθpg(Zi;θ)}eθ3(Zi)=op(n),

by the condition max1inui2=o(n) and the condition (9). Substituting (A.16) and (A.17) into (A.15) leads to

nΛθ012U^IPW=nΛθ012UIPW(θ0)Λθ012Γθ0n(θ^θ0)+op(1)=Λθ012n{UIPW(θ0)Γθ0(θ^θ0)}+op(1) (A.18)
={Λθ012(Ip,Γθ0)Ωθ012}nΩθ012[UIPW(θ0)θ^θ0]+op(1). (A.19)

The equality in (A.18) follows if

Γθ02=O(1)andΛθ0122=O(1), (A.20)

where ∥A2 = {λmax(AA)}1/2 is the spectral norm for any matrix A. We will prove (A.20) at the end. Combining Lemma 2 and the fact that

{Λθ012(Ip,Γθ0)Ωθ012}{Λθ012(Ip,Γθ0)Ωθ012}=Ip,

(A.19) leads to the following convergence in distribution

nΛθ012U^IPWN(0,Ip).

At the end, we verify (A.20) to complete our proof. There exists a constant C > 0 such that

Γθ02Cg=021ni=1nguiθpg(Zi;θ0)2Cg=022ni=1nuiui212i=1nθpg(Zi;θ0)θpg(Zi;θ0)212=1nO(n)O(n)=O(1).

Also, for an arbitrary xRp,

xΛθ0x=x(Ip,Γθ0)Ωθ0(Ip,Γθ0)x=(x,xΓθ0)Ωθ0(x,xΓθ0)infn{λmin(Ωθ0)}x2,

With the condition λmin{i=1n(γi1γi1+γi2γi2)}nϵ in Theorem 2, δn in (A.14) can be replaced with for some δ > 0, which in turn implies that infnmin(Ωθ0)} > 0. Then we know infnmin(Λθ0)} > 0 according to (A.21). So Λθ0122=O(1).

References

  1. Akiyama M, Yatsu K, Ota M, Katsuyama Y, Kashiwagi K, Mabuchi F, Iijima H, Kawase K, Yamamoto T, Nakamura M, Negi A, Sagara T, Kumagai N, Nishida T, Inatani M, Tanihara H, Ohno S, Inoko H, Mizuki N. Microsatellite analysis of the GLC1B locus on chromosome 2 points to NCK2 as a new candidate gene for normal tension glaucoma. British Journal of Ophthalmology. 2008;92:1293–1296. doi: 10.1136/bjo.2008.139980. [DOI] [PubMed] [Google Scholar]
  2. Antczak A, Migdalska-Sek M, Pastuszak-Lewandoska D, Czarnecka K, Nawrot E, Domańska D, Kordiak J, Górski P, Brzeziańska E. Significant frequency of allelic imbalance in 3p region covering RARβ and MLH1 loci seems to be essential in molecular non-small cell lung cancer diagnosis. Medical Oncology. 2013;30:1–10. doi: 10.1007/s12032-013-0532-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bierut LJ, Agrawal A, Bucholz KK, Doheny KF, Laurie C, Pugh E, Fisher S, Fox L, Howells W, Bertelsen S, et al. A genome-wide association study of alcohol dependence. Proceedings of the National Academy of Sciences. 2010;107:5082–5087. doi: 10.1073/pnas.0911109107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bierut LJ, Madden PA, Breslau N, Johnson EO, Hatsukami D, Pomerleau OF, Swan GE, Rutter J, Bertelsen S, Fox L, et al. Novel genes identified in a high-density genome wide association study for nicotine dependence. Human Molecular Genetics. 2007;16:24–35. doi: 10.1093/hmg/ddl441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bierut LJ, Strickland JR, Thompson JR, Afful SE, Cottler LB. Drug use and dependence in cocaine dependent subjects, community-based individuals, and their siblings. Drug and Alcohol Dependence. 2008;95:14–22. doi: 10.1016/j.drugalcdep.2007.11.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bonovas S, Filioussi K, Tsantes A, Peponis V. Epidemiological association between cigarette smoking and primary open-angle glaucoma: a meta-analysis. Public Health. 2004;118:256–261. doi: 10.1016/j.puhe.2003.09.009. [DOI] [PubMed] [Google Scholar]
  7. Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand WH, Samani NJ, et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chen X, Cho K, Singer B, Zhang H. The nuclear transcription factor PKNOX2 is a candidate gene for substance dependence in European-origin women. PLoS One. 2011;6:e16002. doi: 10.1371/journal.pone.0016002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Drgon T, Montoya I, Johnson C, Liu Q-R, Walther D, Hamer D, Uhl GR. Genome-wide association for nicotine dependence and smoking cessation success in NIH research volunteers. Molecular Medicine. 2009;15:21. doi: 10.2119/molmed.2008.00096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Edenberg HJ, Koller DL, Xuei X, Wetherill L, McClintick JN, Almasy L, Bierut LJ, Bucholz KK, Goate A, Aliev F, et al. Genome-wide association study of alcohol dependence implicates a region on chromosome 11. Alcoholism: Clinical and Experimental Research. 2010;34:840–852. doi: 10.1111/j.1530-0277.2010.01156.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Edwards AC, Aliev F, Bierut LJ, Bucholz KK, Edenberg H, Hesselbrock V, Kramer J, Kuperman S, Nurnberger JI, Jr, Schuckit MA, et al. Genome-wide association study of comorbid depressive syndrome and alcohol dependence. Psychiatric Genetics. 2012;22:31–41. doi: 10.1097/YPG.0b013e32834acd07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Frank J, Cichon S, Treutlein J, Ridinger M, Mattheisen M, Hoffmann P, Herms S, Wodarz N, Soyka M, Zill P, et al. Genome-wide significant association between alcohol dependence and a variant in the ADH gene cluster. Addiction Biology. 2012;17:171–180. doi: 10.1111/j.1369-1600.2011.00395.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fuse N. Genetic bases for glaucoma. The Tohoku Journal of Experimental Medicine. 2010;221:1–10. doi: 10.1620/tjem.221.1. [DOI] [PubMed] [Google Scholar]
  14. Gu X, Rosenbaum P. Comparison of multivariate matching methods: structures, distances, and algorithms. Journal of Computational and Graphical Statistics. 1993;2:405–420. [Google Scholar]
  15. Hartel DM, Schoenbaum EE, Lo Y, Klein RS. Gender differences in illicit substance use among middle-aged drug users with or at risk for HIV infection. Clinical Infectious Diseases. 2006;43:525–531. doi: 10.1086/505978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Heath AC, Whitfield JB, Martin NG, Pergadia ML, Goate AM, Lind PA, McEvoy BP, Schrage AJ, Grant JD, Chou Y-L, et al. A quantitative-trait genome-wide association study of alcoholism risk in the community: findings and implications. Biological Psychiatry. 2011;70:513–518. doi: 10.1016/j.biopsych.2011.02.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Jiang Y, Zhang H. Propensity score-based nonparametric test revealing genetic variants underlying bipolar disorder. Genetic Epidemiology. 2011;35:125–132. doi: 10.1002/gepi.20558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Johnson C, Drgon T, Liu Q-R, Walther D, Edenberg H, Rice J, Foroud T, Uhl GR. Pooled association genome scanning for alcohol dependence using 104,268 SNPs: validation and use to identify alcoholism vulnerability loci in unrelated individuals from the collaborative study on the genetics of alcoholism. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2006;141:844–853. doi: 10.1002/ajmg.b.30346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Kendall MG. A new measure of rank correlation. Biometrika. 1938;30:81–93. [Google Scholar]
  20. Kendler KS, Kalsi G, Holmans PA, Sanders AR, Aggen SH, Dick DM, Aliev F, Shi J, Levinson DF, Gejman PV. Genomewide association analysis of symptoms of alcohol dependence in the molecular genetics of schizophrenia (MGS2) control sample. Alcoholism: Clinical and Experimental Research. 2011;35:963–975. doi: 10.1111/j.1530-0277.2010.01427.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Laird N, Horvath S, Xu X. Implementing a unified approach to family-based tests of association. Genetic Epidemiology. 2000;19:S36–S42. doi: 10.1002/1098-2272(2000)19:1+<::AID-GEPI6>3.0.CO;2-M. [DOI] [PubMed] [Google Scholar]
  22. Le-Niculescu H, McFarland M, Ogden C, Balaraman Y, Patel S, Tan J, Rodd Z, Paulus M, Geyer M, Edenberg H, et al. Phenomic, convergent functional genomic, and biomarker studies in a stress-reactive genetic animal model of bipolar disorder and co-morbid alcoholism. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2008;147:134–166. doi: 10.1002/ajmg.b.30707. [DOI] [PubMed] [Google Scholar]
  23. Li C-Y, Mao X, Wei L. Genes and (common) pathways underlying drug addiction. PLoS Computational Biology. 2008;4 doi: 10.1371/journal.pcbi.0040002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Lind PA, Macgregor S, Vink JM, Pergadia ML, Hansell NK, de Moor MH, Smit AB, Hottenga J-J, Richter MM, Heath AC, et al. A genomewide association study of nicotine and alcohol dependence in Australian and Dutch populations. Twin Research and Human Genetics. 2010;13 doi: 10.1375/twin.13.1.10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Liu Q-R, Drgon T, Johnson C, Walther D, Hess J, Uhl GR. Addiction molecular genetics: 639,401 SNP whole genome association identifies many cell adhesion genes. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2006;141:918–925. doi: 10.1002/ajmg.b.30436. [DOI] [PubMed] [Google Scholar]
  26. Liu Z, Guo X, Jiang Y, Zhang H. NCK2 is significantly associated with opiates addiction in african-origin men. The Scientific World Journal. 2013:2013. doi: 10.1155/2013/748979. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lunceford J, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine. 2004;23:2937–2960. doi: 10.1002/sim.1903. [DOI] [PubMed] [Google Scholar]
  28. Luo Z, Alvarado GF, Hatsukami DK, Johnson EO, Bierut LJ, Breslau N. Race differences in nicotine dependence in the collaborative genetic study of nicotine dependence (COGEND) Nicotine & Tobacco Research. 2008;10:1223–1230. doi: 10.1080/14622200802163266. [DOI] [PubMed] [Google Scholar]
  29. National Institute on Drug Abuse Comobidity: Addiction and other mental illnesses. Research Report Series, U.S. Department of Health and Human Services. 2010 NIH Publication Number 10-5771. [Google Scholar]
  30. Pierce D. The asymptotic effect of substituting estimators for parameters in certain types of statistics. Annals of Statistics. 1982;10:475–478. [Google Scholar]
  31. Rabinowitz D. A transmission disequilibrium test for quantitative trait loci. Human Heredity. 1997;47:342–350. doi: 10.1159/000154433. [DOI] [PubMed] [Google Scholar]
  32. Rabinowitz D, Laird N. A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Human Heredity. 2000;50:211–223. doi: 10.1159/000022918. [DOI] [PubMed] [Google Scholar]
  33. Randles RH. On the asymptotic normality of statistics with estimated parameters. Annals of Statistics. 1982;10:462–474. [Google Scholar]
  34. Reich T, Edenberg HJ, Williams JT, Van Eerdewegh P, Foroud T, Hesselbrock V, Schuckit MA, Bucholz K, Porjesz B, Li TK, Conneally PM, Nurnberger JIJ, Tischfield JA, Crowe RR, Cloninger CR, Wu W, Shears S, Carr K, Crose C, Willig C, Begleiter H. Genome-wide search for genes affecting the risk for alcohol dependence. American Journal of Medical Genetics. 1998;81:207–215. [PubMed] [Google Scholar]
  35. Rice JP, Hartz SM, Agrawal A, Almasy L, Bennett S, Breslau N, Bucholz KK, Doheny KF, Edenberg HJ, Goate AM, et al. CHRNB3 is more strongly associated with Fagerström Test for Cigarette Dependence-based nicotine dependence than cigarettes per day: phenotype definition changes genome-wide association studies results. Addiction. 2012;107:2019–2028. doi: 10.1111/j.1360-0443.2012.03922.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Robins J, Hernán M, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–560. doi: 10.1097/00001648-200009000-00011. [DOI] [PubMed] [Google Scholar]
  37. Robins J, Mark S, Newey W. Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics. 1992;48:479–495. [PubMed] [Google Scholar]
  38. Rosenbaum P. Model-based direct adjustment. Journal of the American Statistical Association. 1987;82:387–394. [Google Scholar]
  39. Rosenbaum P, Rubin D. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55. [Google Scholar]
  40. Schifano ED, Li L, Christiani DC, Lin X. Genome-wide association analysis for multiple continuous secondary phenotypes. The American Journal of Human Genetics. 2013;92:744–759. doi: 10.1016/j.ajhg.2013.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Shao J. Mathematical Statistics. 2nd Springer-Verlag New York, Inc; New York: 2003. [Google Scholar]
  42. Sullivan PF, Neale BM, van den Oord E, Miles MF, Neale MC, Bulik CM, Joyce PR, Straub RE, Kendler KS. Candidate genes for nicotine dependence via linkage, epistasis, and bioinformatics. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2004;126:23–36. doi: 10.1002/ajmg.b.20138. [DOI] [PubMed] [Google Scholar]
  43. Treutlein J, Cichon S, Ridinger M, Wodarz N, Soyka M, Zill P, Maier W, Moessner R, Gaebel W, Dahmen N, et al. Genome-wide association study of alcohol dependence. Archives of General Psychiatry. 2009;66:773. doi: 10.1001/archgenpsychiatry.2009.83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Uhl GR, Liu Q-R, Drgon T, Johnson C, Walther D, Rose JE. Molecular genetics of nicotine dependence and abstinence: whole genome association using 520,000 SNPs. BMC Genetics. 2007;8:10. doi: 10.1186/1471-2156-8-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Uhl GR, Liu Q-R, Drgon T, Johnson C, Walther D, Rose JE, David SP, Niaura R, Lerman C. Molecular genetics of successful smoking cessation: convergent genome-wide association study results. Archives of General Psychiatry. 2008;65:683. doi: 10.1001/archpsyc.65.6.683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. van der Vaart AW. Asymptotic Statistics. Cambridge University Press; New York: 1998. [Google Scholar]
  47. Wang H-Y, Friedman E, Olmstead M, Burns L. Ultra-low-dose naloxone suppresses opioid tolerance, dependence and associated changes in Mu opioid receptor-G protein coupling and G βγ signaling. Neuroscience. 2005;135:247–261. doi: 10.1016/j.neuroscience.2005.06.003. [DOI] [PubMed] [Google Scholar]
  48. Wang K-S, Liu X, Zhang Q, Pan Y, Aragam N, Zeng M. A meta-analysis of two genome-wide association studies identifies 3 new loci for alcohol dependence. Journal of Psychiatric Research. 2011;45:1419–1425. doi: 10.1016/j.jpsychires.2011.06.005. [DOI] [PubMed] [Google Scholar]
  49. Wang K-S, Liu X, Zhang Q, Zeng M. ANAPC1 and SLCO3A1 are associated with nicotine dependence: Meta-analysis of genome-wide association studies. Drug and Alcohol Dependence. 2012;124:325–332. doi: 10.1016/j.drugalcdep.2012.02.003. [DOI] [PubMed] [Google Scholar]
  50. Wang X, Ye Y, Zhang H. Family-based association tests for ordinal traits adjusting for covariates. Genetic Epidemiology. 2006;30:728–736. doi: 10.1002/gepi.20184. [DOI] [PubMed] [Google Scholar]
  51. Wright JW, Harding JW. Contributions of matrix metalloproteinases to neural plasticity, habituation, associative learning and drug addiction. Neural Plasticity. 2009:2009. doi: 10.1155/2009/579382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Yang B-Z, Han S, Kranzler HR, Farrer LA, Gelernter J. A genomewide linkage scan of cocaine dependence and major depressive episode in two populations. Neuropsychopharmacology. 2011;36:2422–2430. doi: 10.1038/npp.2011.122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Zhang H, Liu C-T, Wang X. An association test for multiple traits based on the generalized Kendall's tau. Journal of the American Statistical Association. 2010;105:473–481. doi: 10.1198/jasa.2009.ap08387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Zhang H, Wang X, Ye Y. Detection of genes for ordinal traits in nuclear families and a unified approach for association studies. Genetics. 2006;172:693–699. doi: 10.1534/genetics.105.049122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Zhao H, Rebbeck T, Mitra N. A propensity score approach to correction for bias due to population stratification using genetic and non-genetic factors. Genetic Epidemiology. 2009;33:679–690. doi: 10.1002/gepi.20419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Zhu W, Jiang Y, Zhang H. Nonparametric covariate-adjusted association tests based on the generalized Kendall's tau. Journal of the American Statistical Association. 2012;107:1–11. doi: 10.1080/01621459.2011.643707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Zuo L, Zhang F, Zhang H, Zhang X-Y, Wang F, Li C-SR, Lu L, Hong J, Lu L, Krystal J, et al. Genome-wide search for replicable risk gene regions in alcohol and nicotine co-dependence. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2012a;159:437–444. doi: 10.1002/ajmg.b.32047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Zuo L, Zhang X-Y, Wang F, Li C-SR, Lu L, Ye L, Zhang H, Krystal JH, Deng H-W, Luo X. Genome-wide significant association signals in IPO11-HTR1A region specific for alcohol and nicotine codependence. Alcoholism: Clinical and Experimental Research. 2012b doi: 10.1111/acer.12032. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES