Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 Jun 17;49(5):e70013. doi: 10.1002/gepi.70013

A Robust Association Test Leveraging Unknown Genetic Interactions: Application to Cystic Fibrosis Lung Disease

Sangook Kim 1,2, Yu‐Chung Lin 1,2, Lisa J Strug 1,2,3,4,5,
PMCID: PMC12172146  PMID: 40525595

ABSTRACT

For complex traits such as lung disease in Cystic Fibrosis (CF), Gene x Gene or Gene x Environment interactions can impact disease severity but these remain largely unknown. Unaccounted‐for genetic interactions introduce a distributional shift in the quantitative trait across the genotypic groups. Joint location and scale tests, or full distributional differences across genotype groups can account for unknown genetic interactions and increase power for gene identification compared with the conventional association test. Here we propose a new joint location and scale test (JLS), a quantile regression‐basd JLS (qJLS), that addresses previous limitations. Specifically, qJLS is free of distributional assumptions, thus applies to non‐Gaussian traits; is as powerful as the existing JLS tests under Gaussian traits; and is computationally efficient for genome‐wide association studies (GWAS). Our simulation studies, which model unknown genetic interactions, demonstrate that qJLS is robust to skewed and heavy‐tailed error distributions and is as powerful as other JLS tests in the literature under normality. Without any unknown genetic interaction, qJLS shows a large increase in power with non‐Gaussian traits over conventional association tests and is slightly less powerful under normality. We apply the qJLS method to the Canadian CF Gene Modifier Study (n = 1,997) and identified a genome‐wide significant variant, rs9513900 on chromosome 13, that had not previously been reported to contribute to CF lung disease. qJLS provides a powerful alternative to conventional genetic association tests, where interactions may contribute to a quantitative trait.

1. Introduction

Conventional association tests in genome‐wide association studies (GWAS) aim to detect a change in the conditional mean for a quantitative phenotype across genotypic groups at a genetic polymorphism. For a bi‐allelic single‐nucleotide polymorphism (SNP), Paré et al. (2010) demonstrated that an unknown interaction of a gene with another (G x G) or with an external environmental factor (G x E) can induce a change in the conditional distribution of the phenotype across the genotypic groups, which manifests as heterogeneity in the conditional variance. Hence, an unknown genetic interaction can be detected through a variance test. Since genetic interactions are generally unknown or difficult to measure a priori, this indirect approach, which does not require specification of variables interacting with a given polymorphism, is a convenient way to detect contributing genetic variants that may be obscured by unknown genetic interactions. When the data on the interacting variables are collected, modeling the G x G and G x E interactions is a more powerful alternative (Paré et al. 2010), but is an impractical solution due to an overwhelming number of two‐way G x G or higher interaction combinations, that introduces a multiple hypothesis testing and computational burden. This can be avoided with indirect variance tests. As a result, there has been a renewed interest in the classical tests for variance heterogeneity e.g. Bartlett's test (Bartlett 1937) and Levene's test (Levene 1961). These tests were originally developed to verify the underlying assumption in the analysis of variance. More recently, a class of new statistical methods test heterogeneity in both the conditional mean and variance jointly, or test the difference at higher moments or in the phenotypic distribution across the genotypic groups. Through simulation studies, Soave et al. (2015) showed that, in the presence of an unknown genetic interaction, these methods can be more powerful than the conventional location test. The improvement in power is especially important for GWAS of rarer diseases such as Cystic Fibrosis (CF) where the total number of cases available to study is limited.

Several joint location and scale tests have been developed (Cao et al. 2014; Rönnegård and Valdar 2011; Soave et al. 2015; Staley et al. 2021), and they generally jointly model the location and scale of the conditional phenotypic distribution, assuming the errors are normally distributed. The main difference between these joint location and scale tests lies in the estimation of parameters for the conditional variance and the iterative process to fit the joint models. The parameters for the conditional variance are estimated either based on squared residuals or absolute deviations, analogous to Bartlett's test and the Brown‐Forsythe test (Brown and Forsythe 1974), respectively, for discrete predictors. In joint location and scale tests such as the joint location‐scale test (Soave et al. 2015; Soave and Sun 2017) and joint location‐scale score test (Staley et al. 2021), the location and scale parameters are fitted separately whereas, in others such as the likelihood ratio test (Cao et al. 2014) and double generalized linear model (Rönnegård and Valdar 2011), the iterative process cycles between the location model and the scale model, allowing for joint estimation of the location and scale parameters. Although these methods perform well under normality, phenotypes are commonly non‐normal. Rank‐based inverse normal transformation is a popular data transformation technique in genetic epidemiology, as a remedial measure for non‐normal error (Rönnegård and Valdar 2011; Soave et al. 2015; Beasley et al. 2009; McCaw et al. 2020). However, the impact of applying this transformation on the type I error and power of the joint location and scale tests is not well understood. We provide a review of these joint location and scale tests and conduct simulation studies to assess the impact of applying rank‐based inverse normal transformations on the type I error and power, under non‐normality. We then propose an alternative.

Another category of methods aim at detecting changes in the mean, variance and beyond (Hong et al. 2017) or in the unconditional quantiles (Aschard et al. 2013) without requiring any distributional assumption. Although these methods are robust to non‐normal error, they tend to fare much worse than the joint location and scale tests that assume normality when the distribution of the error term is close to a normal distribution (Soave et al. 2015). Additionally, these methods are not designed to incorporate continuous covariates, which can lead to spurious results if these confound the genetic association. Lastly, computational efficiency makes these methods difficult to implement genome‐wide.

Quantile regression (Koenker and Bassett 1978) provides a natural and robust framework to study the conditional distribution without making any distributional assumptions. Its applications in genomics have been growing in recent years (Briollais and Durrieu 2014). Miao et al. (2022) developed a variance test based on quantile regression to detect an unknown genetic interaction. Here, we propose a new joint location and scale test based on quantile regression that does not require a normality assumption. Our method is robust to non‐normal error and is computationally efficient, while its power under normality is comparable to other joint location and scale tests that assume normality. We compare the performance of these tests to our new test in simulation studies that include skewed and heavy‐tailed error distributions. Finally, we apply our method in a GWAS of lung disease in Canadians with CF.

2. Methods

For illustrative purposes, we consider a simple scenario with a single binary exposure variable interacting with a bi‐allelic SNP. It can trivially be generalized to scenarios with multiple exposure variables of continuous or discrete types, or other polymorphisms. We note that the exposure does not necessarily have to be environmental (i.e. G x E interaction) and can involve another SNP (i.e. G x G interaction). We assume that the unknown interacting variables do not confound the association between G and the response, since confounders that are unaccounted for may lead to spurious findings.

2.1. Notation and Genetic Interaction Model

Let Yi denote a quantitative trait, Gi, the number of minor alleles, Ei, a single binary exposure status and Zi, a vector of p covariates for the ith individual in a sample of size N. We use the bold character to indicate a matrix. For simplicity, we suppress the subscript i whenever it is convenient. The data generating model with an interaction between G and E is specified as follows:

Y=α0+αGG+αEE+ZTαZ+αGEG·E+ε (Model 1)

where α0,αG,αE,αZ and αGE are respectively the intercept, effect of G, effect of E, effects of Z and the interaction between G and E, and ε is an independent and identically distributed (IID) random error with a mean 0 and variance σ2. In practice, the exposure E is generally unknown and/or unmeasured a priori. Therefore, the conventional association test ignores this genetic interaction and relies on the following working model:

Y=β0+βGG+ZTβZ+ε (Model 2)

where β0,βG and βZ are the new intercept, effect of G and effects of Z, respectively, and ε is the new error, assumed to be IID. However, in the presence of an unknown genetic interaction (i.e. αGE0), the new error ε is no longer identically distributed. We illustrate the effect of the unknown genetic interaction in Figure 1 by simulating data from Model 1 based on a standard normal error.

Figure 1.

Figure 1

Illustration of the effect of a binary variable E interacting with a bi‐allelic variant G on a Gaussian phenotype Y: the data were simulated based on Model 1 with no covariates Z, fixing the main effects αG=0 and αE=0.3 with a sample size of 1,000,000. E was generated from a Bernoulli random variable with a probability of success pE=0.3 and G was generated based on a minor allele frequency MAF=0.3. In (a), the interaction effect was set to αGE=0 and in (b), the interaction effect was set to αGE=2.

When the genetic interaction is not accounted for, the shape of the phenotypic distributions is shifted across the genotypic groups. In this particular simulation in which the exposure E was binary, the phenotypic distribution is pulled further apart from the unexposed to the exposed group when the number of minor alleles increases, resulting in a bi‐modal shape.

Based on this finding, the unknown genetic interaction can be detected by measuring the shift in the phenotypic distribution across the genotypic groups (Aschard et al. 2013). Since the shift in the phenotypic distribution induces variance heterogeneity, Paré et al. (2010) proposed to capture the unknown genetic interaction by a variance test. They provided an explicit form of the conditional phenotypic variance for a given genotypic group as follows:

Var(Y|G=g,Z=z)=σE2(αG+αGEg)2+σ2

where σE2 is the variance of E. Since the aim of GWAS is to detect any association between SNPs and a given trait, a shift in the phenotypic distributions or variance is of interest. The conventional association test which targets the location shift only is not adequate for this purpose. Instead, tests targeting the location and scale shifts jointly or, more generally, distributional shifts can capture more general genotype effects. Furthermore, (Soave et al. 2015) showed through simulations that these tests can achieve a superior power compared with the conventional test in the presence of an unknown genetic interaction.

2.2. Joint Location and Scale Tests and Rank‐Based Inverse Normal Transformation

We provide a review of four joint location and scale tests: likelihood ratio test (LRT) (Cao et al. 2014); double generalized linear model (DGLM) (Rönnegård and Valdar 2011); joint location‐scale (JLS) test (Soave et al. 2015); and joint location‐scale score (JLSS) test (Staley et al. 2021), in Section 3 of Supporting Information. These differ generally by the estimation of scale parameters, the iterative process to fit the joint models and ability to account for the correlation between location and scale parameter estimates. These joint location and scale tests require normality of the error term. When this assumption is violated, a common remedial measure is to transform the data. Rank‐based inverse normal transformation (INT) is a popular data transformation technique in genetic epidemiology, mapping a variable into another that is perfectly normal. The idea of this data transformation is to estimate the empirical quantiles of the variable of interest through its fractional ranks and to apply a quantile function of the normal random variable. The INT is defined as follows:

Φ1ricN2c+1

where N is the sample size, ri is the fractional rank of the ith observation (i.e. rank of ith observation/N), c[0,1] is an offset term and Φ() is the quantile function of the standard normal random variable. Since fractional ranks of 0 or 1 become or + after the application of the quantile function, respectively, the role of the linear transformation with the offset c is to avoid a value of 0 or 1. A popular offset value is c=3/8 (Blom 1958) but other alternatives exist, such as c=0 (Van der Waerden 1952), c=1/3 (Tukey 1962) and c=1/2 (Bliss 1967). The choice of the offset value is immaterial since these are approximately a linear transformation of each other and close to the expected normal scores (Tukey 1962; Beasley et al. 2009).

2.3. Direct INT

Applying the INT directly to the marginal phenotype is referred to as the direct INT (D‐INT) (McCaw et al. 2020). This is perhaps the most popular transformation in GWAS among the different forms of INT since the transformed phenotype perfectly follows the standard normal distribution. The steps for analyses with D‐INT are highlighted as follows:

  • 1.

    Apply the INT to Y to obtain YT.

  • 2.

    Apply the regression model using YT as the response and G and Z as predictors.

Although the idea may appear appealing, the D‐INT does not guarantee that the normality assumption is met since this distributional assumption in regression analysis applies to the error term, not to the marginal response variable. For this reason, a second approach, indirect INT, is considered.

2.4. Indirect INT

The indirect INT (I‐INT) consists of applying the INT to the residuals of the phenotype after regressing on the covariates Z which we denote by RY|Z. We review two variants of this type discussed in McCaw et al. (2020). The first I‐INT which we refer to as single‐adjusted INT (I‐INT1) is using RY|Z as the response in the regression model with G and Z as predictors. The following steps implement the I‐INT1:

  • 1.

    Regress Y on Z and obtain the residuals RY|Z.

  • 2.

    Apply the INT to RY|Z to obtain YT.

  • 3.

    Apply the regression model using YT as the response and G and Z as predictors.

We note that step 1 is to orthogonalize the phenotype Y with respect to the space spanned by the columns of Z and, therefore, Z should be omitted from the regression model in step 3. However, due to the nonlinear change by INT in step 2, the residuals RY|Z are not perfectly orthogonal to the space spanned by the columns of Z, which explains the second adjustment for Z in step 3 (Sofer et al. 2019).

A second type, referred to as double‐adjusted INT (I‐INT2), consists of orthogonalizing both the phenotype Y and genotype G with respect to the space spanned by the columns of Z, yielding RY|Z and GT, respectively. The INT is applied to the residualized phenotype RY|Z yielding YT. Then, a regression model can be applied using YT as the response and GT as the predictor, omitting Z. The steps for analyzing data with I‐INT2 are shown as follows:

  • 1.

    Regress G on Z and obtain the residuals GT.

  • 2.

    Regress Y on Z and obtain the residuals RY|Z.

  • 3.

    Apply the INT to RY|Z to obtain YT.

  • 4.

    Apply the regression model using YT as the response and GT as the only predictor (omitting Z).

Similar to the I‐INT1, the transformed phenotype YT is not perfectly orthogonal to the space spanned by the columns of Z due to the INT in step 3 and, therefore, Z may be required in the regression model in step 4. However, our simulation studies in Section 4 of Supporting Information show that the I‐INT2 without this second adjustment has a good control of type I error overall.

We note that the extent of the INTs impact on type I error and power is not well understood. In particular, the INT is a nonlinear transformation that changes the phenotypic scale, and, as a consequence, the performance of scale tests would be expected to be affected by INT. We conducted extensive simulation studies to investigate the effect of INT on the joint location and scale tests. The results showed that applying I‐INT2 leads to a well‐controlled type I error and can improve the power under non‐normality. The details are provided in Section 4 of Supporting Information. However, we are unsure how generalizable these results are outside of the specific simulation scenarios investigated. Thus, in the next section, we propose a new robust approach in which the type I error does not depend on a prior distributional assumption and is as powerful as other location and scale tests under normality. We used quantile regression as the test framework, hence, we refer to this test as the quantile regression‐based joint location‐scale (qJLS) test.

We note that there exists non‐parametric and semi‐parametric tests which are robust, with the type I error unaffected by non‐normal error distributions. However, when the error distributions are moderately close to a normal distribution, these show a deficit in power compared with the joint location and scale tests. In addition, these methods are computationally inefficient and cannot adjust for continuous covariates, which may be confounders such as principal components for genetic ancestry.

2.5. Quantile Regression‐Based Joint Location‐Scale Test (qJLS)

Consider the following linear quantile regression model:

QY(τ|G,Z)=β0(τ)+βG(τ)G+ZTβZ(τ)

where QY(τ|G,Z) is the τth quantile of the phenotype Y for a given SNP G and p‐dimensional covariates Z. The parameters β0(τ), βG(τ) and βZ(τ) are the quantile‐specific intercept, effect of G and effects of Z, respectively. Since the unknown genetic interaction distorts the conditional phenotypic distribution across the genotypic groups, we are interested in detecting any quantile‐specific effect of G across the quantiles. In other words, the hypothesis of interest is the following:

H0:βG(τ)=0 for all τ(0,1)
H1:βG(τ)0 for some τ(0,1)

Under some conditions, (Koenker and Bassett 1978) provided the asymptotic distribution of L‐variate estimators (βˆG(τ1),βˆG(τ2),,βˆG(τL))T under the null hypothesis, as follows:

nβˆG(τ1)βˆG(τL)N0,ΩQ01

where Ω is a L×L matrix with the following elements:

ωij=min(τi,τj)τiτjf(F1(τi))f(F1(τj))

with F, the distribution function of ε with associated density f, and

Q0=limnn1GTG.

Based on this result, a Wald‐type test can be constructed. However, the asymptotic variance‐covariance matrix is a function of the density function of the error ε. Due to the added uncertainty in estimating this density, the Wald‐type test can show an inflated type I error, as seen in our preliminary simulation study (Supporting Information: Table S8). Alternatively, robust alternatives such as kernel‐based estimators can be considered. In our preliminary simulation studies, although these robust estimators showed a reduction in the type I error, an inflation in type I error occurred at quantiles where the data are sparse or when the sample size is relatively small, consistent with the findings by Song et al. (2017). Estimation based on resampling is another alternative but is computationally inefficient for GWAS. A second challenge with this approach is that one cannot consider all possible τ's and needs to select quantiles to test. Our preliminary simulation studies revealed that the choice of quantiles affects the power. As a trivial example, when the distributional shift occurs at specific quantiles, the test showed no power if these quantiles were not selected in the test. The joint location and scale test by Koenker and Xiao (2002) alleviates this problem of quantile selection, however, it requires estimation of the error density and the asymptotic null distribution does not have a closed form.

For these reasons, we implement the rankscore test for quantile regression (Gutenbrunner and Jureckova 1992; Gutenbrunner et al. 1993), which is free of the error density and is computationally efficient. Briefly, suppose the null hypothesis holds (i.e. βG(τ)=0 for all τ) and the true regression model is the following:

QY(τ|Z)=β0(τ)+ZTβZ(τ)

The regression rankscore process for the ith observation is defined as follows:

aˆi(τ)=I[Yi>ZTβˆZ(τ)]

where I() is the usual indicator function and βˆZ(τ) are the regression parameters estimated under the null hypothesis. In other words, after fitting the model under the null hypothesis, if the ith residual lies above the fitted hyperplane, aˆi(τ) takes a value of 1 and 0 otherwise. To obtain the regression rankscore bˆi, aˆi(τ) are combined over τ(0,1) with some weighting function φ(τ) called the score function, as follows:

bˆiφ=01aˆi(τ)dφ(τ).

The score function φ(τ) plays a central role because the efficiency of the rankscore test is dictated by its choice. Choosing an optimal score function requires the knowledge of the error distribution and the nature of the distributional shift under the alternative hypothesis. (Koenker 2010) considered some useful distributional shifts and provided a list of optimal score functions for a selection of common error distributions. For example, when the error term follows a normal distribution with a constant location shift over all quantiles, the optimal score function is:

φL(τ)=Φ1(τ)

where Φ is the Gauss error function. Another example of interest is when the error follows a normal distribution with a constant scale shift over all quantiles. The corresponding optimal score function for this alternative is:

φS(τ)=(Φ1(τ))21.

From this, the following test statistic can be computed:

Tn=Sn2Σn

where Sn=n12i=1n(GiGi*)bˆiφ and Σn=(GG*)T(GG*)(01φ2(t)dtφ®2). Here, G* is the orthogonal projection of G onto the space spanned by the columns of the covariates Z and φ=01φ(t)dt. Under the null hypothesis and conditions defined in Section 5 of Supporting Information, Tn converges in distribution to a χ2(1) distribution. Essentially, the score Sn measures the linear association between (GG*), the covariate‐adjusted genotype, and bˆφ, the rankscores under the null hypothesis that are weighted by the selected score function. When no such association exists, we expect Sn to be 0, since the rank of the phenotype is not affected by the genotype.

Furthermore, when considering the alternative hypothesis, one can be specific about the nature of the distributional shift. For example, when a location shift occurs over all quantiles, the corresponding alternative is βG(τ)=βG for all τ. Under a local alternative of the type Hn:βn(τ)=βG(τ)/n, Gutenbrunner et al. (1993) showed that Tn converges in distribution to a non‐central χ2 distribution with 1 degree of freedom and the following non‐centrality parameter:

η=01f(F1(τ))βG(τ)dφ(τ)Σn.

From this result, the test optimality depends on the score function, error distribution and local alternative that is formulated for specific shifts in the distribution. When the error distribution and the nature of the distributional shift is known, the optimal score function can be derived by maximizing the non‐centrality parameter (Koenker 2010). However, the error distribution and the type of distributional shift are generally unknown a priori. In the case of GWAS with unknown interactions, we chose to maximize the efficiency for a location and scale shift under normality, and combine both rankscore tests using their joint asymptotic distribution. We refer to this test as qJLS. The score vector of the qJLS test is:

Sn=n12i=1n(GiGi*)bˆiφL,i=1n(GiGi*)bˆiφST

where φL and φS, as defined above, are the optimal score functions for the location shift and scale shift, respectively, under a normal error. The following is the test statistic for the qJLS test:

Tn=SnT𝚺n1Sn

where

Σn=(GG*)T(GG*)002(GG*)T(GG*).

Under the conditions defined in Section 5 of Supporting Information, the test statistic Tn converges in distribution to χ2(2) under the null hypothesis. The proof is provided in Section 5 of Supporting Information.

We note that the asymptotic distribution of the test statistic under the null hypothesis does not contain any parameters from the error distribution. Therefore, the type I error is independent of the error distribution. On the other hand, the score functions were chosen based on test optimality under a normal distribution. In other words, under a non‐normal error, the type I error of the qJLS test is still well‐controlled, but the power of the test is affected due to our choice of the score functions. In comparison, the other joint location and scale tests reviewed in Section 3 of Supporting Information assume normality but show an inflated type I error when the error distribution departs from normality (Section 4 of Supporting Information). Although I‐INT2 showed control of the type I error and/or improvements in statistical power, its statistical properties have yet to be defined formerly since the simulation studies only included a limited number of scenarios. In contrast, qJLS guarantees a well‐controlled type I error regardless of the error distribution, a main advantage over the other joint location and scale tests. The qJLS software package is publicly available on Strug Lab, along with the simulation code (https://github.com/strug-hub/qJLS).

2.6. Simulation Studies

We note that most simulation studies in the literature used linear heteroscedastic models as a data generating mechanism, which can only generate a location and/or scale shift. However, such a setting is limited since an unknown genetic interaction may modify the entire shape of the conditional distribution of the phenotype, beyond the location or scale. Here, we used the genetic interaction model (Model 1) as described in Paré et al. (2010). We extended the simulation settings by Aschard et al. (2013) to include non‐normal errors, quantile‐specific interaction effects and unknown background heterogeneity. Lastly, we used linear location and scale models to generate a pure location or scale shift without any genetic interaction, ideal for conventional location testing and scale testing, and to evaluate the loss of power by using joint location and scale tests compared with these conventional tests.

2.7. Power

2.7.1. Single Unknown Interaction

The following is the data generating model:

Y=βGG+βEE+βGEGE+βZZ+ε (Model 3)

The unknown exposure E was a Bernoulli random variable with a probability of success pE=0.3. We generated the number of minor alleles G with a minor allele frequency of p=0.3 and the covariate Z from a standard normal distribution such that these variables were weakly correlated, based on the approach in Cao et al. (2014). Briefly, we first sampled from a bivariate normal distribution N00,10.10.11. One of these two variables were discretized based on the minor allele frequency p=0.3, yielding G.

For each simulation study, we sampled the error term ε from (i) standard normal N(0,1); (ii) χ2(3); and (iii) empirical error distribution from the data application in Section 4. We then standardized the error term by subtracting its expected value and dividing by its standard deviation. The selected density functions of the error are displayed in Figure 2.

Figure 2.

Figure 2

Density functions of the error term ε for the simulation models: the selected random variables included standardized normal (in green), χ2(3) (in blue) and empirical distribution of the residualized top SNP in Section 4 (in red).

By fixing the main effects, we varied the interaction effect βGE from −1 to 1 by an increment of 0.1 and estimated the power at each increment at the genome‐wide significance level of 5×108 based on 100,000 replicates. We fixed the main effects at βG=0.01; βE=sign(βGE)×0.3; and βZ=0.3.

2.7.2. Single Unknown Quantile‐Specific Interaction

We modified Model 3 to induce an unknown genetic interaction that affects the upper and lower quantiles of the error term as follows:

Y=βGG+βEEβGEGEI(F(ε)<τL)+βGEGEI(F(ε)>τU)+βZZ+ε (Model 4)

where F() is the distribution function of the error term ε, I() is the usual indicator function, and τU and τL are some upper and lower quantiles, respectively. Essentially, in this setting, the interaction only affects the lower quantile below τL with a negative effect βGE, and upper quantiles above τU with a positive effect βGE. We fixed τU=0.8 and τL=0.3. The simulation settings were kept the same as in the scenario with a single unknown variable above except that we varied the interaction effect βGE from 0 to 1 by an increment of 0.1.

2.7.3. Two Unknown Interactions

The following model includes two binary variables E1 and E2 interacting with G:

Y=βE1E1+βE2E2+βGE1GE1+βGE2GE2+βZZ+ε (Model 5)

where we sampled E1 and E2 independently from a Bernoulli distribution with an equal probability of success pE1=pE2=0.3. We considered two scenarios in which (i) βGE1=0.2; and (ii) βGE1=βGE2. For the former scenario, we varied βGE2 from −1 to 1 by an increment of 0.1 and, for the latter, from 0 to 1 by an increment of 0.1. The main effects were fixed at βE1=0.3. The remaining simulation parameters were kept the same as in the scenario with a single unknown interaction.

2.8. Type I Error

2.8.1. Null Hypothesis

We generated data under the following model by applying βG=βGE=0 in Model 3:

Y=βEE+βZZ+ε (Model 6)

We kept the same setting as in the scenario with a single unknown interaction except that G was no longer correlated with Z by sampling independently. Since Z has an effect on Y whereas G does not, we removed the correlation between Z and G to maintain a coherent model. Additionally, for each simulation study, we varied the minor allele frequency (p{0.01,0.05,0.1,0.3}) and the sample size (N{200,500,1000,2000}) to study the impact of the minor allele frequency and sample size on the type I error for the tests considered. The type I error was estimated based on 1,000,000 replicates.

2.8.2. Null Hypothesis With Unknown Background Heterogeneity

Corty and Valdar (2018) considered some unknown source of heterogeneity other than the genetic component, which they termed a “background heterogeneity”, as opposed to the “foreground heterogeneity” that is of researchers’ interests. This nuisance, independent from G, may affect the conditional distribution of the phenotype including its location and scale but, since it is unknown, it is left unspecified in the statistical model. For example, forced expiratory volume in 1 second (FEV1) in Canadians with CF is heterogeneous across the age (Kim et al. 2018), hence, age can be a source of background heterogeneity if left unspecified. We study the effect of the background heterogeneity on the type I error with the following model:

Y=βEE+βZZ+βZLZL+εγZLZL+γ0 (Model 7)

where ZL is the unknown source of background heterogeneity that can affect the location and/or scale of the phenotype Y if βZL and γZL are non‐zero. We fixed βZL=0.3, γZL=0.5 and γ0=1, inducing location and scale shifts simultaneously. When estimating the type I error, we omitted ZL in the model as it is an unknown source. We note that, in this setting, the estimated parameters for the conditional mean and the parameters for the conditional variance are dependent due to their mutual dependence on ZL.

2.9. Comparison With Conventional Association Tests

In the presence of unknown genetic interactions, joint location and scale tests may have superior power compared with the conventional location or scale association tests. Here, we study the loss of power compared with the conventional association tests when there is no such interaction. We simulated data under a pure location shift as follows:

Y=βGG+βZZ+ε (Model 8a)

We varied the main effect βG from −0.5 to 0.5 by an increment of 0.1. The remaining simulation parameters are kept the same as in the scenario with a single unknown interaction. We compared the power of the joint location and scale tests with the classical linear regression which is optimal for this scenario when the error follows a Gaussian distribution.

Similarly, we generated a pure scale shift as follows:

Y=εγGG+γZZ (Model 8b)

where we applied a standard normal quantile function to Z so that ZUnif(0,1). We varied the scale effect γG from 0 to 0.5 by an increment of 0.1, keeping other simulation parameters the same as in the scenario with a single unknown interaction. We used the generalized Levene's test (Soave and Sun 2017) as the standard and compared the power with the joint location and scale tests.

2.10. Comparison of Methods

Using simulation, we compared the performance of the joint location and scale tests (reviewed in Section 3 of Supporting Information) with qJLS test. These include the double generalized linear model (DGLM), joint location‐scale test (JLS) and joint location‐scale score test (JLS‐Score). We note that the DGLM, JLS and JLS‐Score test require normality of the error term. Since the unknown interaction alters the shape of the distributions, the resulting distribution of the residuals will show a departure from normality based on the normal QQ‐plot or through statistical tests, even if the error term in Model 3 follows a normal distribution. On this basis, we applied the I‐INT2 as a remedial measure for all error distributions, including the normal distribution since our simulation studies in Section 4 of Supporting Information showed that the I‐INT2 removed the type I error inflation with skewed or heavy‐tailed error distributions and the other variants, D‐INT and single‐adjusted indirect INT (I‐INT1), still resulted in an inflated type I error or in numerical instability in some scenarios. Therefore, we implemented the I‐INT2 as the transformation function for the DGLM and JLS tests. Our preliminary simulation results showed that the JLS‐Score test remained robust in all simulation scenarios that we considered (Table S12 in Supporting Information). However, the power was considerably reduced without any data transformation under the alternative hypothesis in scenarios where the error term departed severely from normality. On this basis, we applied an INT for the JLS‐Score test but also considered the case without any data transformation. We chose the I‐INT2, solely for comparison purposes. We note that the power of the JLS‐Score test varied with the type of INT, error distribution and type of distributional shift in our simulation studies in Section 4 of Supporting Information and no single INT led to the highest power. We did not apply any transformation function for qJLS since the asymptotic distribution of its test statistic under the null hypothesis is independent of the error distribution under the null hypothesis and, therefore, the type I error is well‐controlled regardless of the error distribution. For the simulation studies with Model 8a and Model 8b that are based on pure location shift and pure scale shift without any interaction, we did not apply any transformation when the error was normal since the normality actually holds in these scenarios. When the error was non‐normal, we applied the direct INT (D‐INT) to the conventional location and scale tests, since this is commonly used in GWAS, and I‐INT2 for the joint location and scale tests except the qJLS test.

3. Results

3.1. Type I Error

Table 1 displays the empirical type I error, varying nominal levels and error distributions under Model 6 with minor allele frequency of 0.3 and sample size N=2,000.

Table 1.

Empirical type I error under Model 6 varying the error distribution.

Error Distribution Nominal p‐value No transformation I‐INT2
DGLM JLS JLS‐Score qJLS DGLM JLS JLS‐Score
Normal
5×102
0.050225 0.049610 0.049605 0.049219 0.048864 0.049365 0.049581
5×103
0.005031 0.004939 0.004921 0.004848 0.004747 0.004880 0.004916
5×104
0.000506 0.000477 0.000480 0.000478 0.000461 0.000472 0.000480
5×105
0.000063 0.000061 0.000056 0.000051 0.000053 0.000054 0.000056
χ2
5×102
0.204755 0.079244 0.049834 0.049380 0.048960 0.049550 0.049751
5×103
0.084472 0.018055 0.004860 0.004745 0.004667 0.004844 0.004797
5×104
0.037523 0.004398 0.000473 0.000504 0.000482 0.000483 0.000491
5×105
0.017276 0.001057 0.000048 0.000055 0.000048 0.000052 0.000055
Empirical Data
5×102
0.034136 0.058374 0.050120 0.049429 0.048979 0.049795 0.049976
5×103
0.003579 0.008962 0.005043 0.004904 0.004812 0.004933 0.004993
5×104
0.000412 0.001525 0.000507 0.000494 0.000472 0.000496 0.000506
5×105
0.000051 0.000263 0.000050 0.000047 0.000040 0.000050 0.000044

Note: The sample size and minor allele frequency (MAF) were fixed at 2000 and 0.3, respectively. Four tests including double generalized linear model (DGLM), joint location‐scale (JLS) test, joint location‐scale score (JLS‐Score) test and quantile regression based joint location‐scale (qJLS) test were used. For the first three tests, we applied no transformation or indirect double‐adjusted rank‐based inverse normal transformation (I‐INT2) to the data. The type I error was estimated based on 1,000,000 replicates.

Under the standard normal distribution, the type I error estimates for all joint tests without any data transformation are close to each nominal level as expected, and the distribution of p‐values for all joint tests appeared to be uniform with or without any INT (Supporting Information: Figure S3). However, the estimated type I error rate for the DGLM was slightly above all nominal levels. Under the skewed χ2 distribution, the inflation was evident for the DGLM with a large empirical type I error, as well as the JLS test that showed a relatively moderate inflation. For both methods, the inflation was corrected after applying the I‐INT2. The histograms of p‐values with I‐INT2 appeared to be uniform, further supporting the correction by the data transformation (Supporting Information: Figure S4). For the JLS‐Score test, the data transformation was unnecessary since the empirical type I error was below the nominal levels without the I‐INT2. Under the empirical data distribution, the empirical type I error for DGLM was well below the nominal levels. The histogram of p‐values shows a skewness to the left, indicating that the method is overly conservative for the given distribution (Supporting Information: Figure S5). After applying I‐INT2, the histogram of p‐values appeared uniform. A moderate inflation was noted for the JLS test but was corrected after applying the I‐INT2. The JLS‐Score test remained robust in this scenario with empirical type I error below the nominal levels. In all scenarios, qJLS did not show any evidence of inflation, as expected.

When the sample size was reduced, estimated type I error increased for the DGLM under the standard normal distribution whereas other methods were unaffected (Supporting Information: Table S1). This behavior for DGLM is consistent with the previous simulation studies by Soave and Sun (2017). The empirical type I error for the DGLM was slightly above the nominal levels for all sample sizes. At the nominal level of 0.05, the estimated type I error was above this level for all sample sizes except N=2,000. Furthermore, when N=200, approximately 0.011% of the replicates did not reach the numerical convergence. After applying the I‐INT2, the empirical type I error for the DGLM fell below the nominal levels. The transformation resolved the optimization problem with 100% of the replicates attaining the convergence. Under χ2, the DGLM showed a large empirical type I error and increased failures in numerical convergence, close to 0.37% when N=200, which were resolved by applying the I‐INT2 (Supporting Information: Table S2). A moderate inflation was noted for the JLS test, which was corrected after applying the I‐INT2. The results for the empirical data distribution followed the pattern observed when N=2,000 (Supporting Information: Table S3). The empirical type I error for the JLS‐Score and qJLS tests did not show any sign of inflation with the change in the sample size.

The reduction in the minor allele frequency affected the empirical type I error for all methods except the JLS test under the standard normal distribution (Supporting Information: Table S4). For the DGLM, type I error estimates were slightly above the nominal levels when MAF=0.01. For the JLS‐Score and qJLS tests, when the minor allele frequency decreased, the inflation increased at the tail of the null distribution, although at the nominal level of 0.05 no inflation was observed. In all cases, applying the I‐INT2 did not correct the inflation. A similar trend is observed under the χ2 and empirical data distribution (Supporting Information: Table S5 and Table S6, respectively). When the minor allele frequency decreased, a slight inflation was observed for the DGLM with I‐INT2 whereas this inflation occurred at the tail of the null distribution for the JLS‐Score test with I‐INT2 and the qJLS test. There was no evidence of inflation for the JLS test with I‐INT2 in any of the scenarios.

Table S7 (Supporting Information) displays the effect of an unknown background heterogeneity affecting simultaneously the location and scale of the phenotypic distribution on type I error, which results in a dependence between the estimators for the location parameters and the ones for the scale parameters. Since the DGLM and JLS test rely on the independence of estimators for the location and scale parameters, both methods were affected by the background heterogeneity. The DGLM exhibited a large inflation for all error distributions that included the standard normal distribution whereas the inflation was moderate for the JLS test. For both methods, the inflation was corrected after applying the I‐INT2. The JLS‐Score and qJLS tests remained robust to the unknown background heterogeneity.

3.2. Power

Figure 3 shows the estimated power under a single unknown interaction model (Model 3) as a function of the interaction effect βGE based on minor allele frequency of 0.3, sample size N=2,000 and 100,000 replicates at each value of βGE. Under the standard normal distribution, the power of the qJLS test was slightly greater than the other methods that required the I‐INT2. Among these three methods that required the I‐INT2, no difference in power was discernible. Under the χ2 distribution, the qJLS test showed the greatest power when the interaction effect was negative, closely followed by the DGLM. When the interaction effect was positive, JLS and JLS‐Score tests showed greater power than qJLS and DGLM. Under the empirical data distribution, the qJLS test was the most powerful for negative and positive values of βGE.

Figure 3.

Figure 3

Empirical power under an unknown genetic interaction model with a binary single variable (Model 3) by error distribution: the strength of the interaction effect βGE was varied. The top and bottom rows display negative and positive interaction effects, respectively. The error distributions, displayed in each column, included the standardized normal, χ2(3) and empirical distribution of the residualized top SNP in Application section. The sample size and minor allele frequency (MAF) were fixed at 2,000 and 0.3, respectively. Four tests including double generalized linear model (DGLM in green), joint location‐scale (JLS in blue) test, joint location‐scale score (JLS‐Score in purple) test and quantile regression based joint location‐scale (qJLS in red) test were used. For the first three tests, we applied the indirect double‐adjusted rank‐based inverse normal transformation (I‐INT2) to the data for all error distributions. The power was estimated based on 100,000 replicates with the nominal level 5×108.

Figure 4 shows the estimated power under the quantile‐specific unknown interaction model (Model 4). The qJLS test showed the greatest power for all error distributions. Interestingly, the power of the JLS and JLS‐Score tests was approximately 0 for all error distributions considered, showing that the JLS and JLS‐Score tests were not able to detect these distributional changes at the tail.

Figure 4.

Figure 4

Empirical power under a quantile‐specific unknown genetic interaction model with a binary single variable (Model 4) by error distribution: the unknown interaction effect βGE was varied, affecting the upper quantile of the error term above 0.8 positively with βGE>0 and the lower quantile below 0.3 negatively with βGE. The error distributions, displayed in each column, included the standardized normal, χ2(3) and empirical distribution of the residualized top SNP in Application section. The sample size and minor allele frequency (MAF) were fixed at 2,000 and 0.3, respectively. Four tests including double generalized linear model (DGLM in green), joint location‐scale (JLS in blue) test, joint location‐scale score (JLS‐Score in purple) test and quantile regression based joint location‐scale (qJLS in red) test were used. For the first three tests, we applied the indirect double‐adjusted rank‐based inverse normal transformation (I‐INT2) to the data for all error distributions. The power was estimated based on 100,000 replicates with the nominal level 5×108.

In Figure 5, the power was estimated for two interacting variables among which one was fixed with a weak effect of 0.2 and the interaction effect of the other was varied, based on Model 5. For the normal error, the estimated power was the greatest for the qJLS test. A similar pattern emerged for the χ2 and empirical data distributions.

Figure 5.

Figure 5

Empirical power under an unknown genetic interaction model with two independent binary variables, E1 with fixed effect, and E2 with varying effects, (Model 5) by error distribution: after fixing the interaction effect of E1 at βGE1=0.2, we varied the interaction effect of E2, βGE2. The top and bottom rows display negative and positive interaction effects of E2, respectively. The error distributions, displayed in each column, included the standardized normal, χ2(3) and empirical distribution of the residualized top SNP in Application section. The sample size and minor allele frequency (MAF) were fixed at 2000 and 0.3, respectively. Four tests including double generalized linear model (DGLM in green), joint location‐scale (JLS in blue) test, joint location‐scale score (JLS‐Score in purple) test and quantile regression based joint location‐scale (qJLS in red) test were used. For the first three tests, we applied the indirect double‐adjusted rank‐based inverse normal transformation (I‐INT2) to the data for all error distributions. The power was estimated based on 100,000 replicates with the nominal level 5×108.

Figure 6 displays the estimated power for two interacting variables with opposite interaction effects, based on Model 5. The qJLS test showed the greatest power for all error distributions. For the χ2 and empirical data distribution, the difference in the power between the qJLS test and the other three tests was more pronounced.

Figure 6.

Figure 6

Empirical power under an unknown genetic interaction model with two independent binary variables, E1 and E2 with opposite effects, (Model 5) by error distribution: the interactions effects of E1 and E2 were opposite i.e. βGE1=βGE2. The error distributions, displayed in each column, included the standardized normal, χ2(3) and empirical distribution of the residualized top SNP in Application section. The sample size and minor allele frequency (MAF) were fixed at 2000 and 0.3, respectively. Four tests including double generalized linear model (DGLM in green), joint location‐scale (JLS in blue) test, joint location‐scale score (JLS‐Score in purple) test and quantile regression based joint location‐scale (qJLS in red) test were used. For the first three tests, we applied the indirect double‐adjusted rank‐based inverse normal transformation (I‐INT2) to the data for all error distributions. The power was estimated based on 100,000 replicates with the nominal level 5×108.

3.3. Comparison With Conventional Association Test

Figure S1 (Supporting Information) shows the comparison of the power between the classical linear regression and the joint location and scale tests under a pure location shift model without any interaction (Model 8a), varying the main genetic effect and error distributions. In the scenario with the standard normal distribution, the classical regression is known to yield the best linear unbiased estimator. This theory is well demonstrated in Figure S1 (Supporting Information) in which linear regression was the most powerful, although the power reduction for the joint location and scale tests was moderate. The average difference over all effect sizes considered was 3% with the maximum difference of 11%. Further simulation studies showed that the qJLS test required approximately a 12% increase in sample size to achieve similar power under these conditions. When considering χ2, all joint location and scale tests showed a large improvement in the power compared with the classical regression, clearly demonstrating the benefits in this scenario.

Similarly, the generalized Levene's test for the variance heterogeneity was compared with the joint location and scale tests under a pure scale shift model (Model 8b), by varying the main variance effect of G and error distributions (Supporting Information: Figure S2). Surprisingly, DGLM and qJLS tests showed the greatest power with a large improvement over the generalized Levene's test under the standard normal distribution. The difference in power between DGLM and qJLS was minimal. JLS and JLS‐Score tests had a moderately inferior power compared with the generalized Levene's test. For the remaining error distributions, qJLS showed a large positive difference compared with the remaining tests. No difference in the power for the tests except qJLS can be discerned from Figure S2 (Supporting Information).

4. Application to Cystic Fibrosis

4.1. Motivation

Cystic Fibrosis (CF) is a life‐limiting recessive disease caused by loss‐of‐function variants in the CF transmembrane conductance regulator (CFTR). Individuals with the same CFTR genotype have variable disease severity across multiple CF‐affected organs including their lungs, such as variation in lung function which is explained in part by gene modifiers and is assumed to have a complex genetic architecture (Corvol et al. 2015). Identifying contributing common genetic variants require large sample sizes which are difficult to assemble for rare diseases such as CF. More powerful statistical tests that can leverage potential unknown interactions through variance heterogeneity, such as the qJLS that we apply here, can attenuate this challenge.

4.2. Methods

4.2.1. Study Sample

The Canadian CF Gene Modifier Study (CGMS) was designed to recruit a representative sample of the Canadian CF population with the objective to identify genes that modify CF disease severity across the affected organs and notably in the lungs (Taylor et al. 2006). Here we investigate the use of joint location scale tests to identify loci that modify lung disease severity. The study was approved by the Research Ethics Board of the Hospital for Sick Children and written informed consent was obtained from each participant.

Clinical data were obtained via chart review and through the Canadian CF Registry, which captures longitudinal follow‐up of the Canadian CF population.

4.2.2. Pulmonary Phenotype

We used the longitudinal spirometry data in the last 3 years preceding the enrollment date for each participant enrolled in the CGMS whose genome‐wide genotype data were available. If the phenotype data were missing before the enrollment dates, we used the closest 3‐year range of data after the enrollment date. The analysis was limited to individuals with at least 2 or more visits after the age of 6 years for reliable spirometric measurements. Following convention for CF lung function GWAS (Corvol et al. 2015), we only included individuals with two severe CFTR variants associated with pancreatic insufficiency (PI) (Corey et al. 1997) and removed measurements after transplantation, diagnosis of chronic B. Cepacia, or treatment with a CFTR modulator. We computed the CF pulmonary phenotype Saknorm (Taylor et al. 2011), which is the standard normal percentile of FEV1 adjusted for age, sex, height and cohort‐specific survival. To generate FEV1‐percentiles adjusted for age, sex and height, we used the CF‐specific reference equations by Kulich et al. (2005) based on the US CF Foundation (CFF) Registry from 1999 to 2006 for the cohort enrolled in CGMS before 2008 and the Canadian CF‐specific reference equations (Kim et al. 2018) based on the Canadian CF Registry data from 2008 to 2014 for those enrolled after 2008. To avoid any extrapolation, we removed individuals with height outside 5 cm beyond the height range for the CF‐specific reference equations. We then adjusted for the cohort‐specific survival probability for each FEV1‐percentile and averaged over each individual. We note that we did not apply the standard normal quantile function to this phenotype as in Taylor et al. (2011) since our statistical method, qJLS, does not require normality. For DGLM, JLS and JLS‐Score tests that require normality, we applied I‐INT2.

4.2.3. Genome‐Wide Genotype Data

Genotyping was performed on four Illumina platforms 610Quad, 660 W, Omni 2.5 and Omni 5. The detailed quality control (QC) and imputation procedures are described in Gong et al. (2019). SNP position and annotation information were based on Genome Reference Consortium 37 (GRCh37). We included unrelated participants to satisfy the assumption of qJLS, by randomly sampling one individual from each set of related individuals. Additionally, in a principal component analysis (Gogarten et al. 2019), observations inside 6 standard deviations from the centre of African (AFR) or East Asian (EAS) clusters using the 1000 Genomes Project data were excluded from the analysis.

4.2.4. Association Analysis

GWAS was performed using the qJLS test. For comparison, we used DGLM, JLS and JLS‐Score tests after applying the I‐INT2. We used dosage data assuming an additive effect and included the sex, type of reference equations used to compute Saknorm, genotyping platform and 9 principal components as covariates. To account for multiple hypothesis testing, we used the genome‐wide significance threshold of 5×108 (Dudbridge and Gusnanto 2008).

4.3. Results

After applying the inclusion‐exclusion criteria above, a total of 1997 participants with CF were included in the analysis. After standard QC and imputation (Panjwani et al. 2018; Gong et al. 2019), 5,533,051 variants were analyzed for association with CF lung disease using Saknorm.

One locus (minimal p‐value =3.12×108 at rs9513900 on chr13:102,090,156; MAF = 0.31), annotated between ITGBL1 (chr13:102,105,026‐102,373,206) and NALCN (chr13:101,706,128‐102,068,859) was significantly associated with Saknorm using qJLS (Figure 7).

Figure 7.

Figure 7

Manhattan plot for the genome‐wide association study of CF lung disease with an unrelated sample of 1997 individuals in CGMS based on qJLS: All variants with MAF > 1% are included in the analysis. The solid horizontal line in red corresponds to the genome‐wide significance threshold of p=5×108. The top SNP is the variant rs9513900.

Examining the individual location p‐value and scale p‐value from a linear regression and generalized Levene's test (without I‐INT2), respectively, the association for this SNP was driven by a location shift (p=2.15×108) with a positive effect of 0.046 but no significant change in the phenotypic scale detected (p=0.12). This locus was also supported by DGLM, JLS and JLS‐Score tests following I‐INT2, with this locus showing the smallest p‐value genome‐wide. However, at the genome‐wide significance level, these three other association tests failed to detect any significant loci (Supporting Information: Figure S7). No evidence of inflation in type I error was observed based on the histogram of p‐values (Supporting Information: Figure S8) and the QQ‐plots (Supporting Information: Figure S9) for all methods.

This genome‐wide signficant variant, rs9513900, was not identified in a previous meta‐GWAS of CF lung disease (Corvol et al. 2015) that included a subset of the participants analyzed here, with the effect of the variant reported to be 0.033, close to the estimated effect in this study, but did not reach genome‐wide significance (p = 0.037). Figure S10 (Supporting Information) provides the evidence for this variant across Canadian subsets including data not previously published, all with similar effect size and direction.

Since this variant is intergenic and not in linkage disequilibrium (LD) with any protein coding variants, we assessed whether rs9513900 (or a variant in LD with it) showed evidence of being an expression quantitative trait locus (eQTL) in lung tissue. To determine whether the identified variant rs9513900 shows evidence of an eQTL in lung tissue, we used the data from the genotype tissue expression (GTEx) project (Lonsdale et al. 2013) version 8. Among 12 genes annotated to the region, a significant eQTL for NALCN was observed after Bonferroni correction (p=1×103).

To inspect a possible heterogeneity in the variant's effects on the phenotype, we applied quantile regression to estimate the quantile‐specific effects (Figure 8). Overall, the variant has a positive effect at all percentiles of the conditional phenotype but this effect is differential over τ, with the strongest positive effect of 0.074 attained at τ=0.287 that decreases to 0 when approaching the tails. The median effect was 0.054 and was close to the mean effect 0.046 which can be interpreted as the average of the quantile‐specific effects over τ. Having one variant at this locus increases the median lung phenotype by 5.4% assuming all other predictors in the statistical model are held fixed.

Figure 8.

Figure 8

Estimated effects βˆ(τ) of the variant rs9513900 on the CF pulmonary phenotype at different percentiles τ: quantile regression model was used with dosage data assuming an additive effect. Predictors included the sex, type of CF‐specific reference equations used to compute the pulmonary phenotype, genotyping platform and 9 principal components. The shaded gray area represents the simultaneous confidence band using Bonferroni correction. The solid red line is the mean effect, with dashed red line displaying the confidence band.

5. Discussion

In this study we proposed a new joint location and scale test, qJLS, based on quantile regression and compared its performance to a selection of joint location and scale tests in the literature. The main benefit of the qJLS test is its robustness to the underlying error distribution while remaining the most powerful under normality. This robustness feature is due to the asymptotic distribution of the test statistic that is completely free of the parameters for the error distribution, under the null hypothesis. In contrast, the joint location and scale tests in the literature are based on a normality assumption which necessitates transformation, such as the rank‐based inverse normal transformation, for type I error rate control and improvement in power. Although our simulation studies provided some support for type I error rate control with indirect double‐adjusted rank‐based inverse normal transformation, this was under specific scenarios and generalization of the results requires a theoretical proof which is currently lacking.

In terms of power, we purposely designed our test to achieve optimality under normality. But, when this assumption fails, the qJLS test was still more powerful than the other methods in most simulation scenarios that were considered. The basis for this performance is mainly attributable to the scale component of the test. We observed that the scale component of the qJLS test was more powerful than that of the other joint location and scale tests while the power for the location component remained similar for all tests considered. This difference was clearly demonstrated when we considered the tail‐specific interaction in Figure 4. However, the power of a test is a function of the nature of the distributional shift under the alternative and the error distribution. Since the error distribution differs between phenotypes and unknown genetic interactions may affect the error distribution in different ways, the test optimality may be scenario‐specific, which makes it difficult to determine an optimal test a priori for real‐life data. This is demonstrated in Figure 3 in which qJLS was less powerful than JLS and JLS‐Score tests for negative interaction effects under the standardized χ2(3) distribution. Although we selected rankscore functions that are optimal for location and scale shifts under normality, there are alternative rankscore functions that are optimal for other types of distributional shifts and error distributions (Koenker 2010). We conducted additional simulation studies based on Model 3, to experiment with alternative rankscore functions defined in Koenker (2010) such as Lehmann, Wilcoxon and trimmed Wilcoxon functions that included either the upper quantiles above the median or lower quantiles below the median, and compared these with the qJLS test (Figure S6 in Supporting Information). The qJLS test performed well overall but, in some scenarios, its power was lower than some alternative rankscore tests. However, the performance of these alternative rankscore tests was extreme, with power close to 0 in some scenarios. Based on these findings, we concluded that qJLS would offer good performance over the range of observable data.

We also explored the benefit of using qJLS over the conventional location or scale test. This benefit was clearly demonstrated in the presence of unknown genetic interactions. Since unknown genetic interactions shift the conditional phenotypic distribution, qJLS harnesses superior power by leveraging both location and scale shifts, rather than each shift individually as in the conventional tests. When we considered a pure location or scale shift without any unknown genetic interaction, qJLS was still more powerful than individual location or scale test except when we considered a pure location shift model with normality. This is the ideal scenario for linear regression, however, the difference in power appeared moderate. Based on simulated data with Model 8a, the qJLS test required approximately a 12% increase in sample size to achieve similar power under these optimal conditions for the linear regression. Given the overall benefit, we recommend the use of qJLS over the conventional tests.

The benefit of the qJLS was well‐illustrated by the GWAS and its corresponding results. First, we note that, with 5,533,051 variants, the qJLS test demonstrated the fastest computational time. This was expected since this test computes the computation‐heavy rankscores only once under the null hypothesis while the remaining iterations for each of the 5,533,051 variants consists of simple multiplications and sums. Second, our qJLS test detected the candidate variant despite exerting a location shift without a significant shift in the phenotypic scale; its p‐value (p=3.12×108) was only slightly higher than that from the linear regression (p=2.15×108). The reduction in the sample size required to achieve 80% and 90% power compared with the linear regression was approximately 7.5% and 3.7%, respectively, based on the bootstrapping technique with independent sampling and genome‐wide significance of p=5×108. Examination of the identified genome‐wide locus suggested the SNP is an eQTL for NALCN, a sodium leak channel nonselective protein. NALCN regulates epithelial cell trafficking to distant tissues, including the lung epithelium (Rahrmann et al. 2022). Previous findings (Corvol et al. 2015; Soave and Sun 2017; Gong et al. 2019; Panjwani et al. 2018) have suggested CF lung disease modifiers impact severity through their role in gene expression variation, providing additional support for the locus, although replication in an independent sample is necessary to establish this gene as a modifier. We also examined percentile‐specific effects which were all positive, indicating the variant provides a protective effect at all percentiles of the lung phenotype. But these effects were not uniform across the percentiles and appeared to be quadratic. At both extremes of the phenotype, around 1% and 99%, the effect was nearly zero and achieved its maximum effect at the 29th percentile. This differential effect suggests potential interactions at play, even though no scale shift was detected. A possible explanation for the failure to detect the scale shift is that the current sample size was not sufficient to detect the given magnitude of the scale shift.

Our study provides a review and a comparison of performance across a selection of joint location and scale tests that rely on normality, based on simulation studies. Our findings demonstrate that the DGLM which uses the squared residuals for the scale model was more sensitive to departures from normality than the JLS and JLS‐Score that use the robust absolute deviations, without any data transformation. As a clear example, DGLM displayed an inflated type I error even with a logistic distribution which is often indistinguishable from a normal distribution using a QQ‐plot or statistical test for normality such as Kolmogorov–Smirnov test. The DGLM is known to suffer from numerical instability with low minor allele frequency (Dumitrascu et al. 2019). In our experience, this instability increased with additional factors such as skewed and heavy‐tailed error distributions, low sample size and large numbers of predictors. We showed that applying the indirect double‐adjusted rank‐based inverse normal transformation fixed the type I error inflation and numerical instability, for the scenarios considered. The JLS test without any data transformation was more robust to distributional misspecifications but showed a moderate inflation in the type I error rate with skewed or heavy‐tailed distributions. This inflation was corrected by applying the I‐INT2 for the homoscedastic model but, with variance heterogeneity, the JLS test still displayed a slight inflation in type I error, possibly due to the independent estimation of the location and scale parameters. We note that the JLS‐Score test which accounts for the correlation between the estimated location parameters and scale parameters remained robust in the simulation scenarios investigated. Although the type I error remained well‐controlled, it is the power that suffered when the normality assumption was violated. Applying the I‐INT2 resulted in a large improvement in power but the theoretical properties with the data transformation are unknown and remain to be investigated.

We note that a limitation of qJLS is that it is designed for independent observations and cannot accommodate related individuals or repeated observations. A strategy to meet this requirement is to keep one individual per cluster of related individuals, which results in an undesirable reduction in the sample size. Future research will extend qJLS to account for correlation in the data.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Supporting information.

GEPI-49-0-s001.pdf (3.2MB, pdf)

Acknowledgements

We thank the patients and families for participating in the study, and the clinic research staff and directors involved in CF Centers throughout Canada for their contributions to the Canadian Gene Modifier Study. This research was supported by a peer‐reviewed Cystic Fibrosis (CF) Canada 2022 Clinical Research Grant jointly funded by CF Canada and Canadian Institutes of Health Research Institute of Circulatory and Respiratory Health (CIHR‐ICRH), FRN: BCG 187014 and a CIHR Foundation grant to L.J.S., FRN 167282. We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), funding reference number RGPIN‐2015‐03742. This research was undertaken, in part, thanks to funding from the Canada Research Chairs Program to L.J.S. This study was also funded by the Government of Canada through Genome Canada (OGI‐148) and supported by a grant from the Government of Ontario. The funders of the study played no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. S.K. is a trainee in the CANSSI Ontario STAGE (Strategic Training for Advanced Genetic Epidemiology) program at the University of Toronto.

Data Availability Statement

The qJLS software package is publicly available with the simulation code at https://github.com/strug-hub/qJLS.

References

  1. Aschard, H. , Zaitlen N., Tamimi R. M., Lindström S., and Kraft P.. 2013. “A Nonparametric Test to Detect Quantitative Trait Loci Where the Phenotypic Distribution Differs by Genotypes.” Genetic Epidemiology 37, no. 4: 323–333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bartlett, M. S. 1937. “Properties of Sufficiency and Statistical Tests.” Proceedings of the Royal Society of London. Series A‐Mathematical and Physical Sciences 160, no. 901: 268–282. [Google Scholar]
  3. Beasley, T. M. , Erickson S., and Allison D. B.. 2009. “Rank‐Based Inverse Normal Transformations Are Increasingly Used, but Are They Merited?” Behavior Genetics 39, no. 5: 580–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bliss, C. I. 1967. Statistics in Biology; Statistical Methods for Research in the Natural Sciences. McGraw‐Hill. [Google Scholar]
  5. Blom, G. 1958. Statistical Estimates and Transformed Beta‐Variables (PhD thesis. Almqvist & Wiksell. [Google Scholar]
  6. Briollais, L. , and Durrieu G.. 2014. “Application of Quantile Regression to Recent Genetic and‐omic Studies.” Human Genetics 133, no. 8: 951–966. [DOI] [PubMed] [Google Scholar]
  7. Brown, M. B. , and Forsythe A. B.. 1974. “Robust Tests for the Equality of Variances.” Journal of the American Statistical Association 69, no. 346: 364–367. [Google Scholar]
  8. Cao, Y. , Wei P., Bailey M., Kauwe J. S. K., and Maxwell T. J.. 2014. “A Versatile Omnibus Test for Detecting Mean and Variance Heterogeneity.” Genetic Epidemiology 38, no. 1: 51–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Corey, M. , Edwards L., Levison H., and Knowles M.. 1997. “Longitudinal Analysis of Pulmonary Function Decline in Patients With Cystic Fibrosis.” Journal of Pediatrics 131, no. 6: 809–814. [DOI] [PubMed] [Google Scholar]
  10. Corty, R. W. , and Valdar W.. 2018. “Qtl Mapping on a Background of Variance Heterogeneity.” G3: Genes|Genomes|Genetics 8, no. 12: 3767–3782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Corvol, H. , Blackman S. M., Boëlle P.‐Y., et al. 2015. “Genome‐Wide Association Meta‐Analysis Identifies Five Modifier Loci of Lung Disease Severity in Cystic Fibrosis.” Nature Communications 6, no. 1: 8382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dudbridge, F. , and Gusnanto A.. 2008. “Estimation of Significance Thresholds for Genomewide Association Scans.” Genetic Epidemiology 32, no. 3: 227–234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Dumitrascu, B. , Darnell G., Ayroles J., and Engelhardt B. E.. 2019. “Statistical Tests for Detecting Variance Effects in Quantitative Trait Studies.” Bioinformatics 35, no. 2: 200–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gogarten, S. M. , Sofer T., Chen H., et al. 2019. “Genetic Association Testing Using the GENESIS R/Bioconductor Package.” Bioinformatics 35: 5346–5348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gong, J. , Wang F., Xiao B., et al. 2019. “Genetic Association and Transcriptome Integration Identify Contributing Genes and Tissues at Cystic Fibrosis Modifier Loci.” PLoS Genetics 15, no. 2: e1008007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Gutenbrunner, C. , and Jureckova J.. 1992. “Regression Rank Scores and Regression Quantiles.” Annals of Statistics 20, no. 1: 305–330. [Google Scholar]
  17. Gutenbrunner, C. , Jurečková J., Koenker R., and Portnoy S.. 1993. “Tests of Linear Hypotheses Based on Regression Rank Scores.” Journal of Nonparametric Statistics 2, no. 4: 307–331. [Google Scholar]
  18. Hong, C. , Ning Y., Wei P., Cao Y., and Chen Y.. 2017. “A Semiparametric Model for Vqtl Mapping.” Biometrics 73, no. 2: 571–581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Kim, S.‐O. , Corey M., Stephenson A. L., and Strug L. J.. 2018. “Reference Percentiles of fev1 for the Canadian Cystic Fibrosis Population: Comparisons Across Time and Countries.” Thorax 73, no. 5: 446–450. [DOI] [PubMed] [Google Scholar]
  20. Koenker, R. 2010. “Rank Tests for Heterogeneous Treatment Effects With Covariates.” In Nonparametrics and Robustness in Modern Statistical Inference and Time Series Analysis: A Festschrift in honor of Professor Jana Jurečková, 134–142. Institute of Mathematical Statistics. [Google Scholar]
  21. Koenker, R. , and Bassett G.. 1978. “Regression Quantiles.” Econometrica 46: 33–50. [Google Scholar]
  22. Koenker, R. , and Xiao Z.. 2002. “Inference on the Quantile Regression Process.” Econometrica 70, no. 4: 1583–1612. [Google Scholar]
  23. Kulich, M. , Rosenfeld M., Campbell J., et al. 2005. “Disease‐Specific Reference Equations for Lung Function in Patients With Cystic Fibrosis.” American Journal of Respiratory and Critical Care Medicine 172, no. 7: 885–891. [DOI] [PubMed] [Google Scholar]
  24. Levene, H. 1961. “Robust Tests for Equality of Variances.” In Contributions to probability and statistics. Essays in honor of Harold Hotelling, 279–292. [Google Scholar]
  25. Lonsdale, J. , Thomas J., Salvatore M., et al. 2013. “The Genotype‐Tissue Expression (Gtex) Project.” Nature Genetics 45, no. 6: 580–585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. McCaw, Z. R. , Lane J. M., Saxena R., Redline S., and Lin X.. 2020. “Operating Characteristics of the Rank‐Based Inverse Normal Transformation for Quantitative Trait Analysis in Genome‐Wide Association Studies.” Biometrics 76, no. 4: 1262–1272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Miao, J. , Lin Y., Wu Y., et al. 2022. “A Quantile Integral Linear Model to Quantify Genetic Effects on Phenotypic Variability.” Proceedings of the National Academy of Sciences 119, no. 39: e2212959119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Panjwani, N. , Xiao B., Xu L., et al. 2018. “Improving Imputation in Disease‐Relevant Regions: Lessons From Cystic Fibrosis.” NPJ Genomic Medicine 3, no. 1: 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Paré, G. , Cook N. R., Ridker P. M., and Chasman D. I.. 2010. “On the Use of Variance Per Genotype as a Tool to Identify Quantitative Trait Interaction Effects: A Report From the Women's Genome Health Study.” PLoS Genetics 6, no. 6: e1000981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Rahrmann, E. P. , Shorthouse D., Jassim A., et al. 2022. “The Nalcn Channel Regulates Metastasis and Nonmalignant Cell Dissemination.” Nature Genetics 54, no. 12: 1827–1838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Rönnegård, L. , and Valdar W.. 2011. “Detecting Major Genetic Loci Controlling Phenotypic Variability in Experimental Crosses.” Genetics 188, no. 2: 435–447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Soave, D. , Corvol H., Panjwani N., et al. 2015. “A Joint Location‐Scale Test Improves Power to Detect Associated Snps, Gene Sets, and Pathways.” American Journal of Human Genetics 97, no. 1: 125–138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Soave, D. , and Sun L.. 2017. “A Generalized Levene's Scale Test for Variance Heterogeneity in the Presence of Sample Correlation and Group Uncertainty.” Biometrics 73, no. 3: 960–971. [DOI] [PubMed] [Google Scholar]
  34. Sofer, T. , Zheng X., Gogarten S. M., et al. 2019. “A Fully Adjusted Two‐Stage Procedure for Rank‐Normalization in Genetic Association Studies.” Genetic Epidemiology 43, no. 3: 263–275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Song, X. , Li G., Zhou Z., Wang X., Ionita‐Laza I., and Wei Y.. 2017. “Qrank: A Novel Quantile Regression Tool for Eqtl Discovery.” Bioinformatics 33, no. 14: 2123–2130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Staley, J. R. , Windmeijer F., Suderman M., Lyon M. S., Davey Smith G., and Tilling K.. 2021. “A Robust Mean and Variance Test With Application to High‐Dimensional Phenotypes.” European Journal of Epidemiology 37: 377–387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Taylor, C. , Commander C. W., Collaco J. M., et al. 2011. “A Novel Lung Disease Phenotype Adjusted for Mortality Attrition for Cystic Fibrosis Genetic Modifier Studies.” Pediatric Pulmonology 46, no. 9: 857–869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Taylor, C. , Corey M., Breaton J., et al. 2006. “The Canadian Cf Modifier Gene Project: A Nationally Representative Dna and Phenotype Resource.” Pediatric Pulmonology 29: 362. [Google Scholar]
  39. Tukey, J. W. 1962. “The Future of Data Analysis.” Annals of Mathematical Statistics 33, no. 1: 1–67. [Google Scholar]
  40. Van der Waerden, B. 1952. “Order Tests for the Two‐Sample Problem and Their Power.” In Indagationes Mathematicae (Proceedings) (55, 453–458. Elsevier. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting information.

GEPI-49-0-s001.pdf (3.2MB, pdf)

Data Availability Statement

The qJLS software package is publicly available with the simulation code at https://github.com/strug-hub/qJLS.


Articles from Genetic Epidemiology are provided here courtesy of Wiley

RESOURCES