Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Mar 30.
Published in final edited form as: Stat Med. 2024 Feb 2;43(7):1419–1440. doi: 10.1002/sim.10025

Statistical inference on qualitative differences in the magnitude of an effect

Aaron Hudson 1,*, Ali Shojaie 2
PMCID: PMC10947912  NIHMSID: NIHMS1959976  PMID: 38305667

Abstract

Qualitative interactions occur when a treatment effect or measure of association varies in sign by sub-population. Of particular interest in many biomedical settings are absence/presence qualitative interactions, which occur when an effect is present in one sub-population but absent in another. Absence/presence interactions arise in emerging applications in precision medicine, where the objective is to identify a set of predictive biomarkers that have prognostic value for clinical outcomes in some sub-population but not others. They also arise naturally in gene regulatory network inference, where the goal is to identify differences in networks corresponding to diseased and healthy individuals, or to different subtypes of disease; such differences lead to identification of network-based biomarkers for diseases. In this paper, we argue that while the absence/presence hypothesis is important, developing a statistical test for this hypothesis is an intractable problem. To overcome this challenge, we approximate the problem in a novel inference framework. In particular, we propose to make inferences about absence/presence interactions by quantifying the relative difference in effect size, reasoning that when the relative difference is large, an absence/presenceinteractionoccurs.Theproposedmethodologyisillustratedthrougha simulation study as well as an analysis of breast cancer data from the Cancer Genome Atlas.

Keywords: Qualitative interactions, Differential network connectivity, Precision medicine

1 |. INTRODUCTION

An objective of many biomedical studies is to identify and test for interactions, which arise when a measure of effect or association between variables differs by sub-population1,2,3. Precision medicine and genetic network inference provide examples of areas in which interactions are of interest. For instance, researchers in precision medicine seek to understand how patients’ characteristics are associated with heterogeneity in response to treatment. In omics data analysis, it is of interest to determine how biological networks, which summarize the associations between genes/proteins/etc, depend on phenotype.

Interactions may lack clinical or scientific significance when differences in effect are small. In addition to detecting interactions, it is important to identify which are meaningful. For example, in precision medicine, the most important differences in treatment effect may be those in which some sub-populations of patients benefit from the treatment, while other sub-populations are harmed or unaffected. Additionally, one may want to identify differences among sub-populations in the set of biomarkers that have prognostic value for a health outcome — that is, to determine whether some biomarkers are predictive of the outcome in only a subset of the full population. Genetic network inference provides another example: When comparing sub-population level genetic networks, it may be of primary interest to identify pairs of genes that share an association in some sub-populations but share no association in others, or to identify pairs that have a positive association in one sub-population and a negative association in another. This is known as differential network analysis4,5.

Such qualitative interactions are the focus of this paper. Qualitative interactions occur when a measure of association differs in sign by sub-population. We consider two types of qualitative interactions: positive/negative interactions — also known in the literature as cross-over interactions6 — and absence/presence interactions — sometimes referred to as pure interactions7. Positive/negative interactions occur when an effect is positive in one sub-population and negative in another, and absence/presence interactions occur when the effect is present in one population but absent in another.

Our objective is to formally test for qualitative interactions, given independent samples from each sub-population. Testing for positive/negative interactions is well-studied, with several procedures available for identifying positive/negative interactions in a fixed number of sub-populations6,8,9,10,11,12. More recent work has also extended to identifying interactions when there is a large, or possibly infinite, number of sub-groups13,14,15,16,17. In contrast to positive/negative interactions, testing for absence/presence interactions has received little attention. Naïve approaches, to be discussed in Section 2.3, require an untenable minimum signal strength condition — that if an effect is present in any sub-population, it is large enough to be detected with absolute certainty. No approaches exist, to the best of our knowledge, that avoid this assumption.

In this paper, we propose a novel framework for inference about absence/presence interactions in two sample settings. Our proposed methodology allows for well-calibrated hypothesis testing under mild assumptions. We also introduce a numerical summary that measures the strength of absence/presence interactions, while accounting for the uncertainty associated with parameter estimation. Additionally, we describe methods for simultaneous inference about absence/presence and positive/negative interactions. The methodology we introduce provides an effective and flexible inference tool in precision medicine and genetic network analysis, as we illustrate in simulations and an analysis of breast cancer data from The Cancer Genome Atlas (TCGA).

2 |. BACKGROUND

2.1 |. Notation

As we begin to formalize the problem, we first introduce some notation. We consider two sub-populations, labeled by g{1,2}. Let θgR denote a measure of association in sub-population g. For simplicity, we assume that θg is finite. When convenient, we write θ=θ1,θ2. We can consider various measures of association, such as: correlation coefficients, indicating the strength of linear relationship between two variables of interest; log odds ratios, describing the relationship between predictors and a binary outcome; and log hazard ratios, describing the association between predictors and a time-to-event outcome.

We assume that given i.i.d. samples of size n1 and n2 from each sub-population, ng-consistent and asymptotically normal estimators θˆg of θg are available, i.e.,

ng(θˆg-θg)dN0,σg2,

with σg2>0 denoting the asymptotic variance. For expositional simplicity, we assume balanced sample sizes n1=n2=n (key results are stated more generally in the Appendix). We also assume σg2 is known, though we can instead use a consistent estimate, as is commonly done in practice.

We now formally state the null hypotheses of no positive/negative interactions and no absence/presence interaction, labeled H0P/N and H0A/P, respectively:

H0P/N:θg0forg{1,2}orθg0forg1,2 (1)
H0A/P:θg=0forg{1,2}orθg0forg1,2 (2)

We let Θ0P/N,Θ0A/P and Θ1P/N,Θ1A/P denote the corresponding null and alternative regions of the parameter space, depicted in the left and center panels of Figure 1 (Recall that the null region is the set of parameters such that the null hypothesis holds, and the alternative region is the complement of the null region.) The positive/negative null region is the union of the the non-negative and non-positive orthants, and the absence/presence null region is the union of all open orthants and the origin.

Figure 1.

Figure 1

Null and alternative regions of parameter space for positive/negative (left; introduced in Section 2.1), absence/presence (center; introduced in Section 2.1), and relative difference hypotheses (right; introduced in Section 3.1)

2.2 |. Testing Composite Null Hypotheses

Our goal is to use the estimate θˆ to perform tests of H0P/N and H0A/P such that the size is controlled asymptotically under mild assumptions. Recall that for a null hypothesis H0 with accompanying null region Θ0, the size of a test is defined as

supθ0Θ0P(“Rejectthenullhypothesis”θ=θ0).

In words, the size is the largest possible type-I error rate that could be achieved under any probability distribution given that θ belongs to the null region.

Here, we describe our general approach for controlling the size at a pre-specified level α(0,1). We first define a test statistic T, a map from observable data to a real-valued number, with larger values of T corresponding to more evidence against the null hypothesis. We then calculate the test statistic on the observed data, which we denote by t. We write PT>tθ=θ0 as the probability of observing a random test statistic at least as large as the observed value t, assuming θ=θ0. We reject the null hypothesis if

ρtsupθ0Θ0PT>tθ=θ0<α.

One can think of PT>tθ=θ0 as the p-value under a specific null distribution θ=θ0; ρ(t) is then the largest of all such p-values. We reject H0 when there is sufficient evidence to reject all hypotheses θ=θ0. We can view ρ(t) as a generalization of the usual p-value for simple null hypotheses to tests with composite null hypotheses, and we will simply refer to ρ(t) as “p-value”. Tests of the above form are guaranteed to control the size18.

2.3 |. Existing Methodology

We now review existing approaches to test for qualitative interactions. We first discuss testing positive/negative interactions before moving to absence/presence interactions.

Gail and Simon6 developed the most widely used procedure to test for positive/negative interactions. Though Gail and Simon proposed a general K-sample test, we focus on the two-sample problem in this paper. We note that various K-sample tests for positive/negative interactions have been proposed8,10,11, but these procedures are essentially equivalent to the Gail-Simon test in the two-sample setting.

Gail and Simon’s approach is to perform a likelihood ratio test based on the asymptotic sampling distribution of θˆ. The likelihood ratio test rejects H0P/N for large values of

-supa1,a2Θ0P/Ng{1,2}ϕnσg-1(θˆg-ag)supb1,b2R2g{1,2}ϕnσg-1(θˆg-bg),

where ϕ() is the standard normal density. By performing algebraic manipulations, one can show that the likelihood ratio test equivalently rejects the null for large values of

TP/N=min(a1,a2)Θ0P/Ng{1,2}n{σg1(θˆgag)}2.

The test statistic TP/N can be interpreted as the shortest distance between θˆ and the null region, where the distance is inversely weighted by the asymptotic variances of the estimates, as illustrated in the left panel of Figure 2

Figure 2.

Figure 2

Geometric interpretation of likelihood ratio statistic for the positive/negative hypothesis (left; introduced in Section 2.3) and relative difference hypotheses (right; introduced in Section 3.2)

Gail and Simon show that the test statistic TP/N can be calculated as

TP/N=ming{1,2}n(θˆg/σg)21(θˆΘ1P/N),

where 1() denotes the indicator function. Furthermore, one can verify that for all t>0, the p-value is easily calculated as

ρP/N(t)=supθ0Θ0P/NlimnP(TP/N>tθ=θ0)=12Pχ12>t.

The likelihood ratio test is quite intuitive and rejects the null when a positive estimate of association is observed in one population, a negative estimate is observed in another population, and both associations are statistically significant.

Now, we discuss approaches to test for absence/presence interactions. While one might be tempted to perform a likelihood ratio test for absence/presence interactions, the likelihood ratio test fails in the sense that it never can reject the null. To see this, we first recognize that, similar to the positive/negative interaction test, the absence/presence likelihood ratio test would reject for large values of

TA/P=min(a1,a2)Θ0A/Pg{1,2}n{σg1(θˆgag)}2.

Again, the test statistic TA/P is the shortest distance between θˆ and the null region Θ0A/P. However, because the alternative region Θ1A/P has zero area, θˆ lies in the null region with probability one. Therefore, the test statistic is always 0, and the likelihood ratio test has no power.

One might alternatively attempt to test the absence/presence null by separately testing the null hypotheses H0g:θg=0 for g{1,2} and rejecting the absence/presence null when H0g is rejected for one g and not rejected for the other. To control the size of a test of this form, which we refer to as the naïve test for the absence/presence null, we need to simultaneously control the type-I error under two scenarios: (1) there is an association in both sub-populations, and (2) there is no association in either population. When there is an association in both populations, a type-I error occurs when we incorrectly fail to reject one of H0g. When there is no association in either population, we make a type-I error when we incorrectly reject one of H0g. Thus, controlling the size of the test for H0A/P using this approach requires simultaneous control of the type-I error rate and type-II error rate of tests for H0g. If the tests for H0g are consistent — that is, the type-II error rates tend to zero with sufficiently large samples — this approach is asymptotically valid, as only the type-I error rates for tests of H0g need to be controlled. However, with even moderately large samples, we will not be able to correctly reject false H0g with absolute certainty unless the true association is strong and hence easy to detect. This would make the test of H0A/P unreliable in the presence of weak signal.

Next, we argue that the naïve test controls the type-I error only if θ1,θ2>on-1/2, which is similar to the assumption recently made for differential network connectivity analysis 19. To see this, we construct a more formal argument. For simplicity, suppose σ1=σ2=1. We consider tests of H0g of the form

ψg=Reject;ifn|θˆg|>aAccept;ifn|θˆg|<a,

where a is a constant that would be selected to control the size. The probability of rejecting the absence/presence null is

P(RejectH0A/P)=Pψ1=RejectPψ2=Accept+Pψ1=AcceptPψ2=Reject.

Suppose θ1 and θ2 are guaranteed to be greater than on-1/2 if they are both nonzero. Then, for θ1,θ20,n|θˆg|, and P(RejectH0A/P)0. Therefore, to control the size, we are only required to select α so that the type-I error is controlled when θ1=θ2=0; this can be done by taking a as the (1-α/4) quantile of the standard normal distribution. However, if we allow θg<on1/2, we can see a drastically inflated type-I error rate. For instance, for a small ϵ>0, let θ1=n-1/2+ϵ, θ2=n-1/2-ϵ. This asymptotic regime allows us to approximate the type-1 error rate of this test in settings where the sample size is large relative to the signal strength. Then n|θˆ1|>a while n|θˆ2|0<a, so P(RejectH0A/P)1. Thus, when small signal is permitted, tests of this form will be asymptotically anti-conservative.

Both approaches discussed above for testing absence/presence interactions — namely, the likelihood ratio test and the naïve test — fail for a similar reason: it is difficult to gather evidence supporting that a measure of association is exactly equal to zero. This is captured by the alternative region having zero area, causing the failure of the first approach. In the second approach, to obtain evidence supporting that an association is zero, we require that H0,g is only accepted when θg=0; for this, we rely upon a minimum signal strength condition to guarantee that any non-zero association is detected.

3 |. PROPOSED METHODOLOGY

3.1 |. Refinement of Absence/Presence Hypothesis

To mitigate the challenges described in Section 2, we consider a refinement of the absence/presence null hypothesis. The key idea is that in practice, absence/presence interactions can be approximated by considering the settings where an association is at least moderately large in one population and negligible or near zero in the other; or when one association is substantially stronger than the other. This means that we can expand the alternative region to include neighborhoods of zero in a way that the absence/presence interpretation is preserved.

Recall that when there exists an absence/presence interaction, the ratio of the maximum of the absolute value of the θg to the minimum is infinite. We cannot test that the ratio is infinite because we will never have evidence to support that the denominator is exactly zero. However, we can test that the ratio is large because we may have evidence to support that the denominator is very small. Motivated by this intuition, we propose to test whether the relative difference between sub-population measures of association is greater than a large pre-specified constant κ>1. Formally, let θmax=maxgθg and θmin=mingθg. We define the new relative difference null hypothesis H0κ as

H0κ:θmax/θminκorθmax=θmin=0.

Equivalently,

H0κ:θmax-κθmin0. (3)

The null region Θ0κ, illustrated in the right panel of Figure 1, is the union of four linear subspaces — each residing in a separate orthant of R2; the boundary of each subspace is the union of the spans of vectors with absolute direction (κ,1) and (1,κ).

The relative difference null region can be viewed as a relaxation of the absence/presence null. For a large choice of κ, the original and refined null hypotheses have a similar interpretation: the greater measure of association is substantially larger than the lesser. However, the hypotheses may not coincide for any value of κ as the relative difference hypothesis can hold if θmax is non-zero while θmin is non-zero but small. It is nonetheless appealing that H0κ has a reasonable interpretation for any choice of κ; that is, the multiplicative difference in strength of association is no larger than κ.

To motivate defining the refined null hypothesis in terms of relative differences rather than absolute differences, we argue that testing for relative differences has at least the following benefits. First, relative differences are unitless, so the relative difference null is compatible with unitless measures of association such as the Pearson correlation coefficient, which are often preferred in the analysis of biological data. Second, it is possible that in some contexts, κ could be selected without prior knowledge on the range of strength of association. For instance, in studies of differential gene expression in two groups, it is common for investigators to focus on identifying genes for which the relative difference in the mean expression level exceeds a user-specified threshold. The threshold is chosen to be scientifically meaningful, based on the investigator’s own discretion20,21. When the goal is identifying pairs of genes for which the correlation between gene expression levels differs by group, an investigator could similarly select a scientifically meaningful threshold for the relative difference in correlation at their own discretion.

Of course, the relative difference null hypothesis depends on the choice of κ. For small values of κ, the relative difference null may be too dissimilar from the absence/presence null to retain its interpretation; for large κ, the alternative region becomes very small, and the hypothesis may be overly conservative. In settings where there is a scientifically meaningful threshold for the relative difference in effect size, κ can be specified a priori based on its interpretation, often at the investigator’s discretion. In the following subsections, we first construct a test of the relative difference null hypothesis for a pre-specified κ. We then describe an approach to identify the set of κ values for which the test rejects the null hypothesis. This method can be useful when there is no clear scientifically meaningful threshold for the relative difference in effects.

3.2 |. Likelihood Ratio Test for Relative Difference Hypothesis

We next develop a testing procedure for the new relative difference null hypothesis. The relative difference null region, unlike the absence/presence null region, has a non-zero area, so a likelihood ratio test will not fail in the same manner as the likelihood ratio test for the absence/presence hypothesis. Similar to the previously discussed examples, the likelihood ratio test statistic is

Tκ=mina1,a2Θ0κg{1,2}nσg-1(θˆg-ag)2,

and can be interpreted as the shortest (weighted) distance between θˆ and the null region Θ0κ, as depicted in the right panel of Figure 2 Clearly, the test statistic is zero whenever θˆ lies in the null region. Otherwise, Tκ is the shortest distance between θˆ and the closest of the four linear subspaces that define Θ0κ. The test statistic can be calculated as the distance between (θˆmax,θˆmin) and its projection onto the span of the vector (κ,1).

The likelihood ratio test statistic is straightforward to calculate. Let θˆmax=maxg|θˆg| and θˆmin=ming|θˆg| be the strongest and weakest estimated absolute association, respectively. In the following lemma, we state that Tκ is equal to the difference between θˆmax and κθˆmin divided by a normalizing constant. Thus, the test statistic can be viewed as a plug-in of θˆ into (3) with an additional normalizing constant. This test statistic closely resembles the test statistic proposed by Fieller22 to conduct inference about ratios of means.

Lemma 1.

The likelihood ratio test statistic Tκ can be written as

Tκ=θˆmax-κθˆminτˆmax+κ2τˆmin,

where τˆmax=n-1σ121(|θˆ1|=θˆmax)+n-1σ221(|θˆ2|=θˆmax), and τˆmin=n-1σ121(|θˆ1|=θˆmin)+n-1σ221(|θˆ2|=θˆmin).

We now discuss how to obtain a p-value for the relative difference hypothesis. First, we obtain an observed test statistic t, a realization of Tκ calculated from the data. Following the approach described in Section 2.2, we define the p-value as ρκ(t), the largest of all asymptotic tail probabilities limnPTκ>tθ=θ0 such that θ0 belongs to the null region. To determine the maximum tail probability, we characterize the limiting distribution of Tκ assuming θ=θ0 for all θ0 in the null region.

Though the null region contains an infinite number of values, Tκ can only attain one of three limiting distributions corresponding to the following three cases:

  1. The true association is in the interior of the null region, i.e., θmax-κθmin<0.

  2. The true association is on the boundary of the null region, but both associations are non-zero, i.e., θmax=κθmin>0.

  3. The true association is zero in both sub-populations, i.e., θ1=θ2=0.

In Proposition 1, we describe the asymptotic behavior of Tκ for cases 1 and 2 above. We provide here some intuition for the result and reserve a formal argument for the Appendix. In case 1, it is easy to argue that because θˆ is consistent, Tκ is a negative number with probability tending to one, and therefore never provides evidence against the null. In case 2, Tκ asymptotically follows a standard normal distribution. To see this, we note that because both associations are non-zero, consistency and asymptotic normality of θˆ imply that, for large n, the sign of the estimators and the ranking of their magnitudes are deterministic. That the signs are asymptotically deterministic implies that |θˆ| is asymptotically normal (speaking loosely, |θˆg|signθgθˆg); that ranking is asymptotically deterministic implies that θˆmax and θˆmin are asymptotically independent. Therefore, taking the difference between θˆmax and θˆmin, suitably standardized, is asymptotically equivalent to taking a difference between two independent normal random variables with equal means. Dividing by the asymptotic variances gives the claimed result.

Proposition 1.

In the interior of the null region, i.e., when θmax-κθmin<0, Tκ converges in distribution to -. At all nonzero boundary points of the null region, i.e., when θmax=κθmin>0, Tκ converges in distribution to a standard normal random variable.

In case 3, when both associations are zero, the asymptotic distribution of the test statistic is more complicated. More specifically, θg=0 implies that, asymptotically, |θˆg| follows a half-normal distribution instead of a normal distribution. Moreover, the ranking of |θˆ1| and |θˆ2| remains random in the limit. This gives rise to a non-standard limiting distribution. In particular, unlike cases 1 and 2, the limiting distribution depends on the asymptotic variances of the sub-population estimates (and also the ratio of the sample sizes in the unbalanced case). We are nonetheless able to derive an analytic expression for the distribution function, stated in Proposition 2. We reserve this statement for the Appendix, as the expression is cumbersome.

By Propositions 1 and 2, we can calculate the p-value as the maximum of the tail probabilities in cases 2 and 3. That is,

ρκ(t)=max1-Φ(t),1-Fκ(t), (4)

where Φ() denotes the standard normal distribution function, and Fκ()limnPTκ<tθ=θ0 is the limiting distribution function for the test statistic when θ=0. Though the limiting distribution under θ=0 is non-standard, tail probabilities can be calculated easily, as we show in the Appendix (Remark 1). The p-value is therefore simple to calculate.

We have derived an analytic approximation for the power of the likelihood ratio test. In Proposition 2, we more generally characterize the limiting distribution of the test statistic under hypotheses of the form θ1,θ2=n-1/2c1,c2. The local asymptotic power of the proposed test at the level α is available by considering βακc1,c2limnPTκ>t1-α*θ=n-1/2c1,c2, where we define t1-α* as the maximum of the 1-α quantiles of the limiting distribution of Tκ under scenarios 2 and 3 described above. An analytic finite-sample approximation of the power can be calculated as βακn1/2θ1,n1/2θ2.

A contour plot of the local asymptotic power is given in Figure 3 for both equal and unequal variances of the estimators. We observe that the likelihood ratio test has low power when the strongest effect max c1,c2 is small, and power improves considerably when the strongest effect grows. Additionally, we find that in the presence of unequal variance, the test has greater power when the weakest effect is estimated with higher precision than the strongest effect.

Figure 3.

Figure 3

Contour plots of the local asymptotic power of the relative difference likelihood ratio test with κ=2 and α=.05. Bold lines represent the boundary of the null region. Settings of equal and unequal asymptotic variance of estimators are shown.

3.3 |. Quantifying Relative Difference in Effect by Inverting Likelihood Ratio Test

In some applications there may not be a clear scientifically meaningful threshold for the relative difference in effect size. In such settings, investigators may be disinclined to test the relative difference null for a single fixed choice of κ, as the choice can be somewhat arbitrary. It may be preferable to instead provide a more informative summary describing the relative difference in effect size.

We propose to summarize the relative difference in effect size by reporting the set of κ for which we would reject the relative difference null. We define

κmaxαsupκ:κ>1,ρκ(t)<α

as the largest κ>1 such that the likelihood ratio test rejects the null hypothesis at the α level. When the likelihood ratio test fails to reject for all κ>1, we will use the convention κmaxα=1. Since Tκ is negative for κ>θˆmax/θˆmin and since we only reject the relative difference null for large positive values of Tκ, κmaxα is bounded above by θˆmax/θˆmin.

Because the likelihood ratio test controls the type-I error at the level α, if θmax=θmin, then with probability at least 1-α,κmaxα=1. Moreover, if θmax/θmin>1, κmaxα is bounded above by θmax/θmin with probability at least 1-α. Thus, κmaxα provides a probabilistic lower bound for the relative difference in effect size. That is,

limnPκmaxαθmax/θmin1-α.

Moreover, for any κ<θmax/θmin, the relative difference test will reject the null hypothesis that θmaxκθmin with probability tending to one. Therefore, in the limit of large n, κmaxα should approach but not exceed the true relative difference in effect size. We discuss the calculation of κmaxα in the Appendix.

Larger values of κmaxα indicate that the relative difference in effect θmax/θmin is large and provide greater evidence to support the occurrence of an absence/presence interaction. However, we emphasize that no value of κmaxα will be large enough to guarantee that an absence presence interaction has truly occurred. While we argue that κmaxα is an informative and intuitive summary, we caution investigators against misinterpreting it.

3.4 |. Simultaneous Test for Qualitative Interactions

When it is of interest to identify both absence/presence and positive/negative qualitative interactions, it may be desirable to test for both simultaneously. In this section, we construct an omnibus test that achieves control of the size asymptotically.

We define the omnibus qualitative interaction null hypothesis as

H0P/N,κ:BothH0P/NandH0κhold.

The null region Θ0P/N,κ is the intersection of the positive/negative and relative difference null regions, as depicted in Figure 4 We observe that as κ, the omnibus null and alternative regions tend to the positive/negative null and alternative regions.

Figure 4.

Figure 4

(Left) Null and alternative region for omnibus qualitative interaction hypothesis. (Right) Geometric interpretation of likelihood ratio statistic for the omnibus qualitative interaction hypothesis.

To construct the likelihood ratio test, we proceed using similar arguments to those presented in Section 3.2. The likelihood ratio statistic TP/N,κ is the distance between the estimate θˆ and its projection onto the null region Θ0P/N,κ, inversely weighted by the asymptotic variance of θˆ. A simple expression for TP/N,κ is given in Lemma 2.

Lemma 2.

The likelihood ratio statistic TP/N,κ can be written as

TP/N,κ=min(θˆ1-κθˆ2)2n-1σ12+κ2σ22,(κθˆ1-θˆ2)2n-1(κ2σ12+σ22)1θˆΘ1P/N,κ.

Unsurprisingly, the likelihood ratio statistic for the omnibus test approaches the likelihood ratio statistic for the Gail-Simon likelihood ratio statistic for positive/negative interactions in the limit of large κ. The tests will, therefore, be nearly identical for sufficiently large κ.

To characterize the asymptotic behavior of the omnibus test statistic at each location under the null, we use similar arguments to those in the previous section. If θ belongs to the interior of the null region, TP/N,κ converges in probability to zero. If θ belongs to the boundary of the null region and is non-zero, TP/N,κ converges weakly to a uniform mixture of zero and the chi-squared distribution with one degree of freedom. If θ is zero, the limiting distribution of TP/N,κ is non-standard, though it can be characterized nonetheless. Formal statements of asymptotic properties of TP/N,κ are given in Propositions 3 and Remark 2 (the statement of Remark 2 is also cumbersome, and is reserved for the Appendix).

We have derived an analytic approximation for the power of the likelihood ratio test. In Proposition 4 in the Appendix, we more generally characterize the limiting distribution of the test statistic under hypotheses of the form θ1,θ2=n-1/2c1,c2, and in Remark 3, also in the Appendix, we describe how this result can be applied to calculate the local asymptotic power of the proposed test.

Proposition 3.

If θ belongs to the interior of the null region Θ0P/N,κ, TP/N,κ converges in distribution to zero. If θ is on the boundary of the null region, but θ0, P(TP/N,κ>t)12Pχ12>t as n.

Calculating the p-value for the omnibus test is no more difficult than calculating the p-value for the absence/presence test. Defining FP/N,κ(t)limnP(TP/N<tθ=0), the p-value can be calculated as

ρP/N,κ(t)=max{Pχ12>t,1-FP/N,κ(t)} (5)

where t is the value of the test statistic TP/N,κ calculated on the observed data.

4 |. SIMULATION STUDY

In a Monte Carlo simulation study, we examine how type-I error rate and statistical power of the likelihood ratio tests for the relative difference in effect are affected by signal strength, sample size, and choice of κ. Additionally, we examine how κmaxα depends on the true sub-population effects and the sample size.

We generate random observations Yg,Xg in sub-population g under the linear model:

Yg=θgXg+ϵ;Xg~N(0,1);ϵ~N(0,1).

Here, Xg is the predictor of interest, Yg is the response, and ϵ is white noise. The measure of association in which we are interested is the regression coefficient θg. We fix θ1=1 and consider θ2{-1,-.95,-.9,,.95,1}. A total of 1000 synthetic data sets are randomly generated for each θ2 and n{50,100,200,400,800,1600}.

For each synthetic data set, we perform the relative difference likelihood ratio test and the omnibus test with κ{1.5,2,4,8}, and use a significance level of α=.05. We additionally calculate κmaxα with α=.05. Parameter estimation is performed with ordinary least squares, and model-based estimates of the standard error are used. We compare the likelihood ratio test with the naïve test for absence/presence interactions described in Section 2.3, which rejects the null when there is evidence for the existence of an effect in one population but insufficient evidence in the other population. We also compare with a test for quantitative interactions, which rejects the null when there is evidence that θ1 and θ2 are unequal.

In Figure 5, we plot Monte Carlo estimates of the rejection probabilities for the relative difference likelihood ratio test, the naïve test, and the test for quantitative interactions over the range of θ2. We see that the relative difference likelihood ratio test achieves control of size for all choices of κ and all sample sizes. Power is largest when θ2 is near zero and is lower for larger values of κ, as expected, because larger values of κ correspond to a larger null region. With κ=8, the likelihood ratio test has almost zero power when the sample size is small, though we see an improvement in large sample settings.

Figure 5.

Figure 5

Monte Carlo estimate of the rejection probability of the relative difference likelihood ratio test and the naïve test. The white vertical line at θ2=0 corresponds to the single point at which the absence/presence hypothesis does not hold. The solid pink line denotes the specified size α=.05

The naïve test of the absence/presence hypothesis exhibits poor type-1 error control. Because the absence/presence does not hold only when θ2=0, a well-calibrated test would reject the null with probability no greater than α for any other value of θ2. However, we can see that the type- 1 error rate of the naïve test is greatly inflated when θ2 resides within a neighborhood of zero, and this is especially true in a small sample setting. Due to the test’s poor calibration, it is difficult to glean any meaningful information from a decision to reject the null hypothesis. While the relative difference test can also have a high rejection probability when θ2 is in a neighborhood of zero, it is more easily interpretable: because this test is well-calibrated, we are able to understand what a decision to reject the null implies about the relationship between θ1 and θ2.

Figure 6 shows the estimated rejection probabilities of the omnibus test for varying θ2. Control of size is achieved in both large and small samples, as expected. For this test, power increases as θ2 tends to −1, and as expected, power decreases with increasing κ.

Figure 6.

Figure 6

Monte Carlo estimate of the rejection probability of the omnibus test for qualitative interactions. The solid pink line denotes the specified size α=.05

In Figure 7. we plot the α, 0.5, and (1-α) quantiles of κmaxα values from 1000 synthetic data sets for each θ2. As we would expect, for any fixed θ2, at least 100(1-α)% of the κmaxα fall below θ1/θ2. In addition, the (1-α) quantile of the distribution κmaxα approaches θ1/θ2 as sample size increases. However, when the sample size is small, κmaxα tends to be much smaller than the relative difference in effect size. This suggests that κmaxα is a loose lower bound for the relative difference in effect in a small sample setting, but the bound becomes progressively tighter as the sample size grows.

Figure 7.

Figure 7

Distribution of κmaxα calculated on synthetic data. The .05, .50, and .95 quantiles are represented by the dashed blue and solid black curves. The grey curve represents θ1/θ2, the value κmaxα is expected to approach for a given θ2.

5 |. DATA EXAMPLE

In this example, we investigate genetic differences in breast cancer sub-types. Classification of breast cancer based on expression of estrogen receptor (ER) is known to be associated with clinical outcomes. Approximately 70% of breast cancers are estrogen receptor positive (ER+) cancers, meaning that estrogen causes cancer cells to grow23; breast cancers are otherwise estrogen receptor negative (ER−). Patients with ER+ breast cancer tend to experience better clinical outcomes than ER− patients24.

We conduct an analysis using publicly available data from The Cancer Genome Atlas (TCGA)25. We use clinical data and gene expression data from a total of 806 ER+ patients and 237 ER− patients.

We first investigate the differences between the genetic networks in ER+ and ER− breast cancers. Both ER+ and ER− breast cancer are expected to have similar pathways, but identifying differences between them may be key to understanding the underlying disease mechanisms. We then conduct an analysis to assess whether any genes in a set known to be associated with breast cancer are strongly prognostic of disease outcomes in only one of the estrogen receptor groups.

5.1 |. Differential Network Analysis

Our objective is to determine whether there are any pairs of genes that are much more strongly associated in one estrogen receptor group than the other. We consider the set of p=145 genes in the Kyoto Encyclopedia of Genes and Genomes (KEGG)26 breast cancer pathway and measure the association between gene expression levels using the Pearson correlation.

We test the relative difference null hypothesis for each pair of genes with κ{2,2.5,3} and provide the resulting p-values in Table 1. In Figure 8, we display the pairs of genes that are statistically significant at the α=.05 level with κ=2 after a Bonferroni adjustment. Each of the genes progesterone (PGR), insulin-like growth factor 1 (IGF1R), and estrogen receptor 1 (ESR1) have multiple differential connections; each belongs to at least two pairs such that the association is twice as strong in the ER+ population than in the ER− population. These genes have been shown in the literature to be associated with sub-type and prognosis27,28,29.

Table 1.

Bonferroni-adjusted p-values for pairs of genes that genes for which we reject the relative difference hypothesis with κ = 2, κ = 2.5, or κ = 3.

Gene Pair κ =2 κ = 2.5 κ =3

(IGF1R, AKT3) 0.034 1 1
(IGF1R, FGF1) 0.0011 0.16 1
(KIT, EGFR) 0.0069 1 1
(PGR, FGF7) 0.00034 0.17 1
(PGR, DLL1) 0.014 0.5 1
(PGR, NOTCH4) 6.5e-07 0.00054 0.04
(FZD4, PGR) 8.8e-08 0.00013 0.013
(FGF18, FGF10) 0.017 0.74 1
(E2F1,ESR1) 3.2e-07 0.001 0.13
(E2F2, ESR1) 0.00094 0.11 1

Figure 8.

Figure 8

(Left) Pairs of genes in the KEGG breast cancer pathway for which we reject the relative difference hypothesis with κ=2. Blue edges indicate associations that are stronger in the ER+ group, and red edges indicate associations that are stronger in the ER− group. (Right) Log hazard ratios for KEGG genes in ER+ and ER− groups. The gray dashed line represents the 45-degree line. Blue diamonds and red triangles indicate genes for which κmaxα>1 with α=.10, where the largest log hazard ratio is in the ER+ group and ER− group, respectively.

5.2 |. Prognostic Value of Biomarkers

The goal of this analysis is to assess whether any of the KEGG genes have a stronger association with time to death in one estrogen receptor group than in the other. For each gene, we fit a univariate Cox proportional-hazards model with time to death as the outcome in both of the estrogen receptor groups separately; we measure association using the log hazard ratio. A total of 64 deaths occurred in the ER+ group, and 33 deaths occurred the ER− group. We calculate κmaxα for α equal to .1, .05, .025, and .01, for each gene.

In Figure 8, we compare the log hazard ratios of the ER+ and ER− groups in a scatterplot. Though the log hazard ratios for most genes are similar between subgroups, there are twelve genes with κmaxα larger than one with α=0.10. A complete list is available in Table 2. The two genes with the strongest interactions are Growth Factor Receptor-bound Protein 2 (GRB2; κmax0.10=2.04), which has a stronger association in the ER− group, and Adenomatous Polyposis Coli (APC; κmax0.10=1.91), which has a stronger association in the ER+ group. Both genes have been hypothesized to be associated with breast cancer carcinogenesis30,31. We acknowledge, however, that for small α, none of the κmaxα values are much larger than one, and moreover, no κmaxα values exceed one at the Bonferroni-adjusted level .10∕145. This indicates that the relative difference in strength of association is possibly small for all genes, or alternatively, that we may not have sufficient power to detect large or moderate relative differences.

Table 2.

KEGG genes that are more strongly associated with time to death in one ER group than the other. Reported are κmaxα values for α ϵ {.1, .05, .025, .01}.

Gene ER+ Log HR (SE) ER− Log HR (SE) κmaxα
α = .1 α = .05 α = .025 α = .01

GRB2 −0.06 (0.31) −1.66 (0.68) 2.04 1.36 1.00 1.00
APC 1.34 (0.32) −0.09 (0.33) 1.91 1.57 1.32 1.08
BAX −1.05 (0.24) 0.04 (0.36) 1.53 1.26 1.06 1.00
PIK3CA 1.13 (0.28) 0.14 (0.32) 1.51 1.24 1.05 1.00
SOS2 1.13 (0.36) −0.1 (0.37) 1.33 1.03 1.00 1.00
MAP2K2 −0.87 (0.27) 0.03 (0.35) 1.22 1.00 1.00 1.00
GADD45G −0.52 (0.13) −0.07 (0.19) 1.21 1.00 1.00 1.00
HES5 0.02 (0.2) 0.51 (0.18) 1.19 1.00 1.00 1.00
WNT2 −0.36 (0.09) 0 (0.17) 1.14 1.00 1.00 1.00
DLL4 0.09 (0.2) 0.68 (0.27) 1.10 1.00 1.00 1.00
FRAT2 −1.22 (0.31) −0.45 (0.29) 1.08 1.00 1.00 1.00
SOS1 1.19 (0.3) −0.34 (0.42) 1.01 1.00 1.00 1.00

6 |. DISCUSSION

We proposed a general framework for inference about absence/presence qualitative interactions. We argued that naïve procedures rely upon untenable conditions because the absence/presence hypothesis is ill-posed. We thus proposed to relax the problem in order to conduct well-calibrated inference that maintains the absence/presence interpretation and only requires mild assumptions.

In simulations, we found that our methodology has low power when signal is weak or sample sizes are small. To an extent, this is just a feature of the problem; naturally, one would require even more information to detect absence/presence qualitative interactions than what is required to detect quantitative interactions. However, we provide no guarantee that our methodology is optimal, as tests for composite hypotheses based upon supremum p-values can be conservative in practice32.

Our framework is interpretable and provides a natural approach for quantifying differences in strength of association by sub-population in general settings. Though we only considered measures of marginal association in our examples, our method can be used with conditional measures of association as well; we only require that asymptotically normal estimators are available. In particular, our approach remains valid in the high-dimensional setting, where penalized estimators are biased33 but asymptotically normal estimates can be obtained using the techniques of, e.g., van de Geer et al.34 and Zhang and Zhang35. Finally, we demonstrated the usefulness of our method in a real data example based on genomics data

9 |. ACKNOWLEDGEMENTS

The authors gratefully acknowledge the support of the NSF Graduate Research Fellowship Program under grant DGE-1762114 as well as NSF grant DMS-1561814 and NIH grant R01-GM114029. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

APPENDIX

Theoretical Results for Relative Difference Likelihood Ratio Test

Below, we generalize Lemma 1 and Proposition 1 to the setting of unbalanced sample sizes n1n2.

Lemma 1.

The likelihood ratio test statistic Tk can be written as

Tκ=θˆmax-κθˆminτˆmax+κ2τˆmin

where τˆmax=n1-1σ121θ1=θmax+n2-1σ221θ2=θmax, and τˆmin=n1-1σ121θ1=θmin+n2-1σ221θ2=θmin.

Proof of Lemma 1.

We define θˆord and Σˆord as

θˆord=θˆmaxθˆminΣˆord=τˆmax00τˆmin.

Now, we define θord* as the projection of θˆord onto the null region. If θˆ is in the null region Θ0κ, the projection is equal to θˆord. Otherwise, θord* is the projection of θˆord onto the space spanned by (κ,1), with distance inversely weighted by Σord. We proceed by finding an expression for θord*.

θord*=κ1[(κ1)τˆmax00τˆminκ1]-1κ1τˆmax00τˆminθˆmaxθˆmin=κτˆmaxθˆmax+τˆminθˆminκ2τˆmax+τˆminκ1.

Now, the weighted distance between θˆord and θord* is

(θˆord-θord*)Σord-1(θˆord-θord*)=τˆmax-1θˆmax-κ2τˆmaxθˆmax+κτˆminθˆminκ2τˆmax+τˆmin2+τˆmin-1θˆmin-κτˆmaxθˆmax+τˆminθˆminκ2τˆmax+τˆmin2=τˆmax-1τˆmin-1θˆmax-κτˆmin-1θˆminκ2τˆmax-1+τˆmin-12+τˆmin-1κτˆmax-1θˆmax+κ2τˆmax-1θˆminκ2τˆmax-1+τˆmin-12=κ2τˆmax-2τˆmin-1+τˆmax-1τˆmin-2(κ2τˆmax-1+τˆmin-1)2(θˆmax-κθˆmin)2=(θˆmax-κθˆmin)2τˆmax+κ2τˆmin.

Hence, the likelihood ratio test rejects the null for large values of

(θˆmax-κθˆmin)2τˆmax+κ2τˆmin1(θˆmax-κθˆmin>0)

or equivalently, for large positive values of Tκ, where

Tκ=θˆmax-κθˆminτˆmax+κ2τˆmin

Proposition 1.

Let n1+n2=N, and assume n1/Nλ>0 as n1,n2. Then,

  1. In the interior of the null region, i.e., when θmax-κθmin<0,Tκ converges in distribution to -.

  2. At all nonzero boundary points of the null region, i.e., when θmax-κθmin<0,Tκ converges in distribution to a standard normal random variable.

Proof of Proposition 1.

First we prove (i). Suppose θmax-κθmin=b<0. Then consistency of θˆ and the continuous mapping theorem imply that θˆmax-κθˆminpb. Now, because n1-1σ12 and n2-1σ22 both tend to zero, 1τˆmax+κ2τˆminp. Therefore, Tκp-.

Now we prove (ii). Suppose θmax=κθmin>0. By applying the delta method,

n1n2N(θˆmax-κθˆmin)=n1n2N[(max{|θˆ1|,|θˆ2|}-κmin{|θˆ1|,|θˆ2|})-maxθ1,θ2-κminθ1,θ2]=n1n2NMθˆ1-θ1θˆ2-θ2+op(1),

where the matrix M is

M=(-κ)1θ1=θminsignθ100(-κ)1θ2=θminsignθ2.

Now, Slutsky’s theorem implies that

n1n2NMθˆ1-θ1θˆ2-θ2+op(1)dN0,(1-λ)κ21θ1=θminσ12+λκ21θ2=θminσ22.

The test statistic Tκ can be written as

Tκ=θˆmax-κθˆminτˆmax+κ2τˆmin=n1n2N(θˆmax-κθˆmin)n1n2Nτˆmax2+n1n2Nκ2τˆmin2,

with τˆmax and τˆmin as defined in Lemma 1. By the continuous mapping theorem, the denominator converges in probability to

(1-λ)κ21θ1=θminσ12+λκ21θ2=θminσ22.

Thus, Slutsky’s theorem implies that

TκdN(0,1).

In Proposition 2, we characterize the local asymptotic behavior of the relative difference likelihood ratio test.

Proposition 2.

Let n1+n2=N, and assume n1/Nλ>0 as n1,n2. Suppose θ1=n1-1/2c1, and θ2=n2-1/2c2. Further, assume that θˆ is locally regular in the sense that ngθˆgdNcg,σg2 for g{1,2}. Then as n1,n2, for all t>0,PTκ>t converges to

PW11>t,W12>t+PW11<-t,W12<-t+PW21>t,W22>t+PW21<-t,W22<-t,

where W11,W12 follows a bivariate normal distribution with mean c˜11,c˜12, unit variance and correlation v1, and W21,W22 follows a bivariate normal distribution with mean c˜21,c˜22, unit variance and correlation v2, with c˜11, c˜12, c˜21, c˜22, v1,v2 are defined as

c˜11=(1-λ)c1-κλc2(1-λ)σ12+κ2λσ22
c˜12=(1-λ)c1+κλc2(1-λ)σ12+κ2λσ22
c˜21=λc2-κ(1-λ)c1κ2(1-λ)σ12+λσ22
c˜22=λc2+κ(1-λ)c1κ2(1-λ)σ12+λσ22
v1=(1-λ)σ12-κ2λσ22(1-λ)σ12+κ2λσ22
v2=λσ22-κ2(1-λ)σ12κ2(1-λ)σ12+λσ22.
Remark 1.

Setting c1=c2=0, we obtain the limiting distribution of Tκ when θ1=θ2=0, which is critical for calculating the p-value ρκ(t).

Proof of Proposition 2.

We first re-write the tail probability as

Pθˆmax-κθˆminτˆmax+κ2τˆmin>t=Pθˆ1-κθˆ2n1-1σ12+κ2n2-1σ22>t,θˆ1+κθˆ2n1-1σ12+κ2n2-1σ22>t+Pθˆ1-κθˆ2n1-1σ12+κ2n2-1σ22<-t,θˆ1+κθˆ2n1-1σ12+κ2n2-1σ22<-t+Pθˆ2-κθˆ1κ2n1-1σ12+n2-1σ22>t,θˆ2+κθˆ1κ2n1-1σ12+n2-1σ22>t+Pθˆ2-κθˆ1κ2n1-1σ12+n2-1σ22<-t,θˆ2+κθˆ1κ2n1-1σ12+n2-1σ22<-t.

Thus,

n1n2Nθˆ1θˆ2dN(1-λ)c1λc2,(1-λ)σ1200λσ22.

And,

n1n2N1-κ1κθˆ1θˆ2dN(1-λ)c1-κλc2(1-λ)c1+κλc2,(1-λ)σ12+κ2λσ22(1-λ)σ12-κ2λσ22(1-λ)σ12-κ2λσ22κ2(1-λ)σ12+λσ22.

Now,

θˆ1-κθˆ2n1-1σ12+κ2n2-1σ22θˆ1+κθˆ2n1-1σ12+κ2n2-1σ22=n1n2N(θˆ1-κθˆ2)κ2n2N-1σ12+n1N-1σ22n1n2N(κθˆ1-θˆ2)κ2n2N-1σ12+n1-1Nσ22dNc˜11c˜12,1v1v11.

Similarly,

θˆ2-κθˆ1κ2n1-1σ12+n2-1σ22θˆ2+κθˆ1κ2n1-1σ12+n2-1σ22dNc˜21c˜22,1v2v21.

Theoretical Results for Simultaneous Test for Qualitative Interactions

In what follows, we generalize Lemma 2 and Proposition 3 to the case of unbalanced sample sizes.

Lemma 2.

The likelihood ratio statistic TP/N,κ can be written as

TP/N,κ=min(θˆ1-κθˆ2)2n1-1σ12+κ2n2-1σ22,(κθˆ1-θˆ2)2κ2n1-1σ12+n2-1σ221θˆΘ1P/N,κ.
Proof of Lemma 2.

If θˆ belongs to the null region, Θ0P/N,κ, the likelihood ratio statistic is clearly zero. If θˆ1>κθˆ2>0 or θˆ1<κθˆ2<0, the likelihood ratio statistic is minimum of the distances between θˆ and its projections onto the span of (1,κ) and the span of (-κ,-1). Algebra (similar to the proof of Lemma 1) gives the desired result, and similarly if θˆ1>-κθˆ2>0 or θˆ1<-κθˆ2<0. □

Proposition 3.

Let n1+n2=N, and assume n1/Nλ>0. Then,

  1. If θ belongs to the interior of the null region Θ0P/N,κ,TP/N,κ converges in distribution to zero.

  2. If θ is on the boundary of the null region, but θ0, P(TP/N,κ>t)12Pχ12>t as n1,n2.

Proof of Proposition 3.

We first prove (i). If θ belongs to the interior of the null region, consistency of θˆ and the continuous mapping theorem imply that 1θˆΘ1P/N,κ converges in probability to zero. Then, by Slutsky’s theorem, TP/N,κ converges in distribution to zero.

To prove (ii), recall that we can write the null and alternative regions as

Θ0P/N,κ=0<κ-1θ1<θ2<κθ1κθ1<θ2<κ-1θ1<0
Θ1P/N,κ=θ1>κθ2θ1>0-θ1>-κθ2θ1<0θ2>κθ1θ2>0-θ2>-κθ1θ2<0.

Now, we write

P(TP/N,κ>t)=P(θˆ1-κθˆ2)2n1-1σ12+κ2n2-1σ221(θˆΘ1)>t,(κθˆ1-θˆ2)2κ2n1-1σ12+n2-1σ221(θˆΘ1)>t=Pθˆ1-κθˆ2n1-1σ12+κ2n2-1σ221(θˆΘ1)>t,κθˆ1-θˆ2κ2n1-1σ12+n2-1σ221(θˆΘ1)>t+Pθˆ1-κθˆ2n1-1σ12+κ2n2-1σ221(θˆΘ1)>t,κθˆ1-θˆ2κ2n1-1σ12+n2-1σ221(θˆΘ1)<-t+Pθˆ1-κθˆ2n1-1σ12+κ2n2-1σ221(θˆΘ1)<-t,κθˆ1-θˆ2κ2n1-1σ12+n2-1σ221(θˆΘ1)>t+Pθˆ1-κθˆ2n1-1σ12+κ2n2-1σ221(θˆΘ1)<-t,κθˆ1-θˆ2κ2n1-1σ12+n2-1σ221(θˆΘ1)<-t.

The second and third summands above are exactly equal to 0. To see this, note that in the second term θˆ1>κθˆ2 and θˆ1<κ-1θˆ2 implies that θˆΘ0P/N,κ, and similarly for the third term. In the first and last summands, we can ignore the term 1θˆΘ1P/N,κ. To see this, note that in the first term θˆ1>κθˆ2 and θˆ1>κ-1θˆ2 imply that θˆΘ1P/N,κ; a similar argument holds for the fourth term. Thus,

P(TP/N,κ>t)=Pθˆ1-κθˆ2n1-1σ12+κ2n2-1σ22>t,κθˆ1-θˆ2κ2n1-1σ12+n2-1σ22>t+Pθˆ1-κθˆ2n1-1σ12+κ2n2-1σ22<-t,κθˆ1-θˆ2κ2n1-1σ12+n2-1σ22<-t.

Suppose, for the moment, θ1=κθ2>0. Then

n1n2Nθˆ1-θ1θˆ2-θ2dN0,(1-λ)σ1200λσ22.

Therefore,

n1n2Nθˆ1-κθˆ2κθˆ1-θˆ2=n1n2Nθˆ1-κθˆ2κ(θˆ1-θ1)-(θˆ2-θ2)+(κθ1-θ2)dN0,(1-λ)σ12+κ2λσ22κ(1-λ)σ12+κλσ22κ(1-λ)σ12+κλσ22κ2(1-λ)σ12+λσ22+0.

Thus,

PTP/N,κ>t)1-Φ(t)=12Pχ12>t.

A similar argument follows if θ1=κθ2<0, θ2=κθ1>0, or θ2=κθ1<0. □

In Proposition 4, we characterize the local asymptotic behavior of the omnibus test for qualitative interactions.

Proposition 4.

Let n1+n2=N, and assume n1/Nλ>0 as n1,n2. Suppose θ1=n1-1/2c1,θ2n-1/2c2 with c1,c20. Further, assume θˆ is locally regular in the sense that ngθˆgdNcg,σg2 for g{1,2}. Then as n1,n2, for all t>0,P(TP/N,κ>t) converges to

PV1>t,V2>t+PV1<-t,V2<-t,

where V1,V2followsabivariatenormaldistributionwithmeanc˜1,c˜2, unit variance, and correlation v, where

c˜1=(1-λ)c1-κλc2(1-λ)σ12+κ2λσ22
c˜2=κ(1-λ)c1-λc2κ2(1-λ)σ12+λσ22
v=κ(1-λ)σ12+κλσ22(1-λ)σ12+κ2λσ22κ2(1-λ)σ12+λσ22.
Remark 2.

Setting c1=c2=0, we obtain the limiting distribution of TP/N,κ when θ1=θ2=0, which is critical for calculating the p-value ρP/N,κ(t).

Proof of Proposition 4.

First,

n1n2Nθˆ1θˆ2dN(1-λ)c1λc2,(1-λ)σ1200λσ22.

Also,

n1n2N1-κκ-1θˆ1θˆ2dN(1-λ)c1-κλc2κ(1-λ)c1-λc2,(1-λ)σ12+κ2λσ22κ(1-λ)σ12+κλσ22κ(1-λ)σ12+κλσ22κ2(1-λ)σ12+λσ22.

Thus,

θˆ1-κθˆ2n1-1σ12+κ2n2-1σ22κθˆ1-θˆ2κ2n1-1σ12+n2-1σ22=n1n2N(θˆ1-κθˆ2)n2N-1σ12+κ2n1N-1σ22n1n2N(κθˆ1-θˆ2)κ2n2N-1σ12+n1-1Nσ22dNc˜1c˜2,1vv1.

Application of (6) completes the argument. □

Remark 3.

The local asymptotic power of the omnibus test at the level α can be obtained by considering βαP/N,κc1,c2limnPTP/N,κ>t1-α*θ=n-1/2c1,c2, where we define t1-α* as the maximum of the 1-α quantiles of the limiting distribution of TP/N,κ under scenarios (i) and (ii) in Proposition 3. An analytic finite-sample approximation of the power can be calculated as βαP/N,κn1/2θ1,n1/2θ2.

Calculation of Lower Bound for Relative Difference in Effect

Define tκ as a realization of the likelihood ratio statistic Tκ for the relative difference hypothesis, calculated on the observed data. For a pre-specified α<1, recall that κmaxαsupκ:κ>1,ρκtκ<α is defined as the largest κ>1 such that the likelihood ratio test rejects the relative difference null. Defining Fκ(t)limn1,n2PTκ>tθ=0, we can evaluate κmaxα as

κmaxα=minsupκ:κ>1,1-Φtκ<α,supκ:κ>1,1-Fκtκ<αminπ1,π2.

Therefore, we are only required to calculate π1 and π2.

Both π1 and π2 can be calculated with root-finding algorithms. To see this, we first observe that tκ is monotone in κ, recalling that the numerator of Tκ=θˆmax-κθˆminτˆmax+κ2τˆmin decreases in κ while the denominator increases. Monotonicity of Φ() implies that 1-Φtκ is monotone in κ. Now, applying of Proposition 2,

1-FKt=2PW11>t,W12>t+PW21>t,W22>t,

where W11,W12 and W21,W22 follow bivariate normal distributions with mean zero, unit variance, and correlations v1 and v2, as defined in Proposition 2, respectively. Both v1 and v2 are monotone decreasing in κ, so Theorem 8 in36 implies that 1-Fκ(t) is monotone in t. Thus 1-Fκtκ is monotone in κ.

Monotonicity of 1-Φtκ and 1-Fκtκ implies that if the likelihood ratio test rejects the null hypothesis for some κ>1,π1 and π2 are the unique roots of

f1(κ)=1-Φtκ-α
f2(κ)=1-Fκtκ-α,

respectively. These roots can be easily calculated via, e.g., the bisection method. In our implementation, we use the uniroot function in R. □

Footnotes

7 | SOFTWARE AVAILABILITY

Implementation of the proposed methodology and code to reproduce results from the data analysis are available at https://github.com/awhudson/QualitativeInteractions.

8 |. DATA AVAILABILITY

The TCGA data are available using the public R package RTCGA.

10 | BIBLIOGRAPHY

References

  • 1.VanderWeele TJ, Knol MJ. A tutorial on interaction. Epidemiologic methods 2014; 3(1): 33–72. [Google Scholar]
  • 2.Greenland S Tests for interaction in epidemiologic studies: a review and a study of power. Statistics in medicine 1983; 2(2): 243–251. [DOI] [PubMed] [Google Scholar]
  • 3.Gonzalez dAB, Cox DR. Interpretation of interaction: A review. The Annals of Applied Statistics 2007; 1(2): 371–385. [Google Scholar]
  • 4.Ideker T, Krogan NJ. Differential network biology. Molecular systems biology 2012; 8(1): 565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Shojaie A Differential network analysis: A statistical perspective. Wiley Interdisciplinary Reviews: Computational Statistics 2021; 13(2): e1508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gail M, Simon R. Testing for qualitative interactions between treatment effects and patient subsets.. Biometrics 1985; 41(2): 361–372. [PubMed] [Google Scholar]
  • 7.VanderWeele TJ. The Interaction Continuum. Epidemiology (Cambridge, Mass.) 2019; 30(5): 648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Piantadosi S, Gail M. A comparison of the power of two tests for qualitative interactions. Statistics in Medicine 1993; 12(13): 1239–1248. [DOI] [PubMed] [Google Scholar]
  • 9.Pan G, Wolfe DA. Test for qualitative interaction of clinical significance. Statistics in medicine 1997; 16(14): 1645–1652. [DOI] [PubMed] [Google Scholar]
  • 10.Silvapulle MJ. Tests against qualitative interaction: Exact critical values and robust tests. Biometrics 2001; 57(4): 1157–1165. [DOI] [PubMed] [Google Scholar]
  • 11.Li J, Chan IS. Detecting qualitative interactions in clinical trials: an extension of range test. Journal of Biopharmaceutical statistics 2006; 16(6): 831–841. [DOI] [PubMed] [Google Scholar]
  • 12.Bayman EÖ, Chaloner K, Cowles MK. Detecting qualitative interaction: a Bayesian approach. Statistics in medicine 2010; 29(4): 455–463. [DOI] [PubMed] [Google Scholar]
  • 13.Gunter L, Zhu J, Murphy S. Variable selection for qualitative interactions in personalized medicine while controlling the family-wise error rate. Journal of biopharmaceutical statistics 2011; 21(6): 1063–1078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Roth J, Simon N. A framework for estimating and testing qualitative interactions with applications to predictive biomarkers. Biostatistics 2018; 19(3): 263–280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Dusseldorp E, Van Mechelen I. Qualitative interaction trees: a tool to identify qualitative treatment–subgroup interactions. Statistics in medicine 2014; 33(2): 219–237. [DOI] [PubMed] [Google Scholar]
  • 16.Poulson RS, Gadbury GL, Allison DB. Treatment heterogeneity and individual qualitative interaction. The American Statistician 2012; 66(1): 16–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Chen W, Ghosh D, Raghunathan TE, Norkin M, Sargent DJ, Bepler G. On Bayesian methods of exploring qualitative interactions for targeted treatment. Statistics in Medicine 2012; 31(28): 3693–3707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Casella G, Berger RL. Statistical inference. 2. Duxbury Pacific Grove, CA. 2002. [Google Scholar]
  • 19.Zhao S, Shojaie A. Network differential connectivity analysis. The Annals of Applied Statistics 2022; 16(4): 2166–2182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology 2014; 15(12): 1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.McCarthy DJ, Smyth GK. Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 2009; 25(6): 765–771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Fieller EC. The biological standardization of insulin. Supplement to the Journal of the Royal Statistical Society 1940; 7(1): 1–64. [Google Scholar]
  • 23.Lumachi F, Brunello A, Maruzzo M, Basso U, Mm Basso S. Treatment of estrogen receptor-positive breast cancer. Current medicinal chemistry 2013; 20(5): 596–604. [DOI] [PubMed] [Google Scholar]
  • 24.Carey LA, Perou CM, Livasy CA, et al. Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. Jama 2006; 295(21): 2492–2502. [DOI] [PubMed] [Google Scholar]
  • 25.Weinstein JN, Collisson EA, Mills GB, et al. The cancer genome atlas pan-cancer analysis project. Nature genetics 2013; 45(10): 1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 2000; 28(1): 27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Farabaugh SM,Boone DN,Lee AV. RoleofIGF1Rinbreastcancersubtypes,stemness,andlineagedifferentiation. Frontiers in endocrinology 2015; 6: 59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Reinert T, Coelho GP, Mandelli J, et al. Association of ESR1 Mutations and Visceral Metastasis in Patients with Estrogen Receptor-Positive Advanced Breast Cancer from Brazil. Journal of oncology 2019; 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kurozumi S, Matsumoto H, Hayashi Y, et al. Power of PgR expression as a prognostic factor for ER-positive/HER2-negative breast cancer patients at intermediate risk classified by the Ki67 labeling index. BMC cancer 2017; 17(1): 354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Daly RJ, Binder MD, Sutherland RL. Overexpression of the Grb2 gene in human breast cancer cell lines.. Oncogene 1994; 9(9): 2723–2727. [PubMed] [Google Scholar]
  • 31.Jin Z, Tamura G, Tsuchiya T, et al. Adenomatous polyposis coli (APC) gene promoter hypermethylation in primary breast cancers. British journal of cancer 2001; 85(1): 69–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Bayarri M, Berger JO. P values for composite null models. Journal of the American Statistical Association 2000; 95(452): 1127–1142. [Google Scholar]
  • 33.Voorman A, Shojaie A, Witten D. Inference in high dimensions with the penalized score test. arXiv preprint arXiv:1401.2678 2014. [Google Scholar]
  • 34.Geer v. dS, Bühlmann P, Ritov Y, Dezeure R, others. On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics 2014; 42(3): 1166–1202. [Google Scholar]
  • 35.Zhang CH, Zhang SS. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2014; 76(1): 217–242. [Google Scholar]
  • 36.Müller A Stochastic ordering of multivariate normal distributions. Annals of the Institute of Statistical Mathematics 2001; 53(3): 567–575. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The TCGA data are available using the public R package RTCGA.

RESOURCES