Abstract
Qualitative interactions occur when a treatment effect or measure of association varies in sign by sub-population. Of particular interest in many biomedical settings are absence/presence qualitative interactions, which occur when an effect is present in one sub-population but absent in another. Absence/presence interactions arise in emerging applications in precision medicine, where the objective is to identify a set of predictive biomarkers that have prognostic value for clinical outcomes in some sub-population but not others. They also arise naturally in gene regulatory network inference, where the goal is to identify differences in networks corresponding to diseased and healthy individuals, or to different subtypes of disease; such differences lead to identification of network-based biomarkers for diseases. In this paper, we argue that while the absence/presence hypothesis is important, developing a statistical test for this hypothesis is an intractable problem. To overcome this challenge, we approximate the problem in a novel inference framework. In particular, we propose to make inferences about absence/presence interactions by quantifying the relative difference in effect size, reasoning that when the relative difference is large, an absence/presenceinteractionoccurs.Theproposedmethodologyisillustratedthrougha simulation study as well as an analysis of breast cancer data from the Cancer Genome Atlas.
Keywords: Qualitative interactions, Differential network connectivity, Precision medicine
1 |. INTRODUCTION
An objective of many biomedical studies is to identify and test for interactions, which arise when a measure of effect or association between variables differs by sub-population1,2,3. Precision medicine and genetic network inference provide examples of areas in which interactions are of interest. For instance, researchers in precision medicine seek to understand how patients’ characteristics are associated with heterogeneity in response to treatment. In omics data analysis, it is of interest to determine how biological networks, which summarize the associations between genes/proteins/etc, depend on phenotype.
Interactions may lack clinical or scientific significance when differences in effect are small. In addition to detecting interactions, it is important to identify which are meaningful. For example, in precision medicine, the most important differences in treatment effect may be those in which some sub-populations of patients benefit from the treatment, while other sub-populations are harmed or unaffected. Additionally, one may want to identify differences among sub-populations in the set of biomarkers that have prognostic value for a health outcome — that is, to determine whether some biomarkers are predictive of the outcome in only a subset of the full population. Genetic network inference provides another example: When comparing sub-population level genetic networks, it may be of primary interest to identify pairs of genes that share an association in some sub-populations but share no association in others, or to identify pairs that have a positive association in one sub-population and a negative association in another. This is known as differential network analysis4,5.
Such qualitative interactions are the focus of this paper. Qualitative interactions occur when a measure of association differs in sign by sub-population. We consider two types of qualitative interactions: positive/negative interactions — also known in the literature as cross-over interactions6 — and absence/presence interactions — sometimes referred to as pure interactions7. Positive/negative interactions occur when an effect is positive in one sub-population and negative in another, and absence/presence interactions occur when the effect is present in one population but absent in another.
Our objective is to formally test for qualitative interactions, given independent samples from each sub-population. Testing for positive/negative interactions is well-studied, with several procedures available for identifying positive/negative interactions in a fixed number of sub-populations6,8,9,10,11,12. More recent work has also extended to identifying interactions when there is a large, or possibly infinite, number of sub-groups13,14,15,16,17. In contrast to positive/negative interactions, testing for absence/presence interactions has received little attention. Naïve approaches, to be discussed in Section 2.3, require an untenable minimum signal strength condition — that if an effect is present in any sub-population, it is large enough to be detected with absolute certainty. No approaches exist, to the best of our knowledge, that avoid this assumption.
In this paper, we propose a novel framework for inference about absence/presence interactions in two sample settings. Our proposed methodology allows for well-calibrated hypothesis testing under mild assumptions. We also introduce a numerical summary that measures the strength of absence/presence interactions, while accounting for the uncertainty associated with parameter estimation. Additionally, we describe methods for simultaneous inference about absence/presence and positive/negative interactions. The methodology we introduce provides an effective and flexible inference tool in precision medicine and genetic network analysis, as we illustrate in simulations and an analysis of breast cancer data from The Cancer Genome Atlas (TCGA).
2 |. BACKGROUND
2.1 |. Notation
As we begin to formalize the problem, we first introduce some notation. We consider two sub-populations, labeled by . Let denote a measure of association in sub-population . For simplicity, we assume that is finite. When convenient, we write . We can consider various measures of association, such as: correlation coefficients, indicating the strength of linear relationship between two variables of interest; odds ratios, describing the relationship between predictors and a binary outcome; and log hazard ratios, describing the association between predictors and a time-to-event outcome.
We assume that given i.i.d. samples of size and from each sub-population, -consistent and asymptotically normal estimators of are available, i.e.,
with denoting the asymptotic variance. For expositional simplicity, we assume balanced sample sizes (key results are stated more generally in the Appendix). We also assume is known, though we can instead use a consistent estimate, as is commonly done in practice.
We now formally state the null hypotheses of no positive/negative interactions and no absence/presence interaction, labeled and , respectively:
| (1) |
| (2) |
We let and denote the corresponding null and alternative regions of the parameter space, depicted in the left and center panels of Figure 1 (Recall that the null region is the set of parameters such that the null hypothesis holds, and the alternative region is the complement of the null region.) The positive/negative null region is the union of the the non-negative and non-positive orthants, and the absence/presence null region is the union of all open orthants and the origin.
Figure 1.

Null and alternative regions of parameter space for positive/negative (left; introduced in Section 2.1), absence/presence (center; introduced in Section 2.1), and relative difference hypotheses (right; introduced in Section 3.1)
2.2 |. Testing Composite Null Hypotheses
Our goal is to use the estimate to perform tests of and such that the size is controlled asymptotically under mild assumptions. Recall that for a null hypothesis with accompanying null region , the size of a test is defined as
In words, the size is the largest possible type-I error rate that could be achieved under any probability distribution given that belongs to the null region.
Here, we describe our general approach for controlling the size at a pre-specified level . We first define a test statistic , a map from observable data to a real-valued number, with larger values of corresponding to more evidence against the null hypothesis. We then calculate the test statistic on the observed data, which we denote by . We write as the probability of observing a random test statistic at least as large as the observed value , assuming . We reject the null hypothesis if
One can think of as the p-value under a specific null distribution ; is then the largest of all such p-values. We reject when there is sufficient evidence to reject all hypotheses . We can view as a generalization of the usual p-value for simple null hypotheses to tests with composite null hypotheses, and we will simply refer to as “p-value”. Tests of the above form are guaranteed to control the size18.
2.3 |. Existing Methodology
We now review existing approaches to test for qualitative interactions. We first discuss testing positive/negative interactions before moving to absence/presence interactions.
Gail and Simon6 developed the most widely used procedure to test for positive/negative interactions. Though Gail and Simon proposed a general -sample test, we focus on the two-sample problem in this paper. We note that various -sample tests for positive/negative interactions have been proposed8,10,11, but these procedures are essentially equivalent to the Gail-Simon test in the two-sample setting.
Gail and Simon’s approach is to perform a likelihood ratio test based on the asymptotic sampling distribution of . The likelihood ratio test rejects for large values of
where is the standard normal density. By performing algebraic manipulations, one can show that the likelihood ratio test equivalently rejects the null for large values of
The test statistic can be interpreted as the shortest distance between and the null region, where the distance is inversely weighted by the asymptotic variances of the estimates, as illustrated in the left panel of Figure 2
Figure 2.

Geometric interpretation of likelihood ratio statistic for the positive/negative hypothesis (left; introduced in Section 2.3) and relative difference hypotheses (right; introduced in Section 3.2)
Gail and Simon show that the test statistic can be calculated as
where denotes the indicator function. Furthermore, one can verify that for all , the p-value is easily calculated as
The likelihood ratio test is quite intuitive and rejects the null when a positive estimate of association is observed in one population, a negative estimate is observed in another population, and both associations are statistically significant.
Now, we discuss approaches to test for absence/presence interactions. While one might be tempted to perform a likelihood ratio test for absence/presence interactions, the likelihood ratio test fails in the sense that it never can reject the null. To see this, we first recognize that, similar to the positive/negative interaction test, the absence/presence likelihood ratio test would reject for large values of
Again, the test statistic is the shortest distance between and the null region . However, because the alternative region has zero area, lies in the null region with probability one. Therefore, the test statistic is always 0, and the likelihood ratio test has no power.
One might alternatively attempt to test the absence/presence null by separately testing the null hypotheses for and rejecting the absence/presence null when is rejected for one and not rejected for the other. To control the size of a test of this form, which we refer to as the naïve test for the absence/presence null, we need to simultaneously control the type-I error under two scenarios: (1) there is an association in both sub-populations, and (2) there is no association in either population. When there is an association in both populations, a type-I error occurs when we incorrectly fail to reject one of . When there is no association in either population, we make a type-I error when we incorrectly reject one of . Thus, controlling the size of the test for using this approach requires simultaneous control of the type-I error rate and type-II error rate of tests for . If the tests for are consistent — that is, the type-II error rates tend to zero with sufficiently large samples — this approach is asymptotically valid, as only the type-I error rates for tests of need to be controlled. However, with even moderately large samples, we will not be able to correctly reject false with absolute certainty unless the true association is strong and hence easy to detect. This would make the test of unreliable in the presence of weak signal.
Next, we argue that the naïve test controls the type-I error only if , which is similar to the assumption recently made for differential network connectivity analysis 19. To see this, we construct a more formal argument. For simplicity, suppose . We consider tests of of the form
where is a constant that would be selected to control the size. The probability of rejecting the absence/presence null is
Suppose and are guaranteed to be greater than if they are both nonzero. Then, for , and . Therefore, to control the size, we are only required to select so that the type-I error is controlled when ; this can be done by taking as the quantile of the standard normal distribution. However, if we allow , we can see a drastically inflated type-I error rate. For instance, for a small , let , . This asymptotic regime allows us to approximate the type-1 error rate of this test in settings where the sample size is large relative to the signal strength. Then while , so . Thus, when small signal is permitted, tests of this form will be asymptotically anti-conservative.
Both approaches discussed above for testing absence/presence interactions — namely, the likelihood ratio test and the naïve test — fail for a similar reason: it is difficult to gather evidence supporting that a measure of association is exactly equal to zero. This is captured by the alternative region having zero area, causing the failure of the first approach. In the second approach, to obtain evidence supporting that an association is zero, we require that is only accepted when ; for this, we rely upon a minimum signal strength condition to guarantee that any non-zero association is detected.
3 |. PROPOSED METHODOLOGY
3.1 |. Refinement of Absence/Presence Hypothesis
To mitigate the challenges described in Section 2, we consider a refinement of the absence/presence null hypothesis. The key idea is that in practice, absence/presence interactions can be approximated by considering the settings where an association is at least moderately large in one population and negligible or near zero in the other; or when one association is substantially stronger than the other. This means that we can expand the alternative region to include neighborhoods of zero in a way that the absence/presence interpretation is preserved.
Recall that when there exists an absence/presence interaction, the ratio of the maximum of the absolute value of the to the minimum is infinite. We cannot test that the ratio is infinite because we will never have evidence to support that the denominator is exactly zero. However, we can test that the ratio is large because we may have evidence to support that the denominator is very small. Motivated by this intuition, we propose to test whether the relative difference between sub-population measures of association is greater than a large pre-specified constant . Formally, let and . We define the new relative difference null hypothesis as
Equivalently,
| (3) |
The null region , illustrated in the right panel of Figure 1, is the union of four linear subspaces — each residing in a separate orthant of ; the boundary of each subspace is the union of the spans of vectors with absolute direction and .
The relative difference null region can be viewed as a relaxation of the absence/presence null. For a large choice of , the original and refined null hypotheses have a similar interpretation: the greater measure of association is substantially larger than the lesser. However, the hypotheses may not coincide for any value of as the relative difference hypothesis can hold if is non-zero while is non-zero but small. It is nonetheless appealing that has a reasonable interpretation for any choice of ; that is, the multiplicative difference in strength of association is no larger than .
To motivate defining the refined null hypothesis in terms of relative differences rather than absolute differences, we argue that testing for relative differences has at least the following benefits. First, relative differences are unitless, so the relative difference null is compatible with unitless measures of association such as the Pearson correlation coefficient, which are often preferred in the analysis of biological data. Second, it is possible that in some contexts, could be selected without prior knowledge on the range of strength of association. For instance, in studies of differential gene expression in two groups, it is common for investigators to focus on identifying genes for which the relative difference in the mean expression level exceeds a user-specified threshold. The threshold is chosen to be scientifically meaningful, based on the investigator’s own discretion20,21. When the goal is identifying pairs of genes for which the correlation between gene expression levels differs by group, an investigator could similarly select a scientifically meaningful threshold for the relative difference in correlation at their own discretion.
Of course, the relative difference null hypothesis depends on the choice of . For small values of , the relative difference null may be too dissimilar from the absence/presence null to retain its interpretation; for large , the alternative region becomes very small, and the hypothesis may be overly conservative. In settings where there is a scientifically meaningful threshold for the relative difference in effect size, can be specified a priori based on its interpretation, often at the investigator’s discretion. In the following subsections, we first construct a test of the relative difference null hypothesis for a pre-specified . We then describe an approach to identify the set of values for which the test rejects the null hypothesis. This method can be useful when there is no clear scientifically meaningful threshold for the relative difference in effects.
3.2 |. Likelihood Ratio Test for Relative Difference Hypothesis
We next develop a testing procedure for the new relative difference null hypothesis. The relative difference null region, unlike the absence/presence null region, has a non-zero area, so a likelihood ratio test will not fail in the same manner as the likelihood ratio test for the absence/presence hypothesis. Similar to the previously discussed examples, the likelihood ratio test statistic is
and can be interpreted as the shortest (weighted) distance between and the null region , as depicted in the right panel of Figure 2 Clearly, the test statistic is zero whenever lies in the null region. Otherwise, is the shortest distance between and the closest of the four linear subspaces that define . The test statistic can be calculated as the distance between and its projection onto the span of the vector .
The likelihood ratio test statistic is straightforward to calculate. Let and be the strongest and weakest estimated absolute association, respectively. In the following lemma, we state that is equal to the difference between and divided by a normalizing constant. Thus, the test statistic can be viewed as a plug-in of into (3) with an additional normalizing constant. This test statistic closely resembles the test statistic proposed by Fieller22 to conduct inference about ratios of means.
Lemma 1.
The likelihood ratio test statistic can be written as
where , and .
We now discuss how to obtain a p-value for the relative difference hypothesis. First, we obtain an observed test statistic , a realization of calculated from the data. Following the approach described in Section 2.2, we define the p-value as , the largest of all asymptotic tail probabilities such that belongs to the null region. To determine the maximum tail probability, we characterize the limiting distribution of assuming for all in the null region.
Though the null region contains an infinite number of values, can only attain one of three limiting distributions corresponding to the following three cases:
The true association is in the interior of the null region, i.e., .
The true association is on the boundary of the null region, but both associations are non-zero, i.e., .
The true association is zero in both sub-populations, i.e., .
In Proposition 1, we describe the asymptotic behavior of for cases 1 and 2 above. We provide here some intuition for the result and reserve a formal argument for the Appendix. In case 1, it is easy to argue that because is consistent, is a negative number with probability tending to one, and therefore never provides evidence against the null. In case 2, asymptotically follows a standard normal distribution. To see this, we note that because both associations are non-zero, consistency and asymptotic normality of imply that, for large , the sign of the estimators and the ranking of their magnitudes are deterministic. That the signs are asymptotically deterministic implies that is asymptotically normal (speaking loosely, ); that ranking is asymptotically deterministic implies that and are asymptotically independent. Therefore, taking the difference between and , suitably standardized, is asymptotically equivalent to taking a difference between two independent normal random variables with equal means. Dividing by the asymptotic variances gives the claimed result.
Proposition 1.
In the interior of the null region, i.e., when , converges in distribution to . At all nonzero boundary points of the null region, i.e., when , converges in distribution to a standard normal random variable.
In case 3, when both associations are zero, the asymptotic distribution of the test statistic is more complicated. More specifically, implies that, asymptotically, follows a half-normal distribution instead of a normal distribution. Moreover, the ranking of and remains random in the limit. This gives rise to a non-standard limiting distribution. In particular, unlike cases 1 and 2, the limiting distribution depends on the asymptotic variances of the sub-population estimates (and also the ratio of the sample sizes in the unbalanced case). We are nonetheless able to derive an analytic expression for the distribution function, stated in Proposition 2. We reserve this statement for the Appendix, as the expression is cumbersome.
By Propositions 1 and 2, we can calculate the p-value as the maximum of the tail probabilities in cases 2 and 3. That is,
| (4) |
where denotes the standard normal distribution function, and is the limiting distribution function for the test statistic when . Though the limiting distribution under is non-standard, tail probabilities can be calculated easily, as we show in the Appendix (Remark 1). The p-value is therefore simple to calculate.
We have derived an analytic approximation for the power of the likelihood ratio test. In Proposition 2, we more generally characterize the limiting distribution of the test statistic under hypotheses of the form . The local asymptotic power of the proposed test at the level is available by considering , where we define as the maximum of the quantiles of the limiting distribution of under scenarios 2 and 3 described above. An analytic finite-sample approximation of the power can be calculated as .
A contour plot of the local asymptotic power is given in Figure 3 for both equal and unequal variances of the estimators. We observe that the likelihood ratio test has low power when the strongest effect max is small, and power improves considerably when the strongest effect grows. Additionally, we find that in the presence of unequal variance, the test has greater power when the weakest effect is estimated with higher precision than the strongest effect.
Figure 3.

Contour plots of the local asymptotic power of the relative difference likelihood ratio test with and . Bold lines represent the boundary of the null region. Settings of equal and unequal asymptotic variance of estimators are shown.
3.3 |. Quantifying Relative Difference in Effect by Inverting Likelihood Ratio Test
In some applications there may not be a clear scientifically meaningful threshold for the relative difference in effect size. In such settings, investigators may be disinclined to test the relative difference null for a single fixed choice of , as the choice can be somewhat arbitrary. It may be preferable to instead provide a more informative summary describing the relative difference in effect size.
We propose to summarize the relative difference in effect size by reporting the set of for which we would reject the relative difference null. We define
as the largest such that the likelihood ratio test rejects the null hypothesis at the level. When the likelihood ratio test fails to reject for all , we will use the convention . Since is negative for and since we only reject the relative difference null for large positive values of , is bounded above by .
Because the likelihood ratio test controls the type-I error at the level , if , then with probability at least . Moreover, if , is bounded above by with probability at least . Thus, provides a probabilistic lower bound for the relative difference in effect size. That is,
Moreover, for any , the relative difference test will reject the null hypothesis that with probability tending to one. Therefore, in the limit of large , should approach but not exceed the true relative difference in effect size. We discuss the calculation of in the Appendix.
Larger values of indicate that the relative difference in effect is large and provide greater evidence to support the occurrence of an absence/presence interaction. However, we emphasize that no value of will be large enough to guarantee that an absence presence interaction has truly occurred. While we argue that is an informative and intuitive summary, we caution investigators against misinterpreting it.
3.4 |. Simultaneous Test for Qualitative Interactions
When it is of interest to identify both absence/presence and positive/negative qualitative interactions, it may be desirable to test for both simultaneously. In this section, we construct an omnibus test that achieves control of the size asymptotically.
We define the omnibus qualitative interaction null hypothesis as
The null region is the intersection of the positive/negative and relative difference null regions, as depicted in Figure 4 We observe that as , the omnibus null and alternative regions tend to the positive/negative null and alternative regions.
Figure 4.

(Left) Null and alternative region for omnibus qualitative interaction hypothesis. (Right) Geometric interpretation of likelihood ratio statistic for the omnibus qualitative interaction hypothesis.
To construct the likelihood ratio test, we proceed using similar arguments to those presented in Section 3.2. The likelihood ratio statistic is the distance between the estimate and its projection onto the null region , inversely weighted by the asymptotic variance of . A simple expression for is given in Lemma 2.
Lemma 2.
The likelihood ratio statistic can be written as
Unsurprisingly, the likelihood ratio statistic for the omnibus test approaches the likelihood ratio statistic for the Gail-Simon likelihood ratio statistic for positive/negative interactions in the limit of large . The tests will, therefore, be nearly identical for sufficiently large .
To characterize the asymptotic behavior of the omnibus test statistic at each location under the null, we use similar arguments to those in the previous section. If belongs to the interior of the null region, converges in probability to zero. If belongs to the boundary of the null region and is non-zero, converges weakly to a uniform mixture of zero and the chi-squared distribution with one degree of freedom. If is zero, the limiting distribution of is non-standard, though it can be characterized nonetheless. Formal statements of asymptotic properties of are given in Propositions 3 and Remark 2 (the statement of Remark 2 is also cumbersome, and is reserved for the Appendix).
We have derived an analytic approximation for the power of the likelihood ratio test. In Proposition 4 in the Appendix, we more generally characterize the limiting distribution of the test statistic under hypotheses of the form , and in Remark 3, also in the Appendix, we describe how this result can be applied to calculate the local asymptotic power of the proposed test.
Proposition 3.
If belongs to the interior of the null region , converges in distribution to zero. If is on the boundary of the null region, but , as .
Calculating the p-value for the omnibus test is no more difficult than calculating the p-value for the absence/presence test. Defining , the p-value can be calculated as
| (5) |
where is the value of the test statistic calculated on the observed data.
4 |. SIMULATION STUDY
In a Monte Carlo simulation study, we examine how type-I error rate and statistical power of the likelihood ratio tests for the relative difference in effect are affected by signal strength, sample size, and choice of . Additionally, we examine how depends on the true sub-population effects and the sample size.
We generate random observations in sub-population under the linear model:
Here, is the predictor of interest, is the response, and is white noise. The measure of association in which we are interested is the regression coefficient . We fix and consider . A total of 1000 synthetic data sets are randomly generated for each and .
For each synthetic data set, we perform the relative difference likelihood ratio test and the omnibus test with , and use a significance level of . We additionally calculate with . Parameter estimation is performed with ordinary least squares, and model-based estimates of the standard error are used. We compare the likelihood ratio test with the naïve test for absence/presence interactions described in Section 2.3, which rejects the null when there is evidence for the existence of an effect in one population but insufficient evidence in the other population. We also compare with a test for quantitative interactions, which rejects the null when there is evidence that and are unequal.
In Figure 5, we plot Monte Carlo estimates of the rejection probabilities for the relative difference likelihood ratio test, the naïve test, and the test for quantitative interactions over the range of . We see that the relative difference likelihood ratio test achieves control of size for all choices of and all sample sizes. Power is largest when is near zero and is lower for larger values of , as expected, because larger values of correspond to a larger null region. With , the likelihood ratio test has almost zero power when the sample size is small, though we see an improvement in large sample settings.
Figure 5.

Monte Carlo estimate of the rejection probability of the relative difference likelihood ratio test and the naïve test. The white vertical line at corresponds to the single point at which the absence/presence hypothesis does not hold. The solid pink line denotes the specified size
The naïve test of the absence/presence hypothesis exhibits poor type-1 error control. Because the absence/presence does not hold only when , a well-calibrated test would reject the null with probability no greater than for any other value of . However, we can see that the type- 1 error rate of the naïve test is greatly inflated when resides within a neighborhood of zero, and this is especially true in a small sample setting. Due to the test’s poor calibration, it is difficult to glean any meaningful information from a decision to reject the null hypothesis. While the relative difference test can also have a high rejection probability when is in a neighborhood of zero, it is more easily interpretable: because this test is well-calibrated, we are able to understand what a decision to reject the null implies about the relationship between and .
Figure 6 shows the estimated rejection probabilities of the omnibus test for varying . Control of size is achieved in both large and small samples, as expected. For this test, power increases as tends to −1, and as expected, power decreases with increasing .
Figure 6.

Monte Carlo estimate of the rejection probability of the omnibus test for qualitative interactions. The solid pink line denotes the specified size
In Figure 7. we plot the 0.5, and quantiles of values from 1000 synthetic data sets for each . As we would expect, for any fixed , at least of the fall below . In addition, the quantile of the distribution approaches as sample size increases. However, when the sample size is small, tends to be much smaller than the relative difference in effect size. This suggests that is a loose lower bound for the relative difference in effect in a small sample setting, but the bound becomes progressively tighter as the sample size grows.
Figure 7.

Distribution of calculated on synthetic data. The .05, .50, and .95 quantiles are represented by the dashed blue and solid black curves. The grey curve represents , the value is expected to approach for a given .
5 |. DATA EXAMPLE
In this example, we investigate genetic differences in breast cancer sub-types. Classification of breast cancer based on expression of estrogen receptor (ER) is known to be associated with clinical outcomes. Approximately 70% of breast cancers are estrogen receptor positive (ER+) cancers, meaning that estrogen causes cancer cells to grow23; breast cancers are otherwise estrogen receptor negative (ER−). Patients with ER+ breast cancer tend to experience better clinical outcomes than ER− patients24.
We conduct an analysis using publicly available data from The Cancer Genome Atlas (TCGA)25. We use clinical data and gene expression data from a total of 806 ER+ patients and 237 ER− patients.
We first investigate the differences between the genetic networks in ER+ and ER− breast cancers. Both ER+ and ER− breast cancer are expected to have similar pathways, but identifying differences between them may be key to understanding the underlying disease mechanisms. We then conduct an analysis to assess whether any genes in a set known to be associated with breast cancer are strongly prognostic of disease outcomes in only one of the estrogen receptor groups.
5.1 |. Differential Network Analysis
Our objective is to determine whether there are any pairs of genes that are much more strongly associated in one estrogen receptor group than the other. We consider the set of genes in the Kyoto Encyclopedia of Genes and Genomes (KEGG)26 breast cancer pathway and measure the association between gene expression levels using the Pearson correlation.
We test the relative difference null hypothesis for each pair of genes with and provide the resulting p-values in Table 1. In Figure 8, we display the pairs of genes that are statistically significant at the level with after a Bonferroni adjustment. Each of the genes progesterone (PGR), insulin-like growth factor 1 (IGF1R), and estrogen receptor 1 (ESR1) have multiple differential connections; each belongs to at least two pairs such that the association is twice as strong in the ER+ population than in the ER− population. These genes have been shown in the literature to be associated with sub-type and prognosis27,28,29.
Table 1.
Bonferroni-adjusted p-values for pairs of genes that genes for which we reject the relative difference hypothesis with κ = 2, κ = 2.5, or κ = 3.
| Gene Pair | κ =2 | κ = 2.5 | κ =3 |
|---|---|---|---|
|
| |||
| (IGF1R, AKT3) | 0.034 | 1 | 1 |
| (IGF1R, FGF1) | 0.0011 | 0.16 | 1 |
| (KIT, EGFR) | 0.0069 | 1 | 1 |
| (PGR, FGF7) | 0.00034 | 0.17 | 1 |
| (PGR, DLL1) | 0.014 | 0.5 | 1 |
| (PGR, NOTCH4) | 6.5e-07 | 0.00054 | 0.04 |
| (FZD4, PGR) | 8.8e-08 | 0.00013 | 0.013 |
| (FGF18, FGF10) | 0.017 | 0.74 | 1 |
| (E2F1,ESR1) | 3.2e-07 | 0.001 | 0.13 |
| (E2F2, ESR1) | 0.00094 | 0.11 | 1 |
Figure 8.

(Left) Pairs of genes in the KEGG breast cancer pathway for which we reject the relative difference hypothesis with . Blue edges indicate associations that are stronger in the ER+ group, and red edges indicate associations that are stronger in the ER− group. (Right) Log hazard ratios for KEGG genes in ER+ and ER− groups. The gray dashed line represents the 45-degree line. Blue diamonds and red triangles indicate genes for which with , where the largest log hazard ratio is in the ER+ group and ER− group, respectively.
5.2 |. Prognostic Value of Biomarkers
The goal of this analysis is to assess whether any of the KEGG genes have a stronger association with time to death in one estrogen receptor group than in the other. For each gene, we fit a univariate Cox proportional-hazards model with time to death as the outcome in both of the estrogen receptor groups separately; we measure association using the log hazard ratio. A total of 64 deaths occurred in the ER+ group, and 33 deaths occurred the ER− group. We calculate for equal to .1, .05, .025, and .01, for each gene.
In Figure 8, we compare the log hazard ratios of the ER+ and ER− groups in a scatterplot. Though the log hazard ratios for most genes are similar between subgroups, there are twelve genes with larger than one with . A complete list is available in Table 2. The two genes with the strongest interactions are Growth Factor Receptor-bound Protein 2 (GRB2; ), which has a stronger association in the ER− group, and Adenomatous Polyposis Coli (APC; ), which has a stronger association in the ER+ group. Both genes have been hypothesized to be associated with breast cancer carcinogenesis30,31. We acknowledge, however, that for small , none of the values are much larger than one, and moreover, no values exceed one at the Bonferroni-adjusted level .10∕145. This indicates that the relative difference in strength of association is possibly small for all genes, or alternatively, that we may not have sufficient power to detect large or moderate relative differences.
Table 2.
KEGG genes that are more strongly associated with time to death in one ER group than the other. Reported are values for α ϵ {.1, .05, .025, .01}.
| Gene | ER+ Log HR (SE) | ER− Log HR (SE) | ||||
|---|---|---|---|---|---|---|
| α = .1 | α = .05 | α = .025 | α = .01 | |||
|
| ||||||
| GRB2 | −0.06 (0.31) | −1.66 (0.68) | 2.04 | 1.36 | 1.00 | 1.00 |
| APC | 1.34 (0.32) | −0.09 (0.33) | 1.91 | 1.57 | 1.32 | 1.08 |
| BAX | −1.05 (0.24) | 0.04 (0.36) | 1.53 | 1.26 | 1.06 | 1.00 |
| PIK3CA | 1.13 (0.28) | 0.14 (0.32) | 1.51 | 1.24 | 1.05 | 1.00 |
| SOS2 | 1.13 (0.36) | −0.1 (0.37) | 1.33 | 1.03 | 1.00 | 1.00 |
| MAP2K2 | −0.87 (0.27) | 0.03 (0.35) | 1.22 | 1.00 | 1.00 | 1.00 |
| GADD45G | −0.52 (0.13) | −0.07 (0.19) | 1.21 | 1.00 | 1.00 | 1.00 |
| HES5 | 0.02 (0.2) | 0.51 (0.18) | 1.19 | 1.00 | 1.00 | 1.00 |
| WNT2 | −0.36 (0.09) | 0 (0.17) | 1.14 | 1.00 | 1.00 | 1.00 |
| DLL4 | 0.09 (0.2) | 0.68 (0.27) | 1.10 | 1.00 | 1.00 | 1.00 |
| FRAT2 | −1.22 (0.31) | −0.45 (0.29) | 1.08 | 1.00 | 1.00 | 1.00 |
| SOS1 | 1.19 (0.3) | −0.34 (0.42) | 1.01 | 1.00 | 1.00 | 1.00 |
6 |. DISCUSSION
We proposed a general framework for inference about absence/presence qualitative interactions. We argued that naïve procedures rely upon untenable conditions because the absence/presence hypothesis is ill-posed. We thus proposed to relax the problem in order to conduct well-calibrated inference that maintains the absence/presence interpretation and only requires mild assumptions.
In simulations, we found that our methodology has low power when signal is weak or sample sizes are small. To an extent, this is just a feature of the problem; naturally, one would require even more information to detect absence/presence qualitative interactions than what is required to detect quantitative interactions. However, we provide no guarantee that our methodology is optimal, as tests for composite hypotheses based upon supremum p-values can be conservative in practice32.
Our framework is interpretable and provides a natural approach for quantifying differences in strength of association by sub-population in general settings. Though we only considered measures of marginal association in our examples, our method can be used with conditional measures of association as well; we only require that asymptotically normal estimators are available. In particular, our approach remains valid in the high-dimensional setting, where penalized estimators are biased33 but asymptotically normal estimates can be obtained using the techniques of, e.g., van de Geer et al.34 and Zhang and Zhang35. Finally, we demonstrated the usefulness of our method in a real data example based on genomics data
9 |. ACKNOWLEDGEMENTS
The authors gratefully acknowledge the support of the NSF Graduate Research Fellowship Program under grant DGE-1762114 as well as NSF grant DMS-1561814 and NIH grant R01-GM114029. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.
APPENDIX
Theoretical Results for Relative Difference Likelihood Ratio Test
Below, we generalize Lemma 1 and Proposition 1 to the setting of unbalanced sample sizes .
Lemma 1.
The likelihood ratio test statistic can be written as
where , and .
Proof of Lemma 1.
We define and as
Now, we define as the projection of onto the null region. If is in the null region , the projection is equal to . Otherwise, is the projection of onto the space spanned by , with distance inversely weighted by . We proceed by finding an expression for .
Now, the weighted distance between and is
Hence, the likelihood ratio test rejects the null for large values of
or equivalently, for large positive values of , where
Proposition 1.
Let , and assume as . Then,
In the interior of the null region, i.e., when converges in distribution to .
At all nonzero boundary points of the null region, i.e., when converges in distribution to a standard normal random variable.
Proof of Proposition 1.
First we prove (i). Suppose . Then consistency of and the continuous mapping theorem imply that . Now, because and both tend to zero, . Therefore, .
Now we prove (ii). Suppose . By applying the delta method,
where the matrix is
Now, Slutsky’s theorem implies that
The test statistic can be written as
with and as defined in Lemma 1. By the continuous mapping theorem, the denominator converges in probability to
Thus, Slutsky’s theorem implies that
In Proposition 2, we characterize the local asymptotic behavior of the relative difference likelihood ratio test.
Proposition 2.
Let , and assume as . Suppose , and . Further, assume that is locally regular in the sense that for . Then as , for all converges to
where follows a bivariate normal distribution with mean , unit variance and correlation , and follows a bivariate normal distribution with mean , unit variance and correlation , with , , , , are defined as
Remark 1.
Setting , we obtain the limiting distribution of when , which is critical for calculating the p-value .
Proof of Proposition 2.
We first re-write the tail probability as
Thus,
And,
Now,
Similarly,
Theoretical Results for Simultaneous Test for Qualitative Interactions
In what follows, we generalize Lemma 2 and Proposition 3 to the case of unbalanced sample sizes.
Lemma 2.
The likelihood ratio statistic can be written as
Proof of Lemma 2.
If belongs to the null region, , the likelihood ratio statistic is clearly zero. If or , the likelihood ratio statistic is minimum of the distances between and its projections onto the span of and the span of . Algebra (similar to the proof of Lemma 1) gives the desired result, and similarly if or . □
Proposition 3.
Let , and assume . Then,
If belongs to the interior of the null region converges in distribution to zero.
If is on the boundary of the null region, but , as .
Proof of Proposition 3.
We first prove (i). If belongs to the interior of the null region, consistency of and the continuous mapping theorem imply that converges in probability to zero. Then, by Slutsky’s theorem, converges in distribution to zero.
To prove (ii), recall that we can write the null and alternative regions as
Now, we write
The second and third summands above are exactly equal to 0. To see this, note that in the second term and implies that , and similarly for the third term. In the first and last summands, we can ignore the term . To see this, note that in the first term and imply that ; a similar argument holds for the fourth term. Thus,
Suppose, for the moment, . Then
Therefore,
Thus,
A similar argument follows if , , or . □
In Proposition 4, we characterize the local asymptotic behavior of the omnibus test for qualitative interactions.
Proposition 4.
Let , and assume as . Suppose with . Further, assume is locally regular in the sense that for . Then as , for all converges to
where , unit variance, and correlation , where
Remark 2.
Setting , we obtain the limiting distribution of when , which is critical for calculating the p-value .
Proof of Proposition 4.
First,
Also,
Thus,
Application of (6) completes the argument. □
Remark 3.
The local asymptotic power of the omnibus test at the level can be obtained by considering , where we define as the maximum of the quantiles of the limiting distribution of under scenarios (i) and (ii) in Proposition 3. An analytic finite-sample approximation of the power can be calculated as .
Calculation of Lower Bound for Relative Difference in Effect
Define as a realization of the likelihood ratio statistic for the relative difference hypothesis, calculated on the observed data. For a pre-specified , recall that is defined as the largest such that the likelihood ratio test rejects the relative difference null. Defining , we can evaluate as
Therefore, we are only required to calculate and .
Both and can be calculated with root-finding algorithms. To see this, we first observe that is monotone in , recalling that the numerator of decreases in while the denominator increases. Monotonicity of implies that is monotone in . Now, applying of Proposition 2,
where and follow bivariate normal distributions with mean zero, unit variance, and correlations and , as defined in Proposition 2, respectively. Both and are monotone decreasing in , so Theorem 8 in36 implies that is monotone in . Thus is monotone in .
Monotonicity of and implies that if the likelihood ratio test rejects the null hypothesis for some and are the unique roots of
respectively. These roots can be easily calculated via, e.g., the bisection method. In our implementation, we use the uniroot function in R. □
Footnotes
7 | SOFTWARE AVAILABILITY
Implementation of the proposed methodology and code to reproduce results from the data analysis are available at https://github.com/awhudson/QualitativeInteractions.
8 |. DATA AVAILABILITY
The TCGA data are available using the public R package RTCGA.
10 | BIBLIOGRAPHY
References
- 1.VanderWeele TJ, Knol MJ. A tutorial on interaction. Epidemiologic methods 2014; 3(1): 33–72. [Google Scholar]
- 2.Greenland S Tests for interaction in epidemiologic studies: a review and a study of power. Statistics in medicine 1983; 2(2): 243–251. [DOI] [PubMed] [Google Scholar]
- 3.Gonzalez dAB, Cox DR. Interpretation of interaction: A review. The Annals of Applied Statistics 2007; 1(2): 371–385. [Google Scholar]
- 4.Ideker T, Krogan NJ. Differential network biology. Molecular systems biology 2012; 8(1): 565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Shojaie A Differential network analysis: A statistical perspective. Wiley Interdisciplinary Reviews: Computational Statistics 2021; 13(2): e1508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gail M, Simon R. Testing for qualitative interactions between treatment effects and patient subsets.. Biometrics 1985; 41(2): 361–372. [PubMed] [Google Scholar]
- 7.VanderWeele TJ. The Interaction Continuum. Epidemiology (Cambridge, Mass.) 2019; 30(5): 648. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Piantadosi S, Gail M. A comparison of the power of two tests for qualitative interactions. Statistics in Medicine 1993; 12(13): 1239–1248. [DOI] [PubMed] [Google Scholar]
- 9.Pan G, Wolfe DA. Test for qualitative interaction of clinical significance. Statistics in medicine 1997; 16(14): 1645–1652. [DOI] [PubMed] [Google Scholar]
- 10.Silvapulle MJ. Tests against qualitative interaction: Exact critical values and robust tests. Biometrics 2001; 57(4): 1157–1165. [DOI] [PubMed] [Google Scholar]
- 11.Li J, Chan IS. Detecting qualitative interactions in clinical trials: an extension of range test. Journal of Biopharmaceutical statistics 2006; 16(6): 831–841. [DOI] [PubMed] [Google Scholar]
- 12.Bayman EÖ, Chaloner K, Cowles MK. Detecting qualitative interaction: a Bayesian approach. Statistics in medicine 2010; 29(4): 455–463. [DOI] [PubMed] [Google Scholar]
- 13.Gunter L, Zhu J, Murphy S. Variable selection for qualitative interactions in personalized medicine while controlling the family-wise error rate. Journal of biopharmaceutical statistics 2011; 21(6): 1063–1078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Roth J, Simon N. A framework for estimating and testing qualitative interactions with applications to predictive biomarkers. Biostatistics 2018; 19(3): 263–280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Dusseldorp E, Van Mechelen I. Qualitative interaction trees: a tool to identify qualitative treatment–subgroup interactions. Statistics in medicine 2014; 33(2): 219–237. [DOI] [PubMed] [Google Scholar]
- 16.Poulson RS, Gadbury GL, Allison DB. Treatment heterogeneity and individual qualitative interaction. The American Statistician 2012; 66(1): 16–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Chen W, Ghosh D, Raghunathan TE, Norkin M, Sargent DJ, Bepler G. On Bayesian methods of exploring qualitative interactions for targeted treatment. Statistics in Medicine 2012; 31(28): 3693–3707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Casella G, Berger RL. Statistical inference. 2. Duxbury Pacific Grove, CA. 2002. [Google Scholar]
- 19.Zhao S, Shojaie A. Network differential connectivity analysis. The Annals of Applied Statistics 2022; 16(4): 2166–2182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology 2014; 15(12): 1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.McCarthy DJ, Smyth GK. Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 2009; 25(6): 765–771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Fieller EC. The biological standardization of insulin. Supplement to the Journal of the Royal Statistical Society 1940; 7(1): 1–64. [Google Scholar]
- 23.Lumachi F, Brunello A, Maruzzo M, Basso U, Mm Basso S. Treatment of estrogen receptor-positive breast cancer. Current medicinal chemistry 2013; 20(5): 596–604. [DOI] [PubMed] [Google Scholar]
- 24.Carey LA, Perou CM, Livasy CA, et al. Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. Jama 2006; 295(21): 2492–2502. [DOI] [PubMed] [Google Scholar]
- 25.Weinstein JN, Collisson EA, Mills GB, et al. The cancer genome atlas pan-cancer analysis project. Nature genetics 2013; 45(10): 1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 2000; 28(1): 27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Farabaugh SM,Boone DN,Lee AV. RoleofIGF1Rinbreastcancersubtypes,stemness,andlineagedifferentiation. Frontiers in endocrinology 2015; 6: 59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Reinert T, Coelho GP, Mandelli J, et al. Association of ESR1 Mutations and Visceral Metastasis in Patients with Estrogen Receptor-Positive Advanced Breast Cancer from Brazil. Journal of oncology 2019; 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kurozumi S, Matsumoto H, Hayashi Y, et al. Power of PgR expression as a prognostic factor for ER-positive/HER2-negative breast cancer patients at intermediate risk classified by the Ki67 labeling index. BMC cancer 2017; 17(1): 354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Daly RJ, Binder MD, Sutherland RL. Overexpression of the Grb2 gene in human breast cancer cell lines.. Oncogene 1994; 9(9): 2723–2727. [PubMed] [Google Scholar]
- 31.Jin Z, Tamura G, Tsuchiya T, et al. Adenomatous polyposis coli (APC) gene promoter hypermethylation in primary breast cancers. British journal of cancer 2001; 85(1): 69–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Bayarri M, Berger JO. P values for composite null models. Journal of the American Statistical Association 2000; 95(452): 1127–1142. [Google Scholar]
- 33.Voorman A, Shojaie A, Witten D. Inference in high dimensions with the penalized score test. arXiv preprint arXiv:1401.2678 2014. [Google Scholar]
- 34.Geer v. dS, Bühlmann P, Ritov Y, Dezeure R, others. On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics 2014; 42(3): 1166–1202. [Google Scholar]
- 35.Zhang CH, Zhang SS. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2014; 76(1): 217–242. [Google Scholar]
- 36.Müller A Stochastic ordering of multivariate normal distributions. Annals of the Institute of Statistical Mathematics 2001; 53(3): 567–575. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The TCGA data are available using the public R package RTCGA.
