Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Mar 9.
Published in final edited form as: Ann Appl Stat. 2020 Jun 29;14(2):829–849. doi: 10.1214/20-aoas1329

EVIDENCE FACTORS IN A CASE-CONTROL STUDY WITH APPLICATION TO THE EFFECT OF FLEXIBLE SIGMOIDOSCOPY SCREENING ON COLORECTAL CANCER

BIKRAM KARMAKAR 1, CHYKE A DOUBENI 2, DYLAN S SMALL 3
PMCID: PMC10924422  NIHMSID: NIHMS1962443  PMID: 38465229

Abstract

As in any observational study, in a case-control study a primary concern is potential unmeasured confounders. Bias, due to unmeasured confounders, can result in a false discovery of an apparent treatment effect when there is none. Replication of an observational study, which tries to provide multiple analyses of the data where the biases affecting each analysis are thought to be different, is one way to strengthen the evidence from an observational study. Evidence factors allow for internal replication by testing a hypothesis using multiple comparisons in a way that the comparisons yield independent evidence and differ in the sources of potential bias. We construct evidence factors in a case-control study in which there are two types of cases, “narrow” cases which are thought to be potentially more affected by the exposure and “marginal” cases which are thought to have more heterogeneous causes. We develop and study an inference procedure for using such evidence factors and apply it to a study of the effect of sigmoidoscopy screening on colorectal cancer.

Keywords: Case-control studies, colorectal cancer, evidence factors, observational study, replicability, sigmoidoscopy

1. Introduction.

1.1. Distal and proximal colon cancer and sigmoidoscopy screening.

The U.S. Preventive Services Task Force (USPSTF) recommendations for colorectal cancer screening include flexible sigmoidoscopy every five years for men and women above 50 at average risk (Preventive Services Task Force et al. (2016)). Yet, only 58% of adults aged 50–75 were up to date with the screening recommendations (Joseph et al. (2016)). Is screening with sigmoidoscopy effective? Using a case-control study we aim to answer this question; more specifically, we study the effect of screening by flexible sigmoidoscopy as per USPSTF recommendations on reducing mortality from colorectal cancer.

In case-control studies patients with (cases) or without (controls) an outcome of interest are compared in terms of their exposure to treatment. Case-control studies are particularly useful for assessing treatment or exposure effects for rare outcomes. In a case-control study there is often a choice of how to define a case. In many settings there are two (or more) ways to define a case, one being more “narrow,” in that it is more likely to be caused by the exposure of, interest if that exposure in fact has an effect, and the other being “broad” in that it may have more heterogeneous causes. A case unit according to a narrow case definition is also a case unit in a broad case definition. A marginal case unit is not a case in a narrow case definition but is a case in broad case definition.

Sigmoidoscopy can evaluate the lower or distal one-third of the colon for lesions; if abnormal, then a full colon evaluation with a colonoscopy is typically done for confirming the presence of cancer or precancerous polyps. The distal colon is the lower one-third part of the colon on the left side of the body, consisting of the descending colon, the sigmoid colon and the rectum; the proximal colon is the higher two thirds of the colon. We consider broad cases to be all cases of colorectal cancer, and, following Doubeni et al. (2018) and Selby et al. (1992), we consider narrow cases to be cases where there are malignant polyps on the left side of the colon and rectum that are within the reach of the sigmoidoscope. We expect that sigmoidoscopy screening, if it is effective, would only directly reduce the risk of diagnosis or death from cancers in the distal colon (narrow cases) but would also indirectly find or prevent some colorectal cancers in the proximal colon because abnormal findings in the distal colon could trigger a colonoscopy. Is it possible to learn separate evidence about the treatment effect when we have two or more definitions for a case? Before answering this question in Section 1.3, we consider why one might want to construct separate evidence and what we mean by separate evidence.

1.2. Evidence factors in an observational study.

Unlike in a randomized trial, in a case-control study, as in any observational study, treatment is not assigned to the subjects randomly. Therefore, a primary concern in a case-control study is the potential for unmeasured confounders. In an observational study, bias, due to unmeasured confounders, can result in a false discovery of an apparent treatment effect when there is none. In such a situation we should consider if it possible to replicate the study without repeating the bias (Cochran (1965), Section 4.1).

Consider the effect of exposure to radiation on leukemia incidence. Radiologists, who are occupationally exposed to radiation, have been found to have a high incidence of leukemia (Lewis (1963)). A replication of this observational study is a comparison of the leukemia risk in people living in Japan near epicenters of the atomic bomb drops at the end of World War II to people living further from them (Bizzozero, Johnson and Ciocco (1966)). Radiologists may have higher rates of leukemia because they are more likely to diagnose it, and people living near the atomic bomb might have higher rates of leukemia because living in an urban area may be a confounder for leukemia, but these are two different sources of potential bias. Concurring finding of higher rates of leukemia incidence in each exposed group relative to its control group strengthens the evidence for a causal effect since two sources of bias, rather than just one, would be needed to refute the evidence (Rosenbaum (2001)).

While the above two comparisons are from separate studies, in some studies there may be two comparisons we can make within the same study that have different sources of bias, offering an opportunity for internal replication. When these comparisons are statistically independent or “nearly” independent, the comparisons are called evidence factors (Rosenbaum (2010)). A general perspective on evidence factors in an observational study is provided in Karmakar, French and Small (2019), which we briefly review here, and the formal definition is given in Section 5. Suppose two analyses are performed to test for the null hypothesis; the first analysis requires a set of assumptions A1, and the second analysis requires a second set of assumptions A2. Let P1 and P2 be the corresponding p-values. Then, to be evidence factors, we require that under the null hypothesis, when both assumptions A1 and A2 hold, for (p1,p2)[0,1]2

Pr(P1p1,P2p2)p1p2. (1.1)

The inequality in (1.1)—which would be an equality if P1 and P2 were independent—means that the joint distribution of the p-values under the null hypothesis is stochastically bigger than that of two independent p-values under the null hypothesis. So, treating them as independent when combining them would be conservative—this is the “near independence” we spoke of above. By asking for independence or near independence, we ensure that we are learning two separate pieces of evidence rather than essentially one piece which would be the case if one uses two highly correlated tests, such as a t-test and a Wilcoxon rank sum test (Rosenbaum (2010, 2011)). We wish to avoid the mistake of the man who bought “several copies of the morning paper to assure himself that what it said was true” (Wittgenstein (1958), #265, quoted in Rosenbaum (2010)). If both analyses from the evidence factors are significant, both assumptions, A1 and A2, would have to be violated in order for there not to be evidence of a treatment effect.

An example of the use of evidence factors is discussed in Karmakar, Small and Rosenbaum (2020), which follows up on the question raised by Bazzano et al. (2003), does smoking increases homocysteine levels? Bazzano et al. (2003) looked at the association between homocysteine and cotinine, a biomarker for exposure to tobacco. Cotinine level is a personal measure of a dose for exposure to tobacco. An association between homocysteine and cotinine can be confounded by a physiological process that affects both homocysteine levels and the way the exposure is internalized into cotinine levels. Karmakar, Small and Rosenbaum (2020) pair smokers with nonsmokers on their age, gender, race and education levels. Two tests are considered. The first test is a Wilcoxon’s signed-rank test of the differences in the homocysteine levels between the smoker and the nonsmoker in each pair. The second test is a cross-cut test statistic that looks at the association between differences in biomarker levels and differences in the homocysteine levels of the pairs. Pairs of test statistics that use the same data are typically dependent, but these two test statistics are independent when there is no effect of smoking and there is no effect of an increase in the cotinine biomarker on homocysteine levels. Further, a bias in who reports smoking does not affect the cross-cut test, and a confounding in the cotinine biomarker does not affect the signed-rank test. Because the two tests are independent when there is no treatment effect and affected by different biases, they are evidence factors. Their analysis found that the two factors concur in finding two independent pieces of information linking smoking with increased homocysteine. For other examples of evidence factors, see Zhang et al. (2011) and Zubizarreta et al. (2012).

Rosenbaum (2017) provides a general formulation for building evidence factors based on multiple treatment assignment mechanisms. Starting with a set of n units, Rosenbaum (2017) showed how to construct evidence factors using the knit product of two subgroups of the symmetric group of size n. This and other previous work have only considered constructing evidence factors based on different ways of assigning treatment.

In this paper we develop novel evidence factors for case-control studies that use different definitions of a case. To the best of our knowledge, ours is the first demonstration of using differences in outcomes to develop evidence factors. In previous presentations of evidence factors, evidence factors are constructed from a study design in which treatment assignment splits into multiple aspects that exhibit certain symmetries (Rosenbaum (2010, 2017)). A case-control study differs in this view. The retrospective measure of an exposure to the treatment does not split into multiple aspects. The implicit symmetries that create the evidence factors in a case-control study come from multiple case definitions. The following subsection elaborates on this point.

This paper further demonstrates the usefulness of evidence factors when there are overlapping, but not completely overlapping, potential sources of bias for the analyses. This differs from previous discussions of evidence factors in the literature where separate sources of bias would affect the factors. Our quantitative demonstration of how evidence factors can work with overlapping biases widens the applicability of evidence factors. Expansion of the scope of evidence factors to incorporate the design aspects of case-control studies and overlapping biases is crucial for our sigmoidoscopy study.

1.3. Evidence factors in a case-control study with narrow and marginal cases.

In a case-control study with narrow and broad cases, we expect that if the exposure has an effect and our theory that the narrow cases are more likely to be caused by the exposure than the more heterogeneous broad cases is correct and also there is no unmeasured confounding, then: (a) the exposure should have a larger association with narrow cases than marginal cases, that is, cases that are broad but not narrow and (b) the exposure should have an association with broad cases compared to controls. This is an elaborate theory of what a treatment effect, if there is an effect, is expected to look like. Elaborate theories, advocated by Sir Karl Popper and Sir Ronald Fisher, are an integral part of drawing causal conclusions from observational data (see Popper (1959), Cochran (1965), Section 5). For related discussion on considerations for deducing causality from observational data, see Hill (1965).

We compare the narrow cases to marginal cases to appraise association of pattern (a) in the elaborate theory and compare broad cases to controls to appraise association of pattern (b). To test for patterns (a) and (b), we would like to use nearly independent test statistics in the sense of (1.1). In other words, we would like to develop evidence factors associated with the patterns. These two comparisons could be biased differently. Continuing our discussion of Section 1.1, in the sigmoidoscopy study unmeasured variables, such as healthy lifestyle or greater compliance with medical treatment, could be associated with screening. Some of these variables may be more associated with whether a person dies from any colorectal cancer or not (broad case vs. control); some may be more associated with, among people who die from colorectal cancer, does the person die from a colorectal cancer on the distal colon or proximal colon (narrow case vs. marginal case)? If we find evidence for both patterns (a) and (b), this would require a skeptic to explain more types of bias than if we found one pattern alone; this point is developed formally in Section 6.

Using the notation in Section 3, we develop a method for building the evidence factors in Section 4 and Section 5 which proves that the test statistics developed are evidence factors. The data from the study is analyzed in Section 7, and in Section 8 a few other examples of case-control studies are discussed where multiple case definitions are used. Before developing our method, we discuss the data for the sigmoidoscopy study in Section 2.

2. Sigmoidoscopy and colorectal cancer.

Based on the reasoning of Section 1, we consider the effectiveness of screening sigmoidoscopy in relation to mortality from distal and proximal colon cancer. In relation to sigmoidoscopy screening, distal cancer cases are narrow cases, and proximal cancer cases are marginal cases. Throughout the paper by sigmoidoscopy screening we mean specifically flexible sigmoidoscopy screening.

2.1. SCOLAR data.

In a nested case-control study on members of Kaiser Permanente Northern California and Kaiser Permanente Southern California health-care systems, study subjects were selected who were 55–90 years old between 2006 and 2012. Details of the study design are given in Doubeni et al. (2018), Goodman et al. (2015). A selected case unit would be a man or a woman who was 55–90 years old on the date of death with colorectal adenocarcinoma as the underlying cause of death. Using cancer diagnosis data and tumor characteristics, 822 proximal and 886 distal cancer cases were identified. Each case patient was individually matched to controls on the reference date (which was the diagnosis date for each patient who died of colorectal cancer), gender, the duration of health plan prior to diagnosis and the health-care site. In this process 3635 controls were included.

Thus, in our design there are 822 narrow cases and 886 marginal cases. To facilitate the comparison of narrow cases to marginal cases, we pair matched narrow (distal cancer) cases to marginal (proximal cancer) cases using the optmatch package in R which uses methods of Hansen and Klopfer (2006). The matching algorithm used a weighted sum of rank based Mahalanobis distance and absolute distance of estimated logit propensity scores. It also near fine balanced on gender (Rosenbaum, Ross and Silber (2007)). By pair matching the narrow and marginal cases, we obtained 822 matched sets consisting of one narrow case, one marginal cases and the controls associated with these cases and 886 – 822 = 64 matched sets consisting of one marginal case and the controls associated with this case. Table 1 shows the covariate balance of the matched sets. Figure 1 further shows the distribution of the diagnosis year of the colorectal cancer patients. Gender, reference date and enrollment source are well balanced between the narrow cases, marginal cases and controls over the matched sets.

Table 1.

Balance on the covariates in the matched sets. Distal cancer cases are those who have been diagnosed to have died from cancer on the left colon or rectum: proximal cancer cases are from right colon cancer. For each covariate the mean is calculated within a matched set, then averaged over sets

Controls Distal cancer cases Proximal cancer cases

Number of years enrolled before reference date 12 12 12
% from Center 1 83 83 84
% of female 47 46 47

Fig. 1.

Fig. 1.

Reference data of the colorectal cancer cases and controls in the matched sets.

Although the match controls well for the above covariates, there could be unmeasured confounders. For example, lack of physical activity is a known risk factor of colorectal cancer incidence, and people who are less active also may be less likely to get screened (Eldridge et al. (2013)). Because we are not able to match on or adjust for physical activity in our analysis, the comparison of all colorectal cancer cases to controls may be biased. Family history of cancer screening is another likely unmeasured confounder in this analysis. The comparison of sigmoidoscopy screening in proximal vs. distal cancers may also be biased by unmeasured confounding. There are potential biological differences between proximal and distal colon cancers such that variables such as diet (e.g., use of the Mediterranean diet) may be differentially associated with proximal and distal colon cancer (Doubeni et al. (2012), Missiaglia et al. (2014)). Such diet choices may be associated with screening. If we find that sigmoidoscopy screening is associated with reduced morality from colorectal cancer when comparing all cases to controls and with reduced mortality from proximal vs. distal cancer cases when comparing proximal to distal cases, then, in order for these associations to arise purely from bias and not at all from a causal effect of sigmoidoscopy screening on reducing cancer, there would need to be unmeasured confounders in both comparisons rather than just one comparison. In Section 6 we show that, even if the unmeasured confounders for the two comparisons overlap but have different relative magnitudes, the evidence is strengthened by finding significant associations in both comparisons.

As suggested earlier, we shall assess the effect of sigmoidoscopy screening by comparing the prevalence of screening between all colorectal cancer cases and controls and also by comparing the prevalence between the distal cancer cases and proximal cancer cases. Results of this analysis will be discussed in Section 7. We first present the methodology.

3. Notation and review: Case-control studies.

Let observational units be denoted by indices l=1,,L. We use the binary variable Zl to denote whether unit l was exposed to treatment (Zl=1) or spared from being exposed (Zl=0). Under the potential response model, suppose unit l, if exposed, would have response rTl and, if spared, exposure would have response rCl. The observed response for unit l is Rl=ZlrTl+(1Zl)rCl. Consequently, we cannot observe rTl and rCl simultaneously for one unit (Splawa-Neyman (1990), Rubin (1974)). Now, let xl denote the observed pretreatment covariates, that is, covariates recorded in the study that can potentially affect the treatment assignment and the response. The unobserved confounders are summarized by an unobserved number ul for unit l scaled to be valued in [0, 1] (Rosenbaum (1991)). Write F={(rTl,rCl,xl,ul):l=1,,L}. The hypothesis we are interested in studying is Fisher’s sharp null hypothesis of no treatment effect

H0:rTl=rCl,l=1,,L.

A case definition is a function k() which labels each unit as a case, or a control or neither based on the observed response. A case definition would identify a subset of the units as cases and a separate subset as controls.

For a given case definition, a test for the hypothesis H0 can be carried out by matching as follows. Create S strata labeled s=1,,S where each stratum consists of a total of ts units with some case units and the rest control units (say cs) which are similar with respect to the observed covariates (xl's). Now, let YS denote the total number of exposed case units in stratum s. A positive linear combination T=S=1SdSYS can be taken as a test statistic for testing the hypothesis H0. When all ds=1, the statistic T is exactly the total number of exposed cases which is the Mantel–Haenszel test statistic.

We assume that the treatment assignments for distinct units are independent. We consider the following model for treatment assignment:

Pr(Zl=1F)=exp{λ(xl)+γul}1+exp{λ(xl)+γul}, (3.1)

where λ(·) is an unknown function and γ0 is an unknown parameter. Since 0ul1, for two units, l and l(ll), with the same observed covariates, xl=xl, under this model their odds of exposure can vary at most by a factor of Γ:=log(γ). Model (3.1) is equivalent to writing

max1l,lL{Pr(Zl=1F)/Pr(Zl=0F)Pr(Zl=1F)/Pr(Zl=0F):xl=xl}Γ. (3.2)

The fact that (3.1) implies (3.2) is obvious; the proof of the reverse implication constructs a set of ul from the odds of exposure (Rosenbaum (2002), Section 4.4.4). The parameter Γ(1) is the hidden bias level. Thus, when Γ=1, there is no unmeasured confounder, and there is no bias in treatment assignment after controlling for observed covariates. As Γ increases, this model allows more and more bias in treatment assignment. For example, when Γ=2, due to the presence of unmeasured confounders, it might be possible that, for individuals who are the same in their observed covariates, one has twice the odds of getting assigned treatment as the other.

Let es be the number of exposed units in stratum s. Then, under model (3.2) we can bound the tail probability of T under H0 asymptotically,

Pr(Tk{ts},{cs},{es},F)1Φ(kds(tscs)p¯sds2(tscs)p¯s(1p¯s)), (3.3)

where Φ() is the cumulative distribution function of the standard normal distribution and p¯s=Γes/(Γes+(tses)) (Small et al. (2013)). This tail bound is sharp, in that it is attained for a particular vector of unobserved confounders (Rosenbaum (1991), Rosenbaum (2002), Section 4.4.4).

Therefore, given a case-control study, after constructing a satisfactory stratum structure, when the hidden bias level is at most Γ, that is, (3.2) holds, (3.3) can be used to get an upper bound for the p-value of testing the hypothesis H0. If this value is less than α, the significance level, then we have evidence to reject the null hypothesis as long as the hidden bias is at most Γ. A sensitivity analysis asks how much bias in the treatment assignment must be present so that the observed association can be explained just from bias under H0.

4. Two case definitions and two comparisons.

Following our discussion in Section 1.3, consider a design with availability of two case definitions, one narrow and one broad. A case unit according to a narrow case definition is also a case unit in a broad case definition. We label a unit as a marginal case unit if it is not a case in a narrow case definition but is a case in broad case definition. The study units which are noncases in broad case definition are, thus, also noncases in the narrow case definition and are labeled as controls. Matching argument similar to Section 3 can still be used with appropriate modifications.

4.1. Matched strata for the comparisons.

Suppose the matching procedure creates S strata of all three types of units: narrow cases, marginal cases and controls where units in a stratum are similar in their observed covariates. Let a generic stratum labeled s have ns narrow cases, ms marginal cases, thus, a total of bs=ns+ms broad cases and cs controls. In a cohort of L units, a narrow case definition might have a much smaller number of cases than a broad case definition. In such situations some of the stratum (s) may only have marginal cases and controls, resulting in ns=0, which is allowed in our notation. But each stratum must consist of at least two different labels of units. Let the letters n, m, b or c for denoting that the unit is a narrow case, a marginal case, a broad case or a control, respectively. For example, Zn{si} denotes the exposure (0 or 1) for the ith narrow case in the stratum s (s in 1, 2, ..., S). The index i runs in [ns] (we use the notation [k] to denote the set {1, ..., k} if k is a positive integer or empty set {} otherwise). Similarly, xc{si} denotes the observed covariate for the ith control in stratum s. Rm{si}, rCn{si}, uc{si} etc. have similar meanings.

At this point we can quantify the evidence against H0 by calculating the p-values from the two comparisons of narrow cases vs. marginal cases and broad cases vs. controls. We focus on the linear statistics of the number of exposed narrow cases and broad cases, respectively, for these two comparisons. Let Yn{s} and Yb{s} for stratum labeled s ; denote the number of exposed narrow cases and the number of exposed broad cases. Notice that Yn{s}=i[ns]Zn{si} and Yb{s}=i[bs]Zb{si}. Since broad cases encompass narrow cases, in fact,

Yb{s}=i[ns]Zn{si}+i[ms]Zm{si}=Yn{s}+Ym{s}.

Two test statistics for these two comparisons can be written as Tnm=s=1Sdnm{s}Yn{s} and Tbc=s=1Sdbc{s}Yb{s}, where dnm{s} and dbc{s} are nonnegative constants given F. Under assumption (3.2) about treatment assignment distribution, we can get bounds on the p-values for Tnm and Tbc. But there are a few subtleties here that are important to point out.

First, a p-value for Tnm should only be based on information from the narrow cases and marginal cases. In other words, the p-value Pnm is computed based on the tail distribution

Pr(Tnmk{bs},{ms},i[ns]Zn{si}+i[ms]Zm{si},Fb), (4.1)

where Fb is the subset of F restricted to the broad cases. In equation (3.3), ts was used instead of bs, cs was used instead of ms and the sum above replaces es. Similarly, the p-value Pbc is computed based on the tail distribution

Pr(Tbck{bs+cs},{cs},i[ns]Zn{si}+i[ms]Zm{si}+i[cs]Zc{si},F). (4.2)

Thus, in technical terms Pnm and Pbc are measurable with respect to different sigma fields.

Second, in assumption (3.2) the sensitivity parameter Γ bounds the odds ratio of treatment assignment for all the units stratified on their observed covariates. But unmeasured confounders are likely to affect the two comparisons in different ways (see also Section 6). Therefore, while considering narrow versus marginal comparison, we should relax this assumption only to the broad cases since these are the only ones contributing to Tnm. Hence, we distinguish the effect of unmeasured covariates for the two comparisons by using two sensitivity parameters Γnm and Γbc for the narrow vs. marginal and broad vs. control comparisons, respectively. Then, Γnm measures the bias in treatment assignment among all the case units, and Γbc measures the bias in treatment assignment among all case and control units which are similar in their observed covariates.

Therefore, the comparison of narrow vs. marginal cases would compute the upper bound on the p-value for Tnm based on the tail distribution (4.1) for sensitivity parameter Γnm; the broad cases vs. controls comparison would compute the upper bound on the p-value for Tbc based on the tail distribution (4.2) for sensitivity parameter Γbc. We denote them by Pnm,Γnm and Pbc,Γbc, respectively, and, when Γnm=Γbc=1, we simply write Pnm and Pbc for Pnm,1 and Pbc,1 respectively. Section 5 proves that Pnm,Γnm and Pbc,Γbc are nearly independent.

4.2. Two sensitivity parameters and their amplification.

In a sensitivity analysis the sensitivity parameters Γnm and Γbc would be used to get the max p-values Pnm,Γnm and Pbc,Γbc. How does an Γnm bias relate to the influence of the unmeasured confounding on the exposure to treatment of an unit and the influence of the unmeasured confounding on the narrow to marginal case status of the unit? The sensitivity analysis model (3.1) conditions on the information set F which includes the potential outcomes of the units. The maximum p-value calculated under this model is achieved when there is a near perfect relationship between the case definition and the unmeasured confounders. We discuss here that this model can be interpreted differently, “amplified,” to be a model that limits the relationship between the case definition and the unmeasured confounders as well as the relationship between the exposure and the unmeasured confounders (Gastwirth, Krieger and Rosenbaum (1998), Rosenbaum and Silber (2009)).

Let the confounding variable in the broad cases to controls comparison be u1 and the confounding variable in narrow to marginal comparison be u2. Consider now the set C={(xl,u1l,u2l):l=1,,L}. As before, 0u1l1 and 0u2l1. Conditioning on the set C does not condition on the potential outcomes.

Consider two units i1 and i2 with the same observed covariates. We model the relationship between the unmeasured confounding and the treatment assignment with a parameter λ, for zi1+zi2=1, as

Pr(Zi1=zi1,Zi2=zi2C,xi1=xi2,Zi1+Zi2=1)=exp{λ(zi1wi1+zi2wi2)}exp(λwi1)+exp(λwi2), (4.3)

where

wl=ξ1u1l+ξ2u2lforl=1,,L;ξ1,ξ20,ξ1+ξ2=1. (4.4)

If λ=0, the probability is 1/2, and the confounders have no effect. A larger value of λ indicates a larger influence of the unmeasured confounders on the treatment assignment. Equation (4.4) in itself is not a new assumption. Any number wl, taking value in [0, 1], can be rewritten as wl=ξ1u1l+ξ2u2l, for ξ1,ξ20,ξ1+ξ2=1 and 0u1l, u2l1, and vice versa. Hence, this model is similar in spirit to model (3.1) except that the principal conditioning now changes from F to C.

Next, we model the relationship of the unmeasured confounding and the case status. Let us denote for unit l, when not exposed to the treatment, by the indicator variable kClb, whether the unit is a case, and by kCln, whether the unit is a narrow case. Thus, kClb=1 if the lth unit is a case, either narrow or marginal, when not exposed to the treatment and kClb=0 if the unit is a control when not exposed to the treatment. Similarly, kClb=1 if the lth unit is a narrow case when not exposed to the treatment and kClb=0 otherwise. It might be helpful to think of kClb and kCln as being determined by rCl. For two units i1 and i2 with similar observed covariates, the following model relates the case label with the confounders:

Pr(kCi1b=1,kCi2b=0C,xi1=xi2)Pr(kCi1b=0,kCi2b=1C,xi1=xi2)=exp{δbc(u1,i1u1,i2)}; (4.5)
Pr(kCi1n=1,kCi2n=0C,xi1=xi2,kCi1b=kCi2b=1)Pr(kCi1n=0,kCi2n=1C,xi1=xi2,kCi1b=kCi2b=1)=exp{δnm(u2,i1u2,i2)}. (4.6)

The level of bias from unmeasured confounding u1 in being a broad case is δbc, and the level of bias from unmeasured confounding u2 in being a narrow case over a marginal case is δnm—the larger the value of these parameters, the higher the influence of the unmeasured confounding.

How do λ, δbc and δnm relate to the sensitivity parameters Γbc and Γnm? Proposition 1 of Rosenbaum and Silber (2009) provides the correspondence. Let Λ=exp(λ), δbc=exp(δbc) and Δnm=exp(δnm). Then, Γbc=(ΔbcΛ+1)/(Δbc+Λ) and Γbc=(ΔnmΛ+1)/(Δnm+Λ). These formulas allow one to interpret the result of a sensitivity analysis either using the sensitivity parameters Γbc and Γnm or, under model (4.3)(4.6), using parameters λ, δbc and δnm. For example, Γnm= 1.5, Γbc= 1.4 corresponds to Λ=2, Δnm=5/3 and δbc=2. In words, a pair of bias levels of Γnm= 1.5 and Γbc = 1.4 is equivalent to an effect of unmeasured confounders that, for units that are similar in their observed covariates, doubles the chance an exposure, while also increasing the chance of being a case by 5/3-fold and increasing the chance of being a narrow case over a marginal case by twofold. Similarly, Γnm= 3, Γbc= 2 corresponds to Λ=5, Δnm=7 and Δbc=3 and so on.

5. Evidence factors.

This section aims to establish that the two comparisons discussed in Section 4.1 explore different aspects of the study design and give separate evidence and, thus, are evidence factors. The idea of evidence factors was first formalized by Rosenbaum (2010) and extended for studies with multiple treatment assignment mechanisms in Rosenbaum (2011), Rosenbaum (2017). As discussed in Section 1.2, Karmakar, French and Small (2019) provide a general formulation of evidence factors in observational study designs. Readers interested in the results of the SCOLAR data analysis can skip this technical discussion and go to Section 5.1 and 7.

We start this section by stating the definition of evidence factors. To understand that equation (5.1) is a more general statement than (1.1) that was used to introduced evidence factors in Section 1.2, notice that replacing X=(P1,P2), D=[0,p1]×[0,p2] and Y a uniform distribution on [0, 1]2 recreate (1.1). The main result of this section, Theorem 5.1, says that, according to this definition, (Pnm,Γnm, Pbc,Γbc) form evidence factors.

Definition 1.

A set D is called a decreasing set if for any pair (x,y) with xy, if yD, then xD. For two random vectors X and Y we say that X is stochastically larger than Y if

Pr(XD)Pr(YD) (5.1)

for all nondecreasing sets D. If X is stochastically larger than Y, we write XY.

Definition 2.

For any pair of bias levels (Γnm, Γbc), (Pnm,Γnm, Pbc,Γbc)are evidence factors for testing H0, if (Pnm,Γnm,Pbc,Γbc)(U1,U2) under the bias levels Γnm, Γbc and under H0 for two independent Unif[0, 1] random variables U1 and U2.

Now, we state the main theorem.

Theorem 5.1.

Under H0 and for bias levels Γnm and Γbc, we have (Pnm,Γnm,Pbc,Γbc)(U1,U2) for two independent Unif[0, 1] random variables U1 and U2.

The rest of the section is dedicated to proving this theorem using a few lemmas. The proof of all the lemmas are given in the Appendix. These lemmas clarify the functional relationships of Pnm,Γnm and Pbc,Γbc on the exposure of the units Zls. Since the Zl's are the only random variables that determine the p-values or their upper bounds, the purpose of these lemmas in proving the theorem is to show that Pnm,Γnm and Pbc,Γbc depend on different parts of the Zls. For a crude understanding of this, notice the term Zc{si} in the expression of Pbc,Γbc in Lemma 5.2 which is missing from the corresponding expression of Pnm,Γnm—whether a control unit is exposed to the treatment does not affect the narrow vs. marginal cases analysis. Lemma 5.3 shows that, not only are Pnm,Γnm and Pbc,Γbc stochasticaly larger than a uniform distribution on [0, 1], for they are larger than the true but unknown p-values, different conditional distributions of them are also stochastically larger than a uniform distribution on [0, 1]. Theorem 5.1 is about the joint distribution of (Pnm,Γnm, Pbc,Γbc). Thus, the facts about the marginal distributions of Pnm,Γnm, Pbc,Γbc and their conditional distributions, given certain events along with a general lemma, Lemma 5.5, proves the theorem.

To slightly simplify our notation in what follows, for two random vectors X and Y we write [X|Y] to denote the conditional distribution of X given Y. Since we are dealing with discrete spaces, [X|Y] is a real valued measurable function of X and Y.

The following is one of the main lemmas needed to prove Theorem 5.1:

Lemma 5.2.

There exists functions fnm and fbc on appropriate domains such that

Pnm,Γnm=fnm({Zn{si},i[ns];i[ns]Zn{si}+i[ms]Zm{si}s[S]})

and

Pbc,Γbc=fbc({Zc{si},i[cs];i[ns]Zn{si}+i[ms]Zm{si}s[S]}).

Following Definition 1, let us use the notation XD for a random variable X and a probability distribution D to say that X is stochastically larger than D or Pr(Xx)Pr(YxYD) for all x.

Lemma 5.3.

Under H0, we have the following:

  1. [Pnm,Γnm{i[ns]Zn{si}+i[ms]Zm{si}},Fb,{ns}]Unif[0,1].

  2. [Pbc,Γbc{i[ns]Zn{si}+i[ms]Zm{si}+i[cs]Zc{si}},F,{bs+cs}]Unif[0,1].

  3. Pnm,ΓnmUnif[0,1].

  4. Pbc,ΓbcUnif[0,1].

The following lemma relies on the assumption of no interference in treatment assignment among the units, which is to say Zl and Zl are independently distributed for two distinct units l and l:

Lemma 5.4.

Under H0,

[Pnm,Γnm{Zc{si},i[cs]};i[ns]Zn{si}+i[ms]Zm{si}]Unif[0,1].

Lemma 5.5.

Suppose two random variables P1 and P2 satisfy

C1 random variable P1 is a function of random quantity V1,

C2 [P2V1]Unif[0,1],

then for 0q1, Pr(P2q|P1)q, that is, [P2P1]Unif[0,1].

Now, we have all the necessary facts to prove Theorem 5.1.

PROOF OF THEOREM 5.1.

In Lemma 5.5 take P1=Pbc,Γbc, P2=Pnm,Γnm with V1={{Zc{si},i[cs]};i[ns]Zn{si}+i[ms]Zm{si}}. Then, by Lemma 5.2 condition C1 is satisfied, and condition C2 is proved in Lemma 5.4. Thus, by Lemma 5.5[Pnm,ΓnmPbc,Γbc]Unif[0,1].

Let U1 and U2 be two independent uniformly distributed random variables on [0, 1]. We use the theory of Shaked and Shanthikumar ((2007), Section 6B), (U1, U2) being an independent pair is a conditionally increasing in sequence (CIS). Then, combining this with the facts that Pbc,ΓbcUnif[0,1] (by Lemma 5.3) and [Pnm,ΓnmPbc,Γbc]Unif[0,1], Theorem 6.B.4 of Shaked and Shanthikumar (2007) finally gives us

(Pnm,Γnm,Pbc,Γbc)(U1,U2).

Thus, the proof is complete.

5.1. Combining evidence.

In words, Theorem 5.1 says that the combined information from the two evidence factors, Pnm,Γnm and Pbc,Γbc, carries as much evidence as two independent evidence factors. This allows us to combine these two pieces of evidence and provide a total evidence against the hypothesis under both the comparisons. Karmakar, French and Small (2019) discusses different methods for combining evidence. Any method of combining p-values that is monotone in both of the p-values can be used, for example, Fisher’s combination method (Fisher (1932)), the mean of the normal transformation (Liptak (1958)) and the truncated product method of combining (Hsu, Small and Rosenbaum (2013), Zaykin et al. (2002)). Also see Becker (1994). These methods of combining p-values are used when p-values are available from independent sources, for example, in meta-analysis. In an observational study, even when there are independent tests, combining them does not strengthen the evidence against the biases from unmeasured confounders if the analysis are affected by the same unmeasured confounding. The evidence factors are two analyses that are nearly independent and that do not share completely overlapping biases. Thus, combining the maximum p-values from the evidence factors strengthens the evidence in an observational study. The simulation section considers which combining method has largest power in sensitivity analysis for unmeasured confounding.

Fisher’s method computes the joint evidence as the tail probability of χ42 distribution over 2log(Pnm,ΓnmPbc,Γbc). In the scenario of sensitivity analysis, since we only consider largest possible p-values for a given value of hidden bias level, the truncated product method, which weights the evidence by the strength of the evidence, is often preferred. For a given α˜, the combined evidence using the truncated product method is given by FW{Ev(Γnm,Γbc)}, where

Ev(Γnm,Γbc)=𝟙Pnm,Γnmα˜log(Pnm,Γnm)+𝟙Pbc,Γbcα˜log(Pbc,Γbc)and (5.2)
FW{w}=2α˜(1α˜)Gexp(1){log(wα˜)}+α˜2GGamma(2,1){log(wα˜2)}.

In the above, GExp(1) is the survival function of a random variable with exponential distribution with rate 1, and GGamma(2,1) the survival function of a random variable with Gamma distribution with shape parameter 2 and rate 1. The advised choice of α˜ is 0.20 (Hsu, Small and Rosenbaum (2013), Zaykin et al. (2002)).

We conducted a simulation study to compare the powers of Fisher’s method and the truncated product method in the setting of our problem. The simulation scenario considered here is based on the case-control study structure. We are going to look at the favorable situation where there are no unmeasured confounders with treatment effect. Then, for varied treatment effect sizes we compare the power of the two combining methods for different values of (Γnm,Γbc).

We consider a population where the chance of exposure is 1/3. Thus, for a unit l, Pr(Zl=1)=1/3. The treatment effect is denoted by β. We consider a univariate response and two types of response distributions in the population. The two types of distributions when spared exposure are a normal distribution with mean 0 and variance 1 and a t-distribution normalized to have variance 1. Therefore, if a unit Therefore, if a unit l is exposed to treatmentl is exposed to treatment, then the response is a sample from N(β,1) (or β+t3/3), and if not exposed, then the response is a sample from N(0,1) (or t3/3). The case definition for each of the scenarios is taken such that if the treatment effect was 0.5, then 20% of the population would be broad cases. Thus, in the setting where the response is from normal distribution, the response of more than the 0.8 quantile of the mixture distribution 1/3N(β,1)+2/3N(0,1) would be labeled a broad case. In our simulation we sample 2000 broad cases, and half of them are labeled as narrow cases. Then, we sample 2000 controls. In both comparisons of narrow cases vs. marginal cases and broad cases vs. controls, we consider paired stratum, that is, ns=ms=1, cs=2.

Tables 2 and 3 report the simulated power for the two combining methods. The simulated power is based on 10,000 iterations with level of significance α=0.05. Except for very few situations in Table 2, the truncated product method has better simulated power than Fisher’s combining method. The truncated product method seem to be less sensitive as we increase Γnm and Γbc. Fisher’s method has slightly better simulated power in a few situations in the normal response model for moderate values of (Γnm, Γbc) when there is a large treatment effect (β=0.6). After considering these simulation results, in our case-control study of the efficacy of screening sigmoidoscopy we use the truncated product method with α˜=0.20.

Table 2.

Simulated power, in %, of a sensitivity analysis of combined evidence in a case-control study, where there is no unmeasured confounder and Pr(Zl=1)=1/3. The response is simulated from N(β,1) if Zl=1 and N(0,1) if Zl=0. There are 1000 narrow cases and 1000 marginal cases with 2000 controls. Based on 10.000 iterations. Fisher = Fisher’s combination method, tP = truncated product method with α˜=0.20

β=0
β=0.2
β=0.4
β=0.6
Γnm Γbc Fisher tP Fisher tP Fisher tP Fisher tP

1 1 5 5 100 100 100 100 100 100
1.5 0.6 1 25 26 100 100 100 100
2 0.6 1 18 22 87 86 100 100
2.5 0.6 1 18 22 75 80 100 100
1.25 1.25 0 0 48 51 100 100 100 100
2 0 0 0 0.1 15 15 100 100
2.75 0 0 0 0.1 3 5 69 66
3.5 0 0 0 0.1 3 5 69 66
1.5 1.5 0 0 0.2 0.3 99.2 99.4 100 100
2.5 0 0 0 0 0 0 54 52
3.5 0 0 0 0 0 0 1 2
1.75 1.75 0 0 0 0 51 58 100 100
2 0 0 0 0 2 3 100 100
3.25 0 0 0 0 0 0 0 0
2 2 0 0 0 0 2 3 100 100
2.5 0 0 0 0 0 0 36 43
3 0 0 0 0 0 0 0.1 0.2
3.5 0 0 0 0 0 0 0 0
2.25 2.25 0 0 0 0 0 0 88 91
2.5 0 0 0 0 0 0 35 42
3 0 0 0 0 0 0 0.1 0.2

Table 3.

Simulated power of a sensitivity analysis of combined evidence in a case-control study, where there is no unmeasured confounder and Pr(Zl=1)=1/3. The response is simulated from β+t3/3 if Zl=1 and t3/3 if Zl=0. There are 1000 narrow cases and 1000 marginal cases with 2000 controls. Based on 10.000 iterations. Fisher = Fisher’s combination method, tP = truncated product method with α˜=0.20

β=0
β=0.2
β=0.4
β=0.6
Γnm Γbc Fisher tP Fisher tP Fisher tP Fisher tP

1 1 5 5 100 100 100 100 100 100
1.5 1 1.5 0.5 0.5 100 100 100 100
2 1 1.5 0 0.1 15 18 100 100
2.5 1 1.5 0 0.1 0 0 98 98
1.25 1.25 0 0 47 54 100 100 100 100
2 0 0 0 0 14 18 100 100
2.75 0 0 0 0 0 0 71 77
3.5 0 0 0 0 0 0 0.2 0.3
1.5 1.5 0 0 0.2 0.3 100 100 100 100
2.5 0 0 0 0 0 0 98 98
3.5 0 0 0 0 0 0 0.2 0.3
1.75 1.75 0 0 0 0 82 86 100 100
2 0 0 0 0 14 18 100 100
3.25 0 0 0 0 0 0 3 5
2 2 0 0 0 0 14 18 100 100
2.5 0 0 0 0 0 0 98 98
3 0 0 0 0 0 0 24 30
3.5 0 0 0 0 0 0 0.2 0.3
2.25 2.25 0 0 0 0 0.2 0.4 100 100
2.5 0 0 0 0 0 0 98 98
3 0 0 0 0 0 0 24 30

6. Evidence factors with differential effect of unmeasured confounders on the factors.

The individual factors in an evidence factors analysis, if biased, are hoped to be biased by different mechanisms so that a critic would need to consider both sources of bias to explain the observed statistical significance. As discussed in Section 2.1, in the sigmoidoscopy study the bias in comparing all colorectal cancer cases to controls could be due to imbalance between the two groups in healthy lifestyle of the patients, family history and also, potentially, due to diet. The comparison of distal cancer cases to proximal cancer cases may be biased by diet, for example, Mediterranean diet. Hence, the main source of unmeasured confounding in the second analysis can, to some extent, also be a source of bias in the first analysis. The following discussion delineates the logic of evidence factors analysis for such a scenario in which the sources of bias overlap for the two evidence factors but are different in their relative size between the two evidence factors.

Recall that Section 4.2 provides the amplification of the sensitivity parameters Γbc and Γnm in terms of the λ, δbc and δnm. There, u1 and u2 are assumed to be two separate unmeasured confounds. The relation of the unmeasured confounding, u1 and u2, and the exposure to treatment is model by bias level λ. The relation of u1 and the broad case status is modeled by the bias level δbc. Finally, the relation of u2 and the broad case status is modeled by the bias level δnm. In the following we allow for u1 and u2 to be influenced by overlapping factors.

For individual l, let v1l and v2l be unmeasured numbers summarizing two sets of unmeasured variables so that 0v1l, v2l1. We allow for both variables to bias each analysis but to have varying importance in their relationship with the outcomes. We formalize this as follows. Let u1l=ψ1v1l+ψ2v2l where ψ1,ψ20,ψ1+ψ2=1 and ψ1 is larger than ψ2. Also, let u2l=ψ˜1v1l+ψ˜2v2l where ψ˜1,ψ˜20,ψ˜1+ψ˜2=1 and ψ˜2 is larger than ψ˜1. The fractions ψ1, ψ2, ψ˜1 and ψ˜2 are fixed numbers. The unmeasured confounders v1l and v2l relate to the broad case status and the narrow case status by models (4.5) and (4.6) via the variables u1l and u2l.

As for the relation between the unmeasured confounders v1l, v2l and the observed exposure to treatment, for two units i1 and i2 with the same observed covariates we write, for zi1+zi2=1,

Pr(Zi1=zi1,Zi2=zi2C,xi1=xi2,Zi1+Zi2=1)=exp{λ(zi1ωi1+zi2ωi2)}exp(λωi1)+exp(λωi2), (6.1)

where

ωl=ζ1v1l+ζ2v2lforl=1,,L;ζ1,ζ20,ζ1+ζ2=1. (6.2)

Now, consider the amplification of the sensitivity parameters Γbc and Γnm under the model specified by equations (6.1), (6.2) and (4.5) and (4.6) with u1l=ψ1v1l+ψ2v2l and u2l=ψ˜1v1l+ψ˜2v2l. This can be communicated under three different scenarios depending on the source of bias under doubt—either bias from one of v1 or v2 or bias from both v1 and v2. Assume a value of λ in model (6.1)(6.2). We find the parameters δbc and δnm from λ and Γbc, Γnm. Let Λ=exp(λ), Δbc=exp(δbc) and Δnm=exp(δnm). Then, (i) if only v1 is the bias in question, that is, we put the restriction v2,l=v2,l, then Δbc={(ΛΓbc1)/(ΛΓbc)}1/ψ1 and Δnm={(ΛΓnm1)/(ΛΓnm)}1/ψ˜1. This correspondence holds with |v1,i1v1,i2|=1. (ii) If only v2 is the bias in question, that is, we put the restriction v1,l=v1,l, then Δbc={(ΛΓbc1)/(ΛΓbc)}1/ψ2, Δnm={(ΛΓnm1)/(ΛΓnm)}1/ψ˜2 and |v2,i1v2,i2|=1. (iii) Finally, if both the confounders v1 and v2 are in question, then Δbc=(ΛΓbc1)/(ΛΓbc) and Δbc=(ΛΓnm1)/(ΛΓnm). This correspondence holds with |v1,i1v1,i2|=1 and |v2,i1v2,i2|=1. A closer look at these formulas immediately shows that bias parameters δbc=log(Δbc) and δnm=log(Δnm) change wildly across the scenarios.

Guided by the above calculations, Figure 2 provides an illustration of the influence of unmeasured confounders on the broad case status, δbc, and on the narrow case status to a marginal case status, δnm. In this illustration we assume ψ1=3/4, so that, in determining a broad case status, the magnitude of unmeasured confounding from v1 over v2 has the ratio 3 : 1. Whereas, in determining a narrow case status to a marginal case status, the magnitude of unmeasured confounding from v1 over v2 has the ratio 1:4, that is, ψ˜1=1/5. The plot considers three critics, showed in three colors, with different positions on their beliefs in the source of bias from unmeasured confounding. The first critic assumes bias only from v1, the second critic assumes bias only from v2 and, finally, the third critic assumes biases from both v1 and v2. The x-axis on the plot (in red) shows the amount of bias the first critic would have to assume; the y-axis on the plot (in blue) shows the amount of bias the second critic would have to assume, and, finally, the green curves show the amount of bias the third critic would have to assume. For example, the plot highlights the situation where the critics want to explain the sensitivity of the comparisons at level Γbc=2 and Γnm=2, and all of them speculate 3 = 4. The first critic would have to assume biases at the amounts of δbc ≥ 1.671 and δnm ≥ 6.265. The second critic would have to assume biases at the amounts of δbc ≥ 1.566 and δnm ≥ 5.012. The third critic, however, can assume bias levels of δbc ≥ 1.253 and δnm ≥ 1.253. Hence, unless a skeptic of the study assumes unmeasured confounding from both sources of bias mechanisms she would be forced to consider a larger influence of unmeasured confounding in one case definition over the other.

Fig. 2.

Fig. 2.

Level of bias from unmeasured confounding plotted under three speculations—bias only from v1, plotted on the x-axis and in “red”; bias only from v2, plotted on the y-axis and in “blue”; and biases from both v1 and v2, plotted in “green” contours. The contours are of the function f(δv1,δv2)=(1/δv1+1/δv2)1. Here, ψ1= 3/4, ψ2= 1/4, ψ˜1= 1/5 and ψ˜2 = 4/5. The bias levels δbc and δnm change with the speculation, and the required bias level is minimized when biases from both v1 and v2 are assumed.

Thus, when the factors overlap but do not completely overlap in their sources of bias, evidence factors will be useful in narrowing the range of explanations for how an observed association could not be causal.

7. Results: Efficacy of screening sigmoidoscopy.

In our study of mortality from colorectal cancer and screening sigmoidoscopy, the two evidence factors analyses are summarized in Table 4. The count for screening sigmoidoscopy represent the number of individuals who had a screening procedure in 10 years before the reference date. The raw odds ratio, without controlling for any covariates, of screening sigmoidoscopy between proximal and distal cancer cases is 0.63 (95% CI, 0.55 to 0.72) and that between all colorectal cancer cases and controls is 0.64 (95% CI, 0.50 to 0.81). To control for important covariates, we utilize the matched sets we constructed in Section 2.1. Using this matched sets design, the p-value for efficacy of screening sigmoidoscopy for the distal colorectal cancer cases vs. the proximal colorectal cancer cases is 2.3 × 10−5, with the corresponding odds ratio 0.60 (95% CI, 0.46 to 0.76). The p-value for all cases (distal and proximal) vs. the matched controls is 5.0 × 10−11, with odds ratio 0.62 (95% CI, 0.54 to 0.72) (this result is similar to previously reported odds ratios; see Atkin et al. (2010) and Segnan et al. (2011)).

Table 4.

Screening sigmoidoscopy and colorectal cancer summary data. Numbers in the parentheses show the 95% confidence intervals

Distal cancer cases Proximal cancer cases All colorectal cancer cases Controls

No screening sigmoidoscopy 678 662 1340 2538
Screening sigmoidoscopy 144 224 368 1097
Odds ratio from matched sets 0.60 (0.46 to 0.76) 0.62 (0.54 to 0.72)
p-value from matched sets 2.3×10−5 5.0×10−11

We further conduct a sensitivity analysis to assess whether possible covariates, which were not controlled for in our study, may have been the reason behind the observed association above. Being consistent with the notation of Section 4, we consider two sensitivity parameters Γnm and Γbc for the two comparisons. A value of 1 for a sensitivity parameter would say that there is no bias from unmeasured confounding in the respective comparison, and the higher the value is of the parameter, the bigger is the bias. Figure 3 shows the bias levels where the combined evidence for a beneficial effect of screening sigmoidoscopy is sensitive. The p-value upper bounds for each bias level of the two evidence factors are combined using the truncated product method with α˜=0.20. As can be seen in this plot, only a substantial amount of bias in both comparisons could explain the observed association in the data if, in fact, the null hypothesis is true. For example, with a maximum bias of Γnm= 1.4 in the comparison of distal cancer cases to proximal cancer cases, the combined evidence is sensitive only when the bias in the second comparison of all colorectal cases to the controls is larger than Γbc = 1.45. The overall evidence remains insensitive for Γnm= 2 when Γbc ≤ 1.35. Thus, the overall evidence for the efficacy of the procedure is strengthened compared to evidence from an analysis that only looks at the screening rates between all colorectal cancer cases and controls. The maximum p-values are calculated using the “mh” function in R package sensitivity2×2xk.

Fig. 3.

Fig. 3.

Sensitivity analysis of the efficacy of screening sigmoidoscopy in reducing mortality from colorectal cancer. The darker gray color represents the bias levels where the combined evidence for a beneficial effect of screening sigmoidoscopy is sensitive.

To better understand which part of the evidence is contributing to our inferences about the effect of sigmoidoscopy screening, we can use closed testing (Marcus, Peritz and Gabriel (1976)) as in Karmakar, French and Small (2019). When both biases are small, suppose Γnm=Γbc=1.1, by the closed testing procedure, the joint evidence is insensitive, and both evidence factors are also insensitive with Pnm,1.1=1.36×107 and Pbc,1.3=0.0005. The closed testing procedure also says that, when Γnm= 1.5 and Γbc = 1.4, the comparison of proximal to distal cancer cases is sensitive with maximum possible p-value of Pnm,1.5 = 0.21, but there is evidence from the comparison of all colorectal cancer cases to the controls which is insensitive with a maximum possible p-value of Pbc,1.4 = 0.034. Recall from the discussion of Section 4.2 that the pair of bias levels Γnm= 1.5 and Γbc = 1.4 is equivalent to an effect of unmeasured confounders that doubles the chance of a sigmoidoscopy screening for a case relative to a control, while also increasing the chance of death from colorectal cancer by 5/3-fold and increasing the chance of death from a proximal colorectal cancer over a distal colorectal cancer by twofold. On the other hand, if the effect of unmeasured confounders is smaller on being a proximal cancer case so that it increases the chance of death from proximal colorectal cancer over a distal cancer only by 5/3-fold but increases the chance of death by any colorectal cancer by twofold, the joint evidence is sensitive to such unmeasured confounders. The closed testing procedure for two or for many evidence factors and plots similar to Figure 3 can be produced by the R package evidenceFactors available from CRAN (R Core Team (2020)).

8. Discussion.

In this paper we have developed evidence factors in a case-control study in which there is a narrow and a broad case definition. These evidence factors are formed by two sets of comparisons, the first one comparing narrow cases to marginal cases and the second one comparing all cases to controls. Use of these evidence factors in a case-control study can provide better insight into the study especially in a discussion and analysis of possible bias in the study.

In the sigmoidoscopy study considered in this paper, the elaborate theory (Section 1.1 and 1.3) suggested that, if there is an efficacy of sigmoidoscopy screening in reducing mortality from colorectal cancer, the benefit should be larger for the proximal cancer cases compared to distal cancer cases and for any colorectal cancer case over controls. Following this theory, the evidence factors were thus useful in assessing the hypothesis of no benefit of sigmoidoscopy screening. While the standard discussion of evidence factors analyses emphasizes that the biases affecting the different factors are different (Section 1.2), for the sigmoidoscopy study it was more likely the biases overlap but not completely. For case-control studies, this paper also shows that the evidence factors analyses also strengthens the evidence for a causal effect when the biases from unmeasured confounders affecting the different analyses may overlap.

The technical results of Section 5 can be extended to more complex designs, for example, to designs with more than two types of cases (see Keogh and Cox (2014)) using more complex notation. But these technical results are only a part of what makes an evidence factor useful for a case-control study. It is also equally important that the factors are coherent with the elaborate theory of a causal effect of an exposure; for two case definitions other examples, where an evidence factors analysis may be considered, are discussed in the final subsection. Lastly, it would also be important to establish that under overlapping biases, which is likely more prominent when there are multiple types of cases, the multiple analyses considered still strengthens the evidence against a large number of plausible patterns of biases. Regarding this point, for the arguments of Figure 2 in Section 6 to work, one has to think of appropriate extensions of the models in equations (4.5) and (4.6). Such extensions are not readily available in the literature. We leave these developments as a potential future research direction.

Our study paired narrow cases to marginal cases on the observed covariates and included their controls in the matched sets and, then, put the remaining marginal cases in matched sets with their controls. Other matching methods could be used, for example, full matching (Hansen (2004)) and variable ratio matching (Ming and Rosenbaum (2000), Pimentel, Yoon and Keele (2015)).

8.1. Other examples with multiple case definitions.

In certain diseases, like cancer in the body of the uterus, atherosclerosis, hypertension and mental illness, multiple case definitions are considered or often necessary (Acheson (1979), Cole (1979), Cohen et al. (2005)). Some other specific studies where multiple case definitions have been considered are discussed here. These studies illustrate various ways to design a broad case vs. narrow case distinction in case-control studies. In a study to assess whether statin causes peripheral neuropathy, Gaist et al. (2002) classify the neuropathy cases as definite and nondefinite cases of idiopathic peripheral neuropathy based on the intensity of the symptom and the quality of the clinical information. In the terminology of the present paper, the definite cases would be the narrow cases where the association, if present, would be stronger compared to the marginal cases, that is, the nondefinite cases. Small et al. (2013) use an illustrative case-control study for physical abuse by parents in childhood and tendency for more anger in adulthood. In this study the cases were split in two definitions based on whether or not anger score was on a higher range. Here, a case on a higher quantile of anger score could be defined as a narrow case. As a final example, in an effort to understand association between genetic traits and cerebral malaria, Small et al. (2017) consider cerebral malaria cases with and without retinopathy. The World Health Organization (WHO) defines a child as having cerebral malaria when the child is in a coma (cannot localize a painful stimulus), has malaria parasites in his or her blood and has no other known cause of the coma. This definition is not specific as hospitals in malaria-endemic areas often lack diagnostic facilities to identify nonmalarial causes of coma and many children in malaria endemic areas have nonsymptomatic malaria infections. There are characteristic retinal abnormalities (retinopathy) that increase the specificity of a cerebral malaria diagnosis (Taylor et al. (2004)). Cerebral malaria cases with such retinal abnormalities could be considered as narrow cases and those without the retinal abnormalities could be considered as marginal cases.

Acknowledgments.

The authors thank Dr. Noel Weiss for helpful discussion that structured the paper.

Grant support.

This study was supported by an award (number R01CA213645 and number U01CA151736) from the National Cancer Institute of the National Institute of Health. The views expressed here are those of the authors only and do not represent any official position of the National Cancer Institute or National Institutes of Health.

APPENDIX: PROOF OF THE LEMMAS

Proof of Lemma 5.2.

First we note that Tnm is a function of Yn{s} which are, simply, linear functions of Zn{si}. Given the strata, from equation (4.1) we have that the maximum p-value of the narrow vs. marginal comparison, Pnm,Γnm, is computed based on the conditional distributions {[Zn{si}i[ns]Zn{si}+i[ms]Zm{si}]}. Combining these facts, we get the first result that marginally Pnm,Γnm is a function of {Zn{si}} and i[ns]Zn{si}+i[ms]Zm{si}.

Next, we note that Tbc is a function of i[ns]Zn{si}+i[ms]Zm{si}. Now, by looking at equation (4.2), Pbc,Γbc is computed based on the family of conditional distributions {[[i[ns]Zn{si}+i[ms]Zm{si}i[ns]Zn{si}+i[ms]Zm{si}+i[cs]Zc{si}]}. Consequently, Pbc,Γbc is determined by the number of exposed cases {i[ns]Zn{si}+i[ms]Zm{si}} and the total number of exposed individuals {i[ns]Zn{si}+i[ms]Zm{si}+i[cs]Zc{si}}. But it is enough to know whether each control is exposed or not, that is, Zc{si}, to know the number of exposed cases when we have the information on total number of exposed units. Hence, the result is proved.

Proof of Lemma 5.3.

For parts (i) and (ii) note that p-values or their upper bounds are valid p-values, thus, are stochastically larger than Unif[0, 1]. Parts (iii) and (iv) follows from (i) and (ii) simply by marginalizing since marginalization preserves stochastic ordering.

Proof of Lemma 5.4.

Note that, since conditional on i[ns]Zn{si}+i[ms]Zm{si} the random variables Zn{si} and Zc{si} are independently distributed, by Lemma 5.2 the conditional distribution in the statement of the lemma is same as [Pnm,Γnmi[ns]Zn{si}+i[ms]Zm{si}]. Now, the result follows from part (i) of Lemma 5.3.

Proof of Lemma 5.5.

We can write for any 0p, q,1, the conditional probability as

Pr(P2qP1p)=by C1Pr(P2q{V1:P1p})
=E[Pr(P2qV1){V1:P1p}]
by C2E[q{V1:P1p}]=q.

The second equality above follows from the tower property of conditional expectation. The lemma then follows.

Footnotes

Software. An R package evidenceFactors, available from CRAN (R Core Team (2020)), contains code for reproducing the simulation results of Section 5.1, and code used for analyzing the sigmoidoscopy study.

Disclaimer. Dr. Doubeni is a member of the U.S. Preventive Services Task Force (USPSTF). This article does not necessarily represent the views and policies of the USPSTF.

REFERENCES

  1. ACHESON ED. (1979). Comment on “The evolving case-control study.” J. Chronic Dis. 32 28–29. [DOI] [PubMed] [Google Scholar]
  2. ATKIN WS, EDWARDS R, KARLJ-HANS I et al. (2010). Once-only flexible sigmoidoscopy screening in prevention of colorectal cancer: A multicentre randomized controlled trial. Lancet 375 1624–1633. [DOI] [PubMed] [Google Scholar]
  3. BAZZANO LA, HE J, MUNTNER P, VUPPUTURI S and WHELTON PK. (2003). Relationship between cigarette smoking and novel risk factors for cardiovascular disease in the United States. Ann. Intern. Med. 138 891–897. [DOI] [PubMed] [Google Scholar]
  4. BECKER BJ. (1994). Combining significance levels. In The Handbook of Research Synthesis 215–230. Russell Sage Foundation, Thousand Oaks, CA. [Google Scholar]
  5. BIZZOZERO OJ, JOHNSON KG and CIOCCO A. (1966). Radiation related leukemia in Hiroshima and Nagasaki, 1946–1964. N. Engl. J. Med. 274 1095–1101. [DOI] [PubMed] [Google Scholar]
  6. COCHRAN WG. (1965). The planning of observational studies of human populations. J. Roy. Statist. Soc. Ser. A 128 134–155. Reprinted in Readings in Economic Statistics and Econometrics (A. Zellner, ed.), Little Brown, Boston, MA, pp. 11–36 (1968). [Google Scholar]
  7. COHEN JC, KISS RS, PERTSEMLIDIS A, KOTOWSKI IK, GRAHAM R, KIM GARCIA C and HOBBS HH. (2005). Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutation in PCSK9. Nat. Genet. 37 161–165. [DOI] [PubMed] [Google Scholar]
  8. COLE P. (1979). The evolving case-control study. J. Chronic Dis. 32 15–27. 10.1016/0021-9681(79)90006-7 [DOI] [PubMed] [Google Scholar]
  9. DOUBENI CA, MAJOR JM, LAIYEMO AO, SCHOOTMAN M, ZAUBER AG, HOLLENBECK AR, SINHA R and ALLISON J. (2012). Contribution of behavioral risk factors and obesity to socioeconomic differences in colorectal cancer incidence. J. Natl. Cancer Inst. 104 1353–1362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. DOUBENI CA, CORLEY DA, QUINN VP, JENSEN CD, ZAUBER AG, GOODMAN M, JOHNSON JR, MEHTA SJ, BECERRA TA et al. (2018). Effectiveness of screening colonoscopy in reducing the risk of death from right and left colon cancer: A large community-based study. Gut 67 291–298. 10.1136/gutjnl-2016-312712 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. ELDRIDGE RC, DOUBENI CA, FLETCHER RH, ZAUBER AG, CORLEY DA, DORIA-ROSE VP and GOODMAN M. (2013). Uncontrolled confounding in studies of screening effectiveness: An example of colonoscopy. J. Med. Screen. 20 198–207. 10.1177/0969141313508282 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. FISHER RA. (1932). Statistical Methods for Research Workers. Oliver & Boyd, Edinburgh. [Google Scholar]
  13. GAIST D, JEPPESEN U, ANDERSEN M, GARCÍA RODRÍGUEZ AL, HALLAS J and SINDRUP HS. (2002). Statins and risk of polyneuropathy: A case-control study. Neurology 58 1333–1337. [DOI] [PubMed] [Google Scholar]
  14. GASTWIRTH JL, KRIEGER AM and ROSENBAUM PR. (1998). Dual and simultaneous sensitivity analysis for matched pairs. Biometrika 85 907–920. [Google Scholar]
  15. GOODMAN M, FLETCHER RH, DORIA-ROSE VP, JENSEN CD, ZEBROWSKI AM, BECERRA TA, QUINN VP, ZAUBER AG, CORLEY DA et al. (2015). Observational methods to assess the effectiveness of screening colonoscopy in reducing right colon cancer mortality risk: SCOLAR. J. Comp. Eff. Res. 4 541–551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. HANSEN BB. (2004). Full matching in an observational study of coaching for the SAT. J. Amer. Statist. Assoc. 99 609–618. MR2086387 10.1198/016214504000000647 [DOI] [Google Scholar]
  17. HANSEN BB and KLOPFER SO. (2006). Optimal full matching and related designs via network flows. J. Comput. Graph. Statist. 15 609–627. MR2280151 10.1198/106186006X137047 [DOI] [Google Scholar]
  18. HILL AB. (1965). The environment and disease: Association or causation? Proc. R. Soc. Med. 58 295–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. HSU JY, SMALL DS and ROSENBAUM PR. (2013). Effect modification and design sensitivity in observational studies. J. Amer. Statist. Assoc. 108 135–148. MR3174608 10.1080/01621459.2012.742018 [DOI] [Google Scholar]
  20. JOSEPH DA, MEESTER RGS, ZAUBER AG, MANNINEN DL, WINGES L, DONG FB, PEAKER B and VAN BALLEGOOIJEN M. (2016). Colorectal cancer screening: Estimated future colonoscopy need and current volume and capacity. Cancer 122 2479–2486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. KARMAKAR B, FRENCH B and SMALL DS. (2019). Integrating the evidence from evidence factors in observational studies. Biometrika 106 353–367. MR3949308 10.1093/biomet/asz003 [DOI] [Google Scholar]
  22. KARMAKAR B, SMALL DS and ROSENBAUM PR. (2020). Using evidence factors to clarify exposure biomarkers. Am. J. Epidemiol. To appear. 10.1093/aje/kwz263 [DOI] [PubMed] [Google Scholar]
  23. KEOGH RH and COX DR. (2014). Case-Control Studies. Institute of Mathematical Statistics (IMS) Monographs 4. Cambridge Univ. Press, Cambridge. MR3443808 10.1017/CBO9781139094757 [DOI] [Google Scholar]
  24. LEWIS EB. (1963). Leukemia, multiple myeloma, and aplastic anemia in American radiologists. Science 142 1492–1494. [DOI] [PubMed] [Google Scholar]
  25. LIPTAK T. (1958). On the combination of independent tests. Magy. Tud. Akad. Mat. Kut. Intéz. Közl. 3 171–197. [Google Scholar]
  26. MARCUS R, PERITZ E and GABRIEL KR. (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63 655–660. MR0468056 10.1093/biomet/63.3.655 [DOI] [Google Scholar]
  27. MING K and ROSENBAUM PR. (2000). Substantial gains in bias reduction from matching with a variable number of controls. Biometrics 56 118–124. [DOI] [PubMed] [Google Scholar]
  28. MISSIAGLIA E, JACOBS B, D’ARIO G, NARZO AFD, SONESON C, BUDINSKA E, POPOVICI V, VECCHIONE L, GERSTER S et al. (2014). Distal and proximal colon cancers differ in terms of molecular, pathological, and clinical features. Ann. Oncol. 25 1995–2001. 10.1093/annonc/mdu275 [DOI] [PubMed] [Google Scholar]
  29. PIMENTEL SD, YOON F and KEELE L. (2015). Variable-ratio matching with fine balance in a study of the Peer Health Exchange. Stat. Med. 34 4070–4082. MR3431322 10.1002/sim.6593 [DOI] [PubMed] [Google Scholar]
  30. POPPER KR. (1959). The Logic of Scientific Discovery. Hutchinson and Co., Ltd., London. MR0107593 [Google Scholar]
  31. PRENTICE RL and BRESLOW NE. (1978). Retrospective studies and failure time models. Biometrika 65 153–158. [Google Scholar]
  32. U. S. PREVENTIVE SERVICES TASK FORCE, BIBBINS-DOMINGO K, GROSSMAN DC et al. (2016). Screening for colorectal cancer: US preventive services task force recommendation statement. J. Am. Med. Assoc. 315 2564–2575. [DOI] [PubMed] [Google Scholar]
  33. ROSENBAUM PR. (1991). Sensitivity analysis for matched case-control studies. Biometrics 47 87–100. MR1108691 10.2307/2532498 [DOI] [PubMed] [Google Scholar]
  34. ROSENBAUM PR. (2001). Replicating effects and biases. Amer. Statist. 55 223–227. MR1963397 10.1198/000313001317098220 [DOI] [Google Scholar]
  35. ROSENBAUM PR. (2002). Observational Studies, 2nd ed. Springer Series in Statistics. Springer, New York. MR1899138 10.1007/978-1-4757-3692-2 [DOI] [Google Scholar]
  36. ROSENBAUM PR. (2010). Evidence factors in observational studies. Biometrika 97 333–345. MR2650742 10.1093/biomet/asq019 [DOI] [Google Scholar]
  37. ROSENBAUM PR. (2011). Some approximate evidence factors in observational studies. J. Amer. Statist. Assoc. 106 285–295. MR2816721 10.1198/jasa.2011.tm10422 [DOI] [Google Scholar]
  38. ROSENBAUM PR. (2015). How to see more in observational studies: Some new quasi-experimental devices. Annu. Rev. Stat. Appl. 2 21–48. [Google Scholar]
  39. ROSENBAUM PR. (2017). The general structure of evidence factors in observational studies. Statist. Sci. 32 514–530. MR3730520 10.1214/17-STS621 [DOI] [Google Scholar]
  40. ROSENBAUM PR, ROSS RN and SILBER JH. (2007). Minimum distance matched sampling with fine balance in an observational study of treatment for ovarian cancer. J. Amer. Statist. Assoc. 102 75–83. MR2345534 10.1198/016214506000001059 [DOI] [Google Scholar]
  41. ROSENBAUM PR and SILBER JH. (2009). Amplification of sensitivity analysis in matched observational studies. J. Amer. Statist. Assoc. 104 1398–1405. MR2750570 10.1198/jasa.2009.tm08470 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. RUBIN DB. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66 688–701. [Google Scholar]
  43. SEGNAN N, ARMAROLI P, BONELLI L et al. (2011). Once-only sigmoidoscopy in colorectal cancer screening: Follow-up findings of the Italian Randomized Control Trial—SCORE. J. Natl. Cancer Inst. 103 1310–1322. [DOI] [PubMed] [Google Scholar]
  44. SELBY JV, FRIEDMAN GD, QUESENBERRY CP and WEISS N. (1992). A case-control study of screening sigmoidoscopy and mortality from colorectal cancer. N. Engl. J. Med. 326 653–657. [DOI] [PubMed] [Google Scholar]
  45. SHAKED M and SHANTHIKUMAR JG. (2007). Stochastic Orders. Springer Series in Statistics. Springer, New York. MR2265633 10.1007/978-0-387-34675-5 [DOI] [Google Scholar]
  46. SILBER JH, ROSENBAUM PR, POLSKY D, ROSS RN, EVEN-SHOSHAN O, SCHWARTZ JS, ARMSTRONG KA and RANDALL TC. (2015). Does ovarian cancer treatment and survival differ by the specialty providing chemotherapy? J. Clin. Oncol. 25 1169–1175. [DOI] [PubMed] [Google Scholar]
  47. SMALL DS, CHENG J, HALLORAN ME and ROSENBAUM PR. (2013). Case definition and design sensitivity. J. Amer. Statist. Assoc. 108 1457–1468. MR3174721 10.1080/01621459.2013.820660 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. SMALL DS, TAYLOR TE, POSTELS DG, BEARE NA, CHENG J, MACCORMICK IJ and SEY-DEL KB. (2017). Evidence from a natural experiment that malaria parasitemia is pathogenic in retinopathy-negative cerebral malaria. eLife 6. 10.7554/eLife.23699 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. SPLAWA-NEYMAN J. (1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statist. Sci. 5 465–472. MR1092986 [Google Scholar]
  50. TAYLOR TE, FU WJ, CARR RA, WHITTEN RO, MUELLER JS, FOSIKO NG, LEWALLEN S, LIOMBA NG, MOLYNEUX ME et al. (2004). Differentiating the pathologies of cerebral malaria by postmortem parasite counts. Nat. Med. 10 143–145. [DOI] [PubMed] [Google Scholar]
  51. WITTGENSTEIN L. (1958). Philosophical Investigations, 2nd ed. The Macmillan Co., New York. MR0078292 [Google Scholar]
  52. ZAYKIN D, ZHIVOTOVSKY LA, WESTFALL P and WEIR B. (2002). Truncated product method for combining p-values. Genet. Epidemiol. 22 170–185. [DOI] [PubMed] [Google Scholar]
  53. ZHANG K, SMALL DS, LORCH S, SRINIVAS S and ROSENBAUM PR. (2011). Using split samples and evidence factors in an observational study of neonatal outcomes. J. Amer. Statist. Assoc. 106 511–524. MR2847966 10.1198/jasa.2011.ap10604 [DOI] [Google Scholar]
  54. ZUBIZARRETA JR, NEUMAN M, SILBER JH and ROSENBAUM PR. (2012). Contrasting evidence within and between institutions that provide treatment in an observational study of alternative forms of anesthesia. J. Amer. Statist. Assoc. 107 901–915. MR3010879 10.1080/01621459.2012.682533 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. R CORE TEAM (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. Available at http://www.R-project.org. [Google Scholar]

RESOURCES