Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Mar 1.
Published in final edited form as: Res Synth Methods. 2022 Jan 5;13(2):242–254. doi: 10.1002/jrsm.1544

The confounder matrix: A tool to assess confounding bias in systematic reviews of observational studies of etiology

Julie M Petersen 1,2, Malcolm Barrett 3, Katherine A Ahrens 4, Eleanor J Murray 1, Allison S Bryant 5, Carol J Hogue 6, Sunni L Mumford 7,8, Salini Gadupudi 1, Matthew P Fox 9, Ludovic Trinquart 10,11
PMCID: PMC8965616  NIHMSID: NIHMS1787914  PMID: 34954912

Abstract

Systematic reviews and meta-analyses are essential for drawing conclusions regarding etiologic associations between exposures or interventions and health outcomes. Observational studies comprise a substantive source of the evidence base. One major threat to their validity is residual confounding, which may occur when component studies adjust for different sets of confounders, fail to control for important confounders, or have classification errors resulting in only partial control of measured confounders. We present the confounder matrix—an approach for defining and summarizing adequate confounding control in systematic reviews of observational studies and incorporating this assessment into meta-analyses. First, an expert group reaches consensus regarding the core confounders that should be controlled and the best available method for their measurement. Second, a matrix graphically depicts how each component study accounted for each confounder. Third, the assessment of control adequacy informs quantitative synthesis. We illustrate the approach with studies of the association between short interpregnancy intervals and preterm birth. Our findings suggest that uncontrolled confounding, notably by reproductive history and sociodemographics, resulted in exaggerated estimates. Moreover, no studies adequately controlled for all core confounders, so we suspect residual confounding is present, even among studies with better control. The confounder matrix serves as an extension of previously published methodological guidance for observational research synthesis, enabling transparent reporting of confounding control and directly informing meta-analysis so that conclusions are drawn from the best available evidence. Widespread application could raise awareness about gaps across a body of work and allow for more valid inference with respect to confounder control.

Keywords: bias, epidemiologic confounding factors, evidence-based medicine, interpregnancy interval, meta-analysis, systematic review

1 |. INTRODUCTION

Systematic reviews and meta-analyses are essential for drawing conclusions regarding etiologic associations between exposures or interventions and health outcomes.1 Observational studies comprise a substantive source of the evidence base. In a survey of 300 reviews from 2014, 74 focused on exposure-outcome associations and among them, all but one comprised observational studies.2 A major threat to the validity of these reviews is residual confounding in the component studies. Confounding occurs when the true etiological effect is distorted by inadequately controlled common causes of the exposure and the outcome.3

Component studies may fail to control for important confounders, have confounder measurement error limiting the effectiveness of confounding control, or inappropriately adjust for variables affected by the exposure, resulting in residual bias in the estimated summary effect size. Moreover, a major challenge for quantitative synthesis is component studies typically adjust for different sets of confounders.4,5 Investigators of component studies may have different criteria for determining confounders and different approaches for their control, not all of which may be appropriate or adequate.

The Conducting Systematic Reviews and Meta-Analyses of Observational Studies of Etiology (COSMOS-E) guidance was developed to improve the conduct of systematic reviews and meta-analyses of observational studies.6 COSMOS-E includes general principles to assess risk of bias, but it provides limited practical guidance regarding how to transparently assess and report the adequacy of confounding control across component studies. Moreover, it is insufficient to simply describe whether control was adequate; incorporation of such assessments into meta-analyses is necessary for drawing valid conclusions from the best available observational evidence.7,8

Most systematic reviews of observational studies do not define the most important confounders, do not summarize which confounders were controlled across the component studies, and do not incorporate this assessment into the meta-analyses (Box 1). This lack of transparency with respect to confounding control suggests widespread misunderstanding around assessment of this bias in systematic reviews of observational studies and justifies the need for additional guidance and a practical tool to supplement existing risk of bias tools and to aid reporting.9

BOX 1. Confounding control in meta-analyses of etiology.

We searched MEDLINE in July 2019 and selected the 50 most recent meta-analyses of observational studies of an exposure-outcome association that were published in eight high-impact clinical and epidemiology journals (American Journal of Epidemiology, Annals of Epidemiology, Annals of Internal Medicine, BMJ, Epidemiology, International Journal of Epidemiology, JAMA, New England Journal of Medicine). We assessed whether and how each meta-analysis documented confounding control in each component study, whether the most important confounders were identified, and whether the analysis incorporated adequate confounding control. One abstractor extracted all data from the full-text articles and appendices. A second abstractor extracted data independently for 10 randomly selected articles and there were no discrepancies.

In all, four (8%) of the meta-analyses did not report which confounders were controlled. The other 46 (92%) meta-analyses provided a list of confounders in each component study, with 14 (30%) in the online Appendix only. None summarized which confounders were commonly controlled across the component studies. Most (n = 42, 84%) did not define the most important confounders and nearly all (n = 49, 98%) did not consider measurement error of confounders. Just over half (n = 28, 56%) conducted a secondary analysis restricted to a subset of studies that commonly controlled for a single confounder, but none performed such an analysis restricted to studies that commonly controlled for a set of the most important confounders. Further, none simulated an uncontrolled confounder across a set of studies.

We present a 3-step approach that can be used in conjunction with current guidance, with specific focus on defining and summarizing adequate confounding control in systematic reviews of observational studies and for incorporation of this assessment into meta-analyses (Figure 1). First, a group of subject matter and methodological experts determine the core confounders that should be controlled and the best available method for their measurement. Second, a confounder matrix graphically depicts how each component study accounted for each confounder, enabling assessment of control across the body of evidence. Third, this appraisal informs the quantitative synthesis. We developed an R package metaconfoundr (https://cran.r-project.org/web/packages/metaconfoundr/index.html) and an R Shiny app (https://malcolmbarrett.shinyapps.io/metaconfoundr/) to make the creation of a tailored confounder matrix quick and easy. We illustrate our approach with application to a systematic review of short interpregnancy intervals and preterm birth risk in high-resource settings.

FIGURE 1.

FIGURE 1

Steps in conducting the confounder matrix assessment [Colour figure can be viewed at wileyonlinelibrary.com]

2 |. APPROACH

2.1 |. Step 1: Define core set of confounders and criteria for adequate control

The first step is to determine a core set of confounders that theoretically should be controlled when estimating the exposure-outcome association of interest. This step should be detailed upfront in the protocol for the systematic review, and meta-analysis if applicable, and carried out prior to analysis in an effort to reduce subjectivity. In alignment with current guidance,10,11 we recommend this determination be made by small-group consensus. The group should include subject-matter experts and preferably causal inference methodologists. The role of subject-matter experts is to provide the broad scope of potential confounders within the field. The members provide input regarding the relationships between the exposure, outcome, and potential confounders. Methodologists assist with identifying the relevant confounders that need to be controlled to identify causal effects.

Views on how to identify confounders will vary. One potential starting point is listing all adjustment variables across the component studies. However, this approach has shortcomings and may influence the group’s final determination. Commonly, the rationale for confounder selection is poorly reported in epidemiologic studies,12,13 and adjusted estimates may be selectively reported.14,15 The choice of adjustment variables is often based on statistical grounds, for example by examining whether controlling for a covariate changes an effect estimate by more than 10% or by forward or backward selection based on statistical significance, which are flawed in that they do not account for prior causal knowledge.1618 By restricting to adjustment variables available in the component studies, the group may overlook important confounders lacking in the literature (e.g., structural racism). Thus, identifying potential confounders should also rely on content knowledge and theory.16,19 Causal diagrams (Box 2) may be useful in this process, particularly for identifying adjustment variables that are not true confounders and may lead to over- or unnecessary adjustment.20

BOX 2. Causal diagrams.

A causal diagram, also known as a directed acyclic graph, is a visual representation of the relationships between all variables relevant in the data generating mechanism to studying a causal association between an exposure and a health outcome.21,22 Each variable is represented as a node. The variables are connected by unidirectional arrows indicating that one variable causes the other (but not the other way around). Confounding occurs whenever there is a common cause of both the exposure and the outcome, even if mediated through other variables.

Causal diagrams make explicit the theorized relationships relevant to the study of the exposure effect on the outcome. Causal diagrams enable users to identify potential sources of bias based on the assumed causal structure. Users can identify minimal but sufficient sets of variables, which, if controlled, completely remove confounding, at least in theory. Causal diagrams also elucidate variables which, if controlled, would theoretically impose (rather than reduce) bias, such as causal intermediates and common effects of the exposure and the outcome.23 Software packages, such as DAGitty (http://www.dagitty.net/) and dagR,24 can facilitate the construction of causal diagrams and the identification of core confounders. Resources exist with details on the principles of causal diagrams and their construction.2123,2531

As directed in other guidance,10,11 the group should discuss the best available methods for confounder measurement and control. If a confounder is measured with error, the data serve as only a proxy for the true confounder. For instance, if nicotine is a confounder, measurement of self-reported cigarette smoking may be adequate; however, if the target population is prone to misreporting, measurement of plasma cotinine levels may be preferable.32 Confounders that are mismeasured independently and non-differentially with respect to other key variables are only partially controlled in the adjusted estimate.33 Dependent and correlated errors are particularly problematic as the magnitude and direction of the resulting bias can be extreme and unpredictable.3436 When data values can change over time, the timing of measurement may be important and may require specific analytic techniques.37,38 For more information, we refer readers to the Risk Of Bias In Non-randomized Studies of Interventions (ROBINS-I) detailed guidance,11 which discusses assessment and handling of time-varying confounders in observational research synthesis.

The group may opt to consolidate related confounder variables into underlying constructs, as has been proposed by ROBINS-I.11 For example, a socioeconomic factors construct could encompass a variety of variables, including health insurance status, education, and income. By identifying underlying constructs, one can assess confounding control even if component studies adjusted for different sets of variables. Further, causal diagrams may be easier to assemble using constructs instead of individual variables. For each construct, the group defines which variables must be controlled for adequate control of that construct. For example, the group may decide a socioeconomic factors construct is adequately controlled if both education and insurance status were controlled.

The group agrees upon the core set of confounding variables and constructs that comprise the consensus-based minimum set that must be controlled. The objective is to reach judgment for each variable and construct and then overall for a given study by assigning one of three levels: adequate control; some concerns; and inadequate control. Adequate confounding control indicates that the study is at low risk of bias because of confounding, based on expert-consensus evaluation. Some concerns indicates that there is potential for residual confounding in the study. Inadequate confounding control indicates that valid conclusions cannot be drawn from the study.

Definitions for adequacy of confounder control should be as clear as possible to reduce the potential for subjectivity during rating of each component study in Step 2. A formal consensus building approach, such as the Delphi method, may be helpful. If consensus cannot be reached, multiple definitions may be applied to the component studies (Step 2) and results compared (Step 3). If a variable was measured and controlled using the best available methods, the assessment is ‘adequate’; if a variable was measured and controlled using suboptimal methods, the assessment is ‘some concerns’; and if a variable was not measured, not controlled, or controlled using an inappropriate method, the assessment is ‘inadequate’. If control for all necessary variables under a construct, as defined in Step 1, is ‘adequate’, the construct assessment is ‘adequate’; if control for all necessary variables is deemed ‘inadequate’, the assessment is ‘inadequate’; otherwise, the assessment is ‘some concerns’.

Finally, overall confounding control for each study is determined as: ‘adequate’ if all variables and constructs in the core set are adequately controlled; ‘inadequate’ if at least one element of the core set is inadequately controlled; otherwise, ‘some concerns’. Systematic reviews commonly address several exposures or outcomes. We recommend establishing separate confounder criteria for each exposure-outcome association, as rarely would it be appropriate to apply the same confounder logic for all combinations.39

2.2 |. Step 2: Summarize confounding control across component studies

Once the component studies have been identified for inclusion in the review, the second step is to transparently report confounder control.

Abstractors extract details regarding which variables were controlled in each component study. The data abstraction process should follow the recommendations put forth by the Cochrane guidelines.40 Use of standardized, piloted data collection forms will reduce errors and subjectivity and facilitate comparison for agreement when there are multiple abstractors. Confounders may be controlled by design (e.g., restriction or matching) or via statistical analysis (e.g., multivariable regression adjustment, propensity scores, stratification, standardization). If a study reports multiple adjusted estimates, each controlling for a different set of variables, the abstractors should select the model that best controls for confounding based on the consensus-based criteria from Step 1. For each study, the abstractors document the definition of each adjustment variable, the method for measurement, and how control was achieved. Abstractors should note if studies adjusted for variables not identified as confounders in Step 1, especially if their control could have led to over-adjustment bias.20

Next, abstractors should assess confounding control based on the criteria established in Step 1. If any questions arise during the rating process, the questions should be taken back to the experts to further clarify their criteria. If there is disagreement between two raters, consensus should be reached through a formal consensus process with a third rater. In addition, inter-rater agreement could be assessed by a weighted Kappa statistic to compare ratings across the various assessments.

Following, the confounder information across all component studies is visually summarized in a color-coded confounder matrix. The columns of the matrix list the confounding constructs and the specific variables identified during Step 1. The rows correspond to each component study. The intersecting cells are color-coded to represent whether each study controlled for the confounders: adequate (green), some concerns (yellow), inadequate (red). Our metaconfoundr R package and shiny app offer a colorblind-friendly palette option. The matrix can also indicate if studies controlled for other variables that are not true confounders.

If the objective is to conduct a systematic review without meta-analysis, Steps 1 and 2 can be carried out in the absence of Step 3.

2.3 |. Step 3: Incorporate confounding control assessment into quantitative synthesis

The third step is to take confounding control into account when performing quantitative synthesis of the evidence. Several methods exist to account for confounding bias in meta-analysis, including quantitative bias analysis and Bayesian approaches.4146 Subgroup analyses can be performed based on level of control for a specific confounding construct or overall at the study level to evaluate heterogeneity in effect sizes among subgroups.

3 |. APPLICATION TO INTER-PREGNANCY INTERVAL REVIEW

3.1 |. Background

Interpregnancy interval is the time from a delivery to the start of the next pregnancy. Clinical guidelines recommend waiting at least 6 months after a live birth before attempting the next pregnancy.4749 Observational studies show that women with short intervals (<6 months) are more likely to experience adverse outcomes in the subsequent pregnancy, including preterm birth, compared to women whose pregnancies were spaced 18–23 months apart.50 These associations may be explained by confounding, as women who experience rapid repeat pregnancy have different characteristics compared to those who become pregnant within the recommended timeframe.5154 If the true association is null, interventions aimed at preventing close birth spacing will be ineffective at reducing adverse outcomes, and may negatively affect older mothers due to decreased fertility and increased morbidity associated with unnecessarily delayed childbearing.55

We applied the proposed confounder matrix technique to a published systematic review of studies conducted in high-resource settings.56 The investigators of the original review did an assessment of study quality using criteria outlined by the U.S. Preventive Services Task Force57 and found that studies of worse quality tended to report higher estimates. However, their evaluation did not quantitatively isolate whether the higher estimates may be driven by residual confounding and which confounders may be of particular concern. We focused on a subset of studies identified in that review using the same study design and definitions for the exposure and outcome in order to make the studies as comparable as possible. Specifically, our evaluation included 11 retrospective cohort studies that evaluated the association between short interpregnancy intervals (<6 months compared to 18–23 months) and the risk of preterm birth (<37 weeks’ gestation). We report the criteria for study eligibility in the appendix and describe the selected studies in the Table S1.

3.2 |. Step 1: Core set of confounders and criteria for adequate control

We formed a group of five subject-matter experts, two methodologists, and a biostatistician with expertise in meta-analysis. We then generated a comprehensive list of all potential confounding variables that may be measured in a given study of short interpregnancy intervals and preterm birth based on a priori knowledge.58 This list of 41 variables included factors such as maternal age, socioeconomic status (SES), and the outcome of the prior pregnancy (Table 2). We then established what we considered adequate measurement and control of these variables.

Second, we grouped together conceptually similar variables, resulting in eight overarching confounding constructs: sociodemographics (11 variables), socioeconomic factors (eight variables), reproductive history (six variables), family planning (five variables), health behaviors (six variables), underlying health status (three variables), genetics, and structural racism. We did not itemize variables in the last two constructs because they are not typically measured, although operationalized measures of institutional and individually mediated racism exist.59 We then used group consensus to define which variables would be necessary to control for a given construct (Table S2).

Finally, informed by a consensus-based causal diagram (Figure 2), we identified the minimally sufficient set of confounding constructs necessary to adequately control confounding overall. Specifically, we chose three overarching confounding constructs encompassing eight variables: 1) reproductive history (measured by the outcome of the prior pregnancy [e.g., live born versus stillbirth or neonatal death, prematurity]), 2) sociodemographics (measured by maternal age, partner or marital status, and race/ethnicity), and 3) socioeconomic factors (measured by a standardized index of SES or both education and health insurance status) (Table 3). We considered construct control to be ‘adequate’ if all necessary variables were adequately controlled; ‘inadequate’ if control for all necessary variables was inadequate; and otherwise considered control to have ‘some concerns.’ The experts requested one exception; for sociodemographics, control would be considered ‘inadequate’ if only age but none of the other necessary sociodemographic variables were controlled. For a given component study, we considered overall confounding control to be ‘adequate’ if each construct in the core set was adequately controlled; ‘inadequate’ if control for at least one of the constructs in the core set was inadequate; otherwise, we considered the study to have ‘some concerns’.

FIGURE 2.

FIGURE 2

Causal diagram for the study of the association between short interpregnancy interval and preterm birth risk. The diagram depicts the causal relationships between the eight confounding constructs to the exposure (short interpregnancy interval) and outcome (preterm birth). Arrows go from causes into effects. Confounders are common causes of the exposure and outcome. For instance, socioeconomic factors are a cause of short interpregnancy interval (via family planning), as well as a cause of preterm birth (as depicted by the highlighted paths). The minimally sufficient set of confounders needed to control confounding bias is sociodemographics, socioeconomic factors, and reproductive history. By controlling for these three constructs, one would block all backdoor paths between the exposure and the outcome, and this is the smallest number of confounders needed to do so. Non-causal associations created by conditioning on colliders are depicted via black dotted lines. However, in this instance, these non-causal associations would not bias the effect estimate of the exposure on the outcome

Many interpregnancy interval studies measure covariate data during the subsequent pregnancy. This timing may lead to confounder misclassification for factors that could change from one pregnancy to the next.58,60 In a sensitivity analysis, we downgraded a given variable from ‘adequate’ to ‘some concerns’ if the variable was measured after the interpregnancy interval. In a second sensitivity analysis, we considered two additional confounding constructs (family planning and structural racism) as part of the core set (Figure S1). Our rationale was: 1) Family planning is strongly influenced by social factors encompassing demographics, culture, and fiscal status, and may not have an effect once those other factors are adequately controlled. 2) Structural racism influences reproductive health and pregnancy outcomes.61,62 In a third sensitivity analysis, we attempted to exclude studies that controlled for variables that we deemed not to be true confounders. Finally, we conducted a leave-one-out sensitivity analysis.

3.3 |. Step 2: Confounding control across component studies

The data abstraction process for the original review has been published.56 For our evaluation, two abstractors independently assessed which variables were controlled in each component study and then applied the expert consensus criteria for adequate confounding. There was 100% agreement between raters. The eight confounding constructs and 41 variables from Step 1 were measured by six constructs and 26 variables across the 11 component studies (Figure 3). With respect to control of the three constructs in the core set, most studies controlled adequately for reproductive history (9/11) and two were inadequate (Figure 4). Reproductive history was most commonly measured via prior pregnancy outcome (nine studies) and parity (eight studies) (Figure 3). Sociodemographics were adequately controlled in five studies, four studies had some concerns, and two studies were inadequate. This construct was most commonly measured via maternal age (all studies), race/ethnicity (eight studies), and marital status (six studies). No studies controlled for partner status. For socioeconomic factors, five studies had adequate control, five studies had some concerns, and one study was inadequate. This construct was most commonly measured by educational attainment (seven studies), followed by SES index (four studies); health insurance was rarely measured (two studies).

FIGURE 3.

FIGURE 3

Confounder matrix by variable for 11 cohort studies conducted in high-resource settings assessing the association between short interpregnancy intervals (<6 months versus 18–23 months) and preterm birth risk (<37 weeks’ gestation). ART, assisted reproductive therapy; BMI, body mass index; C-section, Cesarean section; Hx, history; and SES, socioeconomic status [Colour figure can be viewed at wileyonlinelibrary.com]

FIGURE 4.

FIGURE 4

Confounder matrix, overall and by core set construct, for 11 cohort studies conducted in high-resource settings assessing the association between short interpregnancy intervals (<6 months versus 18–23 months) and preterm birth risk (<37 weeks’ gestation). Three confounding constructs (sociodemographics, socioeconomic factors, and reproductive history) comprised the core set of confounding constructs. Sociodemographics was considered adequately controlled if maternal age, partner or marital status, and race/ethnicity were all measured. Socioeconomic factors construct was considered adequately controlled if SES index or both education and insurance status were measured. Reproductive history was considered adequately controlled if the outcome of the prior pregnancy (e.g., live born versus stillbirth/neonatal death, prematurity) was measured. Definitions for level of confounding control are provided in the Table S3. For a given component study, we considered overall confounding control to be ‘adequate’ if all three core confounding constructs (sociodemographics, socioeconomic factors, and reproductive history) were adequately controlled; ‘inadequate’ if control for at least one of the constructs in the core set was inadequate; otherwise, we considered the study to have ‘some concerns’. Hx, history; SES socioeconomic status [Colour figure can be viewed at wileyonlinelibrary.com]

Based on our primary criteria for overall confounding control, we judged that no studies were adequate; 8/11 studies had some concerns; and 3/11 studies were inadequate (Figure 4).

All but one study measured at least one confounder in the subsequent pregnancy. When we downgraded variables collected in the subsequent pregnancy, classification for reproductive history was unchanged, all studies did not adequately control for sociodemographics, and all but one study did not adequately control for socioeconomic factors; the assessment of overall confounder control was unchanged. In another sensitivity analysis, we found two studies controlled for proxies of family planning (first trimester prenatal care, assisted reproductive therapy) and were considered to have some concerns, whereas all others were inadequate. No studies measured structural racism (Figure 3). Using our alternative definition for overall confounding control considering these two additional constructs, all studies would be considered inadequate (Figure S2). We were unable to perform the sensitivity analysis excluding studies that controlled for non-confounders, as all studies controlled for at least one variable which we deemed not a true confounder—the most common being year of the delivery following the interval (Figure S3).

3.4 |. Step 3: Incorporation of confounding control assessment into meta-analysis

We estimated pooled odds ratios (OR) for the association between short interpregnancy intervals and preterm birth risk by using inverse-variance random-effects models. We estimated between-study variance τ2 by using the restricted maximum-likelihood estimator with Knapp-Hartung adjustment.63 We performed subgroup analyses stratified by overall confounding control and by control of each core construct. We used random-effects models with pooled estimate of τ2 and tested for differences across subgroups.

The 11 studies included >417,000 women with 6.8% preterm births. The pooled OR was 1.53 (95% CI 1.38, 1.69) (Figure 5). Between-study heterogeneity was substantial (I2 = 93% and τ2 = 0.0205). There was evidence of interaction according to level of overall confounding control (p value = 0.01). Across the eight studies with some concerns, the pooled OR was 1.44 (95% CI 1.29, 1.60). The three studies with inadequate control yielded a larger pooled OR (1.75, 95% CI 1.32, 2.32). In the leave-one-out sensitivity analysis, results after removing each study in turn were similar to results based on all studies (data not shown).

FIGURE 5.

FIGURE 5

Meta-analysis of 11 cohort studies conducted in high-resource settings assessing the association between short interpregnancy intervals (<6 months versus 18–23 months) and preterm bi risk (<37 weeks’ gestation) stratified according to level of confounding control. CI, confidence interval; OR adjusted odds ratio

In subgroup analyses, there was evidence of heterogeneity by the level of control for sociodemographics (p = 0.02) and socioeconomic factors (p = 0.01) but not reproductive history (p = 0.22, Figures S4S6). For reproductive history, the pooled OR was 1.48 (95% CI 1.33, 1.65) across the nine studies with adequate control, and 1.72 (95% CI 0.41, 7.28) for the two studies with inadequate control. For sociodemographics, the pooled OR was 1.35 (95% CI 1.18, 1.55) across the five studies with adequate control, 1.66 (95% CI 1.34, 2.05) across 4 studies with some concerns, and 1.66 (95% CI 0.64, 4.33) across two studies with inadequate control. Unlike the other constructs, the pooled OR for socioeconomic factors was higher among the five studies with adequate control (OR 1.68, 95% CI 1.45, 1.95) compared to the five studies with some concerns (OR 1.35, 95% CI 1.18, 1.55) and the one study with inadequate control (OR 1.53, 95% CI 1.35, 1.73).

In summary, no study adequately controlled for confounding. In the eight studies with some concerns, there was evidence of a modest association between short interpregnancy intervals and preterm birth risk, although we cannot exclude the possibility of bias due to residual confounding. The timing of confounder collection had no influence on our overall assessment of confounder control. The influence of family planning and structural racism remains unknown since few to no studies controlled for these factors. All studies controlled for non-confounders, which may have biased estimates.

4 |. COMMENT

We described a theory- and consensus-driven approach expanding upon current guidelines for research synthesis to comprehensively summarize confounder control across a body of observational research. Our proposed process involves assembling a group of subject matter and methodological experts to reach agreement on criteria for adequate confounding control for a given etiologic research question. These criteria are then applied to assess level of confounding control in each component study. Confounding control across the studies is summarized using a visual confounder matrix. These data can be quantitatively incorporated into the meta-analyses. We applied this approach to a previously published review of observational studies of short interpregnancy intervals and preterm birth,56 and found that studies with inadequate control, particularly for reproductive history and sociodemographics, tend to report higher estimates. A limitation of our case study is that we did not update the search to identify more current eligible studies. Future investigation could include an updated and expanded review with revised criteria to address differences in confounding structures between high and low resource settings and differences by maternal age. Our 3-step confounder matrix approach enables transparent reporting of confounding control and directly informs quantitative synthesis to draw valid conclusions from the best available evidence.

Our approach focused on confounding because non-randomized studies are at increased risk compared to randomized studies. However, the broader network of systematic errors most likely to affect specific research topics should be considered.64,65 Strongly suspected biases should be evaluated first.6567 We illustrated the proposed approach with a single case study. We acknowledge that the approach needs to be piloted more widely to assess its generalizability. We developed the framework with etiological studies in mind, but the approach could also be applied to other types of systematic reviews, such as those of prognostic factors.68 Further, although not detailed here, it is possible that our approach could be extended to visualize and meta-analyze studies based on degree of other suspected biases.

Our approach builds upon previous assessments of bias in systematic reviews. These other tools provide guidance to assess risk of all biases applicable to each study design, which for observational studies includes confounding. In reviews of randomized trials, Kirkham and colleagues propose an outcome matrix displaying which outcomes are reported and meta-analyzable in each trial to identify outcome-reporting bias.69,70 For reviews of non-randomized health interventions, the ROBINS-I,11 the Risk Of Bias in Non-randomized Studies of Exposures (ROBINS-E),10 and A MeaSurement Tool to Assess systematic Reviews (AMSTAR2)71 provide some guidance to assess confounding control, as well as other biases. Similar to our approach, ROBINS-I considers important confounding domains. There are two main differences between our proposed approach and ROBINS-I. Our framework includes a graphical summary of confounding control, across constructs and overall, for each component study. Our framework also includes a detailed approach to identify a minimally sufficient set of confounders and to define criteria for adequate control. In contrast, ROBINS-I offers signaling questions relating to baseline and time-varying confounding. ROBINS-I uses four categories for risk of bias judgments (“Low risk”, “Moderate risk”, “Serious risk” and “Critical risk” of bias) while we used a three-category system, similar to the Cochrane RoB 2 tool, as it is easier to operationalize. We chose to label the categories as “adequate control”, “some concerns”, and “inadequate control” to remind users and readers that our proposed tool focuses on confounding and does not reflect all sources of bias. We recommend our framework be used in conjunction with these other approaches to enhance confounder assessment, while also assessing the risk of other relevant biases affecting study validity, including selective reporting of results14,15 that could also influence selection of key confounders. The overall confounding control assessment for each component study from our approach could be incorporated in risk of bias traffic light plots or summary plots.72

In accordance with Cochrane guidance, we do not propose to summarize the assessment of confounding control for each construct or each study into a single numeric score. We also followed the strategy recommended by Cochrane to incorporate this assessment into the quantitative synthesis, by performing stratified analyses and testing for differences in pooled effect sizes between strata defined by confounding control. However, meta-analysts may wish to explore weighting schemes to combine confounding constructs and use the resulting overall metric as a moderator in the meta-regression model, particularly in areas of research with a larger number of studies, such as economics.73,74

As in any risk of bias assessment, reaching consensus regarding which confounders are necessary for adequate control may be challenging. However, the confounder matrix plot enables ready identification of subsets of component studies that adjusted for the same (or different) sets of variables to identify gaps in the current literature, even if consensus cannot be reached regarding the core set of confounders. In our application, we noted certain determinants of obstetric health (i.e., structural racism and family planning) are rarely measured and all studies controlled for at least one variable from the subsequent pregnancy, which may signal over-adjustment bias.

In conclusion, our confounder matrix approach is an informed, targeted, and transparent method that expands on current guidance to evaluate risk of confounding in systematic reviews and meta-analyses of observational studies. We encourage researchers to utilize such visualization methods when undertaking systematic reviews and to quantitatively incorporate the results into their meta-analyses. Widespread application could raise awareness about common gaps in specific research topics and allow for more trustworthy conclusions in this era of rapid data generation and dissemination.

Supplementary Material

Supporting Information

Highlights.

What is known?

Observational studies comprise a substantial portion of the evidence base for systematic reviews and meta-analyses of etiologic associations between exposures or interventions and health outcomes. A major threat to the validity of conclusions drawn from observational evidence is residual confounding in the component studies. Confounding occurs when the true etiological effect is distorted by inadequately controlled common causes of the exposure (or treatment) and the outcome. Residual confounding may occur because component studies adjust for different sets of confounders, fail to control for important confounders, or have measurement error limiting the effectiveness of control. Further, investigators may control for factors that are not true confounders or may use methods for control that are not adequate.

What is new?

We present the confounder matrix—an approach that expands upon current research synthesis guidance for defining and summarizing adequate confounding control across component studies included in systematic reviews of observational research. Subsequently, appraisal of adequacy of confounding control can be quantitatively incorporated into meta-analyses—a necessary step for drawing valid inferences from the best available observational evidence.

Potential impact for Research Synthesis Methods readers outside the authors’ field

The confounder matrix may be readily assembled for one’s systematic review to visually summarize confounding control across a body of observational evidence. We developed an R package metaconfoundr (https://cran.r-project.org/web/packages/metaconfoundr/index.html) and R shiny app (https://malcolmbarrett.shinyapps.io/metaconfoundr/) to aid in this process. Widespread application could raise awareness about common gaps in specific research topics and allow for more trustworthy conclusions in this era of rapid data generation and dissemination.

Funding information

Ludovic Trinquart received support from Boston University Hariri Institute for Computing (Research Incubation Awards). Sunni L. Mumford was supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland.

Footnotes

CONFLICT OF INTEREST

The authors declare no conflict of interest.

SUPPORTING INFORMATION

Additional supporting information may be found in the online version of the article at the publisher’s website.

DATA AVAILABILITY STATEMENT

No subject-level data were created or analyzed in this study. The summary-level data that support the findings of this study were extracted from the published source studies. Details are openly available in this article and in the supporting online information.

REFERENCES

  • 1.Egger M, Smith GD, Schneider M. Systematic reviews of observational studies. Systematic Reviews in Health Care: Meta-Analysis in Context. 2nd ed. BMJ Publishing Group; 2008. [Google Scholar]
  • 2.Page MJ, Shamseer L, Altman DG, et al. Epidemiology and reporting characteristics of systematic reviews of biomedical research: a cross-sectional study. PLoS Med. 2016;13(5):e1002028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Greenland S, Robins JM. Identifiability, exchangeability, and epidemiological confounding. Int J Epidemiol. 1986;15(3):413–419. [DOI] [PubMed] [Google Scholar]
  • 4.Becker BJ, Aloe AM, Duvendack M, et al. Quasi-experimental study designs series-paper 10: synthesizing evidence for effects collected from quasi-experimental studies presents surmountable challenges. J Clin Epidemiol. 2017;89:84–91. [DOI] [PubMed] [Google Scholar]
  • 5.Valentine JC, Thompson SG. Issues relating to confounding and meta-analysis when including non-randomized studies in systematic reviews on the effects of interventions. Res Synth Methods. 2013;4(1):26–35. [DOI] [PubMed] [Google Scholar]
  • 6.Dekkers OM, Vandenbroucke JP, Cevallos M, Renehan AG, Altman DG, Egger M. COSMOS-E: guidance on conducting systematic reviews and meta-analyses of observational studies of etiology. PLoS Med. 2019;16(2):e1002742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hopewell S, Boutron I, Altman DG, Ravaud P. Incorporation of assessments of risk of bias of primary studies in systematic reviews of randomised trials: a cross-sectional study. BMJ Open. 2013;3(8):e003342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Katikireddi SV, Egan M, Petticrew M. How do systematic reviews incorporate risk of bias assessments into the synthesis of evidence? A methodological study. J Epidemiol Community Health. 2015;69(2):189–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Mueller M, D’Addario M, Egger M, et al. Methods to systematically review and meta-analyse observational studies: a systematic scoping review of recommendations. BMC Med Res Methodol. 2018;18(1):44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Bero L, Chartres N, Diong J, et al. The risk of bias in observational studies of exposures (ROBINS-E) tool: concerns arising from application to observational studies of exposures. Syst Rev. 2018;7(1):242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sterne JA, Hernan MA, Reeves BC, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355:i4919. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Groenwold RH, Van Deursen AM, Hoes AW, Hak E. Poor quality of reporting confounding bias in observational intervention studies: a systematic review. Ann Epidemiol. 2008;18(10):746–751. [DOI] [PubMed] [Google Scholar]
  • 13.Pocock SJ, Collier TJ, Dandreo KJ, et al. Issues in the reporting of epidemiological studies: a survey of recent practice. BMJ. 2004;329(7471):883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Peters J, Mengersen K. Selective reporting of adjusted estimates in observational epidemiology studies: reasons and implications for meta-analyses. Eval Health Prof. 2008;31(4):370–389. [DOI] [PubMed] [Google Scholar]
  • 15.Serghiou S, Patel CJ, Tan YY, Koay P, Ioannidis JP. Field-wide meta-analyses of observational associations can map selective availability of risk factors and the impact of model specifications. J Clin Epidemiol. 2016;71:58–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hernan MA, Hernandez-Diaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002;155(2):176–184. [DOI] [PubMed] [Google Scholar]
  • 17.Lee PH. Should we adjust for a confounder if empirical and theoretical criteria yield contradictory results? A Simulation study. Sci Rep. 2014;4:6085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Witte J, Didelez V. Covariate selection strategies for causal inference: classification and comparison. Biom J. 2019;61(5):1270–1289. [DOI] [PubMed] [Google Scholar]
  • 19.VanderWeele TJ. Principles of confounder selection. Eur J Epidemiol. 2019;34(3):211–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Schisterman EF, Cole SR, Platt RW. Overadjustment bias and unnecessary adjustment in epidemiologic studies. Epidemiology. 2009;20(4):488–495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Glymour MM, Greenland S. Causal Diagrams. In: Rothman KJ, Greenland S, Lash TL, eds. Modern Epidemiology. 3rd ed. Lippincott Williams & Wilkins; 2008. [Google Scholar]
  • 22.Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10(1):37–48. [PubMed] [Google Scholar]
  • 23.Shrier I, Platt RW. Reducing bias through directed acyclic graphs. BMC Med Res Methodol. 2008;8:70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Breitling LP. dagR: a suite of R functions for directed acyclic graphs. Epidemiology. 2010;21(4):586–587. [DOI] [PubMed] [Google Scholar]
  • 25.Cole SR, Platt RW, Schisterman EF, et al. Illustrating bias due to conditioning on a collider. Int J Epidemiol. 2010;39(2):417–420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Etminan M, Collins GS, Mansournia MA. Using causal diagrams to improve the design and interpretation of medical research. Chest. 2020;158(1S):S21–S28. [DOI] [PubMed] [Google Scholar]
  • 27.Hernan MA, Hernandez-Diaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15(5):615–625. [DOI] [PubMed] [Google Scholar]
  • 28.Howards PP, Schisterman EF, Poole C, Kaufman JS, Weinberg CR. “Toward a clearer definition of confounding” revisited with directed acyclic graphs. Am J Epidemiol. 2012;176(6):506–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Pearl J Causality: Models, Reasoning, and Inference. Cambridge University Press; 2000. [Google Scholar]
  • 30.Shrier I Structural approach to bias in meta-analyses. Res Synth Methods. 2011;2(4):223–237. [DOI] [PubMed] [Google Scholar]
  • 31.Suzuki E, Shinozaki T, Yamamoto E. Causal diagrams: pitfalls and tips. J Epidemiol. 2020;30(4):153–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.George L, Granath F, Johansson AL, Cnattingius S. Self-reported nicotine exposure and plasma levels of cotinine in early and late pregnancy. Acta Obstet Gynecol Scand. 2006;85(11):1331–1337. [DOI] [PubMed] [Google Scholar]
  • 33.Greenland S The effect of misclassification in the presence of covariates. Am J Epidemiol. 1980;112(4):564–569. [DOI] [PubMed] [Google Scholar]
  • 34.Greenland S, Robins JM. Confounding and misclassification. Am J Epidemiol. 1985;122(3):495–506. [DOI] [PubMed] [Google Scholar]
  • 35.Marshall JR, Hastrup JL, Ross JS. Mismeasurement and the resonance of strong confounders: correlated errors. Am J Epidemiol. 1999;150(1):88–96. [DOI] [PubMed] [Google Scholar]
  • 36.Wacholder S When measurement errors correlate with truth: surprising effects of nondifferential misclassification. Epidemiology. 1995;6(2):157–161. [DOI] [PubMed] [Google Scholar]
  • 37.Clare PJ, Dobbins TA, Mattick RP. Causal models adjusting for time-varying confounding—a systematic review of the literature. Int J Epidemiol. 2019;48(1):254–265. [DOI] [PubMed] [Google Scholar]
  • 38.Daniel RM, Cousens SN, De Stavola BL, Kenward MG, Sterne JA. Methods for dealing with time-dependent confounding. Stat Med. 2013;32(9):1584–1618. [DOI] [PubMed] [Google Scholar]
  • 39.Westreich D, Greenland S. The table 2 fallacy: presenting and interpreting confounder and modifier coefficients. Am J Epidemiol. 2013;177(4):292–298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Higgins JP, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane Handbook for Systematic Reviews of Interventions; Wiley-Blackwell; 2021. Accessed 18, October 2021. www.training.cochrane.org/handbook. [Google Scholar]
  • 41.Lash TL, Fox MP, Fink AZ. Applying Quantitative Bias Analysis to Epidemiologic Data. Springer-Verlag; 2009. [Google Scholar]
  • 42.Mathur MB, VanderWeele TJ. Sensitivity analysis for unmeasured confounding in meta-analyses. J Am Stat Assoc. 2019;115(529):163–172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.McCandless LC. Meta-analysis of observational studies with unmeasured confounders. Int J Biostat. 2012;8(2):1–31. [DOI] [PubMed] [Google Scholar]
  • 44.Sajeev G, Weuve J, Jackson JW, et al. Late-life cognitive activity and dementia: a systematic review and bias analysis. Epidemiology. 2016;27(5):732–742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Thompson S, Ekelund U, Jebb S, et al. A proposed method of bias adjustment for meta-analyses of published observational studies. Int J Epidemiol. 2011;40(3):765–777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Weuve J, Sagiv SK, Fox MP. Quantitative bias analysis for collaborative science. Epidemiology. 2018;29(5):627–630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Report of a WHO technical consultation on birth spacing. World Health Organization; 2006. [Google Scholar]
  • 48.Birth spacing and birth outcomes fact sheet. March of Dimes; 2015. [Google Scholar]
  • 49.American College of Obstetricians and Gynecologists’ Committee on Obstetric Practice. Association of Women’s health, obstetric and neonatal nurses. Committee opinion no. 666: optimizing postpartum care. Obstet Gynecol. 2016;127(6):e187–e192. [DOI] [PubMed] [Google Scholar]
  • 50.Conde-Agudelo A, Rosas-Bermudez A, Kafury-Goeta AC. Birth spacing and risk of adverse perinatal outcomes: a meta-analysis. JAMA. 2006;295(15):1809–1823. [DOI] [PubMed] [Google Scholar]
  • 51.Cheslack Postava K, Winter AS. Short and long interpregnancy intervals: correlates and variations by pregnancy timing among U.S. women. Perspect Sex Reprod Health. 2015;47(1):19–26. [DOI] [PubMed] [Google Scholar]
  • 52.Gemmill A, Lindberg LD. Short interpregnancy intervals in the United States. Obstet Gynecol. 2013;122(1):64–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Gold R, Connell FA, Heagerty P, Bezruchka S, Davis R, Cawthon ML. Income inequality and pregnancy spacing. Soc Sci Med. 2004;59(6):1117–1126. [DOI] [PubMed] [Google Scholar]
  • 54.Gold R, Connell FA, Heagerty P, et al. Predicting time to subsequent pregnancy. Matern Child Health J. 2005;9(3):219–228. [DOI] [PubMed] [Google Scholar]
  • 55.Haight SC, Hogue CJ, Raskind-Hood CL, Ahrens KA. Short interpregnancy intervals and adverse pregnancy outcomes by maternal age in the United States. Ann Epidemiol. 2019;31:38–44. [DOI] [PubMed] [Google Scholar]
  • 56.Ahrens KA, Nelson H, Stidd RL, Moskosky S, Hutcheon JA. Short interpregnancy intervals and adverse perinatal outcomes in high-resource settings: an updated systematic review. Paediatr Perinat Epidemiol. 2019;33(1):O25–O47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Procedure Manual. Appendix VI. Criteria for Assessing Internal Validity of Individual Studies. 2017. U.S. Preventive Services Task Force. Accessed 15, October 2021. https://uspreventiveservicestaskforce.org/uspstf/about-uspstf/methods-and-processes/procedure-manual/procedure-manual-appendix-vi-criteria-assessing-internal-validity-individual-studies [Google Scholar]
  • 58.Hutcheon JA, Moskosky S, Ananth CV, et al. Good practices for the design, analysis, and interpretation of observational studies on birth spacing and perinatal health outcomes. Paediatr Perinat Epidemiol. 2019;33(1):O15–O24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Groos M, Wallace M, Hardeman R, Theall K. Measuring inequity: a systematic review of methods used to quantify structural racism. J Health Disparities Res Pract. 2018;11(2):190–205. [Google Scholar]
  • 60.Schummers L, Hutcheon JA, Norman WV, Liauw J, Bolatova T, Ahrens KA. Short interpregnancy interval and pregnancy outcomes: how important is the timing of confounding variable ascertainment? Paediatr Perinat Epidemiol. 2021;35(4):428–437. [DOI] [PubMed] [Google Scholar]
  • 61.Bryant AS, Worjoloh A, Caughey AB, Washington AE. Racial/ethnic disparities in obstetric outcomes and care: prevalence and determinants. Am J Obstet Gynecol. 2010;202(4):335–343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Gadson A, Akpovi E, Mehta PK. Exploring the social determinants of racial/ethnic disparities in prenatal care utilization and maternal outcome. Semin Perinatol. 2017;41(5):308–317. [DOI] [PubMed] [Google Scholar]
  • 63.Cornell JE, Mulrow CD, Localio R, et al. Random-effects meta-analysis of inconsistent effects: a time for change. Ann Intern Med. 2014;160(4):267–270. [DOI] [PubMed] [Google Scholar]
  • 64.Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC, Greenland S. Good practices for quantitative bias analysis. Int J Epidemiol. 2014;43(6):1969–1985. [DOI] [PubMed] [Google Scholar]
  • 65.Maclure M, Schneeweiss S. Causation of bias: the episcope. Epidemiology. 2001;12(1):114–122. [DOI] [PubMed] [Google Scholar]
  • 66.Greenland S Basic methods for sensitivity analysis of biases. Int J Epidemiol. 1996;25(6):1107–1116. [PubMed] [Google Scholar]
  • 67.Savitz DA, Wellenius GA, Trikalinos TA. The problem with mechanistic risk of bias assessments in evidence synthesis of observational studies and a practical alternative: assess the impact of specific sources of potential bias. Am J Epidemiol. 2019;188:1581–1585. [DOI] [PubMed] [Google Scholar]
  • 68.Riley RD, Moons KGM, Snell KIE, et al. A guide to systematic review and meta-analysis of prognostic factor studies. BMJ. 2019;364:k4597. [DOI] [PubMed] [Google Scholar]
  • 69.Kirkham JJ, Altman DG, Chan AW, Gamble C, Dwan KM, Williamson PR. Outcome reporting bias in trials: a methodological approach for assessment and adjustment in systematic reviews. BMJ. 2018;362:k3802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Kirkham JJ, Dwan KM, Altman DG, et al. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ. 2010;340:c365. [DOI] [PubMed] [Google Scholar]
  • 71.Shea BJ, Reeves BC, Wells G, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.McGuinness LA, Higgins JPT. Risk-of-bias VISualization (robvis): an R package and shiny web app for visualizing risk-of-bias assessments. Res Synth Methods. 2021;12(1):55–61. [DOI] [PubMed] [Google Scholar]
  • 73.Havranek T, Stanley TD, Doucouliagos H, et al. Reporting guidelines for meta-analysis in economics. J Econ Surv. 2020;34(3):469–475. [Google Scholar]
  • 74.Stanley TD, Massey S. Evidence of nicotine replacement’s effectiveness dissolves when meta-regression accommodates multiple sources of bias. J Clin Epidemiol. 2016;79:41–45. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Data Availability Statement

No subject-level data were created or analyzed in this study. The summary-level data that support the findings of this study were extracted from the published source studies. Details are openly available in this article and in the supporting online information.

RESOURCES