Causal meta-analysis by integrating multiple observational studies with multivariate outcomes

Subharup Guha; Yi Li

doi:10.1093/biomtc/ujae070

. 2024 Jul 29;80(3):ujae070. doi: 10.1093/biomtc/ujae070

Causal meta-analysis by integrating multiple observational studies with multivariate outcomes

Subharup Guha ^1,^✉, Yi Li ²

PMCID: PMC11285113 PMID: 39073772

ABSTRACT

Integrating multiple observational studies to make unconfounded causal or descriptive comparisons of group potential outcomes in a large natural population is challenging. Moreover, retrospective cohorts, being convenience samples, are usually unrepresentative of the natural population of interest and have groups with unbalanced covariates. We propose a general covariate-balancing framework based on pseudo-populations that extends established weighting methods to the meta-analysis of multiple retrospective cohorts with multiple groups. Additionally, by maximizing the effective sample sizes of the cohorts, we propose a FLEXible, Optimized, and Realistic (FLEXOR) weighting method appropriate for integrative analyses. We develop new weighted estimators for unconfounded inferences on wide-ranging population-level features and estimands relevant to group comparisons of quantitative, categorical, or multivariate outcomes. Asymptotic properties of these estimators are examined. Through simulation studies and meta-analyses of TCGA datasets, we demonstrate the versatility and reliability of the proposed weighting strategy, especially for the FLEXOR pseudo-population.

Keywords: FLEXOR, pseudo-population, retrospective cohort, unconfounded comparison, weighting

1. INTRODUCTION

The study of differential patterns of oncogene expression levels across cancer subtypes has aroused great interest because it unveils new tumorigenesis mechanisms and can improve cancer screening and treatment (Kumar et al., 2020). In a multi-site breast cancer study conducted at 7 medical centers, including, for example, Memorial Sloan Kettering, Mayo Clinic, and the University of Pittsburgh, the goal was to compare the mRNA expression levels of 8 targeted breast cancer genes, namely, COL9A3, CXCL12, IGF1, ITGA11, IVL, LEF1, PRB2, and SMR3B (eg, Christopoulos et al., 2015) in the disease subtypes infiltrating ductal carcinoma (IDC) and infiltrating lobular carcinoma (ILC), which account for nearly 80% and 10% of breast cancer cases in the United States (Tran, 2022; Wright, 2022). The data reposited at The Cancer Genome Atlas (TCGA) portal (NCI, 2022) include demographic, clinicopathological, and biomarker measurements; some study-specific attributes are summarized in Web Table 2 of Supplementary Materials. Each breast cancer patient’s outcome is a vector of mRNA expression measurements for these 8 targeted genes.

Inference focuses on interpreting biomarker comparisons between the disease subtypes IDC and ILC in the context of a larger disease population in the United States, for example, Surveillance, Epidemiology, and End Results (SEER) breast cancer patients (Surveillance Research Program and NCI, 2023). The estimands of interest include contrasts and gene-gene pairwise correlations, alongside disease subtype-specific summaries (eg, means, standard deviations, and medians). Understanding gene expression and co-expression patterns in different subtypes of breast cancer among national-level patients is crucial for developing feasible guidelines for regulating targeted therapies and precision medicine (Schmidt et al., 2016). As revealed by Web Table 2 of Supplementary Materials, naive group comparisons based on the TCGA patient cohorts are severely confounded by the high degree of covariate imbalance between the IDC and ILC subtypes.

More broadly, covariate balance is vitally important in observational studies where interest focuses on unconfounded causal comparisons of group potential outcomes (Robins and Rotnitzky, 1995; Rubin, 2007) in a large natural population such as the US population. The observed populations of convenience samples such as observational studies are usually unrepresentative of this natural population. Theoretical and simulation studies have demonstrated the conceptual and practical advantages of weighting over other covariate-balancing techniques like matching and regression adjustment (Austin, 2010). As a result, weighting methods have widespread applicability in diverse research areas such as political science, sociology, and healthcare (Lunceford and Davidian, 2004). For analyzing cohorts consisting of 2 groups, the propensity score (PS) (Rosenbaum and Rubin, 1983) plays a central role. In these studies, the average treatment effect (ATE) and average treatment effect on the treated group (ATT) are overwhelmingly popular estimands (Robins et al., 2000). However, the inverse probability weights (IPW) on which these estimators rely may be unstable when some PSs are near 0 or 1 (Li and Li, 2019).

Several researchers have proposed variations of ATE based on truncated subpopulations of scientific or statistical interest (Crump et al., 2006; Li and Greene, 2013). Most weighting methods, implicitly or explicitly, provide unbiased inferences for a specific pseudo-population, a covariate-balanced construct that often differs substantially from the real but mostly unknown natural population of interest. For example, Li et al. (2018) showed that IPWs correspond to a combined pseudo-population and introduced the overlap pseudo-population, wherein the weights minimize the asymptotic variance of the weighted ATE for the overlap pseudo-population (ATO). For single observational studies comprising 2 or more groups, Li and Li (2019) proposed the generalized overlap pseudo-population that minimizes the sum of asymptotic variances of weighted estimators of pairwise group differences. For multiple observational studies with 2 groups, Wang and Rosner (2019) developed an integrative approach for Bayesian inferences on ATE. For single observational studies with 2 groups, Mao et al. (2019) obtained analytical variance expressions of modified IPW estimators adjusted for the estimated PS and augmented the estimators with outcome models for improved efficiency. Zeng et al. (2023) explored weighting procedures in single-study and multiple-group settings with censored survival outcomes.

However, these methods have several limitations. First, they are theoretically guaranteed to be effective for a specific set of outcome types and estimands under certain theoretical conditions (eg, equal variances of univariate group-specific outcomes). As study endpoints may be continuous, categorical, or multivariate, inference procedures for disparate outcome types have been inadequately explored. Furthermore, scientific interests may necessitate alternative estimands than ATE, ATT, or ATO, such as distribution percentiles, standard deviations, pairwise correlations of multivariate outcomes, and unplanned estimands suggested during post hoc analyses. Second, these methods may imply group assignment changes for some subjects that are sometimes difficult to justify for a meaningful, generalizable pseudo-population (Li et al., 2018; Li and Li, 2019). Lastly, very few methods can accommodate the integration of multiple observational studies with multiple unbalanced groups as encountered in the TCGA datasets. One potential use of the existing weighing methods to achieve covariate balance is by creating a new categorical variable that combines study and group information. However, it is unclear how to conduct unconfounded group comparisons independent of the “nuisance” study factor. Furthermore, the pseudo-populations generated by this approach are often impractical, and inferential accuracies for common estimands are frequently suboptimal. There is a critical need for developing efficient approaches that enable the integration of multiple observational studies and multiple unbalanced groups and the construction of pseudo-populations that resemble the natural population of interest.

To fill this gap, we extend the PS to the multiple PS and propose a new class of pseudo-populations and multi-study balancing weights to effectuate data integration and causal meta-analyses. Compared to the existing weighting methods, our work presents 2 main advances. First, our framework enables unconfounded inferences on a wide variety of population-level group features as well as planned or unplanned estimands relevant to group comparisons. Second, the framework allows us to derive efficient estimators within this proposed family of pseudo-populations. Specifically, by maximizing the effective sample size (ESS), we further obtain a FLEXible, Optimized, and Realistic (FLEXOR) weighting method and derive new weighted estimators that are efficient for a variety of quantitative, categorical, and multivariate outcomes, are applicable to different weighting strategies, and effectively utilize multivariate outcome information. For example, the estimators yield efficient estimates of various functionals of group-specific potential outcomes, for example, contrasts of means and medians, correlations, and percentiles.

The rest of the paper is organized as follows. Section 2 introduces some basic notation, theoretical assumptions, and a general covariate-balancing framework for meta-analysis. We further introduce FLEXOR, an optimized pseudo-population, as its special case. Section 3 develops unconfounded integrative estimators applicable to different weighting methods, estimands, and response types and establishes asymptotic properties. Section 4 presents the finite sample performance of the proposed methodology, especially when used in conjunction with the FLEXOR weights. Section 5 meta-analyzes the aforementioned TCGA studies and detects differential targeted gene expression and co-expression patterns across the 2 major breast cancer subtypes in the United States. Section 6 concludes with some final remarks.

2. INTEGRATION OF OBSERVATIONAL STUDIES WITH MULTIPLE UNBALANCED GROUPS

2.1. Notation and basic assumptions

We aim to compare Inline graphic subpopulations or groups (eg, disease subtypes) of participants belonging to a large natural population such as the US patient population. Beyond basic summaries (eg, group prevalences) from preexisting registries, no additional information is available about the natural population. The investigation comprises Inline graphic observational studies. We assume and are not large. For , let denote the group and denote the observational study. We assume that each participant belongs to exactly 1 observational study and each study includes at least 1 participant in each group. Additionally, there are covariates shared by all the studies and denoted by Inline graphic for the th participant. The motivating TCGA database comprises groups corresponding to breast cancer subtypes IDC and ILC, and covariates of breast cancer patients in observational studies. The th participant’s potential outcome is , that is, the outcome had the patient belonged to group Inline graphic The observed outcome is . In the TCGA example, vectors and represent counterfactual mRNA measurements of disease subtypes IDC and ILC on targeted genes, and the observed contains mRNA measurements of breast cancer subtype with which participant is actually diagnosed.

The participant-specific measurements are a random sample from an observed distribution, Inline graphic , where generically represents distributions or densities with respect to the observed population. Extending Rubin (2007) and Imbens (2000), we assume (A) Stable unit treatment value assumption (SUTVA): Given subjects’ covariates, the study and group memberships do not influence the potential outcomes, and no 2 versions of grouping lead to different potential outcomes; (B) Study-specific unconfoundedness: Given study Inline graphic and covariate vector , group membership is independent of the potential outcomes ; and (C) Positivity: Joint density is strictly positive for all . Assumption (B) states that . Assumption (C) guarantees that the study and group memberships and covariates do not have deterministic relationships and often holds when Inline graphic and are not large.

2.2. A new family of pseudo-populations

We first extend variations of the PS (eg, Rosenbaum and Rubin, 1983) to the multiple PS (MPS) of the vector Inline graphic . For , the MPS

(1)

It then follows that the joint density Inline graphic , where represents the marginal covariate density in the observed population. As the MPS is unknown in observational studies, we can estimate it by regressing the combinations of on covariate (). In single studies with 2 groups, the PS is usually estimated using logistic regression (Mao et al., 2019). For estimating MPS, we recommend multinomial logistic regression: Inline graphic for , so that is a -dimensional parameter. If we define to be the vector of zeros, then for all .

Consider a pseudo-population with attributes fully or partially prescribed by the investigator via 2 probability vectors: (i) relative amounts of information extracted from the studies, quantified by probability tuple Inline graphic ; and (ii) relative group prevalence, . For instance, in the TCGA breast cancer studies, setting extracts equal information from each study, whereas constrains the pseudo-population to the known US proportions of breast cancer subtypes IDC and ILC (Tran, 2022; Wright, 2022). If some or all components of Inline graphic or are unknown, subsequent inferences can optimize the pseudo-population over the multiple possibilities for these quantities.

For multiple observational studies, the participant study memberships Inline graphic are primarily influenced by the study designs and unknown factors driving participation; moreover, study participant characteristics can differ substantially across studies, especially in cancer investigations. To address these issues, we aim to design a pseudo-population for achieving theoretical covariate balance between the Inline graphic groups. In other words, we construct a pseudo-population wherein the study memberships, the group memberships and patient characteristics are mutually independent, that is, so that

(2)

Here and hereafter, Inline graphic denotes a distribution or density with respect to the designed pseudo-population, whereas corresponds to the observed population, as mentioned earlier. Equation 2 further emphasizes that although , , and are independent in the pseudo-population, they may share some distributional parameters. More explicitly, the subscripts of Inline graphic emphasize that the pseudo-population density of may depend on and .

Next, consider the relationship between the pseudo-population covariate density, Inline graphic , and the marginal observed covariate density, . Assuming a common dominating measure for the densities and a common support, , there exists without loss of generality a positive tilting function (eg, Li et al., 2018) denoted by such that for all . Therefore, , where and denotes expectations under the observed distribution. Intuitively, high tilting function values correspond to covariate space regions with high pseudo-population weights. Let Inline graphic denote the unit simplex in . Different choices of , , and tilting function identify different pseudo-populations with structure (2).

Balancing weights for integration of multiple studies. To efficiently meta-analyze multiple studies (with Inline graphic ), we propose the multi-study balancing weight, defined as the ratio of the joint densities with respect to the pseudo-population and observed population. More specifically, for any , the multi-study balancing weight

(3)

As Inline graphic , the balancing weight serves to redistribute the observed distribution’s relative mass to match that of the pseudo-population. Defining the unnormalized weight function as , the unnormalized weight of the th participant is . For a general pseudo-population (eg, FLEXOR pseudo-population introduced in the sequel), the unnormalized weights, even within a study-group combination, may depend on Inline graphic and through the tilting function. As discussed later, the unnormalized weights can be utilized to provide unconfounded inferences on various potential outcome features for a general pseudo-population.

The proposed pseudo-populations and balancing weights are general, encompassing many well-known weighting methods in single-study settings. For example, in single studies, assume equally prevalent pseudo-population groups ( Inline graphic ) in expression (2). A constant tilting function yields IPWs when and generalized IPWs (Imbens, 2000) when . On the other hand, produces overlap weights (Li et al., 2018) when , and generalized overlap weights (Li and Li, 2019) when . Again, if for a group , then the pseudo-population’s covariate density, Inline graphic , matches the observed covariate density of the group participants.

The choice of different tilting functions in Equation 2 naturally extends several weighting methods designed for single studies to meta-analytical settings. For example, assuming equally weighted studies and equally prevalent groups, that is, Inline graphic and , a constant tilting function and , respectively, produces extensions of the combined (Li et al., 2018) and generalized overlap (Li and Li, 2019) pseudo-populations appropriate for meta-analyzing multiple studies with multiple groups. We refer to these proposed pseudo-populations as the integrative combined (IC) and integrative generalized overlap (IGO) pseudo-populations, respectively. Similarly, for a fixed group Inline graphic , the tilting function gives a pseudo-population whose marginal covariate density equals the observed covariate density of group participants irrespective of their study memberships. Given the availability of different tilting functions, an important question arises: Which choice is optimal and in what sense? We address this below.

Effective sample size. A widely used measure of a pseudo-population’s inferential accuracy is the ESS, Inline graphic which relies on the second moment (provided it exists) of the balancing weights in the observed population (eg, McCaffrey et al., 2013). The ESS is asymptotically equivalent to the sample ESS, . Informally, the ESS is the hypothetical sample size from the pseudo-population containing the same information as Inline graphic samples from the observed population, and it is always less than unless the pseudo-population and observed population are identical.

An optimized case: FLEXOR pseudo-population. We propose FLEXOR as a member of pseudo-population family (2) that maximizes the ESS or minimizes the variation of the balancing weights, subject to any problem-dictated constraints on the vectors Inline graphic and . That is, if the triplet identifies the FLEXOR pseudo-population, and is known to belong to a subset, , of , then .

A 2-step procedure for constructing the FLEXOR pseudo-population. Starting with an initial Inline graphic , we iteratively perform the following steps until convergence:

Step I: For a fixed , maximize sample ESS over all tilting functions, . This gives the best fixed- pseudo-population identified by . The analytical form of for the theoretical ESS is given in Theorem 1. Set function .
Step II: For a fixed tilting function , maximize over all to obtain the best fixed- pseudo-population, identified by the triplet . This parametric maximization over can be quickly performed in R using the optim function or by Gauss-Seidel or Jacobi algorithms. Set .

In our experience, convergence is attained within only a few iterations. The converged pseudo-population with the largest ESS yields the FLEXOR pseudo-population. The following theorem gives the analytical expression for the global maximum of Inline graphic mentioned in Step I. See Web Appendix A.1 of Supplementary Materials for the proof.

Theorem 1

Suppose probability vectors and have strictly positive elements and are held fixed. Let be the set of tilting functions for which the ESS, , of pseudo-population (2) is finite. Maximizing over all tilting functions , the optimal fixed- pseudo-population’s tilting function, denoted by , has the expression:

(4)

The unnormalized weight function for the optimal fixed- pseudo-population is then

(5)

The optimal fixed- pseudo-population’s balancing weights, evaluated as in Equation 3, are uniformly bounded. The ESS of the optimal fixed- pseudo-population is with the expectation taken over , the observed population’s covariate density.

It can be shown that the optimal tilting function Inline graphic apportions low importance to outlying regions of covariate space where is approximately 0 for some . Furthermore, the optimal tilting function emphasizes covariate regions where the group propensities match the group proportions of the larger natural population. In particular, in pseudo-populations with equally prevalent groups, the tilting function promotes covariate regions where the group propensities are approximately equal.

3. META-ANALYSES OF GROUP POTENTIAL OUTCOMES

Causal meta-analyses generally follow a 2-stage inferential procedure (eg, Rubin, 2007). In Stage 1, the “outcome free” analysis only utilizes covariate information to estimate the PSs, as done in Section 2. In Stage 2, for the pseudo-population of interest, the procedure makes unconfounded comparisons of group potential outcomes via estimands such as pairwise difference of group means. For any known pseudo-population belonging to family (2), the procedure accommodates wide-ranging group-level features of the endpoints using the available multivariate outcome information. Additionally, we derive analytical expressions for the asymptotic variances of the proposed multivariate estimators.

Suppose potential outcome vectors Inline graphic have a common support, . To ensure that SUTVA, unconfoundedness, and positivity of the observed population also hold for the pseudo-population, we assume identical conditional distributions:

(6)

where Inline graphic and denote the observed and pseudo-population conditional densities, respectively. Unlike the observed population, the covariate-balanced pseudo-population entails , enabling us to construct weighted estimators of various features of the pseudo-population potential outcomes.

Let Inline graphic denote expectations with respect to the pseudo-population. Let be real-valued functions having domain . We wish to infer pseudo-population means of transformed potential outcomes, for . Appropriate choices of correspond to pseudo-population inferences about group-specific marginal means, medians, variances, and CDFs of potential outcome components. Equivalently, writing Inline graphic , the inferential focus is the vector, .

For real-valued functions Inline graphic with domain , we estimate . For example, if the first 2 components of are quantitative, then defining , , , and , we obtain as the pseudo-population covariance of and in the th group. The pseudo-population correlation of pairwise components of can be estimated from estimates of the covariance and standard deviations, as in the motivating breast cancer studies, where the goal is to estimate the pairwise correlations of the 8 targeted genes in groups Inline graphic (ie, IDC and ILC subtypes). For a second example, let be a fine grid of prespecified points in the support of the first component and . For , the pseudo-population CDF of evaluated at equals . Similarly, for where , the approximate pseudo-population median of is given by .

Using the unnormalized weights Inline graphic [defined underneath Equation 3] of a pseudo-population, we estimate as random vector

(7)

The following theorem and corollaries study asymptotic properties of random vector Inline graphic as an estimator of multivariate feature . Part 2(a) of the theorem considers a simpler situation where the MPS is known. As discussed in Mao et al. (2019) and Zeng et al. (2023), Part 2(b) considers a more realistic situation in which the MPS is estimated. The proofs are available in Web Appendix A.2 of Supplementary Materials.

Theorem 2

Let and , respectively, denote expectations with respect to the observed population and a pseudo-population of the form (2). Let observed probability be strictly positive for study . Suppose the conditional distributions of the potential outcomes are unconfounded, as described in Section 2, and satisfy assumption (6). Suppose the multi-study balancing weight (3) is such that is finite. For , let be a real-valued function with domain such that is finite. For group , interest focuses on the pseudo-population moment, , also denoted by vector . For estimator defined in Equation 7, as :

Consistency: .

Asymptotic normality: Consider the following situations:

Known MPS: Suppose multiple PS (1) is known. Let variance matrix
Then .

Estimated MPS: Suppose the MPS is estimated using multinomial logistic regression as outlined after definition (1). Let be the maximum likelihood estimate (MLE) of parameter that determines the unnormalized weights in estimator (7). We denote the variance matrix of Part 2(a) by to make explicit its dependence on . Define variance matrix , where is given in Web Appendix A.2 of Supplementary Materials. Then, .

Corollary 1

Suppose the MPS is estimated using multinomial logistic regression. Let be a real-valued differentiable function with domain . Let denote the gradient vector of length at . With , suppose gradient vector is non-zero at . With variance matrix defined as in Part 2(b) of Theorem 2, set . Then, is an asymptotically normal estimator of :

Remark. Theorem 2 and its corollaries summarize several noteworthy features of estimator (7), in that it (i) is applicable to the balancing weights of any pseudo-populations, including IC, IGO, and FLEXOR weights; (ii) generalizes plug-in sample moment estimators (Li and Li, 2019) to multiple groups and studies while accommodating mixed-type multivariate outcomes; (iii) exploits known or researcher-supplied information about the group proportions of the pseudo-population; as mentioned, the FLEXOR weights typically set Inline graphic equal to the known group prevalences of the larger population. By contrast, for most other weighting methods; and (iv) extends Mao et al. (2019) by quantifying the sampling errors in multiple group settings; matrix in Part 2(b) represents the adjustment due to MPS estimation, and in the event that parameter Inline graphic is known, this adjustment term vanishes and matrix of Part 2(b) equals of Part 2(a).

Group comparisons. Consider estimation of the pseudo-population moment Inline graphic using . Applying standard results (eg, Johnson et al., 2002, Chapter 5), we can construct approximate % confidence intervals simultaneously for all possible linear combinations of . In particular, for large , using an estimate, , of the th diagonal element of the variance matrix defined in Theorem 2, the interval Inline graphic contains with approximate probability simultaneously for all scalars . Various pseudo-population features can then be compared between the groups. Writing , we could estimate (eg, average difference between the th gene’s mRNA expression levels for IDC and ILC breast cancer patients) and, when Inline graphic , (eg, for the th gene, average difference between the mRNA expression levels for a reference group and the average of the other groups). We could also estimate ratios of means such as , ratios of mean differences such as , group-specific standard deviations, percentiles, ratios of medians, and ratios of coefficients of variation. Under mild conditions, these estimators are consistent and asymptotically normal, and their asymptotic variances are available by applying Corollary 1 and the delta method. If Inline graphic is small for some groups, such as rare or undersampled treatments, the asymptotic confidence intervals may not have proper coverage, and we could employ bootstrap methods to construct confidence intervals.

In single studies ( Inline graphic ), Hirano et al. (2003) and Zeng et al. (2023) have shown that treating IPWs as known counter-intuitively overestimates the variance of pairwise group mean comparisons. However, with multiple studies and arbitrary functions of group-specific features , this is not generally guaranteed because matrix Inline graphic of Theorem 2 may be neither positive nor negative definitive for a general pseudo-population (2).

4. SIMULATION STUDY

We used simulated datasets to evaluate different weighting strategies for inferring the population-level features of 2 subject groups and assessed the accuracy of the Section 3 asymptotic variance expression for the mean group differences. Mimicking the motivating TCGA breast cancer studies, we simulated Inline graphic independent datasets, each consisting of observational studies, groups, and (ie, univariate) outcomes for subjects whose covariate vectors were sampled with replacement from the TCGA breast cancer patients. We first took subjects in 2 simulation scenarios, labeled “high” and “low,” to represent the relative degrees of covariate similarity or balance among the Inline graphic study-group combinations; in other words, the low similarity scenario represented higher confounding levels. We then applied the Section 3 procedure to meta-analyze the 4 studies in each artificial dataset. Additionally, by increasing from 125 to 250, and then to 500 subjects, we compared the asymptotic and bootstrap-based variances of the group mean difference, Inline graphic , where .

As a common initial step to all 500 artificial datasets, we performed k-means clustering of the covariates, Inline graphic , of the TCGA datasets and detected lower-dimensional structure by aggregating them into clusters with centers and allocated number of covariates. Independently for the artificial datasets comprising patients each, we generated the data as follows:

Natural population: Generate cluster relative weights, , denoting the Dirichlet distribution on the unit simplex and representing the vector of ones. Let the number of patients in the large natural population be . For the natural population patients, sample their cluster memberships from the mixture distribution of integers: , where represents a point mass at . Thence, pick covariate uniformly from the TCGA covariates allocated previously to the th k-means cluster. Generate the “known” relative group proportions in the natural population: , for groups. Fix the association between group memberships and covariates: 1 if and if . Here, equals 1 and 0.1 in the high and low similarity scenarios, respectively, with chosen so that , averaged over the natural population, equals .
Covariates: For subject , sample covariate vector with replacement from the TCGA covariate vectors.
Study and group memberships: Study and group were generated as follows:
1. Multiple PS: Define the group-specific study propensities as follows:
  
  for and . We set similarity parameter equal to 0.5 and 0.05 in the high and low similarity scenario, respectively. Assuming the same group PS as the natural population, the MPS is available as . For patient , evaluate their probability vector .
2. Study-group memberships: For patient , generate from the categorical distribution with parameter .
Subject-specific observed outcomes: Generate , with chosen to achieve an approximate -squared of 0.9.

Subsequently, we disregarded knowledge of all simulation parameters and analyzed each artificial dataset using the proposed methods. As discussed in Section 3, during Stage 1 of the inferential procedure, we estimated the MPS of each dataset. We then evaluated the unnormalized balancing weights, Inline graphic , for the IC, IGO, and FLEXOR pseudo-populations. The computational costs of evaluating the FLEXOR weights were negligible.

Define percent ESS as the ESS for 100 participants. For Inline graphic subjects, Table 1 presents summaries of the percent ESS of the FLEXOR, IGO, and IC pseudo-populations in the low and high similarity scenarios. Unsurprisingly, all 3 pseudo-populations had substantially higher ESS in the less challenging high similarity scenario in which the covariates were almost balanced even before applying the weighting methods. In both scenarios, the IC and IGO pseudo-populations had similar ESS and a median ESS of approximately 32% (74%) in the low (high) simulation scenarios. The FLEXOR pseudo-population had substantially higher ESS in every dataset and scenario, and median ESS of 87.26% (95.19%%) in the low (high) scenarios corresponding to 436.3 and 475.95 subjects, respectively.

TABLE 1.

For the 500 simulated datasets, percentage ESS summaries for the 3 pseudo-populations in the low and high simulation scenarios with Inline graphic subjects.

	Low similarity			High similarity
	FLEXOR	IGO	IC	FLEXOR	IGO	IC
Minimum	78.37	20.52	20.61	85.81	30.80	30.62
First quartile	85.51	29.91	29.83	94.08	55.67	55.61
Median	87.26	32.07	31.92	95.19	73.73	73.70
Mean	87.20	31.97	31.87	95.03	70.21	69.90
Third quartile	89.02	34.46	34.50	96.17	86.09	85.62
Maximum	93.92	42.56	42.87	98.59	94.52	94.61

Open in a new tab

Abbreviations: ESS, effective sample size; FLEXOR, FLEXible, Optimized, and Realistic; IC, integrative combined; and IGO, integrative generalized overlap.

We applied the Section 3 strategy to make weighted inferences about functionals of the group-specific means Inline graphic and standard deviations of the th group’s potential outcomes. The sufficient conditions of Theorem 2 and its corollaries are satisfied by the potential outcome features and pseudo-populations. Since the estimands depend on the pseudo-population, we evaluated each estimator’s accuracy relative to the true value of its corresponding estimand computed using Monte Carlo methods.

For various estimands and both similarity scenarios, and averaging over the 500 artificial datasets comprising Inline graphic subjects each, Table 2 displays the absolute biases, variances, and coverages of nominally 95% confidence intervals for the FLEXOR, IGO, and IC pseudo-populations. For each artificial dataset and weighting method, the 3 performance measures were estimated using 500 independent bootstrap samples. For each potential outcome feature (row), and separately for the absolute bias and variance performance measures (3-column block), a pseudo-population (column) is marked in bold if it significantly outperforms the other competing pseudo-populations. In general, the IGO and IC weights had comparable performances for these data. The 3 methods had somewhat similar accuracies and reasonable coverages in the high similarity scenario, where the covariates were almost balanced across the study-group combinations. However, in the more realistic and challenging low similarity simulation scenario, the best results typically corresponded to the FLEXOR pseudo-population, which often substantially outperformed the other methods. Somewhat unexpectedly, this included the mean group difference Inline graphic , for which IGO weights are theoretically optimal under additional assumptions such as homoscedasticity (see Li and Li, 2019, for single studies); the simulation mechanism did not comply with these sufficient conditions. The results demonstrate the advantages of the FLEXOR strategy, which focuses on stabilizing the balancing weights rather than inferences about specific estimands.

TABLE 2.

In the 2 simulation scenarios with Inline graphic subjects each, averaging over 500 artificial datasets, the absolute biases, variances, and 95% confidence interval coverages of some potential outcome features for the FLEXOR, IGO, and IC pseudo-populations. For each artificial dataset and weighting method, the 3 performance measures were estimated using 500 independent bootstrap samples. The estimated standard errors are displayed in parentheses. For each feature (row), and separately for the absolute bias and variance measures (3-column block), a weighting strategy (column) is marked in bold if it significantly outperforms the other 2 strategies.

	Absolute bias × 10²			Standard deviation × 10			Coverage (%)
Estimand	FLEXOR	IGO	IC	FLEXOR	IGO	IC	FLEXOR	IGO	IC
Low similarity scenario
	2.9 (0.1)	4.1 (0.1)	3.8 (0.1)	2.9 (0.0)	3.6 (0.0)	3.4 (0.0)	97	93	95
	4.5 (0.1)	8.4 (0.2)	6.6 (0.1)	4.6 (0.1)	6.5 (0.1)	5.7 (0.1)	98	88	89
	2.6 (0.1)	3.0 (0.1)	3.1 (0.1)	2.6 (0.1)	2.6 (0.0)	2.8 (0.1)	95	90	90
	4.4 (0.1)	6.0 (0.1)	6.2 (0.1)	3.8 (0.1)	4.5 (0.1)	4.8 (0.1)	93	89	89
	4.6 (0.1)	7.9 (0.2)	7.4 (0.2)	4.4 (0.1)	6.1 (0.0)	6.1 (0.0)	96	89	90
High similarity scenario
	2.8 (0.1)	3.3 (0.1)	2.7 (0.1)	2.8 (0.0)	3.2 (0.0)	2.8 (0.0)	97	96	96
	4.4 (0.1)	5.7 (0.1)	4.2 (0.1)	4.5 (0.0)	5.5 (0.0)	4.6 (0.0)	97	95	97
	3.0 (0.1)	2.9 (0.1)	2.7 (0.1)	2.2 (0.0)	2.3 (0.0)	2.5 (0.0)	94	94	94
	6.1 (0.2)	5.9 (0.1)	5.2 (0.1)	3.9 (0.0)	4.1 (0.1)	4.5 (0.1)	93	94	94
	4.3 (0.1)	4.7 (0.1)	4.5 (0.1)	4.1 (0.0)	4.4 (0.0)	4.4 (0.0)	97	95	96

Open in a new tab

Abbreviations: FLEXOR, FLEXible, Optimized, and Realistic; IC, integrative combined; and IGO, integrative generalized overlap.

Finally, we compared the bootstrap-based and asymptotic variances of estimator (7) for unconfounded inferences about the mean group difference, Inline graphic . For an increasing number of subjects, that is, , 250, and 500, we generated 500 artificial datasets in the high and low similarity scenarios. For any dataset, the asymptotic variance of weighted estimator is available by applying Theorem 2 and the subsequently discussed group comparison strategies. This theoretical limiting value can be compared to the variance estimate based on Inline graphic bootstrap samples. Web Table 1 of Supplementary Materials compares these numbers for the simulation scenarios and sample sizes. We find that when the sample size is relatively small (ie, ), there is a substantial difference between the asymptotic and bootstrap-based variances. This difference indicates that a sufficiently large number of samples may be required for the asymptotic variance to be reliable. However, for Inline graphic subjects, the 2 variances match very well, giving us the confidence to use asymptotic variances in the TCGA data analysis with a comparable number of patients.

5. DATA ANALYSIS

To understand breast cancer oncogenesis, we analyzed the Inline graphic motivating TCGA studies using mRNA expression measurements on targeted genes and demographic and clinicopathological covariates for patients. The participants are partitioned into groups determined by cancer subtypes IDC and ILC, constituting approximately 80% and 10% of US breast cancer cases (Tran, 2022; Wright, 2022); the study-specific percentages in Web Table 2 of Supplementary Materials are significantly different.

The ESS of the IC weights was 25.7% or 115.7 patients. The IGO weights had a similar ESS of 26.4% or 118.7 patients. The FLEXOR population had a higher ESS of 40.9% or 183.9 patients, while also guaranteeing that the weight-adjusted composition of IDC and ILC patients in each TCGA study matched the composition of US breast cancer patients. Applying the Section 3 procedure, we estimated population-level functionals of the group potential outcomes for the FLEXOR, IC, and IGO pseudo-populations. For example, for the Inline graphic th biomarker, the group-specific mean and standard deviation were estimated by setting in Theorem 2 and in Corollary 1. Median was estimated by first estimating the CDF of potential outcome for a fine grid of points. Group comparison estimands like and were estimated by applying appropriately defined functionals to the estimates of Inline graphic , , , and . The estimate and 95% confidence interval based on bootstrap samples are displayed in Table 3 for each feature (row), pseudo-population (column), and genes COL9A3, CXCL12, IGF1, and ITGA11 (block). The results for the genes IVL, LEF1, IC, and SMR3B are displayed in Web Table 3 of Supplementary Materials. For each gene-estimand combination, a confidence interval for the IC or IGO pseudo-population is marked in bold whenever the FLEXOR pseudo-population’s confidence interval was narrower; we find that the FLEXOR pseudo-population often provided the most precise (narrowest) confidence intervals.

TABLE 3.

For 4 targeted genes, estimates and 95% bootstrap confidence levels (shown in parentheses) of different population-level estimands of the potential outcomes of group 1 (IDC cancer subtype, denoted by superscript 1) and group 2 (ILC cancer subtype, denoted by superscript 2) with FLEXOR, IC, and IGO weights. An IC or IGO confidence interval is highlighted in bold if it is wider than the FLEXOR confidence interval. All numbers are rounded to 2 decimal places. See Section 5 for further explanation.

Estimand	FLEXOR	IC	IGO
Gene COL9A3 ()
	(	()	(
	(	()	()
	1.03 (0.87, 1.26)	0.97 ()	0.93 ()
	0.68 (0.54, 0.86)	0.69 ()	0.69 ()
M	(	()	(
M	(	0.07 ()	()
	0.06 (	0.05 ()	0.09 ()
	1.52 (1.15, 2.09)	1.41 ()	1.35 ()
Gene CXCL12 ()
	(	()	0.02 ()
	0.59 (0.23, 0.88)	0.55 ()	0.58 ()
	0.91 (0.84, 1.16)	0.97 ()	0.94 ()
	0.80 (0.52, 1.10)	0.83 ()	0.82 ()
M	(	()	()
M	0.68 (0.44, 1.01)	0.69 ()	0.58 ()
	(	()	()
	1.14 (0.87, 1.99)	1.17 ()	1.14 ()
Gene IGF1 ()
	0.04 (	0.10 ()	0.13 ()
	0.82 (0.54, 1.09)	0.84 ()	0.82 ()
	0.81 (0.80, 1.12)	0.86 ()	0.87 ()
	0.76 (0.47, 0.95)	0.82 ()	0.76 ()
M	(	0.06 ()	0.10 ()
M	0.95 (0.60, 1.22)	0.95 ()	0.88 ()
	(	()	()
	1.06 (0.94, 2.03)	1.05 ()	1.14 ()
Gene ITGA11 ()
	0.01 (	0.03 ()	0.07 (
	0.01 (	()	0.07 ()
	0.92 (0.83, 1.10)	0.96 ()	0.94 ()
	0.81 (0.60, 1.03)	0.93 ()	0.98 ()
M	0.14 (	0.19 ()	0.20 ()
M	(	()	()
	0.01 (	0.05 ()	0.00 ()
	1.14 (0.89, 1.62)	1.03 ()	0.96 ()

Open in a new tab

Abbreviations: FLEXOR, FLEXible, Optimized, and Realistic; IC, integrative combined; IDC, infiltrating ductal carcinoma; ILC, infiltrating lobular carcinoma; IGO, integrative generalized overlap.

For FLEXOR, the confidence intervals for Inline graphic reveal that the mean potential outcomes were significantly different between the disease subtypes for genes CXCL12, IGF1, LEF1, PRB2, and SMR3B. Additionally, the standard deviation of the IDC and IDL potential outcomes for FLEXOR were substantially different for the genes COL9A3, PRB2, and IVL; the respective confidence intervals for Inline graphic excluded 1. If required, the group-specific medians could be compared by inferences on or .

Next, we estimated the correlation between the potential outcomes of the Inline graphic th and th biomarker in the th group: for and , we assumed an -variate function, , with component functions, , , and . For the th group, we estimated for a pseudo-population by applying Theorem 2. Setting , we then applied Corollary 1 to estimate pseudo-population covariance, . Using the estimated standard deviations Inline graphic and for the pseudo-population, as described above, we estimated the correlation. Independent estimates from bootstrap samples were used to compute 95% confidence intervals of the true correlation between the th and th gene pairs in the th group. Web Tables 4–6 of Supplementary Materials present 95% confidence intervals of the group-specific correlations for each gene pair and weighting method.

Table 4 lists the significantly correlated gene pairs for each disease subtype. For the FLEXOR pseudo-population and IDC disease subtype, gene CXCL12 was significantly co-expressed with IGF1, ITGA11, and LEF1; gene IGF1 was co-expressed with ITGA11 and LEF1; gene COL9A3 was co-expressed with LEF1 and PRB2; and gene LEF1 was co-expressed with IVL and ITGA11. For disease subtype ILC, only the CXCL12-IGF1 gene pair was significantly correlated according to FLEXOR. The differential correlation pattern for the FLEXOR pseudo-population was, therefore, the gene pairs (CXCL12, ITGA11), (IGF1, ITGA11), (COL9A3, LEF1), (CXCL12, LEF1), (IGF1, LEF1), (ITGA11, LEF1), (IVL, LEF1), and (COL9A3, PRB2). Detecting these variations in gene co-expression patterns between the IDC and ILC subtypes of breast cancer patients in the United States is crucial for informing precision medicine and targeted therapies (Schmidt et al., 2016).

TABLE 4.

Co-expressed gene pairs for each pseudo-population and breast cancer subtype.

Pseudo-population	Significantly correlated gene pairs
Infiltrating ductal carcinoma
FLEXOR	CXCL12-IGF1, CXCL12-ITGA11, IGF1-ITGA11,
	COL9A3-LEF1, CXCL12-LEF1, IGF1-LEF1,
	ITGA11-LEF1, IVL-LEF1, COL9A3-PRB2
IC	CXCL12-IGF1, CXCL12-ITGA11, IGF1-ITGA11,
	CXCL12-LEF1, IGF1-LEF1, COL9A3-PRB2
IGO	CXCL12-IGF1, CXCL12-ITGA11, IGF1-ITGA11,
	CXCL12-LEF1, IGF1-LEF1, COL9A3-PRB2
Infiltrating lobular carcinoma
FLEXOR	CXCL12-IGF1
IC	CXCL12-IGF1
IGO	CXCL12-IGF1

Open in a new tab

Abbreviations: FLEXOR, FLEXible, Optimized, and Realistic; IC, integrative combined; IGO, integrative generalized overlap.

By contrast, Table 4 shows that the differential correlation pattern of the IC pseudo-population comprised just 5 gene pairs, and was identical to the IGO pseudo-population’s pattern. Although these gene pairs were also detected by the FLEXOR pseudo-population, the latter detected additional co-expressed gene pairs. Figure 1 graphically summarizes the number of differentially correlated gene pairs discovered by the weighting methods and (biased) unadjusted analyses. Recent literature on breast cancer gene ontology substantiates the distinctive findings of FLEXOR. The genes IVL and LEF1 are highly expressed in basal and metaplastic human breast cancers, and the cell adhesion and extracellular matrix (ECM) receptor pathways, containing the genes ITGA11 and LEF1, are deregulated (Williams et al., 2022). The focal adhesion and cell cycle pathways, containing the genes COL9A3 and LEF1, are affected by WNT signaling gene set mutations caused by breast cancer metastases (Paul, 2020).

Venn diagram of the differential correlation pattern of the targeted gene pairs for the 3 weighting methods and unweighted analysis.

6. CONCLUSION

In multiple retrospective cohorts, the integrative analysis of mixed-type multivariate outcomes to accurately compare multiple groups is a challenging problem. We formulate new frameworks for covariate-balanced pseudo-populations that extend existing weighting methods to meta-analytical investigations and design a novel, estimand-agnostic FLEXOR pseudo-population that maximizes the ESS by a cost-effective iterative procedure. We propose generally applicable weighted estimators for a wide variety of population-level univariate or multivariate features relevant to multigroup comparisons, for example, correlation coefficients and contrasts and ratios of means, medians, and standard deviations.

The methodology has a wide range of meta-analytical applications, including multi-arm randomized controlled clinical trials (RCTs). A component of the multi-study balancing weights is considerably simplified if the Inline graphic th study is an RCT, in which case the study-specific group MPS equals . In general, the theoretical results hold for a mix of observational studies and RCTs, although the study MPS must still be estimated because the subject-study allocations are usually non-random for multiple studies. The methodology may be generalized in several other directions, such as increased efficiency by adding an outcome modeling component (Mao et al., 2019; Zeng et al., 2023); transportability (Westreich et al., 2017) and data-fusion (Bareinboim and Pearl, 2016; Dahabreh et al., 2020; 2023) problems, which incorporate additional information in the form of random samples from the natural population; and flexible machine learning for MPS estimation that achieves Inline graphic inference (Chernozhukov et al., 2018). In recent years, weighting approaches are also challenged and rendered ineffectual by high-dimensional genetic or genomic measurements. Our future research will explore these avenues.

Supplementary Material

ujae070_Supplemental_Files

Web Appendices, Tables, Figures, and code referenced in Sections 1-5 are available with this paper at the Biometrics website on Oxford Academic.

ujae070_supplemental_files.zip^{(158.1KB, zip)}

Acknowledgement

We thank the Editor, Associate Editor, and two referees for many insightful comments that improved the content and presentation of the paper.

Contributor Information

Subharup Guha, Department of Biostatistics, University of Florida, Gainesville, FL 32603, United States.

Yi Li, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, United States.

FUNDING

This work was supported by the National Science Foundation and National Institutes of Health under award DMS-1854003 to SG, award CA249096 to YL, and awards CA269398 and CA209414 to SG and YL.

CONFLICT OF INTEREST

None declared.

DATA AVAILABILITY

The data that support the findings in this paper are openly available in The Cancer Genome Atlas (TCGA) portal at https://www.cancer.gov/tcga.

References

Austin P. C. (2010). The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies. Statistics in Medicine, 29, 2137–2148. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bareinboim E., Pearl J. (2016). Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences, 113, 7345–7352. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chernozhukov V., Chetverikov D., Demirer M., Duflo E., Hansen C., Newey W. et al. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21, C1–C68. [Google Scholar]
Christopoulos P. F., Msaouel P., Koutsilieris M. (2015). The role of the insulin-like growth factor-1 system in breast cancer. Molecular Cancer, 14, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
Crump R. K., Hotz V. J., Imbens G. W., Mitnik O. A. (2006). Moving the goalposts: addressing limited overlap in the estimation of average treatment effects by changing the estimand. Technical Report. National Bureau of Economic Research. [Google Scholar]
Dahabreh I. J., Petito L. C., Robertson S. E., Hernán M. A., Steingrimsson J. A. (2020). Toward causally interpretable meta-analysis: transporting inferences from multiple randomized trials to a new target population. Epidemiology, 31, 334–344. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dahabreh I. J., Robertson S. E., Petito L. C., Hernán M. A., Steingrimsson J. A. (2023). Efficient and robust methods for causally interpretable meta-analysis: transporting inferences from multiple randomized trials to a target population. Biometrics, 79, 1057–1072. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hirano K., Imbens G. W., Ridder G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71, 1161–1189. [Google Scholar]
Imbens G. W. (2000). The role of the propensity score in estimating dose-response functions. Biometrika, 87, 706–710. [Google Scholar]
Johnson R. A., Wichern D. W. et al. (2002). Applied Multivariate Statistical Analysis. Upper Saddle River, NJ: Prentice Hall. [Google Scholar]
Kumar B., Chand V., Ram A., Usmani D., Muhammad N. (2020). Oncogenic mutations in tumorigenesis and targeted therapy in breast cancer. Current Molecular Biology Reports, 6, 116–125. [Google Scholar]
Li F., Li F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13, 2389–2415. [Google Scholar]
Li F., Morgan K. L., Zaslavsky A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113, 390–400. [Google Scholar]
Li L., Greene T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9, 215–234. [DOI] [PubMed] [Google Scholar]
Lunceford J. K., Davidian M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine, 23, 2937–2960. [DOI] [PubMed] [Google Scholar]
McCaffrey D. F., Griffin B. A., Almirall D., Slaughter M. E., Ramchand R., Burgette L. F. (2013). A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Statistics in Medicine, 32, 3388–3414. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mao H., Li L., Greene T. (2019). Propensity score weighting analysis and treatment effect discovery. Statistical Methods in Medical Research, 28, 2439–2454. [DOI] [PubMed] [Google Scholar]
NCI . (2022). Genomic data commons data portal. https://portal.gdc.cancer.gov/ [Accessed 18 February 2021].
Paul M. R. (2020). The Genomic Evolution of Breast Cancer Metastasis. PhD thesis. University of Pennsylvania. [Google Scholar]
Robins J. M., Hernan M. A., Brumback B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550–560. [DOI] [PubMed] [Google Scholar]
Robins J. M., Rotnitzky A. (1995). Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association, 90, 122–129. [Google Scholar]
Rosenbaum P. R., Rubin D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55. [Google Scholar]
Rubin D. B. (2007). The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials. Statistics in Medicine, 26, 20–36. [DOI] [PubMed] [Google Scholar]
Schmidt K. T., Chau C. H., Price D. K., Figg W. D. (2016). Precision oncology medicine: the clinical relevance of patient-specific biomarkers used to optimize cancer treatment. The Journal of Clinical Pharmacology, 56, 1484–1499. [DOI] [PMC free article] [PubMed] [Google Scholar]
Surveillance Research Program and NCI . (2023). SEER*Explorer: an interactive website for SEER cancer statistics [Internet]. https://seer.cancer.gov/statistics-network/explorer/ [Accessed 18 February 2021].
Tran H.-T. (2022). Invasive lobular carcinoma. https://www.hopkinsmedicine.org/health/conditions-and-diseases/breast-cancer/invasive-lobular-carcinoma [Accessed 13 September 2022].
Wang C., Rosner G. L. (2019). A Bayesian nonparametric causal inference model for synthesizing randomized clinical trial and real-world evidence. Statistics in Medicine, 38, 2573–2588. [DOI] [PubMed] [Google Scholar]
Westreich D., Edwards J. K., Lesko C. R., Stuart E., Cole S. R. (2017). Transportability of trial results using inverse odds of sampling weights. American Journal of Epidemiology, 186, 1010–1014. [DOI] [PMC free article] [PubMed] [Google Scholar]
Williams R., Jobling S., Sims A. H., Mou C., Wilkinson L., Collu G. M., et al. (2022). Elevated edar signalling promotes mammary gland tumourigenesis with squamous metaplasia. Oncogene, 41, 1040–1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wright P. (2022). Invasive ductal carcinoma. https://www.hopkinsmedicine.org/health/conditions-and-diseases/breast-cancer/invasive-ductal-carcinoma-idc [Accessed 13 September 2022].
Zeng S., Li F., Hu L., Li F. (2023). Propensity score weighting analysis of survival outcomes using pseudo-observations. Statistica Sinica, 33, 2161–2184. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ujae070_Supplemental_Files

Web Appendices, Tables, Figures, and code referenced in Sections 1-5 are available with this paper at the Biometrics website on Oxford Academic.

ujae070_supplemental_files.zip^{(158.1KB, zip)}

Data Availability Statement

The data that support the findings in this paper are openly available in The Cancer Genome Atlas (TCGA) portal at https://www.cancer.gov/tcga.

[bib1] Austin P. C. (2010). The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies. Statistics in Medicine, 29, 2137–2148. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Bareinboim E., Pearl J. (2016). Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences, 113, 7345–7352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Chernozhukov V., Chetverikov D., Demirer M., Duflo E., Hansen C., Newey W. et al. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21, C1–C68. [Google Scholar]

[bib4] Christopoulos P. F., Msaouel P., Koutsilieris M. (2015). The role of the insulin-like growth factor-1 system in breast cancer. Molecular Cancer, 14, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Crump R. K., Hotz V. J., Imbens G. W., Mitnik O. A. (2006). Moving the goalposts: addressing limited overlap in the estimation of average treatment effects by changing the estimand. Technical Report. National Bureau of Economic Research. [Google Scholar]

[bib6] Dahabreh I. J., Petito L. C., Robertson S. E., Hernán M. A., Steingrimsson J. A. (2020). Toward causally interpretable meta-analysis: transporting inferences from multiple randomized trials to a new target population. Epidemiology, 31, 334–344. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Dahabreh I. J., Robertson S. E., Petito L. C., Hernán M. A., Steingrimsson J. A. (2023). Efficient and robust methods for causally interpretable meta-analysis: transporting inferences from multiple randomized trials to a target population. Biometrics, 79, 1057–1072. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Hirano K., Imbens G. W., Ridder G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71, 1161–1189. [Google Scholar]

[bib9] Imbens G. W. (2000). The role of the propensity score in estimating dose-response functions. Biometrika, 87, 706–710. [Google Scholar]

[bib10] Johnson R. A., Wichern D. W. et al. (2002). Applied Multivariate Statistical Analysis. Upper Saddle River, NJ: Prentice Hall. [Google Scholar]

[bib11] Kumar B., Chand V., Ram A., Usmani D., Muhammad N. (2020). Oncogenic mutations in tumorigenesis and targeted therapy in breast cancer. Current Molecular Biology Reports, 6, 116–125. [Google Scholar]

[bib12] Li F., Li F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13, 2389–2415. [Google Scholar]

[bib13] Li F., Morgan K. L., Zaslavsky A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113, 390–400. [Google Scholar]

[bib14] Li L., Greene T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9, 215–234. [DOI] [PubMed] [Google Scholar]

[bib15] Lunceford J. K., Davidian M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine, 23, 2937–2960. [DOI] [PubMed] [Google Scholar]

[bib17] McCaffrey D. F., Griffin B. A., Almirall D., Slaughter M. E., Ramchand R., Burgette L. F. (2013). A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Statistics in Medicine, 32, 3388–3414. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Mao H., Li L., Greene T. (2019). Propensity score weighting analysis and treatment effect discovery. Statistical Methods in Medical Research, 28, 2439–2454. [DOI] [PubMed] [Google Scholar]

[bib18] NCI . (2022). Genomic data commons data portal. https://portal.gdc.cancer.gov/ [Accessed 18 February 2021].

[bib19] Paul M. R. (2020). The Genomic Evolution of Breast Cancer Metastasis. PhD thesis. University of Pennsylvania. [Google Scholar]

[bib20] Robins J. M., Hernan M. A., Brumback B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550–560. [DOI] [PubMed] [Google Scholar]

[bib21] Robins J. M., Rotnitzky A. (1995). Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association, 90, 122–129. [Google Scholar]

[bib22] Rosenbaum P. R., Rubin D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55. [Google Scholar]

[bib23] Rubin D. B. (2007). The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials. Statistics in Medicine, 26, 20–36. [DOI] [PubMed] [Google Scholar]

[bib24] Schmidt K. T., Chau C. H., Price D. K., Figg W. D. (2016). Precision oncology medicine: the clinical relevance of patient-specific biomarkers used to optimize cancer treatment. The Journal of Clinical Pharmacology, 56, 1484–1499. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Surveillance Research Program and NCI . (2023). SEER*Explorer: an interactive website for SEER cancer statistics [Internet]. https://seer.cancer.gov/statistics-network/explorer/ [Accessed 18 February 2021].

[bib26] Tran H.-T. (2022). Invasive lobular carcinoma. https://www.hopkinsmedicine.org/health/conditions-and-diseases/breast-cancer/invasive-lobular-carcinoma [Accessed 13 September 2022].

[bib27] Wang C., Rosner G. L. (2019). A Bayesian nonparametric causal inference model for synthesizing randomized clinical trial and real-world evidence. Statistics in Medicine, 38, 2573–2588. [DOI] [PubMed] [Google Scholar]

[bib28] Westreich D., Edwards J. K., Lesko C. R., Stuart E., Cole S. R. (2017). Transportability of trial results using inverse odds of sampling weights. American Journal of Epidemiology, 186, 1010–1014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Williams R., Jobling S., Sims A. H., Mou C., Wilkinson L., Collu G. M., et al. (2022). Elevated edar signalling promotes mammary gland tumourigenesis with squamous metaplasia. Oncogene, 41, 1040–1049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Wright P. (2022). Invasive ductal carcinoma. https://www.hopkinsmedicine.org/health/conditions-and-diseases/breast-cancer/invasive-ductal-carcinoma-idc [Accessed 13 September 2022].

[bib31] Zeng S., Li F., Hu L., Li F. (2023). Propensity score weighting analysis of survival outcomes using pseudo-observations. Statistica Sinica, 33, 2161–2184. [Google Scholar]

PERMALINK

Causal meta-analysis by integrating multiple observational studies with multivariate outcomes

Subharup Guha

Yi Li

ABSTRACT

1. INTRODUCTION

2. INTEGRATION OF OBSERVATIONAL STUDIES WITH MULTIPLE UNBALANCED GROUPS

2.1. Notation and basic assumptions

2.2. A new family of pseudo-populations

Theorem 1

3. META-ANALYSES OF GROUP POTENTIAL OUTCOMES

Theorem 2

Corollary 1

4. SIMULATION STUDY

TABLE 1.

TABLE 2.

5. DATA ANALYSIS

TABLE 3.

TABLE 4.

FIGURE 1.

6. CONCLUSION

Supplementary Material

Acknowledgement

Contributor Information

FUNDING

CONFLICT OF INTEREST

DATA AVAILABILITY

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Causal meta-analysis by integrating multiple observational studies with multivariate outcomes

Subharup Guha

Yi Li

ABSTRACT

1. INTRODUCTION

2. INTEGRATION OF OBSERVATIONAL STUDIES WITH MULTIPLE UNBALANCED GROUPS

2.1. Notation and basic assumptions

2.2. A new family of pseudo-populations

Theorem 1

3. META-ANALYSES OF GROUP POTENTIAL OUTCOMES

Theorem 2

Corollary 1

4. SIMULATION STUDY

TABLE 1.

TABLE 2.

5. DATA ANALYSIS

TABLE 3.

TABLE 4.

FIGURE 1.

6. CONCLUSION

Supplementary Material

Acknowledgement

Contributor Information

FUNDING

CONFLICT OF INTEREST

DATA AVAILABILITY

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases