Skip to main content
American Journal of Epidemiology logoLink to American Journal of Epidemiology
. 2018 Dec 7;188(4):709–723. doi: 10.1093/aje/kwy265

Validity of Privacy-Protecting Analytical Methods That Use Only Aggregate-Level Information to Conduct Multivariable-Adjusted Analysis in Distributed Data Networks

Xiaojuan Li 1,, Bruce H Fireman 2, Jeffrey R Curtis 3, David E Arterburn 4, David P Fisher 5, Érick Moyneur 6, Mia Gallagher 1, Marsha A Raebel 7, W Benjamin Nowell 8, Lindsay Lagreid 9, Sengwee Toh 1
PMCID: PMC6438804  NIHMSID: NIHMS1002303  PMID: 30535131

Abstract

Distributed data networks enable large-scale epidemiologic studies, but protecting privacy while adequately adjusting for a large number of covariates continues to pose methodological challenges. Using 2 empirical examples within a 3-site distributed data network, we tested combinations of 3 aggregate-level data-sharing approaches (risk-set, summary-table, and effect-estimate), 4 confounding adjustment methods (matching, stratification, inverse probability weighting, and matching weighting), and 2 summary scores (propensity score and disease risk score) for binary and time-to-event outcomes. We assessed the performance of combinations of these data-sharing and adjustment methods by comparing their results with results from the corresponding pooled individual-level data analysis (reference analysis). For both types of outcomes, the method combinations examined yielded results identical or comparable to the reference results in most scenarios. Within each data-sharing approach, comparability between aggregate- and individual-level data analysis depended on adjustment method; for example, risk-set data-sharing with matched or stratified analysis of summary scores produced identical results, while weighted analysis showed some discrepancies. Across the adjustment methods examined, risk-set data-sharing generally performed better, while summary-table and effect-estimate data-sharing more often produced discrepancies in settings with rare outcomes and small sample sizes. Valid multivariable-adjusted analysis can be performed in distributed data networks without sharing of individual-level data.

Keywords: confounding control, data-sharing, disease risk score, distributed data networks, meta-analysis, multicenter studies, privacy protection, propensity score


Multicenter distributed data networks support rapid evidence generation in large and diverse populations, assessment of treatment effect heterogeneity, and evaluation of rare exposures or outcomes (13). Existing large-scale networks include the Sentinel System (4, 5), the Health Care Systems Research Collaboratory (6), and the National Patient-Centered Clinical Research Network (7). However, efficient and privacy-protecting data-sharing remains a challenge in distributed data network studies. To maximize analytical validity, researchers have traditionally requested detailed individual-level data to control for confounding and other biases. However, sharing detailed data about patients raises concerns about privacy. Even when participating organizations are open to sharing individual-level data, the required legal and contractual agreements and ethical reviews are often labor-intensive and time-consuming, making a study less efficient or even unachievable.

Privacy-protecting analytical methods can help address this challenge (811). The theoretical properties of these methods have been previously explored (12). Using only aggregate-level information, these methods can produce results consistent with those from the pooled individual-level data analysis, but evidence supporting their validity in epidemiologic research is limited. Prior empirical examinations showed that propensity score (PS)-stratified analysis of risk-set data and meta-analysis of site-specific effect-estimate data can achieve similar levels of statistical sophistication as their corresponding pooled individual-level analyses (13, 14), but simulation studies also suggested that these methods could produce different results with sparse data (14). Using 2 empirical examples from a distributed data network, we assessed the performance of different combinations of data-sharing approaches and confounding adjustment methods across a range of scenarios that researchers could encounter in real-world studies.

METHODS

This study focused on the statistical performance of various combinations of data-sharing approaches and confounding adjustment methods for binary and time-to-event outcomes as evaluated by the concordance between their results and those derived from the corresponding pooled individual-level data analyses, which served as the reference analyses in our assessment (Table 1). The 2 empirical examples were comparative effectiveness and safety research topics on obesity and rheumatoid arthritis. The clinical contexts of these examples have been explored elsewhere (1519). Both examples drew data from 3 integrated health-care delivery systems, organized as a 3-site distributed data network: Kaiser Permanente Colorado (Denver, Colorado), Kaiser Permanente Northern California (Oakland, California), and Kaiser Permanente Washington (Seattle, Washington). These systems have previously transformed their electronic health data into research-ready data sets with a common data structure (20). The Institutional Review Board at Harvard Pilgrim Health Care approved this study; the 3 participating delivery systems ceded their institutional review board oversight to Harvard Pilgrim Health Care.

Table 1.

Combinations of Confounder Summary Scores, Confounding Adjustment Methods, Data-Sharing Approaches, and Outcome Types Evaluated in a Study of Privacy-Protecting Analytical Methods

Confounding Adjustment Method and Data-Sharing Approach Statistical Analysis Performed at the Analysis Center
Binary Outcomea Time-to-Event Outcomeb
Propensity Score
Stratification
 Pooled individual-level PS- and site-stratified (reference analysisc) PS- and site-stratified (reference analysis)
 Risk-set PS- and site-stratified Case-centered logistic regressionc
 Summary-tabled PS- and site-stratified PS- and site-stratified CPR
 Effect-estimate IVW meta-analysis IVW meta-analysis
Matching
 Pooled individual-level PS-matched, site-stratified (reference analysis) PS-matched, site-stratified (reference analysis)
 Risk-set PS-matched, site-stratified Case-centered logistic regression
 Summary-table PS-matched, site-stratified PS-matched, site-stratified CPR
 Effect-estimate IVW meta-analysis IVW meta-analysis
Inverse probability weighting
 Pooled individual-level IPW, site-stratified (reference analysis) IPW, site-stratified (reference analysis)
 Risk-set IPW, site-stratified IPW, site-stratified
 Summary-table Not established Not established
 Effect-estimate IVW meta-analysis IVW meta-analysis
Matching weighting
 Pooled individual-level Matching-weighted, site-stratified (reference analysis) Matching-weighted, site-stratified (reference analysis)
 Risk-set Matching-weighted, site-stratified Matching-weighted, site-stratified
 Summary-table Not established Not established
 Effect-estimate IVW meta-analysis IVW meta-analysis
Disease Risk Score
Stratification
 Pooled individual-level DRS- and site-stratified (reference analysis) DRS- and site-stratified (reference analysis)
 Risk-set DRS- and site-stratified Case-centered logistic regression
 Summary-table DRS- and site-stratified DRS- and site-stratified CPR
 Effect-estimate IVW meta-analysis IVW meta-analysis
Matching
 Pooled individual-level DRS-matched, site-stratified (reference analysis) DRS-matched, site-stratified (reference analysis)
 Risk-set DRS-matched, site-stratified Case-centered logistic regression
 Summary-table DRS-matched, site-stratified DRS-matched, site-stratified CPR
 Effect-estimate IVW meta-analysis IVW meta-analysis
Inverse probability weighting
 Pooled individual-level Not established Not established
 Risk-set Not established Not established
 Summary-table Not established Not established
 Effect-estimate Not established Not established
Matching weighting
 Pooled individual-level Not established Not established
 Risk-set Not established Not established
 Summary-table Not established Not established
 Effect-estimate Not established Not established

Abbreviations: CPR, conditional Poisson regression; DRS, disease risk score; IPW, inverse-probability–weighted; IVW, inverse-variance–weighted; PS, propensity score.

a Unless otherwise specified, logistic regression was used to obtain estimates of odds ratios and their 95% confidence intervals for binary outcomes.

b Unless otherwise specified, Cox proportional hazards regression was used to obtain estimates of hazard ratios and their 95% confidence intervals for time-to-event outcomes.

c Case-centered logistic regression is a logistic regression model with the proportion of exposed outcome events among all events used as the dependent variable and the log odds of having the study exposure in the risk set used as the independent variable, specified as an offset (9). Each risk set, anchored by a unique outcome event time, comprises patients who experienced the outcome and patients who were still at risk of developing the outcome at that time point. When combined with confounder summary scores, the risk set is created within a matched cohort or stratum defined by the confounder summary score within a site. In this particular analysis, each risk set comprised the patient or patients who developed the outcome plus all other at-risk patients belonging to the same PS stratum at the time of the event within each site.

d In situations where the regression-based analysis was not feasible for the summary-table data-sharing approach, we used the Mantel-Haenszel method to compute a weighted estimate for the desired effect estimate.

Empirical examples

Example 1

The first example assessed the comparative effectiveness and safety of adjustable gastric banding and Roux-en-Y gastric bypass. We identified a retrospective cohort of patients aged ≥18 years who underwent one of these procedures between January 1, 2005, and September 30, 2015. Eligible patients had continuous health plan enrollment with medical and pharmacy benefits, at least 1 recorded body mass index (weight (kg)/height (m)2) measurement greater than or equal to 35, and no exposure to any major gastrointestinal procedures during the 365-day period preceding the initial bariatric procedure.

The effectiveness outcomes of interest were achievement of clinically meaningful changes in body mass index (e.g., ≥10%) from baseline within the first postprocedure year. The safety outcomes included reintervention and all-cause hospitalization within the first postprocedure year (15, 21). We analyzed both effectiveness and safety outcomes as binary and time-to-event outcomes. We defined binary safety and effectiveness outcomes as occurrence of the outcomes of interest closest to the end of the first postprocedure year and time-to-event outcomes as time to the first occurrence of outcomes of interest within the same follow-up period. Follow-up began on the day after the discharge date of the index procedure hospitalization and ended at the earliest occurrence of an outcome event, 365 days of follow-up, death, end of health plan enrollment, or September 30, 2015. We identified potential confounders (see Web Table 1, available at https://academic.oup.com/aje) during the 365-day period preceding the index procedure on the basis of subject-matter knowledge and prior studies (15, 16, 21).

Example 2

The second example compared the effectiveness and safety of the use of tumor necrosis factor α inhibitor (TNFi) biologicals (adalimumab, certolizumab pegol, etanercept, golimumab, or infliximab) and non-TNFi biologicals (abatacept, rituximab, or tocilizumab) for rheumatoid arthritis. We identified a retrospective cohort of patients aged ≥18 years with rheumatoid arthritis who had a first dispensing of a study medication between January 1, 2001, and September 30, 2015. Eligible patients had continuous health plan enrollment with medical and pharmacy benefits and no exposure to any study medications during the 365-day period preceding initial dispensing. We excluded patients who had an outcome event of interest, cancer (excluding nonmelanoma skin cancer), human immunodeficiency virus infection or acquired immune deficiency syndrome, or organ transplantation during the 365-day baseline period.

The effectiveness outcome was an adapted version of a validated claims-based clinical effectiveness measure operationalized for use with health plan data (22). The safety outcomes included bacterial infections requiring hospitalization and hypersensitivity reactions, identified using previously validated algorithms (19, 23), during the year following the index dispensing. We analyzed both effectiveness and safety outcomes as binary and time-to-event outcomes. We defined binary outcomes as occurrence of the outcomes of interest closet to the end of the first year following the index dispensing and time-to-event outcomes as time to the first occurrence of the outcomes of interest within the same follow-up period, except for the time-to-event effectiveness outcome, which was defined as time to the first occurrence of switching to another biological antirheumatic medication to which the patient had no prior exposure (a component of the validated claims-based clinical effectiveness measure). Follow-up began on the date of index dispensing and ended on the earliest occurrence of an outcome event of interest, 365 days of follow-up, death, end of health plan enrollment, cessation of initial biological treatment, initiation of another biological treatment, or September 30, 2015. We identified prespecified potential confounders during the 365-day baseline period preceding the index dispensing (Web Table 1).

Data-sharing approaches examined

We tested 3 aggregate-level data-sharing approaches that require varying levels of information to be shared by data-contributing sites. The appendix of Mazor et al. (24) and an introductory video (25), both freely available, provide examples of analytical data sets typically shared by a site using these approaches. We used pooled individual-level data from the 3 sites in the reference analysis.

Risk-set data-sharing

The risk-set data-sharing approach aggregated individual-level data into a data set that included 1 record per risk set, with each risk set anchored by a unique outcome event time. A risk set comprised patients who experienced the outcome and patients who were still at risk of developing the outcome at that time point. Each record of the shared risk-set data included the unique event time, number of exposed events, number of unexposed events, size of the exposed risk set, and size of the unexposed risk set. With different confounding adjustment methods, as discussed below, the base for at-risk patients varied. For example, when confounding was adjusted through PS matching, the risk set included all at-risk patients in the PS-matched cohort within the same site.

Summary-table data-sharing

The summary-table data-sharing approach further reduced the data into an aggregated data set that resembled 2-by-2 summary tables. Depending on the outcome type, this aggregated data set contained the total number of persons (for binary outcomes) or total person-time (for time-to-event outcomes), as well as the number of outcome events in each exposure group. As with risk-set data-sharing, the number of 2-by-2 summary tables depended on the confounding adjustment method. For example, when confounding was adjusted through PS matching, only a single 2-by-2 summary table was necessary for the PS-matched cohort within each site.

Effect-estimate data-sharing

The effect-estimate data-sharing approach shared the least amount of data—an aggregated data set that contained only the site-specific effect estimate and the corresponding variance, obtained by analyzing the individual-level data within each site using the same confounding adjustment method as that used for the corresponding reference analysis. For example, when PS matching was used for confounding adjustment, the site-specific effect estimates were obtained by analyzing individual-level data at each site using PS matching.

Confounder summary scores examined

To adjust for the large number of prespecified confounders, we used 2 confounder summary scores—the PS and the disease risk score (DRS)—to condense the information contained in individual confounders into a single variable. PS are the probabilities of having the study exposure given patients’ baseline characteristics (26), while DRS are patients’ probabilities or hazards of having the study outcome conditional on their baseline characteristics (27).

Confounding adjustment methods examined

We performed within-site confounding adjustment by incorporating the 2 confounder summary scores into the analysis via matching, stratification, or weighting (except for DRS, for which weighted analysis has not been established for single-database or multidatabase settings). We evaluated 2 types of PS weights—inverse probability of treatment weights (28) and matching weights (29, 30). When estimated correctly, these summary scores provide results comparable to those from individual covariate adjustment (27, 31).

Statistical analysis

Analysis of individual-level data (reference analysis)

We analyzed the pooled individual-level data across 3 sites and used the results as the reference to evaluate the performance of other approaches that analyzed aggregate-level data sets. We used site-stratified logistic regression to obtain odds ratios and 95% confidence intervals for binary outcomes and site-stratified Cox proportional hazards regression to estimate hazard ratios and 95% confidence intervals for time-to-event outcomes.

Analysis of risk-set data

For time-to-event outcomes, we analyzed the risk-set data by fitting a logistic regression model with the proportion of exposed outcome events among all events as the dependent variable and the log odds of having the study exposure in the risk set as the independent variable (specified as an offset). This approach has been shown to be mathematically equivalent to a stratified Cox regression with individual-level data (9). For binary outcomes, we used logistic regression with count data.

Analysis of summary-table data

For binary outcomes, we fitted a site-stratified logistic regression model for grouped data, with the number of outcomes/total number of persons as the dependent variable and the exposure variable as the independent variable. For time-to-event outcomes, we fitted a site-stratified conditional Poisson regression model with the natural log of person-time as the offset. When confounding adjustment was done through stratification, we also included the quintile indicator of the confounder summary score as another stratification variable. In situations where the regression-based analysis was not feasible, we used the Mantel-Haenszel method (32) to compute a weighted estimate for the desired effect measure across strata. A method for using weighted analysis to analyze summary-table data has not yet been established.

Analysis of effect-estimate data

With the site-specific effect-estimate data, we performed an inverse-variance–weighted meta-analysis using DerSimonian and Laird’s (33) fixed-effect and random-effects models to obtain the overall effect estimate and 95% confidence interval.

Assessment of treatment effect heterogeneity across sites

The goal of the study was to assess the performance of various combinations of data-sharing and analytical methods when the decision to pool data across sites had been made. However, we used Cochran’s Q test to examine treatment effect heterogeneity across sites for illustrative purposes (34).

Assessment of statistical performance

To assess the statistical performance of different combinations of data-sharing approaches and confounding adjustment methods, we compared their results with those derived from their corresponding pooled individual-level data analyses. We did not compare the results across methods (e.g., PS matching vs. PS stratification) because they estimated different treatment effects in different target populations.

RESULTS

Example 1: comparative effectiveness and safety of bariatric procedures

We identified 584 eligible adjustable gastric banding patients and 8,777 eligible Roux-en-Y gastric bypass patients. Their baseline characteristics are shown in Web Table 2.

PS-based analyses.

Binary outcomes

All aggregate-level data-sharing approaches generated results similar to those of their reference analyses for all confounding adjustment methods examined (Table 2). In fact, the results from risk-set and summary-table data-sharing were identical to those from the reference analyses. Both fixed-effect and random-effects meta-analyses of effect-estimate data produced comparable results for effectiveness outcomes, with the random-effects model showing slightly more variation. For safety outcomes, the 2 meta-analyses of effect-estimate data produced somewhat different results, with greater discrepancy being observed in inverse-probability–weighted analyses.

Table 2.

Results for Binary Outcomes From Propensity-Score–Adjusted Analyses Using Different Combinations of Confounding Adjustment Methods and Data-Sharing Approaches to Compare Adjusted Gastric Banding With Roux-en-Y Gastric Bypass (Empirical Example 1)a

Confounding Adjustment Method and Data-Sharing Approach Effectiveness Outcome (Change in BMIb)c Safety Outcomed
<5% ≥5% ≥10% ≥20% ≥30% Rehospitalization Reintervention
OR 95% CI OR 95% CI OR 95% CI OR 95% CI OR 95% CI OR 95% CI OR 95% CI
Stratification
 Pooled individual-level 3.74 3.08, 4.54 0.13 0.08, 0.22 0.12 0.10, 0.16 0.09 0.07, 0.11 0.08 0.05, 0.11 0.85 0.62, 1.15 0.82 0.59, 1.11
 Risk-set 3.74 3.08, 4.54 0.13 0.08, 0.22 0.12 0.10, 0.16 0.09 0.07, 0.11 0.08 0.05, 0.11 0.85 0.62, 1.15 0.82 0.59, 1.11
 Summary-table 3.74 3.08, 4.54 0.13 0.08, 0.22 0.12 0.10, 0.16 0.09 0.07, 0.11 0.08 0.05, 0.11 0.85 0.62, 1.15 0.82 0.59, 1.11
 Effect-estimate, fixed-effect 3.73 3.08, 4.52 0.13 0.08, 0.20 0.12 0.09, 0.16 0.09 0.07, 0.11 0.09 0.06, 0.12 0.94 0.69, 1.28 0.85 0.62, 1.15
 Effect-estimate, random-effects 3.59 1.52, 8.48 0.15 0.06, 0.33 0.12 0.06, 0.23 0.07 0.04, 0.12 0.05 0.01, 0.16 0.85 0.29, 2.46 0.74 0.41, 1.32
 Heterogeneity (Qe) and P value 22.59 <0.0001 2.41 0.2983 4.67 0.0962 7.23 0.0269 7.79 0.0203 12.79 0.0017 4.31 0.1156
Matching
 Pooled individual-level 3.66 2.85, 4.73 0.15 0.05, 0.37 0.17 0.11, 0.26 0.10 0.07, 0.13 0.08 0.06, 0.12 0.87 0.58, 1.30 0.78 0.52, 1.17
 Risk-set 3.66 2.85, 4.73 0.15 0.05, 0.37 0.17 0.11, 0.26 0.10 0.07, 0.13 0.08 0.06, 0.12 0.87 0.58, 1.30 0.78 0.52, 1.17
 Summary-table 3.66 2.85, 4.73 0.15 0.05, 0.37 0.17 0.11, 0.26 0.10 0.07, 0.13 0.08 0.06, 0.12 0.87 0.58, 1.30 0.78 0.52, 1.17
 Effect-estimate, fixed-effect 3.63 2.84, 4.66 0.16 0.07, 0.40 0.19 0.12, 0.29 0.10 0.07, 0.13 0.09 0.06, 0.13 0.93 0.62, 1.38 0.81 0.55, 1.21
 Effect-estimate, random-effects 2.93 1.08, 7.94 0.16 0.07, 0.40 0.19 0.12, 0.29 0.10 0.07, 0.13 0.06 0.02, 0.20 0.73 0.25, 2.11 0.64 0.24, 1.72
 Heterogeneity (Q) and P value 16.27 0.0003 1.72 0.4217 1.64 0.4385 1.71 0.4234 7.05 0.0294 7.91 0.0191 7.71 0.0211
Inverse probability weighting
 Pooled individual-level 3.16 2.67, 3.75 0.14 0.09, 0.20 0.11 0.08, 0.13 0.10 0.09, 0.12 0.09 0.06, 0.11 0.85 0.65, 1.12 0.86 0.65, 1.14
 Risk-set 3.16 2.67, 3.75 0.14 0.10, 0.21 0.11 0.09, 0.13 0.10 0.09, 0.12 0.09 0.06, 0.12 0.85 0.65, 1.12 0.86 0.65, 1.14
 Effect-estimate, fixed-effect 3.15 2.65, 3.74 0.11 0.08, 0.17 0.11 0.09, 0.13 0.10 0.09, 0.12 0.12 0.09, 0.17 1.16 0.86, 1.55 1.09 0.81, 1.45
 Effect-estimate, random-effects 3.11 2.55, 3.80 0.14 0.03, 0.60 0.09 0.05, 0.16 0.10 0.04, 0.29 0.02 0.00, 0.37 0.78 0.16, 3.71 0.43 0.09, 1.93
 Heterogeneity (Q) and P value 2.27 0.3208 4.10 0.1281 6.42 0.0403 41.94 <0.0001 12.10 0.0024 31.41 <0.0001 16.87 0.0002
Matching weighting
 Pooled individual-level 3.74 2.92, 4.80 0.15 0.06, 0.36 0.12 0.07, 0.19 0.09 0.07, 0.12 0.08 0.06, 0.12 0.80 0.54, 1.17 0.82 0.56, 1.22
 Risk-set 3.74 2.92, 4.80 0.15 0.06, 0.35 0.12 0.07, 0.19 0.09 0.07, 0.12 0.08 0.06, 0.12 0.80 0.54, 1.17 0.82 0.56, 1.22
 Effect-estimate, fixed-effect 3.73 2.90, 4.80 0.15 0.06, 0.37 0.12 0.08, 0.20 0.09 0.07, 0.12 0.09 0.06, 0.13 0.82 0.55, 1.23 0.83 0.56, 1.24
 Effect-estimate, random-effects 3.53 1.42, 8.77 0.15 0.06, 0.37 0.12 0.08, 0.20 0.08 0.05, 0.13 0.05 0.01, 0.17 0.78 0.24, 2.50 0.79 0.49, 1.29
 Heterogeneity (Q) and P value 11.59 0.0030 0.38 0.8233 0.88 0.6426 2.89 0.2352 6.52 0.0384 8.50 0.0142 2.38 0.3040

Abbreviations: AGB, adjustable gastric banding; BMI, body mass index; CI, confidence interval; OR, odds ratio; RYGB, Roux-en-Y gastric bypass.

a There were 584 (6.2%) patients who underwent AGB and 8,777 (93.8%) patients who underwent RYGB.

b BMI was calculated as weight (kg)/height (m)2.

c The incidences of <5%, ≥5%, ≥10%, ≥20%, and ≥30% changes in BMI were 68.0%, 93.7%, 76.5%, 31.2%, and 6.8%, respectively, for the AGB recipients and 32.9%, 99.1%, 96.8%, 83.8%, and 47.7%, respectively, for the RYGB recipients. These effectiveness outcomes were defined as the occurrence of the outcomes of interest closet to the end of the first postprocedure year, so the incidences of <5% change in BMI and ≥5% change in BMI do not sum to 100%.

d The incidences of rehospitalization and reintervention were 9.9% and 9.3%, respectively, for the AGB recipients and 11.7% and 11.6%, respectively, for the RYGB recipients.

eQ is a measure of heterogeneity among the 3 data-contributing sites. The summary statistic and P value from Cochran’s Q test are shown.

Time-to-event outcomes

Risk-set data-sharing produced results identical to those from the reference analyses for all confounding adjustment methods assessed (Table 3). Summary-table data-sharing generated numerically different but qualitatively similar results in matched and stratified analysis of summary scores. Fixed-effect meta-analysis of effect-estimate data produced results compatible with the reference results, while random-effects meta-analysis produced slightly different results that did not materially change the overall inference for effectiveness outcomes. For safety outcomes, the effect-estimate data-sharing approach showed discrepant results, with greater divergence being seen in inverse-probability–weighted analyses.

Table 3.

Results for Time-to-Event Outcomes From Propensity-Score–Adjusted Analyses Using Different Combinations of Confounding Adjustment Methods and Data-Sharing Approaches to Compare Adjusted Gastric Banding With Roux-en-Y Gastric Bypass (Empirical Example 1)a

Confounding Adjustment Method and Data-Sharing Approach Effectiveness Outcome (Change in BMIb)c Safety Outcomed
<5% ≥5% ≥10% ≥20% ≥30% Rehospitalization Reintervention
HR 95% CI HR 95% CI HR 95% CI HR 95% CI HR 95% CI HR 95% CI HR 95% CI
Stratification
 Pooled individual-level 2.20 1.97, 2.46 0.53 0.49, 0.58 0.36 0.32, 0.40 0.17 0.15, 0.20 0.10 0.07, 0.13 0.84 0.64, 1.12 0.82 0.61, 1.09
 Risk-set 2.20 1.97, 2.46 0.53 0.49, 0.58 0.36 0.32, 0.40 0.17 0.15, 0.20 0.10 0.07, 0.13 0.84 0.64, 1.12 0.82 0.61, 1.09
 Summary-table 3.48 3.11, 3.89 0.42e 0.39, 0.46e 0.37e 0.34, 0.41e 0.22e 0.19, 0.26e 0.12 0.08, 0.17 0.84 0.62, 1.11 0.81 0.60, 1.09
 Effect-estimate, fixed-effect 2.26 2.02, 2.52 0.54 0.49, 0.59 0.36 0.33, 0.40 0.17 0.15, 0.20 0.11 0.08, 0.15 0.97 0.73, 1.28 0.85 0.64, 1.14
 Effect-estimate, random-effects 2.35 1.21, 4.56 0.58 0.41, 0.81 0.36 0.27, 0.47 0.15 0.10, 0.22 0.06 0.02, 0.19 0.85 0.32, 2.22 0.76 0.45, 1.28
 Heterogeneity (Qf) and P value 25.29 <0.0001 15.08 0.0005 7.80 0.0202 6.55 0.0377 6.66 0.0357 13.50 0.0012 3.97 0.1371
Matching
 Pooled individual-level 2.25 1.91, 2.66 0.50 0.44, 0.57 0.35 0.31, 0.40 0.17 0.15, 0.21 0.10 0.07, 0.14 0.85 0.60, 1.22 0.78 0.54, 1.11
 Risk-set 2.25 1.91, 2.66 0.50 0.44, 0.57 0.35 0.31, 0.40 0.17 0.15, 0.21 0.10 0.07, 0.14 0.85 0.60, 1.22 0.78 0.54, 1.11
 Summary-table 3.49 2.95, 4.13 0.40 0.36, 0.45 0.38 0.33, 0.43 0.23 0.19, 0.27 0.12 0.09, 0.17 0.85 0.58, 1.23 0.77 0.52, 1.12
 Effect-estimate, fixed-effect 2.23 1.89, 2.63 0.50 0.45, 0.57 0.35 0.31, 0.40 0.17 0.15, 0.21 0.11 0.08, 0.15 0.91 0.63, 1.32 0.81 0.56, 1.18
 Effect-estimate, random-effects 1.95 0.92, 4.16 0.56 0.32, 0.99 0.36 0.27, 0.47 0.17 0.12, 0.23 0.07 0.02, 0.23 0.73 0.29, 1.87 0.64 0.24, 1.72
 Heterogeneity (Q) and P value 15.58 0.0004 22.17 <0.0001 4.79 0.0911 3.24 0.1976 5.58 0.0612 7.65 0.0218 7.71 0.0211
Inverse probability weighting
 Pooled individual-level 1.92 1.73, 2.13 0.49 0.45, 0.54 0.32 0.29, 0.35 0.19 0.17, 0.22 0.10 0.08, 0.14 0.84 0.65, 1.08 0.85 0.65, 1.12
 Risk-set 1.93 1.51, 2.47 0.49 0.40, 0.59 0.32 0.26, 0.38 0.19 0.14, 0.27 0.10 0.06, 0.17 0.84 0.46, 1.53 0.80 0.60, 1.07
 Effect-estimate, fixed-effect 1.92 1.73, 2.13 0.50 0.46, 0.54 0.32 0.29, 0.35 0.20 0.18, 0.23 0.15 0.11, 0.20 1.16 0.89, 1.51 1.07 0.81, 1.40
 Effect-estimate, random-effects 1.85 1.55, 2.21 0.54 0.43, 0.68 0.32 0.25, 0.41 0.19 0.11, 0.33 0.02 0.00, 0.42 0.74 0.22, 2.51 0.45 0.11, 1.80
 Heterogeneity (Q) and P value 3.31 0.1905 9.67 0.0079 9.24 0.0098 22.16 <0.0001 10.49 0.0053 27.68 <0.0001 15.27 0.0005
Matching weighting
 Pooled individual-level 2.29 1.94, 2.71 0.53 0.46, 0.59 0.34 0.30, 0.39 0.17 0.14, 0.20 0.10 0.07, 0.14 0.80 0.56, 1.15 0.83 0.57, 1.20
 Risk-set 2.33 2.08, 2.62 0.52 0.47, 0.57 0.34 0.31, 0.38 0.17 0.14, 0.20 0.10 0.07, 0.13 0.80 0.60, 1.07 0.85 0.57, 1.28
 Effect-estimate, fixed-effect 2.26 1.91, 2.67 0.52 0.46, 0.59 0.34 0.30, 0.39 0.17 0.14, 0.20 0.11 0.08, 0.15 0.83 0.57, 1.20 0.83 0.57, 1.22
 Effect-estimate, random-effects 2.23 1.11, 4.49 0.56 0.40, 0.78 0.34 0.26, 0.44 0.14 0.09, 0.23 0.06 0.02, 0.20 0.77 0.27, 2.24 0.81 0.53, 1.25
 Heterogeneity (Q) and P value 10.84 0.0044 7.90 0.0192 4.05 0.1316 6.28 <0.0431 6.20 0.0450 8.35 0.0154 2.22 0.3285

Abbreviations: AGB, adjustable gastric banding; BMI, body mass index; CI, confidence interval; HR, hazard ratio; RYGB, Roux-en-Y gastric bypass.

a There were 584 (6.2%) patients who underwent AGB and 8,777 (93.8%) patients who underwent RYGB.

b BMI was calculated as weight (kg)/height (m)2.

c The incidences of <5%, ≥5%, ≥10%, ≥20%, and ≥30% change in BMI were 68.0%, 93.7%, 76.5%, 31.2%, and 6.8%, respectively, for the AGB recipients and 32.9%, 99.1%, 96.8%, 83.8%, and 47.7%, respectively, for the RYGB recipients. These effectiveness outcomes were defined as the occurrence of the outcomes of interest closet to the end of the first postprocedure year, so the incidences of <5% change in BMI and ≥5% change in BMI do not sum to 100%.

d The incidences of rehospitalization and reintervention were 9.9% and 9.3%, respectively, for the AGB recipients and 11.7% and 11.6%, respectively, for the RYGB recipients.

e These estimates were calculated using the Mantel-Haenszel approach, because the exact CIs for the regression-based analysis could not be obtained.

fQ is a measure of heterogeneity among the 3 data-contributing sites. The summary statistic and P value from Cochran’s Q test are shown.

DRS-based analyses

As with PS-based analyses, we observed similar performance for various data-sharing and adjustment method combinations when used with DRS for both binary and time-to-event outcomes (Table 4). Analyses of risk-set data produced results identical to those from the reference analyses. Summary-table data-sharing generated identical results for binary outcomes but slightly different results for time-to-event outcomes in comparison with the reference analyses. The 2 meta-analyses of effect-estimate data produced slightly different results for both outcome types. When compared using the same confounding adjustment method (i.e., stratification or matching) for any specific outcome, DRS-based analyses generally produced results consistent with those from PS-based analyses.

Table 4.

Results From Disease Risk Scorea–Adjusted Analyses Using Different Combinations of Confounding Adjustment Methods and Data-Sharing Approaches to Compare Adjusted Gastric Banding With Roux-en-Y Gastric Bypass (Empirical Example 1)b

Confounding Adjustment Method and Data-Sharing Approach Effectiveness Outcome (Change in BMIc)d Safety Outcomee
<5% ≥5% ≥10% ≥20% ≥30% Rehospitalization Reintervention
OR 95% CI OR 95% CI OR 95% CI OR 95% CI OR 95% CI OR 95% CI OR 95% CI
Stratification
 Pooled individual-level 3.95 3.26, 4.79 0.15 0.10, 0.23 0.12 0.09, 0.15 0.08 0.07, 0.10 0.07 0.05, 0.10 0.86 0.63, 1.15 0.81 0.58, 1.09
 Risk-set 3.95 3.26, 4.79 0.15 0.10, 0.23 0.12 0.09, 0.15 0.08 0.07, 0.10 0.07 0.05, 0.10 0.86 0.63, 1.15 0.81 0.58, 1.09
 Summary-table 3.95 3.26, 4.79 0.15 0.10, 0.23 0.12 0.09, 0.15 0.08 0.07, 0.10 0.07 0.05, 0.10 0.86 0.63, 1.15 0.81 0.58, 1.09
 Effect-estimate, fixed-effect 3.93 3.25, 4.76 0.14 0.09, 0.21 0.11 0.09, 0.14 0.08 0.07, 0.10 0.09 0.06, 0.12 0.95 0.70, 1.28 0.83 0.62, 1.12
 Effect-estimate, random-effects 3.73 1.52, 9.12 0.16 0.08, 0.34 0.11 0.04, 0.28 0.07 0.03, 0.12 0.04 0.01, 0.16 0.85 0.30, 2.36 0.75 0.46, 1.21
 Heterogeneity (Qf) and P value 25.49 <0.0001 2.46 0.2922 11.25 0.0036 10.50 0.0052 8.27 0.0159 12.04 0.0024 3.29 0.1930
Matching
 Pooled individual-level 3.99 3.10, 5.15 0.08 0.02, 0.24 0.11 0.06, 0.18 0.08 0.06, 0.10 0.08 0.06, 0.12 0.86 0.57, 1.30 0.93 0.61, 1.42
 Risk-set 3.99 3.10, 5.15 0.08 0.02, 0.24 0.11 0.06, 0.18 0.08 0.06, 0.10 0.08 0.06, 0.12 0.86 0.57, 1.30 0.93 0.61, 1.42
 Summary-table 3.99 3.10, 5.15 0.08 0.02, 0.24 0.11 0.06, 0.18 0.08 0.06, 0.10 0.08 0.06, 0.12 0.86 0.57, 1.30 0.93 0.61, 1.42
 Effect-estimate, fixed-effect 3.93 3.06, 5.05 0.08 0.02, 0.26 0.11 0.06, 0.18 0.08 0.06, 0.10 0.09 0.06, 0.13 0.89 0.61, 1.27 0.95 0.63, 1.44
 Effect-estimate, random-effects 3.42 1.06, 10.98 0.08 0.02, 0.26 0.11 0.06, 0.18 0.06 0.04, 0.11 0.05 0.01, 0.16 0.79 0.38, 1.67 0.79 0.34, 1.83
 Heterogeneity (Q) and P value 20.91 <0.0001 0.00 0.9985 0.21 0.8981 3.32 0.1898 7.47 0.0239 4.11 0.1275 5.51 0.0633
HR 95% CI HR 95% CI HR 95% CI HR 95% CI HR 95% CI HR 95% CI HR 95% CI
Stratification
 Pooled individual-level 2.23 2.01, 2.48 0.51 0.47, 0.56 0.34 0.31, 0.38 0.17 0.14, 0.19 0.09 0.07, 0.13 0.83 0.63, 1.08 0.79 0.59, 1.05
 Risk-set 2.23 2.00, 2.47 0.51 0.47, 0.56 0.34 0.31, 0.38 0.17 0.14, 0.19 0.09 0.07, 0.13 0.82 0.63, 1.08 0.79 0.59, 1.05
 Summary-table 3.35 2.98, 3.79 0.41 0.38, 0.46 0.36 0.33, 0.39 0.22 0.19, 0.25 0.12 0.08, 0.17 0.81 0.61, 1.07 0.78 0.56, 1.03
 Effect-estimate, fixed-effect 2.31 2.08, 2.57 0.52 0.47, 0.57 0.34 0.31, 0.38 0.17 0.15, 0.20 0.11 0.08, 0.14 0.94 0.72, 1.23 0.81 0.61, 1.08
 Effect-estimate, random-effects 2.28 1.17, 4.42 0.54 0.38, 0.75 0.34 0.26, 0.45 0.14 0.09, 0.22 0.06 0.02, 0.18 0.81 0.34, 1.96 0.75 0.49, 1.14
 Heterogeneity (Q) and P value 28.81 <0.0001 14.65 0.0007 8.62 0.0134 7.96 0.0186 7.06 0.0292 12.00 0.0025 2.99 0.2232
Matching
 Pooled individual-level 2.41 2.03, 2.85 0.49 0.43, 0.55 0.34 0.30, 0.39 0.16 0.14, 0.19 0.10 0.07, 0.13 0.85 0.59, 1.22 0.90 0.61, 1.32
 Risk-set 2.41 2.03, 2.85 0.49 0.43, 0.56 0.34 0.30, 0.39 0.16 0.14, 0.19 0.10 0.07, 0.13 0.85 0.59, 1.22 0.90 0.61, 1.32
 Summary-table 3.75 3.16, 4.46 0.39 0.35, 0.44 0.37 0.32, 0.42 0.22 0.18, 0.26 0.12 0.08, 0.17 0.85 0.58, 1.24 0.89 0.59, 1.32
 Effect-estimate, fixed-effect 2.36 1.99, 2.80 0.49 0.43, 0.55 0.34 0.30, 0.39 0.16 0.14, 0.20 0.11 0.08, 0.15 0.88 0.61, 1.27 0.92 0.62, 1.36
 Effect-estimate, random-effects 2.22 0.88, 5.59 0.52 0.36, 0.74 0.35 0.24, 0.50 0.13 0.08, 0.21 0.05 0.01, 0.20 0.79 0.40, 1.55 0.79 0.37, 1.67
 Heterogeneity (Q) and P value 21.66 <0.0001 9.16 0.0102 7.95 0.0187 6.67 0.0356 6.90 0.0317 4.17 0.1242 5.07 0.0792

Abbreviations: AGB, adjustable gastric banding; BMI, body mass index; CI, confidence interval; HR, hazard ratio; OR, odds ratio; RYGB, Roux-en-Y gastric bypass.

a The disease risk score was estimated using Cox proportional hazards regression in patients receiving the Roux-en-Y gastric bypass procedure.

b BMI was calculated as weight (kg)/height (m)2.

c There were 584 (6.2%) patients who underwent AGB and 8,777 (93.8%) patients who underwent RYGB.

d The incidences of <5%, ≥5%, ≥10%, ≥20%, and ≥30% change in BMI were 68.0%, 93.7%, 76.5%, 31.2%, and 6.8%, respectively, for the AGB recipients and 32.9%, 99.1%, 96.8%, 83.8%, and 47.7%, respectively, for the RYGB recipients. These effectiveness outcomes were defined as the occurrence of the outcomes of interest closet to the end of the first postprocedure year, so the incidences of <5% change in BMI and ≥5% change in BMI do not sum to 100%.

e The incidences of rehospitalization and reintervention were 9.9% and 9.3%, respectively, for the AGB recipients and 11.7% and 11.6%, respectively, for the RYGB recipients.

fQ is a measure of heterogeneity among the 3 data-contributing sites. The summary statistic and P value from Cochran’s Q test are shown.

Treatment effect heterogeneity across sites

The Q statistic suggested potential treatment effect heterogeneity across the 3 data-contributing sites for most outcomes examined (Tables 24).

Example 2: comparative effectiveness and safety of biological disease-modifying antirheumatic medications

We identified 7,419 patients who initiated use of a TNFi biological and 407 patients who initiated use of a non-TNFi biological. Their baseline characteristics are shown in Web Table 3. Because of the low numbers of outcome occurrences and the limited sample size, we present results pertaining to treatment switching for the effectiveness outcome and results pertaining to serious infections for the safety outcome—the only outcomes for which we could obtain reliable estimates.

PS-based analyses.

Binary outcomes

Similarly to the bariatric procedure example, all 3 data-sharing approaches generated results similar to those from the reference analyses (Table 5). The results from risk-set and summary-table data-sharing were identical to the reference results when confounding was adjusted for through stratification or matching. The 2 meta-analyses of effect-estimate data also produced comparable results. When using inverse probability weighting for confounding adjustment, divergence from the reference analysis was observed, especially for the “serious infections” outcome, which had lower incidence compared with treatment switching.

Table 5.

Results for Binary Outcomes From Propensity-Score–Adjusted Analyses Using Different Combinations of Confounding Adjustment Methods and Data-Sharing Approaches to Compare Non–Tumor Necrosis Factor α Inhibitors With Tumor Necrosis Factor α Inhibitors (Empirical Example 2)a

Confounding Adjustment Method and Data-Sharing Approach Effectiveness Outcome (Treatment Switching)b Safety Outcome (Serious Infections)c
OR 95% CI OR 95% CI
Stratification
 Pooled individual-level 0.52 0.34, 0.76 0.97 0.46, 1.86
 Risk-set 0.52 0.34, 0.76 0.97 0.46, 1.86
 Summary-table 0.52 0.34, 0.76 0.97 0.46, 1.86
 Effect-estimate, fixed-effect 0.54 0.36, 0.79 1.06 0.56, 2.03
 Effect-estimate, random-effects 0.53 0.32, 0.86 1.03 0.56, 2.03
 Heterogeneity (Qd) and P value 2.23 0.3268 0.78 0.6741
Matching
 Pooled individual-level 0.47 0.30, 0.73 1.07 0.46, 2.37
 Risk-set 0.47 0.30, 0.73 1.07 0.46, 2.37
 Summary-table 0.47 0.30, 0.73 1.07 0.46, 2.37
 Effect-estimate, fixed-effect 0.47 0.31, 0.72 1.23 0.58, 2.64
 Effect-estimate, random-effects 0.47 0.31, 0.72 1.23 0.58, 2.64
 Heterogeneity (Q) and P value 0.55 0.7589 0.03 0.9820
Inverse probability weighting
 Pooled individual-level 0.36 0.23, 0.57 2.88 1.97, 4.21
 Risk-set 0.36 0.23, 0.57 3.11 2.12, 4.55
 Effect-estimate, fixed-effect 0.39 0.25, 0.61 3.17 2.16, 4.65
 Effect-estimate, random-effects 0.45 0.18, 1.09 3.17 2.16, 4.65
 Heterogeneity (Q) and P value 4.62 0.0992 0.85 0.6531
Matching weighting
 Pooled individual-level 0.51 0.32, 0.81 0.89 0.39, 2.04
 Risk-set 0.51 0.32, 0.81 0.95 0.41, 2.19
 Effect-estimate, fixed-effect 0.51 0.32, 0.82 0.93 0.40, 2.17
 Effect-estimate, random-effects 0.51 0.32, 0.82 0.93 0.40, 2.17
 Heterogeneity (Q) and P value 1.30 0.5213 0.42 0.8105

Abbreviations: CI, confidence interval; OR, odds ratio; TNFi, tumor necrosis factor α inhibitor.

a There were 407 (5.2%) patients who initiated use of non-TNFi biologicals and 7,419 (94.8%) patients who initiated use of TNFi biologicals.

b The incidence of treatment switching was 7.6% in new users of non-TNFi biologicals and 11.2% in new users of TNFi biologicals.

c The incidences of serious infection were 2.9% in new users of non-TNFi biologicals and 3.1% in new users of TNFi biologicals.

dQ is a measure of heterogeneity among the 3 data-contributing sites. The summary statistic and P value from Cochran’s Q test are shown.

Time-to-event outcomes

Sharing of risk-set data produced results identical to those of the reference analyses except when confounding was adjusted for through weighting—divergence from the reference analysis was observed in the 95% confidence intervals, especially for the “serious infections” outcome (Table 6). Both meta-analyses of effect-estimate data produced results compatible with those of the reference analysis, with the random-effects meta-analysis showing slightly more variation. However, differently from the bariatric procedure example, summary-table data-sharing generated results concordant with the reference results.

Table 6.

Results for Time-to-Event Outcomes From Propensity-Score–Adjusted Analyses Using Different Combinations of Confounding Adjustment Methods and Data-Sharing Approaches to Compare Non–Tumor Necrosis Factor α Inhibitors With Tumor Necrosis Factor α Inhibitors (Empirical Example 2)a

Confounding Adjustment Method and Data-Sharing Approach Effectiveness Outcome (Treatment Switching)b Safety Outcome (Serious Infections)c
HR 95% CI HR 95% CI
Stratification
 Pooled individual-level 0.59 0.41, 0.86 0.94 0.50, 1.77
 Risk-set 0.59 0.41, 0.86 0.94 0.50, 1.77
 Summary-table 0.59 0.39, 0.85 0.94 0.45, 1.78
 Effect-estimate, fixed-effect 0.64 0.44, 0.93 1.04 0.55, 1.96
 Effect-estimate, random-effects 0.63 0.26, 1.50 1.04 0.55, 1.96
 Heterogeneity (Qd) and P value 4.21 0.1215 0.83 0.6599
Matching
 Pooled individual-level 0.55 0.37, 0.83 1.07 0.51, 2.22
 Risk-set 0.55 0.37, 0.83 1.07 0.51, 2.22
 Summary-table 0.55 0.35, 0.82 1.07 0.47, 2.34
 Effect-estimate, fixed-effect 0.57 0.38, 0.84 1.23 0.58, 2.61
 Effect-estimate, random-effects 0.58 0.34, 1.02 1.23 0.58, 2.61
 Heterogeneity (Q) and P value 2.37 0.3053 0.03 0.9701
Inverse probability weighting
 Pooled individual-level 0.60 0.38, 0.93 2.67 1.86, 3.84
 Risk-set 0.60 0.35, 1.01 2.67 0.53, 13.52
 Effect-estimate, fixed-effect 0.69 0.44, 1.07 2.93 2.03, 4.22
 Effect-estimate, random-effects 0.79 0.26, 2.42 2.93 2.03, 4.22
 Heterogeneity (Q) and P value 7.61 0.0222 0.81 0.6656
Matching weighting
 Pooled individual-level 0.60 0.39, 0.94 0.85 0.38, 1.92
 Risk-set 0.60 0.41, 0.88 0.85 0.45, 1.63
 Effect-estimate, fixed-effect 0.62 0.40, 0.96 0.89 0.39, 2.05
 Effect-estimate, random-effects 0.60 0.30, 1.23 0.89 0.39, 2.05
 Heterogeneity (Q) and P value 2.62 0.2689 0.37 0.8304

Abbreviations: CI, confidence interval; HR, hazard ratio; TNFi, tumor necrosis factor α inhibitor.

a There were 407 (5.2%) patients who initiated use of non-TNFi biologicals and 7,419 (94.8%) patients who initiated use of TNFi biologicals.

b The incidence of treatment switching was 7.6% in new users of non-TNFi biologicals and 11.2% in new users of TNFi biologicals.

c The incidence of serious infection was 2.9% in new users of non-TNFi biologicals and 3.1% in new users of TNFi biologicals.

dQ is a measure of heterogeneity among the 3 data-contributing sites. The summary statistic and P value from Cochran’s Q test are shown.

DRS-based analyses

We observed similar findings for both binary and time-to-event outcomes when comparing results derived from the aggregate-level analytical methods with the reference results (Table 7). Risk-set and summary-table data-sharing generated identical results for both binary and time-to-event outcomes. Meta-analyses of effect-estimate data produced slightly different results for both outcome types but did not change the overall inference. For any specific outcome, DRS-based analyses generated results consistent with those from PS-based analyses when using the same confounding adjustment method (i.e., stratification or matching).

Table 7.

Results From Disease Risk Scorea–Adjusted Analyses Using Different Combinations of Confounding Adjustment Methods and Data-Sharing Approaches to Compare Non–Tumor Necrosis Factor α Inhibitors With Tumor Necrosis Factor α Inhibitors (Empirical Example 2)b

Confounding Adjustment Method and Data-Sharing Approach Effectiveness Outcome (Treatment Switching)c Safety Outcome (Serious Infections)d
OR 95% CI OR 95% CI
Stratification
 Pooled individual-level 0.53 0.35, 0.78 0.88 0.42, 1.64
 Risk-set 0.53 0.36, 0.78 0.88 0.42, 1.64
 Summary-table 0.53 0.35, 0.78 0.88 0.42, 1.64
 Effect-estimate, fixed-effect 0.56 0.38, 0.82 0.97 0.52, 1.82
 Effect-estimate, random-effects 0.51 0.28, 0.96 0.97 0.52, 1.82
 Heterogeneity (Qe) and P value 2.67 0.2624 0.24 0.8832
Matching
 Pooled individual-level 0.41 0.26, 0.63 0.86 0.37, 1.93
 Risk-set 0.41 0.26, 0.63 0.86 0.37, 1.93
 Summary-table 0.41 0.26, 0.63 0.86 0.37, 1.93
 Effect-estimate, fixed-effect 0.41 0.27, 0.63 0.89 0.40, 1.98
 Effect-estimate, random-effects 0.41 0.27, 0.63 0.89 0.40, 1.98
 Heterogeneity (Q) and P value 1.77 0.4115 0.00 0.9961
HR 95% CI HR 95% CI
Stratification
 Pooled individual-level 0.59 0.41, 0.84 0.86 0.47, 1.57
 Risk-set 0.59 0.41, 0.84 0.86 0.47, 1.57
 Summary-table 0.58 0.39, 0.83 0.86 0.42, 1.57
 Effect-estimate, fixed-effect 0.63 0.44, 0.91 0.95 0.52, 1.75
 Effect-estimate, random-effects 0.59 0.26, 1.32 0.95 0.52, 1.75
 Heterogeneity (Q) and P value 3.83 0.1466 0.21 0.8989
Matching
 Pooled individual-level 0.46 0.31, 0.68 0.89 0.42, 1.86
 Risk-set 0.46 0.31, 0.68 0.89 0.42, 1.86
 Summary-table 0.45 0.30, 0.68 0.88 0.38, 1.95
 Effect-estimate, fixed-effect 0.47 0.32, 0.70 0.91 0.42, 2.00
 Effect-estimate, random-effects 0.52 0.22, 1.25 0.91 0.42, 2.00
 Heterogeneity (Q) and P value 4.01 0.1345 0.00 1.0000

Abbreviations: CI, confidence interval; HR, hazard ratio; OR, odds ratio; TNFi, tumor necrosis factor α inhibitor.

a The disease risk score was estimated using Cox proportional hazards regression in patients receiving tumor necrosis factor-α inhibitor biologicals.

b There were 407 (5.2%) patients who initiated use of non-TNFi biologicals and 7,419 (94.8%) patients who initiated use of TNFi biologicals.

c The incidence of treatment switching was 7.6% in new users of non-TNFi biologicals and 11.2% in new users of TNFi biologicals.

d The incidence of serious infection was 2.9% in new users of non-TNFi biologicals and 3.1% in new users of TNFi biologicals.

eQ is a measure of heterogeneity among the 3 data-contributing sites. The summary statistic and P value from Cochran’s Q test are shown.

Treatment effect heterogeneity across sites

The Q statistic suggested no treatment effect heterogeneity across the 3 data-contributing sites for most outcomes examined (Tables 57).

DISCUSSION

Using 2 empirical examples within a 3-site distributed data network, we tested combinations of 3 aggregate-level data-sharing approaches, 4 confounding adjustment methods, and 2 confounder summary scores and assessed their performance in multivariable-adjusted analysis of binary and time-to-event outcomes. The empirical examples included a range of exposure prevalences and outcome incidences, allowing for assessment under various real-world settings. For both types of outcomes, these aggregate-level data-sharing approaches yielded results identical or comparable to those from their corresponding pooled individual-level data analyses in most scenarios examined.

Summary of findings

For a given data-sharing approach, the comparability between aggregate- and individual-level data analysis depended on the confounding adjustment method. For example, with risk-set data-sharing, matched or stratified analysis of confounder summary scores returned identical results, while weighted analysis showed some variation. This was true for both PS- and DRS-based analysis. Our finding on the equivalence between PS-stratified analysis of risk-set data and the pooled individual-level data analysis was consistent with a previous empirical examination (13). Our study also confirmed the high comparability between inversed-probability–weighted analysis of risk-set data and the corresponding reference analysis in most scenarios, which was previously demonstrated in a simulation study (35).

Sharing of summary-table data only requires aggregated information by exposure group, but analyses using this approach were sensitive to outcome type, outcome incidence, and sample size. Across confounding adjustment methods, this data-sharing approach yielded results identical to those of the reference analyses for binary outcomes but discrepant results for time-to-event outcomes in some scenarios. For example, the hazard ratio estimate for a <5% change in body mass index from the PS-based analysis was 3.48 (95% confidence interval: 3.11, 3.89) with summary-table data-sharing, while the reference hazard ratio was 2.20 (95% confidence interval: 1.97, 2.46) (Table 3). This discordance was not surprising because summary-table data-sharing for time-to-event outcomes was, in essence, performing a Poisson regression analysis, which assumes constant hazards. In the situation of time-varying hazards, this approach would generate results different from those of the Cox proportional hazards regression used in the pooled individual-level data analysis. This difference indicates that analysis of summary-table data may not be appropriate for certain time-to-event outcomes, especially when the hazards of the outcome under study are not constant.

Meta-analysis of effect-estimate data requires that the least amount of information be shared across sites, but our empirical examples suggest that this approach was sensitive to outcome incidence and sample size. The discordance between results from this approach and the reference analyses was evident for the “≥30% change in body mass index” outcome and the safety outcomes in the bariatric procedure example and the “serious infections” outcome in the biological antirheumatic medications example. The incidences of these outcomes were less than or equal to 3.5% at some sites, much lower than those for the other outcomes. In addition, some outcomes occurred in only 1 exposure group at some sites, making the data uninformative in meta-analyses of effect-estimate data. Conversely, other data-sharing approaches can utilize data from sites with an outcome occurring in only 1 of the exposure groups. When the outcome under study was common across exposure groups and across sites, effect-estimate data-sharing, using both fixed-effect and random-effects modeling, produced estimates similar to those of the reference analysis. This finding was consistent with the results from previous simulation studies (14, 35).

Synthesis of evidence on the performance of methods examined

Results from this empirical study confirmed and complemented those from a simulation study that examined the performance of these methods in a wider range of scenarios with varying treatment prevalence, outcome incidence, treatment effect, site size, number of sites, and covariate distributions (35). Simulation and empirical studies showed that these method combinations produced results highly comparable to those from their corresponding pooled individual-level analysis when the exposure prevalence was high, the outcome incidence was high, and the site size was adequate. The performance of these method combinations varied in scenarios with low exposure prevalence, low outcome incidence, and small site size. Web Table 4 summarizes the strengths and limitations of these methods examined in both studies and how their performance may be influenced by key parameters in a given multicenter study. This table can serve as a guide for researchers interested in applying these methods. In general, risk-set data-sharing is the method of choice in matched or stratified analysis of confounder summary scores because of its mathematical equivalence to its corresponding pooled individual-level data analysis. We demonstrated this equivalence in simulation and empirical studies. Meta-analysis of effect-estimate data is a valid alternative if all data-contributing sites are able to produce an effect estimate. Summary-table data-sharing can also be considered when the hazards of the study outcome are constant.

Additional considerations

We evaluated the performance of these methods in a distributed data network that had a common data model and reliable data quality. However, we would not expect their relative performance to differ in settings with less standardized data infrastructure, because the pooled individual-level data analysis would be equally susceptible to the same data issues. In practice, it may be more challenging to apply certain privacy-protecting methods in settings with less standardized data infrastructure. The use of these methods may also require more programming resources at each site and more coordination across sites. These operational challenges, though important, were beyond the scope of this study, which focused on the statistical performance of the methods.

It is not uncommon to have richer data at certain sites in a multicenter study. Researchers can estimate confounder summary scores using a common set of covariates or site-specific covariates. Both approaches have unique strengths and limitations that may vary by setting (12). Again, this issue applies to all data-sharing approaches, including approaches that share individual-level data. Using a common model to estimate summary scores ensures consistency across sites, but this approach may not fully utilize the information available at each site. Estimating site-specific summary scores theoretically allows better confounding adjustment at each site but may require more programming resources when using aggregate-level data-sharing. Some semiautomated modeling techniques, such as the high-dimensional PS approach (36), may help improve the feasibility of estimating site-specific summary scores. In practice, it is generally worthwhile to estimate summary scores in multiple ways to examine the robustness of the results.

Strengths

To our knowledge, this was the first study to systematically and comprehensively assess these newer privacy-protecting analytical and data-sharing methods for distributed data network studies. We used the results from pooled individual-level data analysis as the benchmark with which to evaluate the results from these more privacy-protecting methods. Although the referent pooled individual-level data analysis might not necessarily yield the true treatment effect, it represents the best possible analysis one could perform in multicenter studies; a more privacy-protecting method is a reasonable alternative if it produces identical or comparable results. It is also reassuring that our empirical studies produced results consistent with findings from prior methodological (8, 1214, 35, 37, 38) and clinical (1519) studies. Data from the 3 integrated delivery systems allowed us to assess the performance of these methods in settings that researchers may encounter in real-world studies with different outcome incidences and exposure prevalences. We also produced empirical evidence to support the use of DRS in combination with aggregate-level data-sharing approaches, which has not been previously evaluated.

Limitations

Because of small sample sizes and rare outcomes in some scenarios, certain analyses were not feasible or produced unreliable estimates. However, our study offers a realistic scenario involving sparse data at participating sites, a setting that necessitates multicenter studies. Our distributed data network comprised only 3 sites whose data had been converted into standardized formats. Investigators in future studies need to assess the validity of these methods in networks with more data-contributing sites, larger sample sizes, and more diverse databases.

The combinations of data-sharing approaches and confounding adjustment methods evaluated were by no means exhaustive. We did not consider distributed regression (10, 11, 3941), which could be used in combination with confounder summary scores (42). We tested for treatment effect heterogeneity across sites but did not address it in our analyses, other than accounting for it in the random-effects meta-analysis. In the presence of treatment effect heterogeneity by site, issues around the appropriateness of combining data across sites apply to all data-sharing approaches, including approaches that share individual-level data. All methods we examined can accommodate assessment of treatment effect heterogeneity, either by site or by specific patient characteristics, if researchers specify potential effect modifiers in advance and request data accordingly. It is worth noting that the performances of the various method combinations examined were similar in both empirical examples, one of which showed substantial treatment effect heterogeneity and the other of which did not.

CONCLUSION

When used in conjunction with confounder summary scores, several combinations of data-sharing approaches and confounding adjustment methods allow researchers to perform multivariable-adjusted analysis using only aggregate-level information from participating sites and produce results identical to or comparable to those from pooled individual-level data analysis. These more privacy-protecting analytical methods can be viable alternatives when sharing of individual-level data is not feasible or preferred in multicenter studies. Generally, risk-set data-sharing is the method of choice in matched or stratified analysis of confounder summary scores. Meta-analysis of effect-estimate data is a valid alternative if all data-contributing sites can produce an effect estimate. Summary-table data-sharing can also be considered when the hazards of the study outcome are constant. Researchers should carefully evaluate exposure prevalence and outcome incidence when choosing among available data-sharing approaches and confounding adjustment methods in multicenter studies.

Supplementary Material

Web Material

ACKNOWLEDGMENTS

Author affiliations: Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts (Xiaojuan Li, Mia Gallagher, Sengwee Toh); Division of Research, Kaiser Permanente Northern California, Oakland, California (Bruce H. Fireman); Division of Clinical Immunology and Rheumatology, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama (Jeffrey R. Curtis); Kaiser Permanente Washington Health Research Institute, Seattle, Washington (David E. Arterburn); The Permanente Medical Group, Kaiser Permanente Northern California, Oakland, California (David P. Fisher); StatLog Econometrics Inc., Montreal, Quebec, Canada (Érick Moyneur); Institute for Health Research, Kaiser Permanente Colorado, Denver, Colorado (Marsha A. Raebel); CreakyJoints, Global Healthy Living Foundation, Upper Nyack, New York (W. Benjamin Nowell); and Limeade, Bellevue, Washington (Lindsay Lagreid).

This work was supported by the Patient-Centered Outcomes Research Institute (contracts ME-1403-11305 and PPRN-1306-04811). S.T. was also supported in part by the National Institutes of Health (grant U01EB023683) and the Agency for Healthcare Research and Quality (grant R01HS026214).

We thank the project managers, programmers, and staff of the 3 data-contributing sites in this study.

All statements in this article, including its findings and conclusions, are solely those of the authors and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute (PCORI) or PCORI’s Board of Governors or Methodology Committee.

Conflict of interest: none declared.

Abbreviations

DRS

disease risk score

PS

propensity score

TNFi

tumor necrosis factor α inhibitor

REFERENCES

  • 1. Brown JS, Holmes JH, Shah K, et al. . Distributed health data networks: a practical and preferred approach to multi-institutional evaluations of comparative effectiveness, safety, and quality of care. Med Care. 2010;48(6 suppl):S45–S51. [DOI] [PubMed] [Google Scholar]
  • 2. Maro JC, Platt R, Holmes JH, et al. . Design of a national distributed health data network. Ann Intern Med. 2009;151(5):341–344. [DOI] [PubMed] [Google Scholar]
  • 3. Toh S, Platt R, Steiner JF, et al. . Comparative-effectiveness research in distributed health data networks. Clin Pharmacol Ther. 2011;90(6):883–887. [DOI] [PubMed] [Google Scholar]
  • 4. Behrman RE, Benner JS, Brown JS, et al. . Developing the Sentinel System—a national resource for evidence development. N Engl J Med. 2011;364(6):498–499. [DOI] [PubMed] [Google Scholar]
  • 5. Sentinel Coordinating Center Sentinel is a national medical product monitoring system. 2018. https://www.sentinelinitiative.org/. Accessed January 6, 2018.
  • 6. NIH Collaboratory Health Care Systems Research Collaboratory. 2018. https://www.nihcollaboratory.org/Pages/default.aspx. Accessed January 6, 2018.
  • 7. Patient-Centered Outcomes Research Institute PCORnet: the National Patient-Centered Clinical Research Network. 2018. http://www.pcori.org/funding-opportunities/pcornet-national-patient-centered-clinical-research-network/. Accessed January 6, 2018.
  • 8. Rassen JA, Avorn J, Schneeweiss S. Multivariate-adjusted pharmacoepidemiologic analyses of confidential information pooled from multiple health care utilization databases. Pharmacoepidemiol Drug Saf. 2010;19(8):848–857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Fireman B, Lee J, Lewis N, et al. . Influenza vaccination and mortality: differentiating vaccine effects from bias. Am J Epidemiol. 2009;170(5):650–656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Karr AF, Lin X, Sanil AP, et al. . Secure regression on distributed databases. J Comput Graph Stat. 2005;14(2):263–279. [Google Scholar]
  • 11. Wu Y, Jiang X, Kim J, et al. . Grid Binary LOgistic REgression (GLORE): building shared models without sharing data. J Am Med Inform Assoc. 2012;19(5):758–764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Toh S, Gagne JJ, Rassen JA, et al. . Confounding adjustment in comparative effectiveness research conducted within distributed research networks. Med Care. 2013;51(8 suppl 3):S4–S10. [DOI] [PubMed] [Google Scholar]
  • 13. Toh S, Shetterly S, Powers JD, et al. . Privacy-preserving analytic methods for multisite comparative effectiveness and patient-centered outcomes research. Med Care. 2014;52(7):664–668. [DOI] [PubMed] [Google Scholar]
  • 14. Toh S, Reichman ME, Houstoun M, et al. . Multivariable confounding adjustment in distributed data networks without sharing of patient-level data. Pharmacoepidemiol Drug Saf. 2013;22(11):1171–1177. [DOI] [PubMed] [Google Scholar]
  • 15. Arterburn D, Powers JD, Toh S, et al. . Comparative effectiveness of laparoscopic adjustable gastric banding vs laparoscopic gastric bypass. JAMA Surg. 2014;149(12):1279–1287. [DOI] [PubMed] [Google Scholar]
  • 16. Maciejewski ML, Arterburn DE, Van Scoyoc L, et al. . Bariatric surgery and long-term durability of weight loss. JAMA Surg. 2016;151(11):1046–1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Toh S, Li L, Harrold LR, et al. . Comparative safety of infliximab and etanercept on the risk of serious infections: does the association vary by patient characteristics? Pharmacoepidemiol Drug Saf. 2012;21(5):524–534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Grijalva CG, Chen L, Delzell E, et al. . Initiation of tumor necrosis factor-α antagonists and the risk of hospitalization for infection in patients with autoimmune diseases. JAMA. 2011;306(21):2331–2339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Curtis JR, Patkar N, Xie A, et al. . Risk of serious bacterial infections among rheumatoid arthritis patients exposed to tumor necrosis factor alpha antagonists. Arthritis Rheum. 2007;56(4):1125–1133. [DOI] [PubMed] [Google Scholar]
  • 20. Curtis LH, Brown J, Platt R. Four health data networks illustrate the potential for a shared national multipurpose big-data network. Health Aff (Millwood). 2014;33(7):1178–1186. [DOI] [PubMed] [Google Scholar]
  • 21. Longitudinal Assessment of Bariatric Surgery (LABS) Consortium, Flum DR, Belle SH, et al. . Perioperative safety in the longitudinal assessment of bariatric surgery. N Engl J Med. 2009;361(5):445–454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Curtis JR, Baddley JW, Yang S, et al. . Derivation and preliminary validation of an administrative claims-based algorithm for the effectiveness of medications for rheumatoid arthritis. Arthritis Res Ther. 2011;13(5):R155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Walsh KE, Cutrona SL, Foy S, et al. . Validation of anaphylaxis in the Food and Drug Administration’s Mini-Sentinel. Pharmacoepidemiol Drug Saf. 2013;22(11):1205–1213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Mazor KM, Richards A, Gallagher M, et al. . Stakeholders’ views on data sharing in multicenter studies. J Comp Eff Res. 2017;6(6):537–547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute Privacy-protecting methods. Educational materials. 2018. https://www.distributedanalysis.org/educational-materials. Accessed June 27, 2018.
  • 26. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. [Google Scholar]
  • 27. Arbogast PG, Ray WA. Performance of disease risk scores, propensity scores, and traditional multivariable outcome regression in the presence of multiple confounders. Am J Epidemiol. 2011;174(5):613–620. [DOI] [PubMed] [Google Scholar]
  • 28. Xu S, Ross C, Raebel MA, et al. . Use of stabilized inverse propensity scores as weights to directly estimate relative risk and its confidence intervals. Value Health. 2010;13(2):273–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Li L, Greene T. A weighting analogue to pair matching in propensity score analysis. Int J Biostat. 2013;9(2):215–234. [DOI] [PubMed] [Google Scholar]
  • 30. Yoshida K, Hernández-Díaz S, Solomon DH, et al. . Matching weights to simultaneously compare three treatment groups: comparison to three-way matching. Epidemiology. 2017;28(3):387–395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Cook EF, Goldman L. Performance of tests of significance based on stratification by a multivariate confounder score or by a propensity score. J Clin Epidemiol. 1989;42(4):317–324. [DOI] [PubMed] [Google Scholar]
  • 32. Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst. 1959;22(4):719–748. [PubMed] [Google Scholar]
  • 33. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177–188. [DOI] [PubMed] [Google Scholar]
  • 34. Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10(1):101–129. [Google Scholar]
  • 35. Yoshida K, Gruber S, Fireman BH, et al. . Comparison of privacy-protecting analytic and data-sharing methods: a simulation study. Pharmacoepidemiol Drug Saf. 2018;27(9):1034–1041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Schneeweiss S, Rassen JA, Glynn RJ, et al. . High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20(4):512–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Toh S, Reichman ME, Houstoun M, et al. . Comparative risk for angioedema associated with the use of drugs that target the renin-angiotensin-aldosterone system. Arch Intern Med. 2012;172(20):1582–1589. [DOI] [PubMed] [Google Scholar]
  • 38. Fireman B, Toh S, Butler MG, et al. . A protocol for active surveillance of acute myocardial infarction in association with the use of a new antidiabetic pharmaceutical agent. Pharmacoepidemiol Drug Saf. 2012;21(suppl 1):282–290. [DOI] [PubMed] [Google Scholar]
  • 39. Fienberg SE, Fulp WJ, Slavkovic AB, et al. . “Secure” log-linear and logistic regression analysis of distributed databases. Lect Notes Comput Sci. 2006;4302:277–290. [Google Scholar]
  • 40. Lin X, Karr AF. Privacy-preserving maximum likelihood estimation for distributed data. J Priv Confid. 2010;1(2):213–222. [Google Scholar]
  • 41. Her QL, Malenfant JM, Malek S, et al. . A query workflow design to perform automatable distributed regression analysis in large distributed data networks. EGEMS (Wash DC). 2018;6(1):Article 11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Toh S, Wellman R, Coley RY, et al. . Combining distributed regression and propensity scores: a doubly privacy-protecting analytic method for multicenter research. Clin Epidemiol. 2018;10:1773–1786. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web Material

Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES