Summary
Many clinical endpoint measures, such as the number of standard drinks consumed per week or the number of days that patients stayed in the hospital, are count data with excessive zeros. However, the zero‐inflated nature of such outcomes is sometimes ignored in analyses of clinical trials. This leads to biased estimates of study‐level intervention effect and, consequently, a biased estimate of the overall intervention effect in a meta‐analysis. The current study proposes a novel statistical approach, the Zero‐inflation Bias Correction (ZIBC) method, that can account for the bias introduced when using the Poisson regression model, despite a high rate of inflated zeros in the outcome distribution of a randomized clinical trial. This correction method only requires summary information from individual studies to correct intervention effect estimates as if they were appropriately estimated using the zero‐inflated Poisson regression model, thus it is attractive for meta‐analysis when individual participant‐level data are not available in some studies. Simulation studies and real data analyses showed that the ZIBC method performed well in correcting zero‐inflation bias in most situations.
Keywords: aggregate data, individual participant data, meta‐analysis, randomized clinical trial, zero‐inflated outcome
1. INTRODUCTION
Meta‐analysis is an established statistical approach for combining data from multiple studies to provide large‐scale evidence across many disciplines, including medical, educational, and policy research. 1 The majority of published meta‐analyses have relied on aggregate data (AD), which are study‐level summary statistics available from published or unpublished reports. 2 , 3 , 4 However, AD meta‐analysis is susceptible to estimation bias, because the biased result from a study with model misspecification (eg, a biased effect size) will be carried over in meta‐analysis if the study is included. For AD meta‐analysis, it is challenging to correct biased estimation from original studies without refitting raw individual participant data (IPD) using a more suited statistical model. 5 In this article, we aim to correct this estimation bias, that is, the bias from the conventional count model on zero‐inflated count outcome, when only AD are available for meta‐analysis.
Count outcomes are prevalent in clinical research, including number of seizures for each patient in epilepsy trials (eg, Reference 6), number of relapses in multiple sclerosis trials (eg, Reference 7), and number of standard alcohol drinks in alcohol intervention trials (eg, Reference 8). Some studies, by nature, have high proportions of zero outcome values. For example, drinking outcomes from alcohol intervention studies reflect both individuals who abstain from alcohol following an intervention and those who happened not to drink, resulting in a large proportion of zero drinks, above and beyond the frequency that would be predicted by conventional count models, such as the Poisson. Therefore, estimation results might be biased if the Poisson regression model was used in studies with potentially zero‐inflated outcomes (Reference 9; some examples may be References 10, 11, 12). If the results from Poisson regression are included in a meta‐analysis, the biased estimation results might further bias the pooled result in a meta‐analysis. We, henceforth, refer to this bias as zero‐inflation bias throughout the study. The application of appropriate statistical approaches to accommodate zero‐inflated outcome data has been on the rise in recent years, with the availability of relevant software packages (eg, zero‐inflated and hurdle models in the pscl R package 13 ). However, there still exists a non‐ignorable number of publications that did not ideally account for zero‐inflation in analysis. For example, in a recent meta‐analysis study of 17 brief alcohol interventions, the proportions of participants reporting zero number of drinks were considerably high in nine studies, eight of which did not account for zero‐inflation in outcome reporting. 14 The studies that did not properly account for zero‐inflation may still be pooled in meta‐analysis studies for years to come. Thus, a methodological approach capable of correcting biased estimates from past studies would help facilitate AD meta‐analyses of zero‐inflated count outcomes into the future.
A zero‐inflated Poisson (ZIP) model is more appropriate for count data with many zeros, since it assumes that the outcome follows a mixture of a point mass at zero and a Poisson distribution. From a clinical perspective, the two components of the ZIP model correspond to two distinct subpopulations: (a) participants who predictably do not engage in the behavior, and (b) participants who may or may not engage in the behavior at a particular assessment. In some clinical situations, clinicians may focus on the latter as they are the primary target of their intervention (eg, whether an alcohol intervention helps reduce drinking for those who regularly engage in drinking; See Section 2.1 for two examples). In this article, we are interested in the incidence density ratio for intervention vs. control on the mean of the Poisson portion in the ZIP model, which is important for understanding the intervention effect among the subpopulation that may potentially engage in the behavior. Note that in other trial evaluation situations, modeling the overall mean of the outcome, which accommodates structural zeros, may be more desirable. 16
In this article, we focus on mitigating the impact of zero‐inflation bias in meta‐analysis and propose a novel statistical method, called the Zero‐inflation Bias Correction (ZIBC) method. This method corrects the biased intervention effect size estimation that can result from the conventional Poisson regression model, the “go‐to” method when modeling count outcomes. We aim to correct zero‐inflation bias and produce a bias‐corrected effect size estimate equivalent to the estimate from the ZIP regression model. This bias correction is achieved by comparing the estimating equations under the ZIP and Poisson models and using summary statistics of intervention and control subgroups. We will refer to the Poisson and ZIP regression models as the conventional and true methods, respectively, in the current paper.
The article proceeds as follows. In Section 2, we describe the formulation of the standard Poisson and ZIP regression models for a single study. We then introduce the ZIBC method for correcting zero‐inflation bias as well as how to apply it in an AD meta‐analysis. In Section 3, we conduct simulation studies to evaluate the performance of the ZIBC method in bias correction. In Section 4, we consider two real data examples. In the first example, we examine the intervention effects on alcohol use utilizing data drawn from an IPD meta‐analysis study. The example is used to demonstrate the performance of the ZIBC method, pretending we have only AD, which were derived from IPD. In the second example, we illustrate the application of the method in a clinical trial for preventing dental caries, utilizing AD from the published report. In this example, we have only AD and it is not possible to perform a meta‐analysis using IPD. In Section 5, we discuss the overall findings and conclusions.
2. METHOD: FROM SINGLE STUDY TO META‐ANALYSIS
In this section, we describe the ZIBC method and how it corrects zero‐inflation bias in an AD meta‐analysis. We first focus on the case of a single randomized clinical trial, where we set up notations for the true and conventional methods (Section 2.1). We then describe zero‐inflation bias (Section 2.2), and provide the ZIBC method that can correct it (Section 2.3). Next, for each clinical trial that originally used the conventional method for zero‐inflated outcomes, we implement the ZIBC method to obtain the bias‐corrected intervention effect estimate and conduct a standard meta‐analysis for the overall bias‐corrected intervention effect (Section 2.4).
2.1. Model setup: Single randomized clinical trial
2.1.1. True method: ZIP regression model
For a randomized clinical trial with two arms, we assume a count outcome with an excessive rate of zeros that follows a ZIP regression model. Suppose the study sample size is n, and for ith subject, , we assume that the outcome is distributed
(1) |
where is the structural zero rate and is the mean parameter of the Poisson portion for subject i. The mean of is .
In the context of intervention or prevention studies, the structural zeros correspond to participants that do not engage in the outcome (eg, alcohol abstainers who do not drink across situation and time), whereas the Poisson portion corresponds to those who may or may not engage in the behavior at a given time or situation (eg, participants who may or may not drink during the past month at 1‐month follow‐up). The present paper focuses on the Poisson portion characterizing the intervention effect on the latter, which is of interest in many harm‐reduction alcohol intervention studies. For example, in alcohol prevention and intervention trials among college students, researchers may be most interested in students who may drink if given an opportunity (eg, Section 4.1). Another example is clinical trials to prevent dental caries among children, where the outcome of interest is number of caries developed during a certain period (eg, Section 4.1). Among the trials, some children may be unlikely to develop dental caries (eg, due to good oral hygiene habits or protective genetic factors), while others have higher chances of developing them. Therefore, targeting the latter group of children, which can be characterized through the Poisson portion, may produce higher cost‐effectiveness and utility for dental caries prevention strategies.
The Poisson portion can be modeled as follows. Suppose covariates are included in the model and one of the covariates is the intervention assignment indicator , where denotes a participant's assignment to either the intervention (T) or control (C) arm, and denotes the remaining covariates. The Poisson mean parameter is estimated by the covariates in
(2) |
where and are the regression coefficients. Note that measures the intervention effect, the log incidence density ratio difference between the intervention and control groups, which is the parameter we aim to recover. We denote as the true regression parameters.
From Equations (1) and (2), the estimating equations under the true method is given by
By solving , we can obtain the maximum likelihood estimates (MLE), . As are the true parameters for the ZIP model (1), we also have and as by standard likelihood inference. Note that can be modeled separately in a logistic model. However, we do not attempt to model because it is not the interest of the current study.
2.1.2. Conventional method: Standard Poisson regression model
For the same trial design described in the previous subsection, some researchers (cf., References 11, 17, etc.) have used the conventional (CV) Poisson model to analyze potentially zero‐inflated count outcome with , where .
Under the conventional method, we derive the following estimating equation
(3) |
and denote as the solution of . Then are the parameter estimates in the conventional method, which are usually reported in each individual trial. Define as the solution of . By the standard asymptotic theory of M‐estimation (cf., Reference 18), we can show that , as . Since the estimating equations do not account for zero‐inflation, there is a discrepancy between and the true parameter values , so the intervention effect estimate from the conventional method, , is biased. In the current study, we focus on the MLE of the true intervention effect, (defined in Section 2.1.1), which can be recovered by modifying .
2.2. Zero‐inflation bias
In this section, we formally describe zero‐inflation bias as the difference between the parameters of the true method (ie, ) and those of the conventional method (ie, ). Denote as the zero‐inflation bias for all parameters, then , and . Since is of primary interest, we focus on the corresponding zero‐inflation bias for the intervention effect and the following formula .
We can characterize by taking a close look at the equations . Plugging in , Equation (3) can be recast as
(4) |
which shows the zero‐inflation bias is part of the solution of . However, and require participant‐level information, which is unavailable in AD meta‐analysis. Hence, Equation (4) cannot be solved directly. Alternatively, we can approximate by substituting and with study‐level summary information. We describe the approximation in detail in the following section.
2.3. Approximate bias : The ZIBC method
In this section, we describe the ZIBC method to approximate using Equation (4). First, we can simplify by
(5) |
where , , is the average structural zero rate, and are the average values for covariates in the sample. Thus, part of the participant‐level information (ie, and ) are substituted with the study‐level summary statistics (ie, and ) to approximate . From the approximation of , we transfer the problem of solving to solving , which is a function with respect to .
Rewrite , where , and , then , and a solution for is
(6) |
Thus, the MLE of the true intercept can be recovered by
(7) |
Note that the approximation of above is analogous to the expectation‐maximization (EM) algorithm, 19 where the expectation‐step occurs when plugging in and average values in Equations (4) and (5). The maximization‐step occurs implicitly when maximizing the log‐likelihood by taking the first derivative to obtain the estimating equations (ie, Equation (3)). The advantage of our method is that, after approximation, the estimating equations can be directly solved using summary statistics; therefore, iterations or IPD are not needed. A detailed derivation is provided in Appendix A in the Supplemental Materials.
However, cannot be obtained directly as in Equation (6). To get around this limitation, we can estimate by estimating the MLE of the intercept separately for the control and intervention groups, based on Equation (7). The specific steps are described as follows:
-
(S1)
Consider the sample as being comprised of two separate and independent groups: intervention and control.
-
(S2)
For each group, derive a bias‐corrected intercept from the conventional method using Equation (7).
-
(S3)
Merge the corrected intercepts of the two groups from (S2) to obtain the corrected intervention effect estimate. The details are given as follows.
Denote and as the index sets for control and intervention groups, respectively. We further denote and . We first consider control group. Since for , Equation (2) becomes . Denote and as the parameter estimates under the true and conventional methods, respectively. Based on Equation (7), we have
(8) |
where is the average structural zero rate in control group. For intervention group, since for , Equation (2) becomes . Note that the intercept becomes , which includes the intervention effect. Under similar arguments and notations, we then have
(9) |
where is the parameter estimate from the conventional method and is the average structural zero rate in intervention group. From Equations (8) and (9), we can see that the discrepancies in intercepts under the true and conventional methods are and for the control and intervention groups, respectively. Combining the two equations, we can obtain an estimate for the zero‐inflation bias , which is summarized in Lemma 1. The proof is provided in Appendix B in the Supplemental Materials.
Lemma 1
In a study given by Equations ( 1 ) and ( 2 ), denote the observed covariates excluding the intervention assignment as for . if , where and , then we have
(10)
We denote the adjusted intervention effect as . Lemma 1 gives the correction formula, Equation (10), of the proposed ZIBC method. The assumption requires that the “average” subject in control group has the same covariate values as the “average” subject in intervention group. In a typical two‐arm randomized controlled trial, subjects are randomized to either a control or intervention group. Thus the covariates should follow similar distributions across the groups. In addition, the participants in control and intervention groups are expected to be equivalent not only in all measured covariates but also in unmeasured ones. Hence, the assumption of Lemma 1 can reasonably hold in this case. Note that depends on the relative difference between the average structural zero rates of the two groups: if , if , and if . The zero‐inflation bias would be minor if the structural zero rates are similar for the control and intervention groups, as the respective influence of zero‐inflation cancel each other, even when the zero‐inflation itself may be strong. We conducted a simulation study in Section 3 to further evaluate the relationship.
The group‐level structural zero rates and can be estimated using the following algorithm. Take the control group, for , as an example, we have
(11) |
where , , and are the sample size, observed outcome average, and observed number of zero outcomes, respectively, for the control group. To estimate , we approximate Equation (11) by substituting with , and with , resulting in
(12) |
Here, is the proportion of zero outcome values in the control group. By solving Equation (12), we can get an approximation of . Similarly, we can get using the same process.
The data required for the ZIBC method are (a) , (b) , , and (c) , . In a typical trial study, (a) and (b) are directly reported or can be obtained, while (c) are less frequently reported but may be obtained via author queries to the investigators of original studies.
2.4. Implementation in meta‐analysis
Suppose an AD meta‐analysis contains K studies that used the conventional method to model zero‐inflated outcomes. For each of the K studies, we can apply the ZIBC method to obtain the bias‐corrected intervention effect , which occurs before combining data in a meta‐analysis. For simplicity, we use the reported standard errors from the conventional method. With the new set of intervention effects and standard errors, a standard AD meta‐analysis can be applied to combine results across studies and obtain the corrected overall intervention effect estimate. For example, a random‐effects meta‐analysis model, which assumes intervention effects to vary across studies, may be used when study heterogeneity needs to be accounted for in a meta‐analysis. 20 , 21
3. SIMULATION
We conducted simulation studies to examine the performance of the ZIBC method. Specifically, we compare relative performance of the following three methods:
ZIP regression model (ie, the true method), the “gold standard” method, which is not feasible in AD meta‐analysis,
Poisson regression model (ie, the conventional method), the method with zero‐inflation bias when the outcome is zero‐inflated, and
ZIBC method, the method to correct zero‐inflation bias from the conventional method and recover the intervention effect as if it came from the true method.
In the simulation study, we consider randomized clinical trials aimed at evaluating the effect of an intervention on reducing alcohol consumption, where the outcome is the number of standard alcohol drinks. For each trial, we incorporate an additional covariate that follows a standard normal distribution. The simulation was motivated by Project INTEGRATE, a large‐scale meta‐analysis project examining the effectiveness of brief alcohol interventions on reducing alcohol consumption among young adults. 22 High proportions of zero alcoholic drinks (ie, non‐drinking) were observed in most trials included in the study.
The settings of the simulation are based on our observation of the motivating data. Specifically, the sample sizes for individual trials are set at 200 and 400 for first and last half of the studies, respectively. For study with sample size , the outcome of ith subject () is simulated by a true ZIP regression model with probability , and 0 otherwise. The structural zero rate and Poisson mean parameter are simulated by and with a continuous covariate and intervention group assignment , where for one‐third of the studies, respectively, to allow for potential group imbalance. Note that we will examine in the Poisson portion; is used only to generate data sets, and will not be examined in the simulation study.
We examine the relative performance of the three methods under the following parameter settings:
-
1)
,
-
2)
, and
-
3)
.
Note that as the intervention effect () varies from to , the intercept () also varies accordingly to fix the maximum possible at the same level of 0.95.
To evaluate the impact of different degrees of zero‐inflation on the bias and performance of the methods, we varied the overall proportion of zero drinks at 0.2, 0.3, …, 0.8 among trials. Then , and can be calculated to yield the aforementioned zero rates. In the simulation, we fixed , indicating that participants in the intervention group will have a higher probability of no drinking, compared to the control. For example, more participants who previously drank may quit drinking after intervention, compared with their control counterparts. To ensure identifiability of , , and , one additional constraint needs to be applied, and in this simulation, we used . Other constraints were considered and examined, and their comparative results from simulation remained the same (results available upon request).
In one replication of the simulation, data from K intervention studies were generated. For each study, both the true and conventional methods were estimated first, then the ZIBC method was applied to modify the intervention effect estimate from the conventional method. Finally, for each of the three methods, we applied a random‐effects meta‐analysis model using the metafor R package, 23 and generated forest plots to compare performance between the methods.
Figure 1 shows a forest plot from a typical replication during simulation when , true intervention effect , and overall zero rate . Based on the results, we have the following four observations. First, the conventional method produced biased estimates of intervention effects for individual studies as well as the overall result after meta‐analysis. Specifically, the estimated zero‐inflation bias was positive (), as the structural zero rates of intervention groups () were higher than those of control groups () across the studies, according to Lemma 1. Second, the true method produced accurate intervention effect estimate, that is, close to , for each study and the overall effect across studies. Third, the ZIBC method corrected zero‐inflation bias in the right direction for each study. Finally, after meta‐analysis, the corrected overall estimate from the ZIBC method was very close to the true parameter value of , and the standard error was also close to that of the true method (0.035 vs. 0.036). In sum, this typical simulation replication illustrates that the ZIBC method reasonably corrects the biased intervention effect estimates from the conventional method.
Figure 1 graphically illustrates the good performance of the ZIBC method in a single simulation replication. To examine the performance numerically across replications, we compared the intervention effect estimates from the three methods with the true intervention effect by calculating the coverage indicator (1 if the 95% confidence interval covers and 0 otherwise) and differences with at each replication. After 1000 replications, we calculated the proportion of replications whose 95% confidence intervals captured (coverage rate), and the mean squared error (MSE) between the effect estimate and . To evaluate the practice of using for the ZIBC method in meta‐analysis, we calculated the average combined standard errors of the three methods (denoted as average , , and ), as well as the absolute percent relative difference of the conventional method or the ZIBC method against the true method (ie, ). We compared these indices across the methods.
Figure 2 presents the results for different simulation settings when . The comparative results when and 20 (Figures S1 and S2 in the Supplemental Materials) are more or less the same as the results of . From the results shown in Figure 2, first, the true method had the highest coverage rates, which were close to 0.95, and also had MSE values close to 0. Second, the conventional method resulted in biased intervention effect estimates, as indicated by low coverage rates and high MSE values. Note that as zero rates increased, zero‐inflation bias became greater, leading to progressively lower coverage rates and higher MSE values. Third, the ZIBC method had acceptable coverage rates close to 0.9 and low MSE that were close to 0. Furthermore, the performance of the ZIBC method was consistent across different zero rates between 0.2 and 0.8. Table 1 presents the average combined standard errors and absolute percent relative difference of conventional vs. true method and the ZIBC vs. true method when . Compared to the conventional method, the ZIBC method had lower absolute percent relative differences, which were within 3%, in all scenarios for zero rates . For the zero rate of , the absolute percent relative difference of the ZIBC method increased dramatically. This is because as the zero rate approaches to 1, the structural zero rates will also approach to 1, so a small variation in (and ) would lead to a more drastic variation in (and ) in the correction formula (ie, Equation (10)). This produces higher standard errors around the parameter estimates. Thus, we recommend using the ZIBC method with caution when the zero rate is 80% or higher. Based on the comparative results on both the intervention effect estimates and standard errors, the ZIBC method provides reasonable correction for the intervention effect from the conventional method in AD meta‐analysis across a wide range of zero inflation.
TABLE 1.
Average | Average | Average | APRD | APRD | ||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Zero rate |
|
|
|
ZIBC vs. true | CV vs. true | ||||
|
0.2 | 0.029 | 0.029 | 0.028 | 0.003 | 0.047 | ||||
0.3 | 0.032 | 0.031 | 0.032 | 0.021 | 0.005 | |||||
0.4 | 0.035 | 0.034 | 0.038 | 0.016 | 0.090 | |||||
0.5 | 0.039 | 0.038 | 0.046 | 0.011 | 0.196 | |||||
0.6 | 0.044 | 0.044 | 0.057 | 0.007 | 0.283 | |||||
0.7 | 0.053 | 0.053 | 0.071 | 0.001 | 0.341 | |||||
0.8 | 0.068 | 0.093 | 0.091 | 0.363 | 0.332 | |||||
|
0.2 | 0.029 | 0.030 | 0.028 | 0.007 | 0.052 | ||||
0.3 | 0.033 | 0.032 | 0.032 | 0.025 | 0.024 | |||||
0.4 | 0.036 | 0.035 | 0.038 | 0.019 | 0.061 | |||||
0.5 | 0.040 | 0.039 | 0.046 | 0.014 | 0.164 | |||||
0.6 | 0.045 | 0.045 | 0.057 | 0.012 | 0.254 | |||||
0.7 | 0.054 | 0.056 | 0.071 | 0.031 | 0.312 | |||||
0.8 | 0.070 | 0.081 | 0.090 | 0.163 | 0.300 | |||||
|
0.2 | 0.030 | 0.031 | 0.028 | 0.022 | 0.052 | ||||
0.3 | 0.034 | 0.033 | 0.032 | 0.029 | 0.045 | |||||
0.4 | 0.037 | 0.036 | 0.038 | 0.024 | 0.033 | |||||
0.5 | 0.041 | 0.040 | 0.046 | 0.020 | 0.130 | |||||
0.6 | 0.046 | 0.046 | 0.057 | 0.015 | 0.220 | |||||
0.7 | 0.056 | 0.057 | 0.070 | 0.022 | 0.267 | |||||
0.8 | 0.071 | 0.082 | 0.090 | 0.154 | 0.268 |
Abbreviation: APRD, absolute percent relative difference.
We conducted an additional simulation study to further verify the relationship between and the relative difference of and inferred from Lemma 1. Note that the structural zero rates between intervention and control groups are controlled by , we, therefore, consider , which represent , , and , respectively, in the simulation. We also consider zero rates of 0.2, 0.4, 0.6, and 0.8, and . For each pair of and zero rate, under a sample size of 400, Table 2 presents the average , , , , , and under 1000 replications. Regardless of the actual zero rates (from 0.2 to 0.8), we observe, on average, when (for ), when (for ), and when (for ), which is consistent with Lemma 1. In addition, we observe that is biased when , whereas is close to in all settings, suggesting that the proposed ZIBC method can provide reasonable correction for the bias in a wide range of situations.
TABLE 2.
Average | Average | Average | Average | Average | Average | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Zero rate |
|
|
|
|
|
|
|||||||
|
0.2 |
|
0.119 | 0.155 |
|
|
|
|||||||
0.4 |
|
0.300 | 0.400 |
|
|
|
||||||||
0.6 |
|
0.505 | 0.617 |
|
|
|
||||||||
0.8 |
|
0.736 | 0.810 |
|
|
|
||||||||
0 | 0.2 |
|
0.144 | 0.135 |
|
|
|
|||||||
0.4 | 0.000 | 0.358 | 0.350 |
|
|
|
||||||||
0.6 |
|
0.568 | 0.562 |
|
|
|
||||||||
0.8 | 0.003 | 0.778 | 0.775 |
|
|
|
||||||||
0.5 | 0.2 | 0.039 | 0.166 | 0.115 |
|
|
|
|||||||
0.4 | 0.153 | 0.414 | 0.302 |
|
|
|
||||||||
0.6 | 0.282 | 0.631 | 0.506 |
|
|
|
||||||||
0.8 | 0.391 | 0.821 | 0.738 |
|
|
|
4. REAL DATA ANALYSIS
4.1. Analysis 1: Project INTEGRATE
Project INTEGRATE is a large‐scale IPD meta‐analysis study examining the overall efficacy and comparative effectiveness of brief alcohol interventions for young adults. 22 A recent IPD meta‐analysis of 6713 participants from 17 randomized controlled trials examined the effect of intervention on the total number of drinks consumed in a typical week, a count variable with a high percentage of zeros. 14 Across all studies, an average of 30% of individuals reported zero drinking, with the highest proportion of zero drinking being 66% in one study.
In this section, we evaluate the performance of the ZIBC method in a real data application. We compared the meta‐analysis results between the true, conventional and ZIBC methods using publicly available IPD from Project INTEGRATE. 24 As in Section 3 of the simulation study, IPD were used to estimate parameters from the true and conventional methods. The ZIBC method was conducted using summary statistics from the conventional method (including standard errors for subsequent meta‐analysis), mimicking a real data analysis setting where study reports with only summary statistics are available. Intervention studies included in the current study (a) randomly allocated participants to an intervention or control group, (b) had a follow‐up within 6 months from baseline, and (c) had at least one zero outcome in a study. Ten of the 17 studies met the criteria (studies 2, 7 (7.1 and 7.2), 9, 11, 13/14, 15, 16, 18, and 21). For more details of the studies, please refer to References 8, 14, 22. The outcome was the average drinks on a typical drinking day in the most recent follow‐up assessment within 6 months, with a fixed assessment time for each study. We included the intervention group assignment as the only covariate.
The comparative results across the three methods are presented in a forest plot (Figure 3). Since the interventions aimed at reducing the number of alcohol drinks, a negative log incidence rate ratio represents a favorable intervention effect. For most studies, the conventional method produced biased estimates of intervention effect, compared with the true method. Specifically, the zero‐inflation bias was positive (ie, ) in studies 9 and 16, whereas the bias was negative (ie, ) in studies 2, 7.1, 7.2, 15, and 18. In studies 11, 13/14, and 21, the bias was negligible. Note that although study 11 had very strong zero‐inflation (; ), because was close to , the zero‐inflation bias was very minor. For studies 13/14 and 21, similarly, the zero‐inflation bias was small because was close to . This observation is in line with Lemma 1 such that the direction and magnitude of zero‐inflation bias depends on the relative difference in structural zero rates between two groups. We also note that the standard errors from the conventional method were identical up to the second or third decimal place to their counterparts from the true method in each individual study, so the width of the confidence intervals was nearly the same across all three methods. Since the ZIBC method adjusted the effect estimates to the correct level, the confidence intervals of the ZIBC and true methods were nearly the same.
The data example demonstrates that the ZIBC method corrects zero‐inflation bias regardless of the directions of the bias in the meta‐analysis. In conclusion, the ZIBC method showed good performance in correcting zero‐inflation bias for study‐specific intervention effects as well as the overall pooled intervention effect in meta‐analysis in a real data analysis setting.
4.2. Analysis 2: A dental caries prevention clinical trial
We illustrate the application of the ZIBC method using a randomized controlled trial in dental caries prevention. 11 The study was aimed at evaluating whether the bucco‐lingual technique could increase the effectiveness of a tooth brushing program on preventing dental caries (ie, cavities) among five‐year‐old children. This study was a two‐arm trial that randomized participants to either a conventional tooth brushing program (Control) or a modified tooth brushing program (Intervention). The outcome of interest was the number of enamel and dentin caries at 18‐month follow up, which exhibited considerable zero‐inflation, with rates up to 67%. The conventional Poisson regression model was used to evaluate the intervention effect in the original study. The analysis was stratified by gender due to baseline imbalance in covariates. Since a high proportion of participants did not develop any dental caries, the presence of zero‐inflation bias in the intervention effect estimates from the original study is reasonable to assume. In this example, although IPD were not available, all of the summary information needed to implement the ZIBC method could be extracted from the study report.
We apply the ZIBC method here in order to examine the potential zero‐inflation bias. First, we extracted the required information from the original study (see Table 3). Specifically, the uncorrected effects (ie, and ) were calculated from incidence density ratios (IDR) and 95% confidence intervals in the original Table 3 from Reference 11, and the arm‐level outcome averages and the proportion of zeros (ie, , , and ) were obtained directly from the original Figure 2 by using software WebPlotDigitizer version 4.2. 25 We then estimated the arm‐level average structural zero rates and by solving Equation (12) , which were 49% and 32%, respectively, for girls, and 27% and 45%, respectively, for boys. Finally, we obtained the corrected intervention effect estimates by plugging the values of , , and into Equation (10). Using the original standard errors , we obtained the modified P‐values based on the Wald test.
TABLE 3.
Summary information | Data source | Girls | Boys | ||
---|---|---|---|---|---|
|
Table 3 | 0.29 |
|
||
|
Table 3 | 0.28 | 0.30 | ||
|
Figure 2 (with WebPlotDigitizer) | 0.83 | 1.04 | ||
|
Figure 2 (with WebPlotDigitizer) | 1.06 | 0.49 | ||
|
Figure 2 (with WebPlotDigitizer) | 59% | 45% | ||
|
Figure 2 (with WebPlotDigitizer) | 47% | 67% |
The original and ZIBC method‐corrected results are summarized in Table 4. According to the original analysis, girls receiving the modified tooth brushing program tended to develop more caries with an IDR of 1.34, suggesting a potentially negative or harmful intervention effect. After applying the ZIBC method to adjust for the zero‐inflation, the IDR was corrected to 1.01, suggesting a null intervention effect. Note that the intervention effect was statistically insignificant before and after the correction, so the statistical conclusion did not change after applying the ZIBC method. For boys, the original analysis reported a significant protective intervention effect with an IDR of (P‐value = .02). After applying the ZIBC method, the intervention effect was reduced to an IDR of and became statistically insignificant (P‐value = .13), suggesting that the original statistical conclusion may not be valid.
TABLE 4.
Estimate | IDR | P‐value | |||
---|---|---|---|---|---|
Girls | Original | 0.29 | 1.34 | 0.29 | |
Corrected | 0.01 | 1.01 | 0.97 | ||
Boys | Original |
|
0.48 | 0.02 | |
Corrected |
|
0.63 | 0.13 |
In meta‐analysis, the statistical significance of an intervention effect in an individual study is less important than its magnitude and uncertainty, which can influence the overall pooled result. Therefore, adjusting biased effect sizes would improve precision in drawing statistical inference. When evaluating the effect of the tooth brushing program on dental caries, it would be better to utilize bias‐corrected estimates rather than estimates from the original report. Namely, 0.01 for girls and –0.46 for boys. Standard errors can also be taken from the original study (ie, 0.28 for girls; 0.30 for boys) because we found standard errors from the original study can reasonably substitute unknown standard errors associated with the bias‐corrected intervention effect estimates (see Table 1 and Figure 3).
5. DISCUSSION AND CONCLUSION
In this article, we propose the ZIBC method to correct zero‐inflation bias that may arise in the intervention effect estimates of clinical trials with excessive zero outcome values in AD meta‐analysis. Specifically, this method aims to recover the intervention effect estimates from a conventional Poisson model as if they were appropriately estimated in a ZIP model. The ZIBC method works well when one can use the information of the “average” subject in the sample to approximate the study result, as we substitute IPD required in the estimating equations with their group‐level average values to relax the IPD requirement. The idea of substituting IPD with average values is in line with the Mean Value Theorem for Integrals and the EM algorithm. The statistical property of the ZIBC method is justified by Lemma 1, which is based on the assumption that the characteristics (or covariates) of “average” subjects in control and intervention groups are similar, which should hold in randomized controlled trials due to random assignment to groups. In other situations where the assumption is not met, such as case‐control or cross‐sectional studies, the ZIBC method should be used with caution. In addition, by imposing linear predictors in the true ZIP regression model (ie, Equation (2)), we implicitly assume no intervention by covariate interactions on the outcome, which should hold in most trials. We note that the intervention effect targeted by the ZIBC method is the mean difference between two groups and cannot be interpreted as a causal effect. 26 If one is interested in drawing causal inference, then issues of noncompliance 27 and assumptions of temporal stability, causal transience, and unit homogeneity 28 need to be taken into consideration.
The adjusted intervention effect estimates from the ZIBC method correspond to the Poisson portion in the ZIP model, characterizing the subpopulation that may or may not engage in the targeted behavior, which may be of greater interest in certain meta‐analyses. In contrast, the intervention effect estimates derived from the conventional Poisson model pertain to the entire population. However, the ZIBC method can be used to incorporate these studies into a meta‐analysis focusing on the subpopulation described above. In practice, we recommend that researchers check the population of interest before applying the ZIBC method when conducting meta‐analysis. We also acknowledge the ideal way for combining studies with presumably biased effects is to communicate with the original investigators and request IPD, so that meta‐analysts can re‐analyze raw data using statistical methods that are most suited to the research question. However, IPD may not be available due to data sharing restrictions and other resource limitations. The proposed ZIBC method can serve as a practical alternative to adjust for zero‐inflation bias in an AD meta‐analysis when obtaining original data is not feasible.
In data analysis, having a high proportion of zeros does not necessarily mean that zero‐inflation bias exists in the estimated intervention or treatment effect size. The ZIBC method should be considered only when a proportion of zeros in data exceeds the expected proportion given a Poisson parameter. For example, when the mean of a Poisson distribution is equal to 1, the expected zero rate is 36.8%. This high rate of zeros would be in line with the Poisson model when the average value of the outcome is low in quantity (eg, 1) and there would be no need for the ZIBC method even if the actual zero rate were as high as 40%. We recommend that the ZIBC method be used when the actual zero rate is much higher than the one expected when fitting a Poisson model. In another example, the mean number of drinks following an alcohol intervention would be usually much higher than 1 (eg, 3 drinks). For the Poisson distribution with a mean of 3, its corresponding expected zero rate is less than 5.0%. Therefore, an actual zero rate of 20% or higher would signal a need to account for zero‐inflation bias. Additionally, in the specific context of bias correction for an intervention effect size estimate, as illustrated by Lemma 1, zero‐inflation bias may not occur when the intervention and control groups have similar zero rates, even when zero rates in both groups are high (eg, study 11 in Project INTEGRATE). In situations where there is a difference in zero rates between groups, we recommend that the ZIBC method be used. Note that the Poisson model is nested within the ZIP model, so misspecifying the ZIP model when the Poisson model is accurate will not lead to biased estimates but will result in efficiency loss due to the estimation of additional parameters. The consequence of incorrectly specifying a ZIP model when data follow a Poisson distribution is relatively minor, while the opposite would lead to a biased estimate. Therefore, when the proportion of observed zeros is considerably higher than what was expected or when there is a difference in the proportions of zeros between groups, the ZIBC method can be considered.
The ZIBC method adjusts the intervention effect for each of the studies separately and independently, which occurs before combining data for meta‐analysis. After correcting any zero‐inflation bias for each individual trial, modified intervention effects are then combined in AD meta‐analysis to obtain a more accurate overall result. Note that the ZIBC method only targets the mean intervention effect estimates, corresponding to a first‐order correction. It would be theoretically attractive to adjust standard errors for zero inflation bias as well, a second‐order correction. However, it is beyond the scope of the current study and can be investigated in future studies. For simplicity, we used the standard errors from the Poisson models when conducting AD meta‐analysis, which showed reasonable performance in our simulation study and real data examples.
The ZIBC method minimally requires summary information for its correction. In many situations, all the required data can be directly obtained from study reports (eg, the real data example in Section 4.2). It also requires the group‐level outcome zero rates, which sometimes may not be described in study reports but can be obtained through inquiries with original investigators, or an educated guess when prior information or expert knowledge is available. Note that the outcome average and zero rate are sufficient statistics for a ZIP distribution, so they are good substitutes for IPD when only AD are available.
The ZIBC method we describe can be extended in the future in several ways. First, although we illustrate the ZIBC method in the context of a two‐arm trial design, it can be applied to multi‐arm trials by sequentially comparing each intervention group with control and correcting the biased intervention effect per pair. Second, aside from the ZIBC method, alternative strategies may be investigated for their feasibility and validity when adjusting the estimating equations for zero‐inflation bias. One potential strategy is to generate pseudo IPD based on AD of outcome and each covariate, and then solve for using the pseudo data, which is similar to the idea of Approximate Bayesian Computing (see, eg, References 29, 30). Finally, the proposed method is designed to recover biased intervention effect estimates from the conventional Poisson model when the ZIP regression model should have been used; however, it can be extended to other statistical models with appropriate adjustments, such as a negative binomial regression model and a two‐sample t‐test, which can be thought of as a Wald test in a simple linear regression with intervention group membership as the lone covariate.
Supporting information
ACKNOWLEDGMENTS
This work was supported by the National Institute on Alcohol Abuse and Alcoholism (NIAAA) grants R01 AA019511, K02 AA028630 and the National Science Foundation (NSF) grants DMS1737857, 1812048, 2015373 and 2027855. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIAAA, the National Institutes of Health, or the NSF.
Zhou Z, Xie M, Huh D, Mun E‐Y. A bias correction method in meta‐analysis of randomized clinical trials with no adjustments for zero‐inflated outcomes. Statistics in Medicine. 2021;40(26:5894–5909. 10.1002/sim.9161
Funding information National Institute on Alcohol Abuse and Alcoholism, R01 AA019511; K02 AA028630; National Science Foundation, DMS1737857; DMS1812048; DMS2015373; DMS2027855
Contributor Information
Zhengyang Zhou, Email: zhengyang.zhou@unthsc.edu.
Eun‐Young Mun, Email: eun-young.mun@unthsc.edu.
DATA AVAILABILITY STATEMENT
The data from Project INTEGRATE used in this article to illustrate our findings are openly available in Mendeley Data at http://doi.org/10.17632/4dw4kn97fz.2. 24
REFERENCES
- 1. Schmid CH, Stijnen T, White I. Handbook of Meta‐Analysis. Boca Raton, FL: CRC Press; 2020. [Google Scholar]
- 2. Sutton AJ, Higgins JPT. Recent developments in meta‐analysis. Stat Med. 2008;27:625‐650. [DOI] [PubMed] [Google Scholar]
- 3. Lyman GH, Kuderer NM. The strengths and limitations of meta‐analyses based on aggregate data. BMC Med Res Methodol. 2005;5:14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Chen D‐G, Liu D, Min X, Zhang H. Relative efficiency of using summary versus individual data in random‐effects meta‐analysis. Biometrics. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Liu Y, Chen Y. Avenues for further research. Diagnostic Meta‐Analysis. New York, NY: Springer; 2018:305‐315. [Google Scholar]
- 6. Garcia HH, Pretell EJ, Gilman RH, et al. A trial of antiparasitic treatment to reduce the rate of seizures due to cerebral cysticercosis. N Engl J Med. 2004;350:249‐258. [DOI] [PubMed] [Google Scholar]
- 7. Silcocks P, Whitham D, Whitehouse WP. P3MC: a double blind parallel group randomised placebo controlled trial of Propranolol and Pizotifen in preventing migraine in children. Trials. 2010;11:71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Huh D, Mun E‐Y, Walters ST, Zhou Z, Atkins DC. A tutorial on individual participant data meta‐analysis using Bayesian multilevel modeling to estimate alcohol intervention effects across heterogeneous studies. Addict Behav. 2019;94:162‐170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Horton NJ, Kim E, Saitz Richard A. Cautionary note regarding count models of alcohol consumption in randomized controlled trials. BMC Med Res Methodol. 2007;7:1‐9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Milgrom P, Ly KA, Tut OK, et al. Xylitol pediatric topical oral syrup to prevent dental caries: a double‐blind randomized clinical trial of efficacy. Arch Pediatr Adolesc Med. 2009;163:601‐607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Frazão P. Effectiveness of the bucco‐lingual technique within a school‐based supervised toothbrushing program on preventing caries: a randomized controlled trial. BMC Oral Health. 2011;11:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Kelly JF, Kaminer Y, Kahler CW, et al. A pilot randomized clinical trial testing integrated 12‐Step facilitation (iTSF) treatment for adolescent substance use disorder. Addiction. 2017;112:2155‐2166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Zeileis A, Kleiber C, Jackman S. Regression models for count data in R. J Stat Softw. 2008;27:1‐25. [Google Scholar]
- 14. Huh D, Mun E‐Y, Larimer ME, et al. Brief motivational interventions for college student drinking may not be as powerful as we think: an individual participant‐level data meta‐analysis. Alcohol Clin Exp Res. 2015;39:919‐931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Lambert D. Zero‐inflated Poisson regression, with an application to defects in manufacturing. Technometrics. 1992;34:1‐14. [Google Scholar]
- 16. Long DL, Preisser JS, Herring AH, Golin CE. A marginalized zero‐inflated Poisson regression model with overall exposure effects. Stat Med. 2014;33:5151‐5165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Murphy SW, Foley RN, Barrett BJ, et al. Comparative hospitalization of hemodialysis and peritoneal dialysis patients in Canada. Kidney Int. 2000;57:2557‐2563. [DOI] [PubMed] [Google Scholar]
- 18. Serfling RJ. Approximation Theorems of Mathematical Statistics. Hoboken, NJ: John Wiley & Sons; 1980. [Google Scholar]
- 19. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B (Methodol). 1977;39:1‐22. [Google Scholar]
- 20. DerSimonian R, Laird N. Meta‐analysis in clinical trials. Control Clin Trials. 1986;7:177‐188. [DOI] [PubMed] [Google Scholar]
- 21. DerSimonian R, Kacker R. Random‐effects model for meta‐analysis of clinical trials: an update. Contemp Clin Trials. 2007;28:105‐114. [DOI] [PubMed] [Google Scholar]
- 22. Mun E‐Y, de la Torre J, Atkins DC, et al. Project INTEGRATE: an integrative study of brief alcohol interventions for college students. Psychol Addict Behav. 2015;29:34‐48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Viechtbauer W. Conducting meta‐analyses in R with the metafor package. J Stat Softw. 2010;36:1‐48. [Google Scholar]
- 24. Huh D, Mun EY, Walters ST, Zhou Z, Atkins DC. Data and code for: a tutorial on individual participant data meta‐analysis using Bayesian multilevel modeling to estimate alcohol intervention effects across heterogeneous studies. Mendeley Data, V2; 2019. [DOI] [PMC free article] [PubMed]
- 25. Ankit R. WebPlotDigitizer version: 4.2; 2019. https://automeris.io/WebPlotDigitizer
- 26. Zheng C, Dai R, Gale RP, Zhang M‐J. Causal inference in randomized clinical trials. Bone Marrow Transplant. 2020;4‐8. [DOI] [PubMed] [Google Scholar]
- 27. Frangakis CE, Rubin DB. Principal stratification in causal inference. Biometrics. 2002;58:21‐29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Holland PW. Statistics and causal inference. J Am Stat Assoc. 1986;81:945‐960. [Google Scholar]
- 29. Marin J‐M, Pudlo P, Robert CP, Ryder RJ. Approximate Bayesian computational methods. Stat Comput. 2012;22:1167‐1180. [Google Scholar]
- 30. Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian computation in population genetics. Genetics. 2002;162:2025‐2035. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data from Project INTEGRATE used in this article to illustrate our findings are openly available in Mendeley Data at http://doi.org/10.17632/4dw4kn97fz.2. 24