Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Dec 13.
Published in final edited form as: Stat Med. 2025 Sep;44(20-22):e70260. doi: 10.1002/sim.70260

Equivalency between the Generalized Bivariate Bernoulli Model Dependency Test and a Logistic Regression Model with Interaction Effects

Kazi Md Farhad Mahmud 1, Yanming Li 1, Devin C Koestler 1,*
PMCID: PMC12699481  NIHMSID: NIHMS2118154  PMID: 40916462

Abstract

Background:

Binary endpoints measured at two timepoints—such as pre- and post-treatment—are common in biomedical and healthcare research. The Generalized Bivariate Bernoulli Model (GBBM) provides a specialized framework for analyzing such bivariate binary data, allowing for formal tests of covariate-dependent associations conditional on baseline outcomes. Despite its potential utility, the GBBM remains underutilized due to the lack of direct implementation in standard statistical software. Moreover, we contend that the comparison made in the original publication between the GBBM dependency test and the regressive logistic regression model has shortcomings and does not provide an ideal basis for evaluating the model's performance.

Methods:

In this paper, we propose a standard logistic regression model with an interaction term and demonstrate that it yields an equivalent dependency test to the GBBM approach. This equivalence is established conceptually, theoretically, and empirically. Extensive simulations compared the power of the GBBM dependency test with: a) dependency test from regressive logistic model; b) test derived from the logistic regression model with interaction; and c) the Pearson Chi-square test. We also applied these methods to infant mortality data from the Bangladesh Demographic & Health Survey (BDHS).

Results:

The power of the GBBM dependency test differs from the regressive logistic regression model used as a benchmark in the original paper that introduced the GBBM methodology. In contrast, the power and type 1-error rate of the GBBM dependency test and the logistic regression model with interaction described herein are equivalent across varying effect sizes and sample sizes.

Conclusion:

Our work reveals that a widely available and flexible logistic regression model can serve as a practical alternative to the GBBM dependency test, enhancing accessibility for researchers. Moreover, this approach provides a foundation for extending dependency analyses to more complex longitudinal binary data structures, broadening its applicability in biomedical research.

Keywords: Longitudinal binary endpoints generalized linear models, repeated measures

1. INTRODUCTION

Dependence in outcome variables frequently arises in various fields. This is especially true for studies involving repeated measures or observations on the same unit/subject over time, particularly when binary outcomes are measured at two time points, such as pre- and post-intervention. For example, in genomics, genetic association studies often assess disease status at baseline and follow-up to examine how genetic variants affect disease progression, highlighting the need to model dependencies between time points1. In behavioral health, smoking cessation studies evaluate smoking status before and after interventions to understand the impact of personal habits2. Clinical research frequently employ randomized pre-post designs to compare treatment efficacy with disease status measured before and after treatment3. Similarly, in the psychological research field, assessment of stress-levels pre- and post-intervention, have been employed to evaluate mindfulness therapies4. Vaccine efficacy studies assess infection status before and after vaccination to determine how well the vaccine works5. Recognizing the types of studies that are prone to dependency between repeated measurements of the same endpoint(s) and accurately modeling these dependencies is crucial for both valid interpretation and to avoid misleading conclusions.

Several marginal approaches have been employed to model repeated binary outcomes arising from longitudinal studies6. Noteworthy among these are Generalized Estimating Equation (GEE), introduced and developed by Liang and Zeger7, Lipsitz et al.8, and others911. Additional approaches include: the use of dependence measures for binary data with logistic regression12, the quadratic exponential form model13,14, and the generalized multivariate logistic model15. Several studies, including those by Wakefield16, have explored the limitations of marginal models, particularly in relation to Simpson’s paradox17. In contrast to marginal approaches, fewer studies have focused on conditional approaches. We briefly note that a conditional model describes the probability of an outcome given both the covariates and a previous outcome, capturing subject-specific dependence, whereas a marginal model describes the average relationship between the covariates and the outcome across the population, without conditioning on prior outcomes. Key contributions in this area include the development of Markov models for covariate dependence1820 17. However, Islam et al.6 later argued that marginal or conditional models alone are insufficient to fully address the dependence in correlated outcome variables without incorporating a joint model. To address this issue, Islam et al. proposed a joint model called the Generalized Bivariate Bernoulli Model (GBBM). GBBM integrates both marginal and conditional probabilities of correlated binary events, allowing the joint function to be fully specified. Furthermore, Islam et al. introduced a test for association, called the “dependency test”, which allows one to examine dependency between the relationship of a covariate(s) and the odds of the event, conditional on the outcome of the event at the earlier time point. While the GBBM model exclusively addresses repeated measures of binary outcomes collected at only two time points, it is essential to acknowledge its broader applicability in various fields.

While working with the joint model proposed by Islam et al., we identified a significant gap in the availability of packages for established statistical software such as R, SAS, and STATA that support the analysis of data via the GBBM. Upon further investigation, we realized that the parameterization of GBBM and the dependency test introduced by Islam et al. can be estimated using a more straightforward and easily implementable marginal logistic regression model that includes interaction effects. The latter allows for parameter estimation and the dependency test to be carried out using existing packages in established statistical software. Additionally, we identified certain points of contention in the original paper regarding the authors comparison of their proposed dependency test with what they referred to as the regressive logistic model. We find it valuable to further explore and clarify these differences, particularly in relation to various types of dependency tests for binary outcomes.

The primary objective of this study is to provide an alternative to the joint model proposed by Islam et al. that can be readily implemented using existing software and packages. To achieve this objective, we provide both a theoretical/conceptual comparison of the proposed logistic regression model with interaction effects and the GBBM dependency test, as well as an empirical comparison of parameter estimates and test results between the two approaches. Our applied comparison involved extensive simulation studies as well as an analysis of infant mortality data across consecutive births from the same mother using the Bangladesh Demographic & Health Survey (BDHS) Data21. Finally, we demonstrate that the comparison of the GBBM dependency test to the regressive logistic model reported in Islam et al. has shortcomings and as a consequence, does not allow for a comprehensive comparison of the two approaches.

2. METHODS

2.1. Generalized Bivariate Bernoulli Model (GBBM)

To address within-subject dependence in repeated measures of a binary outcomes, Islam et al.6 proposed the Generalized Bivariate Bernoulli Model (GBBM). GBBM uses a marginal-conditional approach to construct a joint model. Specifically, the joint distribution of two binary outcomes Y1{0,1} and Y2{0,1} is modeled using the Bivariate Bernoulli distribution (Eq. 1), which effectively captures the joint behavior of the two binary variables:

PY1=y1,Y2=y2=P001-y11-y2P011-y1y2P10y11-y2P11y1y2 (Eq. 1)

where Prs,r=0,1 and s=0,1 are the joint probabilities corresponding to different combinations of Y1 and Y2. The joint probability mass function can be represented in the exponential family form, as indicated in Eq.2:

PY1=y1,Y2=y2=expy1logP10P00+y2logP01P00+y1y2logP00P11P01P10+logP00 (Eq.2)

Following from Eq.2 and assuming n independent samples, the log-likelihood function for the Bivariate Bernoulli distribution is given by:

l=i=1ny1ilogP10iP00i+y2ilogP01iP00i+y1iy2ilogP00iP11iP01iP10i+logP00i (Eq.3)

where i=1,2,,n and,

η0logP00,η1logP01P00,η2logP10P00η3logP00P11P01P10

As defined above, η0, η1, η2 and η3 represent the link functions used to relate the probabilities to covariates. Following from the relationship between joint, conditional, and marginal probabilities, the joint probability of Y1 and Y2 given in Eq. 1 can be expressed as:

PY1i=y1i,Y2i=y2i=PY2i=y2i|Y1i=y1i.PY1i=y1i (Eq.4)

which allows one to model the dependence between Y1 and Y2 using conditional relationships. When covariates X are introduced, Eq.4 becomes:

PY1i=y1i,Y2i=y2i|Xi=xi=PY2i=y2i|Y1i=y1i;Xi=xi.PY1i=y1i|Xi=xi (Eq.5)

As per Islam et al., the conditional probabilities in Eq.5, are modeled assuming a logit link function, resulting in the following expressions:

P(Y2i=1Y1i=0;Xi=xi=eα01+β01xi1+eα01+β01xi (Eq.6)
P(Y2i=1Y1i=1;Xi=xi=eα11+β11xi1+eα11+β11xi (Eq.7)

where, β01 is the effect of covariate X on the probability that Y2=1 conditioning on Y1=0, β11 is the effect of covariate X on the probability that Y2=1 conditioned on Y1=1, and (α01, α11) represent the corresponding intercept parameters. Similarly, the marginal probability of Y1 can be written as a function of the covariates. Again, assuming a logit-link, we have:

PY1i=1|Xi=xi=eα1+β1xi1+eα1+β1xi (Eq.8)

where, β1 is the effect of covariate X on the probability of Y1=1 and α1 is the intercept parameter. Given the above, the joint probabilities of Y1 and Y2 conditional on covariates X, can be written as follows:

P01xi=PY2i=1|Y1i=0;Xi=xi.PY1i=0|Xi=xi=eα01+β01xi1+eα01+β01xi.11+eα1+β1xi (Eq.9)
P11xi=PY2i=1|Y1i=1;Xi=xi.PY1i=1|Xi=xi=eα11+β11xi1+eα11+β11xi.eα1+β1xi1+eα1+β1xi (Eq.10)
P00xi=PY2i=0|Y1i=0;Xi=xi.PY1i=0|Xi=xi=11+eα01+β01xi.11+eα1+β1xi (Eq.11)
P10xi=PY2i=0|Y1i=1;Xi=xi.PY1i=1|Xi=xi=11+eα11+β11xi.eα1+β1xi1+eα1+β1xi (Eq.12)

As described in Islam et al., estimating equations for the parameters of the GBBM model are derived from the log-likelihood function (Eq.3) using the above joint probabilities (Eq.9Eq.12). Because there is no closed from analytic solution for model parameters, Newton-Raphson is used to estimate the parameters: β01=α01,β01T, β11=α11,β11T, β1=α1,β1T. We draw the reader’s attention to the above expressions for the link functions, η0, η1, η2 and η3, and note that the dependency between Y1 and Y2 is captured by η3. Substituting the expressions given for the joint probabilities (Eq.9Eq.12) and writing η3 in terms of β01, and β11, we obtain the following expression:

η3=lnP00xP11(x)P01xP10(x)=xβ11-β01

Where x=1,x. When η3=0 and assuming x0, that implies that β11-β01=0 or equivalently, that β11=β01. That is, the relationship between X and Y2 is the same irrespective of the outcome Y1. In other words, the relationship between X and Y2 is not dependent on Y1 (e.g., no dependency). Islam et al. formally assess the dependency of Y2 on Y1 by developing a dependency test using a Wald’s test (Eq. 13) to test the null hypothesis H0:β01=β11.

χ2=β^01-β^11TVar^β^01-β^11-1β^01-β^11 (Eq.13)

Under the null hypothesis, the test statistic (Eq. 13) follows central χ2 with 2 degrees of freedom.

2.2. Regressive Logistic Regression Model

Islam et al. compared the performance of the proposed test for dependence (Eq.13) with an alternative test deriving from a “regressive logistic model”. The regressive model represents the conditional part (not the entire full model), wherein the probability of Y2 is modeled conditional on previous outcome, Y1, and explanatory variable, X, as follows:

PY2i=1|Y1i=y1i;Xi=xi=eα1+β1xi+γy1i1+eα1+β1xi+γy1i (Eq.14)

where α1, β1 and γ are parameters. It is important to note that while H0:γ=0 suggests no dependence between Y1 and Y2 based on the parameterization given in Eq. 14, the GBBM dependency test (Eq.13) captures the dependence in the covariate X and Y2 conditional on the outcome at time-point 1, Y1. Therefore, we argue that this comparison is not ideal as it fundamentally addresses a different question regarding dependency.

2.3. Logistic Regression Model with Interaction

In the present paper, we compare the GBBM dependency test (Eq.13) with a test resulting from a subtle change of the regressive logistic regression model described above (Eq. 14). In Eq. 14, the γ parameter was treated as a scalar. However, by allowing γ to be a functional parameter γx=γ0+γ1x, we derive a dependency test equivalent to the GBBM dependency test (Eq.13). This functional parameter introduces an interaction term between y1i and covariate xi to the regressive logistic regression model. The addition of this interaction term allows one to distinguish the how the effect of xi on y2i varies depending on the value of y1i. This is reflected through the parameter γ1 in the model below:

PY2i=1|Y1i=y1i,Xi=xi=eα1+β1xi+γ0+γ1xiy1i1+eα1+β1xi+γ0+γ1xiy1i (Eq.15)

When Y1i=0 we have, α1+β1xi and when Y1i=1, we have α1+γ0+(β1+γ1)xi. Comparing these results to Eq. 6 and Eq.7, we observe that α01α1, β01β1, α11α1+γ0, and β11(β1+γ1). Since the GBBM dependency test is based on the differences (α11-α01) and (β11-β01) and given that α11-α01γ0 and β11-β01γ1, the equivalent Wald’s test for testing H0:β=0 from logistic regression model is:

χ2=β^TVar^β^-1β^ (Eq.16)

where, β^=γ^0,γ^1T and Var^β^=Var^γ^0Cov^γ^0,γ^1Cov^γ^0,γ^1Var^γ^1

Under the null hypothesis, the aforementioned test statistic follows central χ2 with 2 degrees of freedom.

Further, we note that the likelihood of the joint GBBM model is proportional to that of the conditional logistic regression model with interaction. Specifically, by expressing the joint distribution P(Y1,Y2) in exponential family form and conditioning on Y1, the resulting conditional distribution of Y2|Y1 follows a logistic form consistent with Eq. 15. This proportionality implies that the two models share the same score functions and Fisher Information and thus yield equivalent asymptotic variances for corresponding parameters.

The logistic regression model with interaction can be extended to include multiple covariates. For example, if we consider two covariates in the analysis, the model can be expressed as:

logit[P(Y2i=1Y1i=y1i;X1i=x1i;X2i=x2i]=α1+β1x1i+β2x2i+γ0+γ1x1i+γ2x2iy1i

In this model α1,β1,β2,γ0,γ1 and γ2 are the model parameters. To test the null hypothesis H0:β=0, the Wald’s test statistic is χ2=β^TVar^β^-1 β^ where,

β^=γ^0,γ^1,γ^2TandVar^β^=Var^γ^0Cov^γ^0,γ^1Cov^γ^0,γ^2Cov^γ^0,γ^1Var^γ^1Cov^γ^1,γ^2Cov^γ^0,γ^2Cov^γ^1,γ^2Var^γ^2

Under the null hypothesis, the test statistic follows a central chi-squared distribution with 3 degrees of freedom. In a similar manner, this approach can be extended to accommodate P covariates in the model.

2.4. Simulation Studies

The objective of our simulation studies was to compare the power of the GBBM dependency test (Eq.13) with: a) Pearson Chi-square test of dependency; b) a dependency test based on the regressive logistic regression model (Eq.14); and c) a test derived from the logistic regression model with interaction (Eq 15). Simulations were conducted assuming different study sample sizes and effect sizes; for each setting of the simulation parameters, we used a total of 1000 Monte-Carlo iterations. The data generation process involved several steps. First, a covariate xi was drawn from a standard normal distribution, xi~N(0,1). Next, using predetermined parameters of the GBBM models, we generated conditional (Eq. 67) and marginal (Eq. 8) probabilities as defined in Section 2.1. The conditional and marginal probabilities were then multiplied to calculate joint probabilities (Eq. 912). Using the joint probabilities, we generated the response variables Y from a multinomial distribution Y~Multinomial1,P00,P10,P01,P11, where, Prs,r=0,1 and, s=0,1 represent joint probabilities of Y1=r and Y2=s. Finally, values of Y1 and Y2 were determined based on the value generated from the multinomial distribution.

Each simulation involved data generation based on the following fixed parameters: α01=0.5, β01=0.3, α11=0.5, β11=β01+effectsize, α1=0.5, and β1=0.2. We considered the following sample sizes (n) and effect sizes (β11-β01): n = 500, 1000, 1500, 2000 and effect sizes of 0.2, 0.4, and 0.5. Each test was conducted using a predetermined Type I error rate of α=0.05. Power was calculated as the proportion of Monte Carlo simulations in which the null hypothesis was correctly rejected (p-value0.05).

  • One of the objectives of this simulation study was to compare the dependency test of the GBBM model with Pearson’s Chi-square test of dependency. For each dataset generated, we performed two tests: the GBBM test of dependency (Eq.13) and a Pearson’s Chi-Square test using the 2 × 2 contingency table of Y1 and Y2.

  • Another comparison focused on evaluating the GBBM dependency test against the regressive logistic regression model (Eq.14) used by Islam et al. to benchmark the performance of the GBBM dependency test. The data generation process followed the same approach as in the previous case. For the regressive model, the null hypothesis H0:γ=0 was tested, and the results were compared with those of the GBBM dependency test, which evaluates the null hypothesis H0:β01=β11.

  • Additionally, the performance of the GBBM dependency test was examined in relation to the dependency test derived from the logistic regression model with interaction (Eq.15). This comparison was intended to showcase the performance of these two dependency tests across varying sample sizes and effect sizes. For each generated dataset, we conducted the dependency test using the GBBM method (Eq.13), a dependency test based on the logistic regression model with interaction (Eq.16), and a likelihood ratio test based on the logistic regression model with an interaction.

We also calculated the type I error rates for all competing models to assess their ability to maintain the nominal significance level under the null hypothesis. For this, we generated data using the following fixed parameters: α01=α11=0.5, β01=β11=0.3, α1=0.5, and β1=0.2. As α01=α11 & β01=β11 in this simulation, the effect of covariate X on Y2 does not depend on Y1.

2.5. Analyzing the Dependence of Neonatal Mortality Status Across Consecutive Births

To compare GBBM and the logistic regression model with an interaction term in a real-world context, we utilized data collected from the 2014 Bangladesh Demographic and Health Survey (BDHS). This survey selected 18,000 residential households and interviewed 17,862 ever-married women, comprising 6,167 urban and 11,696 rural residents. For the purposes of the present study, we focused on a subset of 11,951 women who had their first two children between 1991 and 2014. Our analysis examined whether the relationship between the neonatal mortality status of the second child and the mother's education level depends on the neonatal mortality status of the first child. The binary outcome of interest was neonatal survival (0 = alive, 1 = deceased) for both the first and second births. To maintain comparability with our simulation studies, we considered a single explanatory factor: the mother's education level, categorized as secondary education or beyond or less than secondary education (treated as the referent group). Along with the GBBM and the logistic regression model with interaction, we also performed dependency tests using the regressive logistic model and the Pearson Chi-square test to evaluate and compare the performance of these methods in real data.

3. RESULTS

3.1. Comparison of the Operating Characteristics of the Dependency Tests in Simulated Data

Table 1 presents the results comparing the power of the different dependency tests. As expected, statistical power of the GBBM dependency test increases as both the sample size and effect size increase. Conversely, for the Pearson Chi-square test there was much less pronounced relationship between power and effect size, demonstrating that a distinct pattern of dependency between Y1 and Y2 via the covariate X is not captured by Pearson Chi-square test. For example, at a sample size of 1000 and an effect size of 0.5, the power was 0.867 for GBBM and 0.055 for Pearson’s χ2 Test.

Table 1:

Statistical power across varied sample size & effect size for the GBBM, Logistic Regression Model with Interaction, Regressive Logistic Model & Pearson’s χ2 Test.

GBBM Logistic Regression Model with interaction (Wald’s) Logistic Regression Model with interaction (LRT) Regressive Logistic Model Pearson’s χ2 Test
Sample # 0.2 0.4 0.5 0.2 0.4 0.5 0.2 0.4 0.5 0.2 0.4 0.5 0.2 0.4 0.5
500 0.138 0.423 0.590 0.138 0.423 0.590 0.141 0.427 0.591 0.046 0.063 0.074 0.053 0.049 0.05
1000 0.231 0.708 0.867 0.231 0.708 0.867 0.233 0.708 0.866 0.061 0.059 0.090 0.065 0.046 0.055
1500 0.338 0.874 0.972 0.338 0.874 0.972 0.337 0.873 0.972 0.064 0.090 0.094 0.070 0.062 0.056
2000 0.418 0.941 0.992 0.418 0.941 0.992 0.416 0.941 0.992 0.056 0.103 0.117 0.087 0.071 0.069

As a reminder, another objective was to compare the GBBM dependency test with the regressive logistic regression model, which was used as a performance benchmark by Islam et al. We observed distinct differences between the power of the GBBM dependency test and the dependency test based on regressive logistic regression model (Table 1). The GBBM model demonstrated a clear increase in power with both increasing sample size and effect size, aligning with expected theoretical behavior. In contrast, the regressive logistic regression model did not exhibit a consistent pattern, indicating that the underlying dependencies captured by GBBM are not fully captured by the regressive logistic regression approach. For example, at a sample size of 500 and an effect size of 0.5, power was 0.590 for the GBBM dependency test and 0.074 for the regressive logistic regression model. Furthermore, when comparing the power of the regressive logistic regression model with that of Pearson Chi-square test, it was observed that the power of the regressive logistic regression model was comparable to Pearson Chi-square test, which is unsurprising given that the latter is focused on to the marginal dependency of Y1 and Y2. This suggests that although the regressive logistic regression model shows similar power Pearson Chi-square test, it fails to capture the more detailed and complex dependency patterns that the GBBM identifies.

Table 1 also presents the power comparison results between the GBBM dependency test and the logistic regression model with an interaction, using both a Wald and LRT. The results show that the GBBM model and the logistic regression model with interaction with Wald’s test produced identical estimates of power across all effect sizes and sample sizes, indicating that both tests perform equivalently in terms of power. In contrast, the power values obtained from the LRT showed slight differences compared to those of the GBBM dependency test, although these differences were minimal and within a comparable range. For instance, at a sample size of 500 and an effect size of 0.5, the power was 0.590 for both the GBBM dependency test and Wald’s test, and 0.591 for the LRT. These minor variations suggest that while the LRT yields power values very similar to those of the GBBM model, slight discrepancies exist due to the distinct statistical framework of the LRT and Wald tests.

Table 2 shows the type I error rate across all approaches. We observe that for the GBBM dependency test, the type I error rate is controlled at approximately 5% as expected given our predetermined significance threshold. The type I error rate is also controlled at approximately 5% for regressive logistic regression model and logistic regression with interaction effects, regardless of whether the Wald or likelihood ratio test is used. In contrast, for Pearson Chi-square test we observe a slight increase in the type 1 error rate as a function of increasing study sample size.

Table 2:

Type I error rates for the GBBM, Logistic Regression Model with Interaction, Regressive Logistic Model & Pearson’s χ2 Test.

Model/Sample Size 500 1000 1500 2000
GBBM 0.049 0.055 0.043 0.053
Logistic Regression Model with interaction (Wald) 0.049 0.055 0.043 0.053
Logistic Regression Model with interaction (LRT) 0.053 0.051 0.045 0.050
Regressive Logistic Model 0.042 0.047 0.041 0.052
Pearson’s χ2 Test 0.052 0.071 0.084 0.086

3.2. Comparison of parameter estimates between GBBM and logistic regression model with interaction:

Here, we compare the parameter estimates and their variances between the GBBM and the logistic regression model with interaction based on a sample size of 2000. As noted in Table 3, for GBBM the estimated value of α01 was 0.445, with an estimated variance 0.0058. The estimated value of β01 was 0.212, with an estimated variance of 0.0060. Parameter α11 was estimated to be 0.518, with an estimated variance of 0.0039, while β11 was estimated to be 0.795, with an estimated variance of 0.0048. The logistic regression model with interaction provides estimates of α1 and β1 that are identical to those of the GBBM for the parameters α01 and β01 (Table 3). The logistic regression model with interaction also captures the differences between parameters from the GBBM Model. The difference between α^11 and α^01 was 0.073, with a combined variance of 0.0096, which is identical to the estimated value and variances of γ^0 in the logistic regression model with interaction. Similarly, the difference between β^11 and β^01 was 0.583, with a combined variance of 0.0108, which is identical to the estimated value and variance of γ^1 in the logistic regression model with interaction. We direct readers to the Appendix for a proof that Varγ^0=Varα^01-α^11. Finally, the χ2 statistic for the dependency test are the same between the dependency test in both GBBM model and logistic regression model with interaction (χ2=31.523, for both).

Table 3:

Comparison of parameters estimates from GBBM and Logistic Regression Model with Interaction.

Generalized Bivariate Bernoulli Model (GBBM)
Estimate Variance
α01 0.4454 0.0058
β01 0.2116 0.0060
α11 0.5183 0.0039
β11 0.7946 0.0048
χ2 31.523
Logistic Regression Model with Interaction
α1 0.4454 0.0058
β1 0.2116 0.0060
γ0 0.0729 0.0096
α11=γ0+α01 0.5183 0.0039
γ1 0.5830 0.0108
β11=γ1+β01 0.7946 0.0048
χ2 31.523

3.3. Analyzing the Dependence of Neonatal Mortality Status Across Consecutive Births

Table 4 presents the frequency distribution of neonatal survival (Death or Alive) by birth order (first and second birth) and maternal education levels. In this study, we included 11,951 mothers who had their first two children between 1991 and 2014. The neonatal mortality rate for first births is 8% (960 out of 11,951), which is higher than the 4.6% (555 out of 11,951) observed for second births. Additionally, among these mothers, 34% (4,068) had attained at least a secondary level of education (≥6 years), while 66% (7,883) had less than six years of education.

Table 4:

Frequency distribution of death status for first and second births and maternal educational attainment in the 2014 Bangladesh Demographic and Health Survey data set.

First Birth Second Birth Mother’s Education ≥ Secondary(6 Years)
Alive Dead Alive Dead Yes No
# 10991 960 11396 555 4068 7883
% 92.00 8.00 95.40 4.60 34.00 66.00

Table 5 presents a comparative analysis of parameter estimates between the Generalized Bivariate Bernoulli Model (GBBM), the logistic regression model with an interaction term, regressive logistic regression model and Pearson Chi-square test. The focus is on assessing the survival of the second birth, conditional on the survival status of the first birth at the neonatal stage. For the neonatal survival of the second birth, conditional on the survival status of the first birth, the GBBM model provides the following interpretations: α01 is the intercept term for the mortality of the second child (Y2=1) when first child survived (Y1=0) and α11 is the intercept term when the first child died (Y1=1). The coefficient β^01=0.7468 indicates that if the first child survived (Y1=0), women with education at the secondary level or higher are approximately 52.61% less likely to experience neonatal death for their second child compared to women with less educational attainment. Similarly, β^11=0.6856 suggests that if the first child died (Y1=1), women with education at the secondary level or higher are about 49.62% less likely to experience neonatal death for their second child compared to women with lower educational attainment. Our findings also indicate that the relationship between the neonatal mortality of the second child and the mother’s educational attainment is dependent on the neonatal mortality of the first child χGBBM2=74.118,p<0.05. Table 5 also provides the estimated parameters and their variances for the logistic regression model with interaction term. Here, α1 represents the intercept of the model. The coefficient β^1=0.7468 indicates the change in the log-odds of the second child’s survival for mothers’ education at the secondary level or higher (X=1) compared to those with lower educational attainment (X=0) when the first child survived (Y1=0). The coefficient γ^0=0.9785 reflects the change in the log-odds of the mortality of the second child when the first child died (Y1=1) compared to when the first child survived (Y1=0) among women educational attainment less that the secondary level. The positive sign suggests that if the first child died, the likelihood of the second child’s death increases among women educational attainment less that the secondary level. The interaction term γ^1=0.0612 reflects how the effect of the first child’s mortality status on the likelihood of the second child’s survival varies based on the mother’s education level. The positive coefficient indicates that the positive effect of the first child’s survival on the second child’s survival probability is more pronounced for mothers with higher education compared to those with less educational attainment. Similar to the GBBM dependency test, our findings show that the relationship between the neonatal mortality of the second child and the mother’s educational attainment is dependent on the neonatal mortality of the first child χLogisticwithinteraction2=74.118,p<0.05. Consistent with the results from Section 3.3.1, we observe identical parameter estimates for both GBBM and logistic regression model with interaction term across all four analyses presented in Table 5. This consistency indicates that both models equivalently capture the dependency structure between birth mortality, regardless of the different parameter estimates introduced by the additional terms in the logistic regression model with interaction term.

Table 5:

Comparison of parameter estimates and dependency tests between the GBBM Dependency Test, Logistic Regression Model with Interaction, Regressive Logistic Regression Model, and Pearson Chi-square Test for neonatal death of 2nd birth | 1st birth.

Generalize Bivariate Bernoulli Model
Estimate SE
α01 −2.9453* 0.0542
β01 −0.7468* 0.1182
α11 −1.9668* 0.1138
β11 −0.6856* 0.2826
χGBBM2 74.1180*
Logistic Regression Model with Interaction
α1 −2.9453* 0.0542
β1 −0.7468* 0.1182
γ0 0.9785* 0.1261
α11=γ0+α01 −1.9668 0.1138
γ1 0.0612 0.3063
β11=γ1+β01 −0.6856* 0.2826
χLogisticwithinteraction2 74.1180*
Regressive Logistic Regression Model
α1 −2.9472 0.0534
β1 −0.7378 0.1090
γ 0.9887 0.1150
χRegressive2 73.96*
Pearson Chi-square Test
χPearson2 88.7880*
*

Significant at α=0.05

We have also included test results from the regressive logistic model χRegressive2=73.96p<0.05 and the Pearson Chi-square test χPearson2=88.788,p<0.05. Both tests indicate a significant marginal dependency of the survival of the second child on the survival of the first child during the neonatal stage.

4. DISCUSSION

The objectives of this paper were two-fold: (1) to demonstrate that the regressive logistic regression model does not provide an appropriate comparison with the GBBM dependency test and (2) that the latter can be effectively conducted using a logistic regression model with an interaction. This work is significant because there is no existing software package that directly supports the GBBM dependency test, requiring custom code for implementation. In contrast, logistic regression models can be fit using various software platforms (e.g., R, SAS, Stata, etc.). Because GBBM dependency offers valuable insights into the relationship between two binary outcomes with respect to covariates, conducting the dependency test using a logistic regression model benefits the research community by making this analysis more accessible. The logistic regression model with interaction may serve as a valuable alternative for outcomes measured at more than two time points, as it allows for the inclusion of additional interaction terms. This flexibility opens the door to promising future research opportunities.

Our results show that the power of the GBBM dependency test and the dependency test from the regressive logistic regression model differ significantly. This contrasts with the findings of Islam et al., where both tests produced comparable results. This discrepancy arises from differences in the data generation process: in our setting, we assume α01=α11 to examine the dependency of Y2 on Y1 through X, thereby eliminating any unmediated dependency between Y2 on Y1. Our results also indicate that the power of the marginal dependency test (e.g., Pearson’s Chi-square test) and the regressive logistic regression model are comparable, however both are distinct when compared to the GBBM dependency test. In contrast, the power of the GBBM dependency test and a dependency test from the logistic regression model with an interaction are equivalent across varying effect sizes and sample sizes. Both the GBBM dependency test and the logistic regression model with interaction effectively captured the effect of mother’s education on neonatal mortality for the second birth, conditional on the neonatal survival status of the first birth.

The current model assumes that the dependency between Y1 and Y2 varies linearly with the covariate(s) X. A more general formulation could involve replacing the linear term with an arbitrary function f(x), which may be linear or nonlinear (e.g., polynomial or spline), to better accommodate complex patterns in how X influences the dependency between Y1 and Y2. However, our focus in this paper is to demonstrate that a standard logistic regression model with an interaction term yield results equivalent to the GBBM model. Thus, we prioritize interpretability and accessibility, leaving exploration of more flexible functional forms for future research. Additionally, while this work focuses on binary outcomes observed at two time points, extending this framework to accommodate more than two time points would require the development of a Generalized Trivariate or Multivariate Bernoulli model. These represent important directions for future research.

While our primary focus is on demonstrating the equivalence between the GBBM and the logistic model with interaction, we acknowledge that many other frameworks exist for modeling dependency in binary outcomes. Generalized estimating equations (GEEs)22 are widely used for correlated binary data and provide population-averaged estimates. Copula-based models23 offer flexibility in modeling joint distributions with specified marginals and dependence structures. Although a detailed comparison of these methods is beyond the scope of this study, future research could explore their use in dependency testing and assess their comparative performance in practical biomedical contexts.

This paper makes a contribution by illuminating dependencies in modelling approaches for two binary outcomes at consecutive time points through the lens of the GBBM dependency test. Additionally, it offers a practical and accessible alternative to the GBBM test, making dependency analysis more accessible to the scientific community.

Supplementary Material

Supplementary Material

FUNDING

Research reported was supported by: the National Cancer Institute (NCI) Cancer Center Support Grant P30 CA168524; the Kansas IDeA Network of Biomedical Research Excellence Bioinformatics Core, supported by the National Institute of General Medical Science award P20 GM103418; the Kansas Institute for Precision Medicine COBRE, supported by the National Institute of General Medical Science award P20 GM130423.

ABBREVIATIONS

GBBM

Generalized Bivariate Bernoulli Model

BDHS

Bangladesh Demographic and Health Survey

LRT

Likelihood Ratio Test

Footnotes

DISCLOSURES

The authors have no conflicts of interest to disclose.

DATA AVAILABILITY

Code for the simulation can be found here: https://github.com/kmahmud01/GBBM

REFERENCES

  • 1.Patron J, Serra-Cayuela A, Han B, Li C, Wishart DS. Assessing the performance of genome-wide association studies for predicting disease risk. PLoS One. 2019;14(12):e0220215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.West R, McEwen A, Bolling K, Owen L. Smoking cessation and smoking patterns in the general population: a 1-year follow-up. Addiction. 2001;96(6):891–902. [DOI] [PubMed] [Google Scholar]
  • 3.Wan F Statistical analysis of two arm randomized pre-post designs with one post-treatment measurement. BMC medical research methodology. 2021;21:1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Goyal M, Singh S, Sibinga EM, et al. Meditation programs for psychological stress and well-being: a systematic review and meta-analysis. JAMA internal medicine. 2014;174(3):357–368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ssentongo P, Ssentongo AE, Voleti N, et al. SARS-CoV-2 vaccine effectiveness against infection, symptomatic and severe COVID-19: a systematic review and meta-analysis. BMC infectious diseases. 2022;22(1):439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Islam MA, Alzaid AA, Chowdhury RI, Sultan KS. A generalized bivariate Bernoulli model with covariate dependence. Journal of Applied Statistics. 2013;40(5):1064–1075. [Google Scholar]
  • 7.Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22. [Google Scholar]
  • 8.Lipsitz SR, Laird NM, Harrington DP. Generalized estimating equations for correlated binary data: using the odds ratio as a measure of association. Biometrika. 1991;78(1):153–160. [Google Scholar]
  • 9.Guo X, Qi H, Verfaillie CM, Pan W. Statistical significance analysis of longitudinal gene expression data. Bioinformatics. 2003;19(13):1628–1635. [DOI] [PubMed] [Google Scholar]
  • 10.Prentice RL. Correlated binary regression with covariates specific to each binary observation. Biometrics. 1988:1033–1048. [PubMed] [Google Scholar]
  • 11.Liang KY, Zeger SL, Qaqish B. Multivariate regression analyses for categorical data. Journal of the Royal Statistical Society: Series B (Methodological). 1992;54(1):3–24. [Google Scholar]
  • 12.Le Cessie S, Van Houwelingen JC. Logistic regression for correlated binary data. Journal of the Royal Statistical Society: Series C (Applied Statistics). 1994;43(1):95–108. [Google Scholar]
  • 13.Bahadur RR. A representation of the joint distribution of responses to n dichotomous items. Studies in item analysis and prediction. 1961:158–168. [Google Scholar]
  • 14.Cox DR, Wermuth N. A note on the quadratic exponential binary distribution. Biometrika. 1994;81(2):403–408. [Google Scholar]
  • 15.Glonek GF, McCullagh P. Multivariate logistic models. Journal of the Royal Statistical Society: Series B (Methodological). 1995;57(3):533–546. [Google Scholar]
  • 16.Wakefield J Ecological inference for 2× 2 tables (with discussion). Journal of the Royal Statistical Society Series A: Statistics in Society. 2004;167(3):385–445. [Google Scholar]
  • 17.Simpson EH. The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society: Series B (Methodological). 1951;13(2):238–241. [Google Scholar]
  • 18.Muenz LR, Rubinstein LV. Markov models for covariate dependence of binary sequences. Biometrics. 1985:91–101. [PubMed] [Google Scholar]
  • 19.Bonney GE. Logistic regression for dependent binary observations. Biometrics. 1987:951–973. [PubMed] [Google Scholar]
  • 20.Islam MA, Chowdhury RI, Huda S. Markov models with covariate dependence for repeated measures. (No Title). 2009; [Google Scholar]
  • 21.National Institute of Population Research and Training (NIPORT) MaA, ICF International. Bangladesh Demographic and Health Survey Report. 2014.
  • 22.Zeger SL, Liang K-Y. Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986:121–130. [PubMed] [Google Scholar]
  • 23.de Leon AR, Wu B. Copula-based regression models for a bivariate mixed discrete and continuous outcome. Statistics in Medicine. 2011;30(2):175–185. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

Data Availability Statement

Code for the simulation can be found here: https://github.com/kmahmud01/GBBM

RESOURCES