On the correspondence of deviances and maximum-likelihood and interval estimates from log-linear to logistic regression modelling

W Jing; M Papathomas

doi:10.1098/rsos.191483

. 2020 Jan 15;7(1):191483. doi: 10.1098/rsos.191483

On the correspondence of deviances and maximum-likelihood and interval estimates from log-linear to logistic regression modelling

W Jing ¹, M Papathomas ^1,^✉

PMCID: PMC7029921 PMID: 32218966

Abstract

Consider a set of categorical variables $P$ where at least one, denoted by Y, is binary. The log-linear model that describes the contingency table counts implies a logistic regression model, with outcome Y. Extending results from Christensen (1997, Log-linear models and logistic regression, 2nd edn. New York, NY, Springer), we prove that the maximum-likelihood estimates (MLE) of the logistic regression parameters equals the MLE for the corresponding log-linear model parameters, also considering the case where contingency table factors are not present in the corresponding logistic regression and some of the contingency table cells are collapsed together. We prove that, asymptotically, standard errors are also equal. These results demonstrate the extent to which inferences from the log-linear framework translate to inferences within the logistic regression framework, on the magnitude of main effects and interactions. Finally, we prove that the deviance of the log-linear model is equal to the deviance of the corresponding logistic regression, provided that no cell observations are collapsed together when one or more factors in $P ∖ {Y}$ become obsolete. We illustrate the derived results with the analysis of a real dataset.

Keywords: categorical variables, contingency table, generalized linear modelling

1. Introduction

Let v = {v₁, …, v_n} denote a set of observations, θ = {θ₁, …, θ_n} a set of parameters, and consider known or nuisance quantities ϕ = {ϕ₁, …, ϕ_n}. Now, v_i, i = 1, …, n, belongs to the exponential family of distributions if its probability function can be written as

f (v_{i} | θ_{i}, ϕ_{i}) = exp {\frac{w_{i}}{ϕ_{i}} [v_{i} θ_{i} - b (θ_{i})] + c (v_{i}, ϕ_{i})},

where w = {w₁, …, w_n} are known weights, and ϕ_i is the dispersion or scale parameter. Regarding first-order moments, μ_i ≡ E(v_i) = b^′(θ_i). A generalized linear model relates μ = {μ₁, …, μ_n} to covariates by setting ζ(μ) = X_dγ, where ζ denotes the link function, X_d the covariate design matrix and γ a vector of parameters. For a single μ_i, we write ζ_i(μ_i) = X_d(i)γ, where X_d(i) denotes the ith row of X_d, defining ζ as a vector function ζ ≡ {ζ₁, …, ζ_n}.

Let $P$ denote a finite set of P categorical variables. Observations from $P$ can be arranged as counts in a P-way contingency table, with cell counts denoted by n_i, i = 1, …, n_ll. The ‘ll’ indicator alludes to a log-linear model. The counts follow a Poisson distribution with E(n_i) = μ_i. A Poisson log-linear interaction model, $log (μ) = X_{ll} λ$ , is a generalized linear model that relates the expected counts to $P$ .

From Christensen [1], there is an association between log-linear modelling and multinomial logistic regression. Consider categorical variables X, Y and Z, with J_X, J_Y and J_Z levels, respectively. Let j_X, j_Y, j_Z be integer indices that describe the level of X, Y and Z. In a multinomial logistic regression with outcome Y, one typically models the log-odds of an observation at level j_Y + 1 relative to one at level j_Y, $log (p_{j_{Y} + 1} / p_{j_{Y}})$ , j_Y = 0, …, J_Y − 1. This can be viewed as equivalent to fitting a log-linear model as

log (\frac{P (Y = j_{Y} + 1 | X, Z)}{P (Y = j_{Y} | X, Z)}) = log (\frac{P (Y = j_{Y} + 1, X, Z)}{P (Y = j_{Y}, X, Z)}) = log (μ_{j_{Y} + 1, j_{X}, j_{Z}}) - log (μ_{j_{Y}, j_{X}, j_{Z}}) .

For more details, see [1, Section 4.6] where, in addition to the above approach, the alternative of constructing a multinomial model to model the log-odds of an observation at level j_Y, j_Y = 1, …, J_Y − 1, relative to one at fixed level J_Y is considered. In this manuscript, we focus on the association between log-linear modelling and binary logistic regression. Assume that the categorical variable Y is binary. Then, a logistic regression can be fitted with Y as the outcome, and all or some of the remaining P − 1 variables as covariates. We write, $logit (p) = X_{lt} β$ , $p = (p_{1}, \dots, p_{n_{lt}})$ , using the ‘lt’ indicator for the logistic model, denoting by p_i the conditional probability that Y = 1 given covariates X_lt(i), and by β the vector of model parameters.

From Agresti [2], when $P$ contains a binary Y, a log-linear model $log (μ) = X_{ll} λ$ implies a specific logistic regression model with parameters β defined uniquely by λ. As Y is binary, j_Y = 0, 1. Consider the log-linear model

log (μ_{j_{Y}, j_{X}, j_{Z}}) = λ + λ_{j_{X}}^{X} + λ_{j_{Y}}^{Y} + λ_{j_{Z}}^{Z} + λ_{j_{X}, j_{Y}}^{X Y} + λ_{j_{X}, j_{Z}}^{X Z} + λ_{j_{Y}, j_{Z}}^{Y Z},

where the superscript denotes the main effect or interaction term. Similar to the derivation above, the corresponding logistic regression model for the conditional odds ratios for Y is

\begin{aligned} log (\frac{P (Y = 1 | X, Z)}{P (Y = 0 | X, Z)}) & = log (\frac{P (Y = 1, X, Z)}{P (Y = 0, X, Z)}) \\ = log (μ_{j_{Y} = 1, j_{X}, j_{Z}}) - log (μ_{j_{Y} = 0, j_{X}, j_{Z}}) \\ = λ_{1}^{Y} - λ_{0}^{Y} + λ_{j_{X}, 1}^{X Y} - λ_{j_{X}, 0}^{X Y} + λ_{1, j_{Z}}^{Y Z} - λ_{0, j_{Z}}^{Y Z} . \end{aligned}

This is a logistic regression with parameters, $β = (β, β_{j_{X}}^{X}, β_{j_{Z}}^{Z})$ , so that, $β = λ_{1}^{Y} - λ_{0}^{Y}$ , $β_{j_{X}}^{X} = λ_{j_{X}, 1}^{X Y} - λ_{j_{X}, 0}^{X Y}$ and $β_{j_{Z}}^{Z} = λ_{1, j_{Z}}^{Y Z} - λ_{0, j_{Z}}^{Y Z}$ . Identifiability corner point constraints set all elements in λ with a zero subscript equal to zero. Then, $β = λ_{1}^{Y}$ , $β_{j_{X}}^{X} = λ_{j_{X}, 1}^{X Y}$ and $β_{j_{Z}}^{Z} = λ_{1, j_{Z}}^{Y Z}$ . This scales in a straightforward manner to larger log-linear models. If a factor does not interact with Y in the log-linear model, this factor disappears from the corresponding logistic regression. Without any loss of generality, and to simplify the analysis and notation, we henceforth assume corner point constraints.

Considering the log-odds implied by a logistic regression, more than one log-linear models provide the same structure. For example, the log-linear model, log(μ_{j_Y}_{j_X}_{j_Z}) = λ + λ^X_{j_X} + λ^Y_{j_Y} + λ^Z_{j_Z} + λ^XY_{j_X}_{j_Y} + λ^YZ_{j_Y}_{j_Z}, implies the same conditional log-odds structure for Y as (M1). However, as shown in Christensen [3, Section 3.3.2] in conjunction with Christensen [1, Sections 11.1 and 12.4], the log-linear model that determines exactly the same logistic structure is the one that contains all possible interaction terms between the categorical factors in $P ∖ {Y}$ . Other log-linear models, even when they imply the same log-odds, impose additional constraints on the logistic structure. To avoid any confusion, the description of our results in this manuscript will expound that the considered log-linear model contains all possible interaction terms between the categorical factors in $P ∖ {Y}$ .

The relationship between β and λ can be described as β = Tλ, where T is an incidence matrix [4]. In the context of this manuscript, matrix T has one row for each element of β, and one column for each element of λ. The elements of T are zero, except in the case where the element of β is defined by the corresponding element of λ. The number of rows of T cannot be greater than the number of columns.

In Papathomas [5], the correspondence between the two modelling frameworks within the Bayesian framework was studied, deriving exact and asymptotic results. In this manuscript, we focus on the frequentist framework, and derive results on maximum-likelihood estimates (MLE), interval estimates and deviances. Christensen [1] offers a comprehensive account of log-linear and logistic regression modelling. In Christensen [1, ch. 11], results on the equivalence between MLE and confidence intervals were derived. We extend these results, by also considering the case where factors present in the contingency table and log-linear model are not present in the corresponding logistic regression model, and some of the contingency table cells are collapsed together. This case is not considered in [1,2] or, to the best of our knowledge, in any other published work. As stated in theorem 3.2, the MLE for the parameters of the logistic regression equals the MLE for the corresponding parameters of the log-linear model. Theorem 3.3 states that, asymptotically, standard errors for the logistic regression and corresponding log-linear model parameters are equal. Subsequently, Wald confidence intervals [2] are asymptotically equal.

For theorem 3.4, we stipulate that the logistic model is fitted to a dataset where no cell observations are collapsed together when one or more factors in $P ∖ {Y}$ are not present in the logistic regression. Then, we prove that the deviance of the log-linear model equals the deviance of the corresponding logistic regression. Christensen [1, p. 371] refers to this equality, by considering a simple logistic regression with two parameters and showing that the likelihood ratio test statistic (LRTS) for the log-linear model equals the LRTS for the logistic regression. This is done by using the invariance of the MLE and the properties of the product-binomial sampling scheme [1, Section 2.6]. Christensen [1, p. 365] also shows that applying the logistic regression to a contingency table implies that the sampling scheme of the contingency table is product-binomial instead of multinomial. As these results are based on a logistic regression with two parameters, a general mathematical proof is required, provided in appendix A.

In §2 we provide additional notation and essential derivations for the log-linear and logistic regression model, then §3 contains the main contributions in this manuscript. In §4, the correspondence from a log-linear to a logistic regression model is illustrated using real data. We conclude with a discussion, where we also consider possible practical implications of our results.

2. Deviances and the information matrix

The deviance of a generalized linear model is crucial for assessing goodness of fit [6]. Let $\hat{θ}$ denote the MLE of θ. Let L(θ_sat, v) and L(θ_sim, v) denote the log-likelihood for the saturated model, and for a simpler model, respectively. The deviance is defined as

D (\hat{θ}, v) = - 2 [L ({\hat{θ}}_{sim}, v) - L ({\hat{θ}}_{sat}, v)] .

Then,

\begin{aligned} D (\hat{θ}, v) & = - 2 (\sum_{i = 1}^{n} \frac{w_{i}}{ϕ_{i}} (v_{i} {\hat{θ}}_{i, sim} - b ({\hat{θ}}_{i, sim})) + c (v_{i}, ϕ_{i}) \\ - \sum_{i = 1}^{n} \frac{w_{i}}{ϕ_{i}} (v_{i} {\hat{θ}}_{i, sat} - b ({\hat{θ}}_{i, sat})) + c (v_{i}, ϕ_{i})) \\ = - 2 (\sum_{i = 1}^{n} \frac{w_{i}}{ϕ_{i}} v_{i} ({\hat{θ}}_{i, sim} - {\hat{θ}}_{i, sat}) - \frac{w_{i}}{ϕ_{i}} (b ({\hat{θ}}_{i, sim}) - b ({\hat{θ}}_{i, sat}))) . \end{aligned}

Denote by $\hat{γ}$ the MLE of γ, and $I (\hat{γ})$ the information matrix $X_{d}^{⊤} V X_{d}$ . ( $V$ will be specified below for both modelling frameworks as $V_{\log - linear}$ and $V_{logistic}$ .) Then, from Agresti [2], asymptotically

\hat{γ} \sim N (γ, I^{- 1}) .

2.1. Log-linear regression

Consider a vector n of counts n_i i = 1, …, n_ll. Now, $N = \sum_{i = 1}^{n_{ll}} n_{i}$ , and,

f (n_{i} | μ_{i}) = \frac{e^{- μ_{i}} μ_{i}^{n_{i}}}{n_{i}!},

with $θ_{i} = log (μ_{i})$ , $b (θ_{i}) = e^{θ_{i}}$ and $c (n_{i}, ϕ_{i}) = - log (n_{i}!)$ . Also, $w_{i} ϕ_{i}^{- 1} = 1$ , so that w_i = 1 implies ϕ_i = 1. Note that, $μ_{i} = b^{'} (θ_{i}) = e^{θ_{i}}$ , and $Var (n_{i}) = ϕ_{i} w_{i}^{- 1} b^{″} (θ) = e^{θ_{i}}$ . For the log-linear model, $log (μ) = X_{ll} λ$ , X_ll is a n_ll × n_λ design matrix of covariates, and $ζ_{i} (μ_{i}) = log (μ_{i})$ . Given the above,

\begin{aligned} D (\hat{μ}, n) & = - 2 (\sum_{i = 1}^{n_{ll}} n_{i} (log ({\hat{μ}}_{i}) - log (n_{i})) - {\hat{μ}}_{i} + n_{i}) \\ = 2 \sum_{i = 1}^{n_{ll}} n_{i} log (\frac{n_{i}}{{\hat{μ}}_{i}}) - 2 \sum_{i = 1}^{n_{ll}} n_{i} + 2 \sum_{i = 1}^{n_{ll}} {\hat{μ}}_{i} . \end{aligned}

From Agresti [2, p. 140], when the log-linear model contains an intercept, $\sum_{i = 1}^{n_{ll}} n_{i} = \sum_{i = 1}^{n_{ll}} {\hat{μ}}_{i}$ . Then,

D (\hat{μ}, n) = 2 \sum_{i = 1}^{n_{ll}} n_{i} log (\frac{n_{i}}{{\hat{μ}}_{i}}) .

2.1

The diagonal matrix $V_{\log - linear}$ has non-zero elements $exp {X_{ll (i)} \hat{λ}}$ , i = 1, …, n_ll.

2.2. Logistic regression

Assume that y_i, i = 1, …, n_lt, is the proportion of successes out of t_i trials. Now, $N = \sum_{i = 1}^{n_{lt}} t_{i}$ , and,

f (t_{i} y_{i} | p_{i}) = (\binom{t_{i}}{t_{i} y_{i}}) p_{i}^{t_{i} y_{i}} {(1 - p_{i})}^{t_{i} - t_{i} y_{i}},

where $θ_{i} = logit (p_{i})$ , $b (θ_{i}) = log (1 + e^{θ_{i}})$ and $c (y_{i}, ϕ_{i}) = log (\binom{t_{i}}{t_{i} y_{i}})$ . Also, $w_{i} ϕ_{i}^{- 1} = t_{i}$ , so that w_i = 1 implies $ϕ_{i} = t_{i}^{- 1}$ . Note that

E (y_{i}) = b^{'} (θ_{i}) = \frac{e^{θ_{i}}}{1 + e^{θ_{i}}} = p_{i} and Var (y_{i}) = \frac{ϕ_{i}}{w_{i}} b^{″} (θ_{i}) = \frac{1}{t_{i}} \frac{e^{θ_{i}}}{{(1 + e^{θ_{i}})}^{2}} = \frac{p_{i} (1 - p_{i})}{t_{i}} .

For the logistic regression, $logit (p) = X_{lt} β$ , X_lt is a n_lt × n_β design matrix, and $ζ_{i} (p_{i}) = logit (p_{i})$ . Given the above

\begin{aligned} D (y, \hat{p}) & = - 2 (\sum_{i = 1}^{n_{lt}} t_{i} y_{i} [log (\frac{{\hat{p}}_{i}}{1 - {\hat{p}}_{i}}) - log (\frac{y_{i}}{1 - y_{i}})] - t_{i} log (\frac{1}{1 - {\hat{p}}_{i}}) + t_{i} log (\frac{1}{1 - y_{i}})) \\ = - 2 (\sum_{i = 1}^{n_{lt}} t_{i} y_{i} log ({\hat{p}}_{i}) - t_{i} y_{i} log (y_{i}) \\ + \sum_{i = 1}^{n_{lt}} (t_{i} - t_{i} y_{i}) log (1 - {\hat{p}}_{i}) - (t_{i} - t_{i} y_{i}) log (1 - y_{i})) . \end{aligned}

After some algebra,

\begin{aligned} D (y, \hat{p}) & = 2 \sum_{i = 1}^{n_{lt}} t_{i} y_{i} log (\frac{t_{i} y_{i}}{t_{i} {\hat{p}}_{i}}) + 2 \sum_{i = 1}^{n_{lt}} (t_{i} - t_{i} y_{i}) log (\frac{t_{i} - t_{i} y_{i}}{t_{i} - t_{i} {\hat{p}}_{i}}) \\ = 2 \sum_{i = 1}^{n_{lt}} t_{i} y_{i} log (\frac{y_{i}}{{\hat{p}}_{i}}) + 2 \sum_{i = 1}^{n_{lt}} (t_{i} - t_{i} y_{i}) log (\frac{1 - y_{i}}{1 - {\hat{p}}_{i}}) . \end{aligned}

2.2

The diagonal matrix $V_{logistic}$ has non-zero elements $t_{i} exp {X_{lt (i)} \hat{β}} exp {1 + X_{lt (i)} \hat{β}}^{- 2}$ , i = 1, …, n_lt.

3. Results

To facilitate the derivation of theoretical results, we introduce the following additional notation. Without any loss of generality, let x_.1 be the binary Y factor, and x_.2, …, x_.q the q − 1 factors that are present in the log-linear model but disappear from the logistic regression model as they do not interact with Y. Denote the rest of the factors by x_.q+1, …, x_.P. Each element of n is denoted by n_j, j = (j₁, …, j_P), 0 ≤ j_p ≤ J_p − 1, p = 1, …, P, where J_p is the number of levels of x_.p. Here, j, identifies the combination of variable levels that cross-classify the given cell. We define L as the set of all n_ll cross-classifications, so that, $L = \otimes_{p = 1}^{P} [j_{p}]$ . Elements y_j and μ_j are defined analogously.

Lemma 3.1. —

Assume that the log-linear model contains all possible interaction terms between the categorical factors in $P ∖ {Y} .$ Then, for all 0 ≤ j_p ≤ J_p − 1, p = 2, …, P,

$n_{0, j_{2}, \dots, j_{P}} + n_{1, j_{2}, \dots, j_{P}} = {\hat{μ}}_{0, j_{2}, \dots, j_{P}} + {\hat{μ}}_{1, j_{2}, \dots, j_{P}} .$

Proof. —

The proof is given in appendix A. ▪

Theorem 3.2. —

Assume that the log-linear model contains all possible interaction terms between the categorical factors in $P ∖ {Y} .$ Then, the MLE $\hat{β}$ of the parameters of the logistic-regression is equal to the MLE of the corresponding parameters of the log-linear model.

Proof. —

The proof is given in appendix A. ▪

Theorem 3.3. —

Assume that the log-linear model contains all possible interaction terms between the categorical factors in $P ∖ {Y} .$ Then, asymptotically, the standard error for each element of β is equal to the standard error for the corresponding parameter of the log-linear model.

Proof. —

The proof is given in appendix A. ▪

The proofs for theorems 3.2 and 3.3 include the case where factors present in the log-linear model are not present in the corresponding logistic regression and some of the contingency table cells are collapsed together. For completeness, our proofs also include the case where all factors in $P ∖ {Y}$ are present in the logistic regression model. Theorem 3.4 postulates that n_lt = n_ll/2, i.e. the number of proportions fitted by the logistic regression should be half the number of cell counts in the contingency table. This happens either because all factors in $P ∖ {Y}$ are present in the logistic regression, or because counts in cells with the same cross-classification considering x_.q+1, …, x_.P are not collapsed. This is important for observing equal deviances for the log-linear model and the corresponding logistic regression. Intuitively, when n_lt = n_ll/2, the number of observations fitted by the logistic regression is in direct correspondence with the number of observations fitted by the log-linear model. When n_lt < n_ll/2, a logistic regression model with the same number of parameters fits a smaller number of observations, something that naturally results in a smaller deviance compared to the deviance observed when the contingency table is not collapsed. This is illustrated in §4 with the analysis of a real dataset.

Theorem 3.4. —

Assume that the log-linear model contains all possible interaction terms between the categorical factors in $P ∖ {Y} .$ Assume also that the corresponding logistic regression is fitted to a dataset where n_lt = n_ll/2. Then, the deviance of the log-linear model equals the deviance of the corresponding logistic regression.

Proof. —

The proof is given in appendix A. ▪

4. Illustration

Edwards & Havránek [7] presented a 2⁶ contingency table in which 1841 men were cross-classified by six binary risk factors {A, B, C, D, E, F} for coronary heart disease. Adopting the notation in Agresti [2], a single letter denotes the presence of a main effect, two-letter terms denote the presence of the implied first-order interaction and so on and so forth. The presence of an interaction between a set of variables implies the presence of all lower-order interactions plus main effects for that set. Consider the log-linear model

log (μ) = A C + A D + A E + B C D E F .

Treating A as the outcome, the corresponding logistic regression is

logit (p) = C + D + E .

The deviances, MLE and standard errors for the relevant parameters of both models are given in table 1, after fitting the models in R using the ‘glm’ function. We observe that corresponding quantities are equal. To obtain equal deviances, although factors B and F are not present in the logistic regression, the logistic model was fitted to a dataset where contingency table cell counts discriminated only by B and F were not collapsed together. This resulted in n_lt = 32. The datasets for (M2) and (M3) are given in appendix A. The design matrix $X_{lt}^{(M 3)}$ is shown below, with $⊤$ denoting the transpose, with some of the rows identical.

\begin{aligned} X_{lt}^{(M 3)} & = (\begin{matrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \end{matrix} \\ {\begin{matrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{matrix})}^{⊤} . \end{aligned}

Table 1.

Deviances, MLE and standard errors for the relevant parameters of log-linear model (M2) and the corresponding logistic regressions (M3) and (M4). (Standard errors are given in brackets.)

log-linear model (M2), $log (μ) = A C + A D + A E + B C D E F$ , deviance = 33.51
	A	AC	AD	AE
MLE	−0.4140 (0.0892)	0.5501 (0.0958)	−0.3684 (0.0967)	0.4893 (0.0973)
outcome is A (M3), $logit (p) = C + D + E$ , deviance = 33.51
	intercept	C	D	E
MLE	−0.4140 (0.0892)	0.5501 (0.0958)	−0.3684 (0.0967)	0.4893 (0.0973)
outcome is A (M4), $logit (p) = C + D + E$ , deviance = 3.47
	intercept	C	D	E
MLE	−0.4140 (0.0892)	0.5501 (0.0958)	−0.3684 (0.0967)	0.4893 (0.0973)

Open in a new tab

As factors B and F disappear from the logistic regression that corresponds to (M2), one may decide to collapse together the contingency table cells with the same cross-classification considering C, D and E. A logistic regression is fitted, denoted by (M4). It only contains main effects for C, D and E, as does (M3). The dataset for (M4) is shown in appendix A. The design matrix for (M4) is

X_{lt}^{(M 4)} = {(\begin{matrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{matrix})}^{⊤} .

Relevant output is given in table 1. MLE and standard errors are equal, as theorems 3.2 and 3.3 hold. However, as cells are collapsed together and n_lt ≠ n_ll/2, the deviances differ.

5. Discussion

The results in Christensen [1] and this manuscript demonstrate the extent to which inferences from the log-linear framework translate to inferences within the logistic regression framework, on the magnitude of main effects and interactions.

When factors are not present in the logistic regression, one may choose to collapse the counts in the contingency table cells that are only discriminated by the obsolete variables x_.2, …, x_.q. Logistic regression parameter estimates and associated standard errors are not affected by collapsing the cell counts. This is shown in the proofs for theorems 3.2 and 3.3 in appendix A. However, the logistic regression fitted to the collapsed dataset, returns a different deviance compared to a logistic regression with the same covariates (parameters) fitted without collapsing. This is expected, as two models with the same number of parameters are fitted to a different number of data points. The deviance naturally increases for the larger dataset.

Our results concern two of the most popular approaches for the analysis of categorical observations and the correspondence between them. Theoretical derivations on such associations improve understanding and enhance the models’ use, as advances for one framework are not always readily available to the other. For instance, to describe the joint probability distribution between covariates, Zhou et al. [8] adopt a PARAFAC factorization. Marginal independence is modelled with fixed baseline vectors, providing expressions for parameters of the log-linear models that correspond to the adopted latent class model. Another example is Papathomas & Richardson [9], where the use of employing variable selection within clustering to assist log-linear modelling is investigated, without examining logistic regression models.

Supplementary Material

Reviewer comments

rsos191483_review_history.pdf^{(422.3KB, pdf)}

Acknowledgements

We are grateful to Prof. Ronald Christensen for the instructive discussions we had during the preparation of this manuscript. We are also grateful for the comments by two reviewers and the editor that improved this manuscript.

Appendix A.

Proof of Lemma 3.1. —

To facilitate this and subsequent proofs, the following notation is introduced, similar to Papathomas [5]. Using the incidence matrix T discussed in §1, write the mapping between β and λ as β = Tλ, where

$T = (\begin{matrix} λ_{(1)} \\ ⋮ \\ λ_{(n_{λ_{Y}})} \end{matrix}),$

and λ_(k), $k = 1, \dots, n_{λ_{Y}}$ , is a vector of zeros with the exception of one element that is equal to one. This element is in the position of the kth λ parameter with a Y in its superscript. With $n_{λ_{Y}}$ we denote the number of parameters in λ with a Y in their superscript. To ease algebraic calculations, and without any loss of generality, rearrange the elements of λ, creating a new vector λ_r, so that T changes accordingly to, $T_{r} = (I 0),$ where I is an n_β × n_β identity matrix. (Vector μ is similarly rearranged to μ_r.) The rows and columns of X_ll are also rearranged accordingly to create X_rll, so that

$X_{rll} = (\begin{matrix} X_{lt}^{*} & X_{ll - lt} \\ 0 & X_{ll - lt} \end{matrix}) .$ A 1

X_{ll− lt} is a square (n_ll/2 × n_ll/2) matrix. This is because we consider the log-linear model that, in addition to the terms that involve Y, contains all possible interaction terms between the categorical factors in $P ∖ {Y}$ . The number of parameters that correspond to the intercept, main effects and interactions for $P ∖ {Y}$ is n_ll/2. X_lt* is a n_ll/2 × n_β matrix. When q = 1, all factors other than Y remain in the logistic regression model as covariates. When no cell counts are collapsed, either because q = 1, or because we opt not to collapse, X_lt* = X_lt, and n_ll = 2 × n_lt. When the cell counts that are only discriminated by the obsolete variables x_.2, …, x_.q are collapsed, by rearranging the rows of X_rll when necessary, we can write X_lt* as, $X_{lt}^{*} = {(X_{lt}^{⊤} X_{lt}^{⊤} \dots X_{lt}^{⊤})}^{⊤}$ , where $X_{lt}^{⊤}$ is repeated (J₁ − 1) × J₂ × … × J_q times. For example, for q = 2, X_lt repeats J₂ times within X_lt*, and n_ll = 2 × J₂ × n_lt. When q = P, the corresponding logistic regression model only contains an intercept, and one may decide to fit the logistic regression to a collapsed contingency table that only contains two cells describing the total number of counts where Y = 0 and Y = 1. Then, n_ll = 2 × J₂ × … × J_P × n_lt.

We can now write β = T_rλ_r. For example, assume the log-linear model (M1) describes a 3 × 2 × 2 contingency table. Then, q = 1, and the standard arrangement of the elements of λ would be such that,

$X_{ll} = (\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 1 & 0 & 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 1 \\ 1 & 1 & 0 & 1 & 1 & 1 & 0 & 1 & 0 & 1 \\ 1 & 0 & 1 & 1 & 1 & 0 & 1 & 0 & 1 & 1 \end{matrix}), λ = (\begin{matrix} λ \\ λ_{1}^{X} \\ λ_{2}^{X} \\ λ_{1}^{Y} \\ λ_{1}^{Z} \\ λ_{11}^{X Y} \\ λ_{21}^{X Y} \\ λ_{11}^{X Z} \\ λ_{21}^{X Z} \\ λ_{11}^{Y Z} \end{matrix}), T = (\begin{matrix} 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{matrix}) .$

After rearranging

$X_{rll} = (\begin{matrix} 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 1 & 0 & 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 1 & 1 & 1 & 0 & 1 & 1 & 0 \\ 1 & 0 & 1 & 1 & 1 & 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 & 1 \end{matrix}), λ_{r} = (\begin{matrix} λ_{1}^{Y} \\ λ_{11}^{X Y} \\ λ_{21}^{X Y} \\ λ_{11}^{Y Z} \\ λ \\ λ_{1}^{X} \\ λ_{2}^{X} \\ λ_{1}^{Z} \\ λ_{11}^{X Z} \\ λ_{21}^{X Z} \end{matrix}), T_{r} = (\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \end{matrix}) .$

See Papathomas [5] for another example where q = 2. From Agresti [2, p. 138], the likelihood equations for a log-linear model $log (μ_{r}) = X_{rll} λ_{r}$ are

$\sum_{j_{1}, \dots, j_{P}} n_{j_{1}, \dots, j_{P}} X_{rll (j_{1}, \dots, j_{P}), j} - \sum_{j_{1}, \dots, j_{P}} {\hat{μ}}_{j_{1}, \dots, j_{P}} X_{rll (j_{1}, \dots, j_{P}), j} = 0,$

where $X_{rll (j_{1}, \dots, j_{P}), j}$ is the element of X_rll in the row that corresponds to $n_{j_{1}, \dots, j_{P}}$ , and column j, j = 1, …, n_λ. As $log (μ_{r}) = X_{rll} λ_{r}$ , includes all interactions between factors other than Y, X_{ll− lt} is the design matrix for a saturated log-linear model for all factors other than Y. Because X_{ll− lt} repeats within X_rll (as shown in (A1)), the n_ll/2 likelihood equations for $log (μ_{r}) = X_{rll} λ_{r}$ , j = n_β + 1, …, n_λ, are also the likelihood equations of a saturated log-linear model for fitting the n_ll/2 observations, $n_{0, j_{2}, \dots, j_{P}} + n_{1, j_{2}, \dots, j_{P}}$ :

$\begin{aligned} \sum_{j_{2}, \dots, j_{P}} (n_{0, j_{2}, \dots, j_{P}} + n_{1, j_{2}, \dots, j_{P}}) X_{ll - lt (j_{2}, \dots, j_{P}), j} \\ = \sum_{j_{2}, \dots, j_{P}} ({\hat{μ}}_{0, j_{2}, \dots, j_{P}} + {\hat{μ}}_{1, j_{2}, \dots, j_{P}}) X_{ll - lt (j_{2}, \dots, j_{P}), j} . \end{aligned}$

Here, $X_{ll - lt (j_{2}, \dots, j_{P}), j}$ is the element of X_{ll− lt} in the row that corresponds to $y_{j_{2}, \dots, j_{P}}$ , and column j, j = n_β + 1, …, n_λ. As these are the likelihood equations of a saturated model,

$n_{0, j_{2}, \dots, j_{P}} + n_{1, j_{2}, \dots, j_{P}} = {\hat{μ}}_{0, j_{2}, \dots, j_{P}} + {\hat{μ}}_{1, j_{2}, \dots, j_{P}},$

and this completes the proof. ▪

Proof of Theorem 3.1. —

All factors in $P ∖ {Y}$ are present in the logistic regression, or no collapsing of cells. From Agresti [2, (p.193)] the likelihood equations for the logistic regression model, $logit (p) = X_{lt} β$ , are

$\sum_{j_{2}, \dots, j_{P}} t_{j_{2}, \dots, j_{P}} y_{j_{2}, \dots, j_{P}} X_{lt (j_{2}, \dots, j_{P}), j} - \sum_{j_{2}, \dots, j_{P}} t_{j_{2}, \dots, j_{P}} {\hat{p}}_{1, j_{2}, \dots, j_{P}} X_{lt (j_{2}, \dots, j_{P}), j} = 0,$

for j = 1, …, n_β. Now,

$\begin{aligned} \frac{{\hat{μ}}_{1, j_{2}, \dots, j_{P}}}{{\hat{μ}}_{0, j_{2}, \dots, j_{P}}} & = \frac{exp (X_{rll (1, j_{2}, \dots, j_{P})} {\hat{λ}}_{r})}{exp (X_{rll (0, j_{2}, \dots, j_{P})} [n_{β} + 1 : n_{λ}] {\hat{λ}}_{r} [n_{β} + 1 : n_{λ}])} \\ = exp (X_{rll (1, j_{2}, \dots, j_{P})} [1 : n_{β}] {\hat{λ}}_{r} [1 : n_{β}]) = exp (X_{lt (j_{2}, \dots, j_{P})} \hat{β}) = \frac{{\hat{p}}_{1, j_{2}, \dots, j_{P}}}{1 - {\hat{p}}_{1, j_{2}, \dots, j_{P}}}, \end{aligned}$

where, a[a₁ : a₂], specifies the vector formed by all elements from the a₁th to the a₂th element of vector a, including the a₁th and a₂th elements. Therefore,

${\hat{p}}_{1, j_{2}, \dots, j_{P}} = \frac{{\hat{μ}}_{1, j_{2}, \dots, j_{P}}}{{\hat{μ}}_{0, j_{2}, \dots, j_{P}} + {\hat{μ}}_{1, j_{2}, \dots, j_{P}}} .$

Thus, to estimate β, the likelihood equations are

$\begin{aligned} \sum_{j_{2}, \dots, j_{P}} t_{j_{2}, \dots, j_{P}} y_{j_{2}, \dots, j_{P}} X_{lt (j_{2}, \dots, j_{P}), j} \\ - \sum_{j_{2}, \dots, j_{P}} t_{j_{2}, \dots, j_{P}} \frac{{\hat{μ}}_{1, j_{2}, \dots, j_{P}}}{{\hat{μ}}_{0, j_{2}, \dots, j_{P}} + {\hat{μ}}_{1, j_{2}, \dots, j_{P}}} X_{lt (j_{2}, \dots, j_{P}), j} = 0 \\ \Rightarrow \sum_{j_{2}, \dots, j_{P}} t_{j_{2}, \dots, j_{P}} y_{j_{2}, \dots, j_{P}} X_{lt (j_{2}, \dots, j_{P}), j} - \sum_{j_{2}, \dots, j_{P}} {\hat{μ}}_{1, j_{2}, \dots, j_{P}} X_{lt (j_{2}, \dots, j_{P}), j} = 0. \end{aligned}$

For the log-linear model, for λ_r[1 : n_β], the likelihood equations are

$\sum_{j_{1}, \dots, j_{P}} n_{j_{1}, j_{2}, \dots, j_{P}} X_{rll (j_{1}, \dots, j_{P}), j} - \sum_{j_{1}, \dots, j_{P}} {\hat{μ}}_{j_{1}, \dots, j_{P}} X_{rll (j_{1}, \dots, j_{P}), j} = 0,$

where j = 1, …, n_β. As, $X_{rll (0, j_{2}, \dots, j_{P}), j} = 0$ for all j, the likelihood equations for estimating λ_r[1 : n_β] are

$\sum_{j_{2}, \dots, j_{P}} n_{1, j_{2}, \dots, j_{P}} X_{rll (1, j_{2}, \dots, j_{P}), j} - \sum_{j_{2}, \dots, j_{P}} {\hat{μ}}_{1, j_{2}, \dots, j_{P}} X_{rll (1, j_{2}, \dots, j_{P}), j} = 0 .$

As, $n_{1, j_{2}, \dots, j_{P}} = t_{j_{2}, \dots, j_{P}} \times y_{j_{2}, \dots, j_{P}}$ , and $X_{lt (j_{2}, \dots, j_{P}), j} = X_{rll (1, j_{2}, \dots, j_{P}), j}$ , the likelihood equations for estimating β and the corresponding λ_r[1 : n_β] are the same. Therefore, $\hat{β} = \hat{λ_{r}} [1 : n_{β}]$ , as the number of equations equals the number of parameters.

Factors not present in the logistic regression, with collapsing of cells. As X_lt repeats J₂ × · · · × J_q times within X_lt*, the likelihood equations for estimating λ_r[1 : n_β], for j = 1, …, n_β, are shown below:

$\begin{aligned} \sum_{j_{2}, \dots, j_{q}} \sum_{j_{q + 1}, \dots, j_{P}} n_{1, j_{2}, \dots, j_{P}} X_{rll (1, j_{2}, \dots, j_{P}), j} - \sum_{j_{2}, \dots, j_{q}} \sum_{j_{q + 1}, \dots, j_{P}} {\hat{μ}}_{1, j_{2}, \dots, j_{P}} X_{rll (1, j_{2}, \dots, j_{P}), j} = 0, \\ \Rightarrow \sum_{j_{q + 1}, \dots, j_{P}} t_{+_{2}, \dots, +_{q}, j_{q + 1}, \dots, j_{P}} y_{+_{2}, \dots, +_{q}, j_{q + 1}, \dots, j_{P}} X_{rll (j_{q + 1}, \dots, j_{P}), j} \\ - \sum_{j_{q + 1}, \dots, j_{P}} {\hat{μ}}_{1, +_{2}, \dots, +_{q}, j_{q + 1}, \dots, j_{P}} X_{rll (j_{q + 1}, \dots, j_{P}), j}, \end{aligned}$

where

$\begin{aligned} {\hat{μ}}_{1, +_{2}, \dots, +_{q}, j_{q + 1}, \dots, j_{P}} = \sum_{j_{2}, \dots, j_{q}} {\hat{μ}}_{1, j_{2}, \dots, j_{q}, j_{q + 1}, \dots, j_{P}}, \\ t_{+_{2}, \dots, +_{q}, j_{q + 1}, \dots, j_{P}} = \sum_{j_{2}, \dots, j_{q}} t_{j_{2}, \dots, j_{q}, j_{q + 1}, \dots, j_{P}} \\ and & y_{+_{2}, \dots, +_{q}, j_{q + 1}, \dots, j_{P}} = \sum_{j_{2}, \dots, j_{q}} y_{j_{2}, \dots, j_{q}, j_{q + 1}, \dots, j_{P}} . \end{aligned}$

These are also the equations for estimating the logistic regression parameters β. So, $\hat{β} = \hat{λ_{r}} [1 : n_{β}]$ , as the number of equations equals the number of parameters. ▪

Proof of Theorem 3.2. —

Consider a vector of cell counts n = {n₁, …, n_ll}, and the log-linear model $log (μ) = X_{ll} λ$ . Then, from Agresti [2], asymptotically:

$Var (\hat{λ}) ≃ {[I (\hat{λ})]}^{- 1} = {[X_{ll}^{⊤} V (\hat{λ}) X_{ll}]}^{- 1} .$

After rearranging the rows and columns of X_ll, consider the log-linear model with linear predictor X_rllλ_r, for cell counts n_r, where n_r is n rearranged to correspond to X_rll. Now

$\begin{aligned} Var ({\hat{λ}}_{r}) & ≃ {[I ({\hat{λ}}_{r})]}^{- 1} = {[X_{rll}^{⊤} V ({\hat{λ}}_{r}) X_{rll}]}^{- 1} = {[X_{rll}^{⊤} (V (\hat{λ_{r}})) X_{rll}]}^{- 1} \\ = [{({(\begin{matrix} V_{1} V_{2} & 0 \\ 0 & V_{2} \end{matrix})}^{1 / 2} (\begin{matrix} X_{lt}^{*} & X_{ll - lt} \\ 0 & X_{ll - lt} \end{matrix}))}^{⊤} \\ \times {{(\begin{matrix} V_{1} V_{2} & 0 \\ 0 & V_{2} \end{matrix})}^{1 / 2} (\begin{matrix} X_{lt}^{*} & X_{ll - lt} \\ 0 & X_{ll - lt} \end{matrix})]}^{- 1} . \end{aligned}$

$V_{1}$ denotes a diagonal matrix with non-zero elements $exp (X_{lt (i)}^{*} (T_{r} {\hat{λ}}_{r}))$ , i = 1, …, n_ll/2. $V_{2}$ denotes a diagonal matrix with non-zero elements $exp (X_{ll - lt (i)} {\hat{λ}}_{ll - lt})$ , i = 1, …, n_ll/2, where ${\hat{λ}}_{ll - lt}$ denotes the MLE for $λ_{r} ∖ T_{r} λ_{r}$ . Now,

$Var ({\hat{λ}}_{r}) ≃ {(\begin{matrix} X_{lt}^{* ⊤} A_{12} X_{lt}^{*} & X_{lt}^{* ⊤} A_{12} X_{ll - lt} \\ X_{ll - lt}^{⊤} A_{12} X_{lt}^{*} & X_{ll - lt}^{⊤} (A_{12} + A_{2}) X_{ll - lt} \end{matrix})}^{- 1},$

where $A_{12} = V_{1} V_{2}$ and $A_{2} = V_{2}$ . From Lutkepohl [10, p. 147, result 2(a)], and Lutkepohl [10, p. 29, line 6], the submatrix H that is formed by the first n_β rows and columns of $Var ({\hat{λ}}_{r})$ is

$\begin{aligned} H & = {[X_{lt}^{* ⊤} A_{12} X_{lt}^{*} - X_{lt}^{* ⊤} A_{12} X_{ll - lt} {(X_{ll - lt}^{⊤} (A_{12} + A_{2}) X_{ll - lt})}^{- 1} X_{ll - lt}^{⊤} A_{12} X_{lt}^{*}]}^{- 1} \\ = {[X_{lt}^{* ⊤} A_{12} X_{lt}^{*} - X_{lt}^{* ⊤} A_{12} X_{ll - lt} X_{ll - lt}^{- 1} {(A_{12} + A_{2})}^{- 1} {(X_{ll - lt}^{⊤})}^{- 1} X_{ll - lt}^{⊤} A_{12} X_{lt}^{*}]}^{- 1} \\ = {[X_{lt}^{* ⊤} A_{12} X_{lt}^{*} - X_{lt}^{* ⊤} A_{12} {(A_{12} + A_{2})}^{- 1} A_{12} X_{lt}^{*}]}^{- 1} \\ = {[X_{lt}^{* ⊤} (A_{12} - A_{12} {(A_{12} + A_{2})}^{- 1} A_{12}) X_{lt}^{*}]}^{- 1} \\ = {[X_{lt}^{* ⊤} (A_{12} - A_{12} (A_{12} {(I + A_{12}^{- 1} A_{2}))}^{- 1} A_{12}) X_{lt}^{*}]}^{- 1} \\ = {[X_{lt}^{* ⊤} (A_{12} - A_{12} {(I + A_{12}^{- 1} A_{2})}^{- 1}) X_{lt}^{*}]}^{- 1} . \end{aligned}$

Thus,

$\begin{aligned} H & = {[X_{lt}^{* ⊤} (V_{1} V_{2} - V_{1} V_{2} {(I + V_{1}^{- 1} V_{2}^{- 1} V_{2})}^{- 1}) X_{lt}^{*}]}^{- 1} \\ = {[X_{lt}^{* ⊤} (V_{1} V_{2} - V_{1}^{2} V_{2} {(I + V_{1})}^{- 1}) X_{lt}^{*}]}^{- 1} \\ = {[X_{lt}^{* ⊤} [(V_{1} V_{2} (I + V_{1}) - V_{1}^{2} V_{2}) {(I + V_{1})}^{- 1}] X_{lt}^{*}]}^{- 1} \\ = {[X_{lt}^{* ⊤} (V_{1} V_{2} {(I + V_{1})}^{- 1}) X_{lt}^{*}]}^{- 1} . \end{aligned}$

All factors in $P ∖ {Y}$ are present in the logistic regression, or no collapsing of cells. Assume cell counts are not collapsed (by choice or when q = 1), so that n_lt = n_ll/2 and X_lt* = X_lt. We now use the standard result (e.g. [11, (p. 200)]) that, asymptotically, the Binomial distribution $Bin (t_{i}, (exp (X_{lt (i)} (T_{r} λ_{r}))) / (1 + exp (X_{lt (i)} (T_{r} λ_{r}))))$ of a data point t_i y_i, i = 1, …, n_lt, can be approximated by $Poisson (t_{i} (exp (X_{lt (i)} (T_{r} λ_{r}))) / (1 + exp (X_{lt (i)} (T_{r} λ_{r}))))$ . Considering the Poisson log-linear model, the Binomial observation t_i − t_i × y_i follows the Poisson distribution:

$Poisson (exp (X_{ll - lt (i)} {\hat{λ}}_{ll - lt})) .$

Therefore, approximately,

$t_{i} \frac{1}{1 + exp (X_{lt (i)} (T_{r} {\hat{λ}}_{r}))} ≃ exp (X_{ll - lt (i)} {\hat{λ}}_{ll - lt}) .$

In matrix notation, we can now write that, asymptotically,

$\begin{aligned} Var (T_{r} {\hat{λ}}_{r}) & = T_{r} (Var ({\hat{λ}}_{r})) T_{r}^{⊤} \\ = (\begin{matrix} I & 0 \end{matrix}) (Var ({\hat{λ}}_{r})) (\begin{matrix} I \\ 0 \end{matrix}) \\ = {(X_{lt}^{⊤} V_{logistic} X_{lt})}^{- 1}, \end{aligned}$

where $V_{logistic}$ has diagonal elements $t_{i} exp {X_{lt (i)} \hat{β}} exp {1 + X_{lt (i)} \hat{β}}^{- 2}$ , i = 1, …, n_lt. ${(X_{lt}^{⊤} V_{logistic} X_{lt})}^{- 1}$ is, asymptotically, the variance of $\hat{β}$ when the logistic regression is fitted directly, and this completes the proof when no collapsing of cell counts takes place.

Factors not present in the logistic regression, with collapsing of cells. When one chooses to collapse the counts in the contingency table cells that are only discriminated by the obsolete variables x_.2, …, x_.q,

$H = [X_{lt}^{⊤} (V_{1, reduced} {(I + V_{1, reduced})}^{- 1})] {[V_{2, 1} + V_{2, 2} + \dots + V_{2, (j_{1} - 1) \times j_{2} \times \dots \times j_{q}}] X_{lt}]}^{- 1},$

where $V_{1, reduced}$ denotes a diagonal matrix with non-zero elements $exp (X_{lt (i)} (T_{r} {\hat{λ}}_{r}))$ , i = 1, …, n_lt. $V_{2, k}$ , k = 1, …, J₂ × · · · × J_q, denotes a diagonal matrix with elements $exp (X_{ll - lt (n_{lt} (k - 1) + i)} {\hat{λ}}_{ll - lt})$ . Similar to the previous case, we use the standard result that, asymptotically, the Binomial distribution $Bin (t_{i}, (exp (X_{lt (i)}^{*} (T_{r} λ_{r}))) / (1 + exp (X_{lt (i)}^{*} (T_{r} λ_{r}))))$ of a data point t_i y_i, i = 1, …, n_lt, can be approximated by $Poisson (t_{i} (exp (X_{lt (i)}^{*} (T_{r} λ_{r}))) / (1 + exp (X_{lt (i)}^{*} (T_{r} λ_{r}))))$ . When cell counts are collapsed, the Binomial observation t_i − t_i × y_i is formed by adding J₂ × · · · × J_q independent Poisson cell counts. Considering the Poisson log-linear model, t_i − t_i y_i follows the Poisson distribution:

$Poisson (exp (X_{ll - lt (i)} {\hat{λ}}_{ll - lt}) + \dots + exp (X_{ll - lt (n_{lt} (J_{2} \times \dots \times J_{q} - 1) + i)} {\hat{λ}}_{ll - lt})) .$

Therefore, approximately

$\begin{aligned} t_{i} \frac{1}{1 + exp (X_{lt (i)} (T_{r} {\hat{λ}}_{r}))} \\ ≃ exp (X_{ll - lt (i)} {\hat{λ}}_{ll - lt}) + \dots + exp (X_{ll - lt (n_{lt} (J_{2} \times \dots \times J_{q} - 1) + i)} {\hat{λ}}_{ll - lt}) . \end{aligned}$

In matrix notation, we can now write that, asymptotically

$\begin{aligned} Var (T_{r} {\hat{λ}}_{r}) & = T_{r} (Var ({\hat{λ}}_{r})) T_{r}^{⊤} \\ = (\begin{matrix} I & 0 \end{matrix}) (Var ({\hat{λ}}_{r})) (\begin{matrix} I \\ 0 \end{matrix}) \\ ≃ {[X_{lt}^{⊤} (t V_{1, reduced} {(I + V_{1, reduced})}^{- 2}) X_{lt}]}^{- 1} \\ = {(X_{lt}^{⊤} V_{logistic} X_{lt})}^{- 1}, \end{aligned}$

where t is a diagonal matrix with diagonal elements the number of trials t_i, and $V_{logistic}$ has diagonal elements $t_{i} exp {X_{lt (i)} \hat{β}} exp {1 + X_{lt (i)} \hat{β}}^{- 2}$ , i = 1, …, n_lt. ${(X_{lt}^{⊤} V_{logistic} X_{lt})}^{- 1}$ is, asymptotically, the variance of $\hat{β}$ when the logistic regression is fitted directly, and this completes the proof. ▪

Proof of Theorem 3.3. —

Assume that no cell observations are collapsed when one or more factors in $P ∖ {Y}$ are not present in the logistic regression. From (2.2),

$\begin{aligned} D (\hat{p}, y) \\ = 2 \sum_{j_{2}, \dots, j_{P}} n_{1, j_{2}, \dots, j_{P}} log (\frac{n_{1, j_{2}, \dots, j_{P}}}{n_{0, j_{2}, \dots, j_{P}} + n_{1, j_{2}, \dots, j_{P}}} \times {(\frac{exp (X_{lt (j_{2}, \dots, j_{P})} \hat{β})}{1 + exp (X_{lt (j_{2}, \dots, j_{P})} \hat{β})})}^{- 1}) \\ + 2 \sum_{j_{2}, \dots, j_{P}} n_{0, j_{2}, \dots, j_{P}} log (\frac{n_{0, j_{2}, \dots, j_{P}}}{n_{0, j_{2}, \dots, j_{P}} + n_{1, j_{2}, \dots, j_{P}}} \times {(\frac{1}{1 + exp (X_{lt (j_{2}, \dots, j_{P})} \hat{β})})}^{- 1}) . \end{aligned}$

This, in turn, is equal to

$2 \sum_{j_{2}, \dots, j_{P}} n_{1, j_{2}, \dots, j_{P}} log (n_{1, j_{2}, \dots, j_{P}}) + 2 \sum_{j_{2}, \dots, j_{P}} n_{0, j_{2}, \dots, j_{P}} log (n_{0, j_{2}, \dots, j_{P}})$ A 2

$- 2 \sum_{j_{2}, \dots, j_{P}} n_{1, j_{2}, \dots, j_{P}} log (exp (X_{lt (j_{2}, \dots, j_{P})} \hat{β}))$ A 3

$and - 2 \sum_{j_{2}, \dots, j_{P}} (n_{0, j_{2}, \dots, j_{P}} + n_{1, j_{2}, \dots, j_{P}}) log (\frac{n_{0, j_{2}, \dots, j_{P}} + n_{1, j_{2}, \dots, j_{P}}}{1 + exp (X_{lt (j_{2}, \dots, j_{P})} \hat{β})}) .$ A 4

For the log-linear model, from (2.1),

$\begin{aligned} D (\hat{μ}, n) & = 2 \sum_{i = 1}^{n_{ll}} n_{i} log (\frac{n_{i}}{{\hat{μ}}_{i}}) \\ = 2 \sum_{j_{2}, \dots, j_{P}} n_{0, j_{2}, \dots, j_{P}} log (\frac{n_{0, j_{2}, \dots, j_{P}}}{{\hat{μ}}_{0, j_{2}, \dots, j_{P}}}) + 2 \sum_{j_{2}, \dots, j_{P}} n_{1, j_{2}, \dots, j_{P}} log (\frac{n_{1, j_{2}, \dots, j_{P}}}{{\hat{μ}}_{1, j_{2}, \dots, j_{P}}}) \\ = 2 \sum_{j_{2}, \dots, j_{P}} n_{0, j_{2}, \dots, j_{P}} log (\frac{n_{0, j_{2}, \dots, j_{P}}}{exp (X_{ll (0, j_{2}, \dots, j_{P})} \hat{λ})}) \\ + 2 \sum_{j_{2}, \dots, j_{P}} n_{1, j_{2}, \dots, j_{P}} log (\frac{n_{1, j_{2}, \dots, j_{P}}}{exp (X_{ll (1, j_{2}, \dots, j_{P})} \hat{λ})}) \end{aligned}$

This, in turn, is equal to

$2 \sum_{j_{2}, \dots, j_{P}} n_{0, j_{2}, \dots, j_{P}} log (n_{0, j_{2}, \dots, j_{P}}) + 2 \sum_{j_{2}, \dots, j_{P}} n_{1, j_{2}, \dots, j_{P}} log (n_{1, j_{2}, \dots, j_{P}})$ A 5

$- 2 \sum_{j_{2}, \dots, j_{P}} n_{1, j_{2}, \dots, j_{P}} (X_{rll (1, j_{2}, \dots, j_{P})} [1 : n_{β}] {\hat{λ}}_{r} [1 : n_{β}])$ A 6

$- 2 \sum_{j_{2}, \dots, j_{P}} n_{0, j_{2}, \dots, j_{P}} X_{rll (0, j_{2}, \dots, j_{P})} [n_{β} + 1 : n_{λ}] {\hat{λ}}_{r} [n_{β} + 1 : n_{λ}]$ A 7

$and - 2 \sum_{j_{2}, \dots, j_{P}} n_{1, j_{2}, \dots, j_{P}} X_{rll (1, j_{2}, \dots, j_{P})} [n_{β} + 1 : n_{λ}] {\hat{λ}}_{r} [n_{β} + 1 : n_{λ}] .$ A 8

Now, (A2)=(A5) by inspection. Furthermore, from theorem 3.2, $\hat{β} = {\hat{λ}}_{r} [1 : n_{β}]$ . As,

$X_{rll (1, j_{2}, \dots, j_{P})} [1 : n_{β}] {\hat{λ}}_{r} [1 : n_{β}] = X_{lt (j_{2}, \dots, j_{P})} \hat{β},$

we have that (A3)=(A6). Finally, from Lemma 3.1,

$n_{0, j_{2}, \dots, j_{P}} + n_{1, j_{2}, \dots, j_{P}} = {\hat{μ}}_{0, j_{2}, \dots, j_{P}} + {\hat{μ}}_{1, j_{2}, \dots, j_{P}} .$

Also,

$1 + exp (X_{lt (j_{2}, \dots, j_{P})} \hat{β}) = \frac{1}{{\hat{p}}_{0, j_{2}, \dots, j_{P}}} .$

Then,

$\begin{aligned} (A 4) & = - 2 \sum_{j_{2}, \dots, j_{P}} n_{1, j_{2}, \dots, j_{P}} log (\frac{{\hat{μ}}_{0, j_{2}, \dots, j_{P}} + {\hat{μ}}_{1, j_{2}, \dots, j_{P}}}{1 / {\hat{p}}_{0, j_{2}, \dots, j_{P}}}) \\ - 2 \sum_{j_{2}, \dots, j_{P}} n_{0, j_{2}, \dots, j_{P}} log (\frac{{\hat{μ}}_{0, j_{2}, \dots, j_{P}} + {\hat{μ}}_{1, j_{2}, \dots, j_{P}}}{1 / {\hat{p}}_{0, j_{2}, \dots, j_{P}}}) \\ = - 2 \sum_{j_{2}, \dots, j_{P}} n_{1, j_{2}, \dots, j_{P}} log ({\hat{μ}}_{0, j_{2}, \dots, j_{P}}) - 2 \sum_{j_{2}, \dots, j_{P}} n_{0, j_{2}, \dots, j_{P}} log ({\hat{μ}}_{0, j_{2}, \dots, j_{P}}) = (A7) + (A8) . \end{aligned}$

This completes the proof of theorem 3.4. ▪

Data analysed in §4. The dataset for log-linear model (M2) is given by vector

\begin{aligned} n = & (44, 40, 112, 67, 129, 145, 12, 23, 35, 12, 80, 33, 109, 67, 7, 9, 23, 32, 70, 66, 50, \\ 80, 7, 13, 24, 25, 73, 57, 51, 63, 7, 16, 5, 7, 21, 9, 9, 17, 1, 4, 4, 3, 11, 8, 14, 17, 5, 2, 7, \\ 3, 14, 14, 9, 16, 2, 3, 4, 0, 13, 11, 5, 14, 4, 4) . \end{aligned}

The dataset for the logistic regression (M3) is

\begin{aligned} t = & (84, 179, 274, 35, 47, 113, 176, 16, 55, 136, 130, 20, 49, 130, 114, 23, 12, 30, 26, \\ 5, 7, 19, 31, 7, 10, 28, 25, 5, 4, 24, 19, 8), \end{aligned}

\begin{aligned} y = (\frac{40}{84}, \frac{67}{179}, \frac{145}{274}, \frac{23}{35}, \frac{12}{47}, \frac{33}{113}, \frac{67}{176}, \frac{9}{16}, \frac{32}{55}, \frac{66}{136}, \\ \frac{80}{130}, \frac{13}{20}, \frac{25}{49}, \frac{57}{130}, \frac{63}{114}, \frac{16}{23}, \frac{7}{12}, \frac{9}{30}, \frac{17}{26}, \frac{4}{5}, \frac{3}{7}, \frac{8}{19}, \\ \frac{17}{31}, \frac{2}{7}, \frac{3}{10}, \frac{14}{28}, \frac{16}{25}, \frac{3}{5}, \frac{0}{4}, \frac{11}{24}, \frac{14}{19}, \frac{4}{8}) . \end{aligned}

The dataset for (M4) is,

t = (305, 340, 186, 230, 229, 180, 207, 164)

and

y = (\frac{123}{305}, \frac{189}{340}, \frac{56}{186}, \frac{95}{230}, \frac{115}{229}, \frac{112}{180}, \frac{93}{207}, \frac{97}{164}) .

Data accessibility

All data considered in this manuscript are provided in appendix A. This article has no additional data.

Authors' contributions

W.J. and M.P. contributed equally to all parts of this manuscript. All authors gave final approval for publication.

Competing interests

We declare we have no competing interests.

Funding

The first author acknowledge the support of the School of Mathematics and Statistics, as well as CREEM, at the University of St Andrews, and the University of St Andrews St Leonard’s 7th Century Scholarship.

References

1.Christensen R. 1997. Log-linear models and logistic regression, 2nd edn New York, NY: Springer. [Google Scholar]
2.Agresti A. 2002. Categorical data analysis, 2nd edn Princeton, NJ: John Wiley and Sons. [Google Scholar]
3.Christensen R. 1996. Plane answers to complex questions. The theory of linear models, 4th edn New York, NY: Springer. [Google Scholar]
4.Bapat RB. 2001. Graphs and matrices. New Delhi, India: Springer; Hindustan Book Agency. [Google Scholar]
5.Papathomas M. 2018. On the correspondence from Bayesian log-linear modelling to logistic regression modelling with g-priors. Test 27, 197–220. ( 10.1007/s11749-017-0540-8) [DOI] [Google Scholar]
6.Wood SN. 2006. Generalized additive models: an introduction with R. New York, NY: Chapman and Hall. [Google Scholar]
7.Edwards D, Havránek T. 1985. A fast procedure for model search in multi-dimensional contingency tables. Biometrika 72, 339–351. ( 10.1093/biomet/72.2.339) [DOI] [Google Scholar]
8.Zhou J, Bhattacharya A, Herring AH, Dunson DB. 2015. Bayesian factorizations of big sparse tensors. J. Am. Statist. Assoc. 110, 1562–1576. ( 10.1080/01621459.2014.983233) [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Papathomas M, Richardson S. 2016. Exploring dependence between categorical variables: benefits and limitations of using variable selection within Bayesian clustering in relation to log-linear modelling with interaction terms. J. Stat. Plan. Infer. 173, 47–63. ( 10.1016/j.jspi.2016.01.002) [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Lutkepohl H. 1996. Handbook of matrices. Chichester, UK: John Wiley and Sons. [Google Scholar]
11.Rohatgi VK. 1976. An introduction to probability theory and mathematical statistics. New York, NY: John Wiley and Sons. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reviewer comments

rsos191483_review_history.pdf^{(422.3KB, pdf)}

Data Availability Statement

All data considered in this manuscript are provided in appendix A. This article has no additional data.

[RSOS191483C1] 1.Christensen R. 1997. Log-linear models and logistic regression, 2nd edn New York, NY: Springer. [Google Scholar]

[RSOS191483C2] 2.Agresti A. 2002. Categorical data analysis, 2nd edn Princeton, NJ: John Wiley and Sons. [Google Scholar]

[RSOS191483C3] 3.Christensen R. 1996. Plane answers to complex questions. The theory of linear models, 4th edn New York, NY: Springer. [Google Scholar]

[RSOS191483C4] 4.Bapat RB. 2001. Graphs and matrices. New Delhi, India: Springer; Hindustan Book Agency. [Google Scholar]

[RSOS191483C5] 5.Papathomas M. 2018. On the correspondence from Bayesian log-linear modelling to logistic regression modelling with g-priors. Test 27, 197–220. ( 10.1007/s11749-017-0540-8) [DOI] [Google Scholar]

[RSOS191483C6] 6.Wood SN. 2006. Generalized additive models: an introduction with R. New York, NY: Chapman and Hall. [Google Scholar]

[RSOS191483C7] 7.Edwards D, Havránek T. 1985. A fast procedure for model search in multi-dimensional contingency tables. Biometrika 72, 339–351. ( 10.1093/biomet/72.2.339) [DOI] [Google Scholar]

[RSOS191483C8] 8.Zhou J, Bhattacharya A, Herring AH, Dunson DB. 2015. Bayesian factorizations of big sparse tensors. J. Am. Statist. Assoc. 110, 1562–1576. ( 10.1080/01621459.2014.983233) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191483C9] 9.Papathomas M, Richardson S. 2016. Exploring dependence between categorical variables: benefits and limitations of using variable selection within Bayesian clustering in relation to log-linear modelling with interaction terms. J. Stat. Plan. Infer. 173, 47–63. ( 10.1016/j.jspi.2016.01.002) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSOS191483C10] 10.Lutkepohl H. 1996. Handbook of matrices. Chichester, UK: John Wiley and Sons. [Google Scholar]

[RSOS191483C11] 11.Rohatgi VK. 1976. An introduction to probability theory and mathematical statistics. New York, NY: John Wiley and Sons. [Google Scholar]

PERMALINK

On the correspondence of deviances and maximum-likelihood and interval estimates from log-linear to logistic regression modelling

W Jing

M Papathomas

Abstract

1. Introduction

2. Deviances and the information matrix

2.1. Log-linear regression

2.2. Logistic regression

3. Results

Lemma 3.1. —

Proof. —

Theorem 3.2. —

Proof. —

Theorem 3.3. —

Proof. —

Theorem 3.4. —

Proof. —

4. Illustration

Table 1.

5. Discussion

Supplementary Material

Acknowledgements

Appendix A.

Proof of Lemma 3.1. —

Proof of Theorem 3.1. —

Proof of Theorem 3.2. —

Proof of Theorem 3.3. —

Data accessibility

Authors' contributions

Competing interests

Funding

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases