Let Continuous Outcome Variables Remain Continuous

Enayatollah Bakhshi; Brian McArdle; Kazem Mohammad; Behjat Seifi; Akbar Biglarian

doi:10.1155/2012/639124

. 2012 May 29;2012:639124. doi: 10.1155/2012/639124

Let Continuous Outcome Variables Remain Continuous

Enayatollah Bakhshi ^1,^*, Brian McArdle ², Kazem Mohammad ³, Behjat Seifi ⁴, Akbar Biglarian ¹

PMCID: PMC3368309 PMID: 22693539

Abstract

The complementary log-log is an alternative to logistic model. In many areas of research, the outcome data are continuous. We aim to provide a procedure that allows the researcher to estimate the coefficients of the complementary log-log model without dichotomizing and without loss of information. We show that the sample size required for a specific power of the proposed approach is substantially smaller than the dichotomizing method. We find that estimators derived from proposed method are consistently more efficient than dichotomizing method. To illustrate the use of proposed method, we employ the data arising from the NHSI.

1. Introduction

Recently, logistic regression has become a popular tool in biomedical studies. The parameter in logistic regression has the interpretation of log odds ratio, which is easy for people such as physicians to understand. Probit and complementary log-log are alternatives to logistic model. For a covariate X and a binary response variable Y, let π(X) = P(Y = 1 | X = x). A related model to the complementary log-log link is the log-log link. For it, π(x) approaches 0 sharply but approaches 1 slowly. When the complementary log-log model holds for the probability of a success, the log-log model holds for the probability of a failure [1].

These models use a categorical (dichotomous or polytomous) outcome variable. In many areas of research, the outcome data are continuous. Many researchers have no hesitation in dichotomizing a continuous variable, but this practice does not make use of within-category information. Several investigators have noted the disadvantages of dichotomizing both independent and outcome variables [2–10]. Ragland [11] showed that the magnitude of odds ratio and statistical power depend on the cutpoint used to dichotomize the response variable. From a clinical point of view, binary outcomes may be preferred for some reasons such as (1) setting diagnostic criteria for disease, (2) offering a simpler interpretation of common effect measures from statistical models such as odds ratios and relative risks. However, all advantages come at the lost information. From a statistical point of view, this loss of information means more samples which are required to attain prespecified powers.

Moser and Coombs [12] provided a closed-form relationship that allows a direct comparison between the logistic and linear regression coefficients. They also provided a procedure that allows the researcher to analyze the original continuous outcome without dichotomizing. To date, a method that applies the complementary log-log model without dichotomizing and without loss of information has not been available.

We aim to (a) provide a method that allows the researcher to estimate the coefficients of the complementary log-log model without dichotomizing and without loss of information, (b) show that the coefficient of the complementary log-log model can be interpreted in terms of the regression coefficients, (c) demonstrate that the coefficient estimates from this method have smaller variances and shorter confidence intervals than the dichotomizing method.

2. Methods

2.1. Model

Let y ₁, y ₂,…, y _n be n independent observations on y, and let x ₁, x ₂,…, x _p−1 be p − 1 predictor variables thought to be related to the response variable y. The multiple linear regression model for the ith observation can be expressed as

\begin{matrix} y_{i} & = β_{0} + β_{1} x_{i 1} + β_{2} x_{i 2} \\ + \dots + β_{p - 1} x_{i p - 1} + E_{i} i = 1,2, \dots, n, \end{matrix}

(1)

y_{i} = x_{i} β + E_{i} i = 1,2, \dots, n,

(2)

where

x_{i} = (1, x_{i 1}, x_{i 2}, \dots, x_{i p - 1}) .

(3)

To complete the model, we make the following assumptions:

E(E _i) = 0 for i = 1,2,…, n,
var(E _i) = σ ² for i = 1,2,…, n,
the independent E _i follows an extreme value distribution for i = 1,2,…, n.

Writing the model for each of the n observations, in matrix form, we have

[\begin{matrix} y_{1} \\ y_{2} \\ . \\ . \\ y_{n} \end{matrix}] = [\begin{matrix} 1 x_{11} & x_{12} & \dots & x_{1 p - 1} \\ 1 x_{21} & x_{22} & \dots & x_{2 p - 1} \\ . \\ . \\ 1 x_{n 1} & x_{n 2} & \dots & x_{n p - 1} \end{matrix}] [\begin{matrix} β_{0} \\ β_{1} \\ . \\ . \\ β_{p - 1} \end{matrix}] + [\begin{matrix} E_{1} \\ E_{2} \\ . \\ . \\ E_{n} \end{matrix}],

(4)

y = X β + E .

(5)

The preceding three assumptions on E _i and y _i can be expressed in terms of this model:

E(E) = 0,
cov⁡(E) = σ ² I,
the E _i is extreme value (0, σ ²) for i = 1,2,…, n.

2.2. (Largest) Extreme Value Distribution

The PDF and CDF of the extreme value distribution are given by

\begin{matrix} f (y ∣ x β, σ) & = \frac{π}{σ \sqrt{6}} \\ \times \exp (- \frac{y - x β - k σ}{σ} \times \frac{π}{\sqrt{6}} \\ - \exp (\frac{y - x β - k σ}{σ} \times \frac{π}{\sqrt{6}})) \\ - \infty 〈 x 〈 \infty, σ 〉 0, \\ P (y \leq c) & = \exp (- \exp (- \frac{c - x β + k σ}{σ} \times \frac{π}{\sqrt{6}})) \\ - \infty 〈 x 〈 \infty, σ 〉, k \approx 0.45 . \end{matrix}

(6)

It is easy to check that

\begin{matrix}  \end{matrix} \begin{matrix} ω_{j} & = \frac{\ln π_{1}}{\ln π_{2}} = \frac{\ln (p (y \leq c ∣ x))}{\ln (p (y \leq c ∣ x_{(- 1, j)}))} \\ = \frac{- \exp (- ((c - x^{'} β + k σ) / σ) \times π / \sqrt{6})}{- \exp (- ((c - x_{(- 1, j)}^{'} β + k σ) / σ) \times π / \sqrt{6})} \\ = \exp (\frac{π}{\sqrt{6}} \cdot \frac{β_{j}}{σ}) \Rightarrow π_{1} = {π_{2}}^{\exp ((π / \sqrt{6}) \cdot (β_{j} / σ))}, \end{matrix}

(7)

where

\begin{matrix} x & = (1, x_{1}, \dots, x_{j}, \dots, x_{p - 1}), \\ x_{(- 1, j)} & = (1, x_{1}, \dots, x_{j} - 1, \dots, x_{p - 1}), \\ β & = {(β_{0}, β_{1}, \dots, β_{j}, \dots, β_{p - 1})}^{'} . \end{matrix}

(8)

To return to a random sample of observations (y ₁, y ₂,…, y _n), we conclude that the PDF and CDF of each independent y _i are given by (6), and the corresponding equality (7) is given by

\frac{\ln {\hat{π}}_{1}}{\ln {\hat{π}}_{2}} = \exp (\frac{π}{\hat{σ} \sqrt{6}} {\hat{β}}_{j}),

(9)

where the estimate ${\hat{β}}_{j}$ is the (j + 1)th element of vector $\hat{β} = {({\hat{β}}_{0}, {\hat{β}}_{1}, \dots, {\hat{β}}_{j}, \dots, {\hat{β}}_{p - 1})}^{'}$ . It is readily shown that the results also hold true for the smallest extreme value distribution (Appendix A).

2.3. The Proposed Confidence Intervals

Let

\begin{matrix} \hat{β} & = {({\hat{β}}_{0}, {\hat{β}}_{1}, \dots, {\hat{β}}_{j}, \dots, {\hat{β}}_{p - 1})}^{'} \\ = {(X^{'} X)}^{- 1} X^{'} Y j = 0, \dots, p - 1, \\ {\hat{σ}}^{2} & = \frac{Y^{'} (I_{n} - X {(X^{'} X)}^{- 1} X^{'}) Y}{(n - p)} . \end{matrix}

(10)

According to the preceding three assumptions on E _i and y _i, we obtain

\begin{matrix} E (\hat{β}) & = E [{(X^{'} X)}^{- 1} X^{'} Y] \\ = {(X^{'} X)}^{- 1} X^{'} E Y = {(X^{'} X)}^{- 1} X^{'} X β = β, \\ E ({\hat{σ}}^{2}) & = \frac{1}{n - p} E (Y^{'} (I_{n} - X {(X^{'} X)}^{- 1} X^{'}) Y) \\ = \frac{1}{n - p} {tr [(I_{n} - X {(X^{'} X)}^{- 1} X^{'}) σ^{2} I] \\ + E (Y^{'}) [I_{n} - X {(X^{'} X)}^{- 1} X^{'}] E (Y)} \\ = \frac{1}{n - p} {σ^{2} tr [I_{n} - X {(X^{'} X)}^{- 1} X^{'}] \\ + β^{'} X^{'} [I_{n} - X {(X^{'} X)}^{- 1} X^{'}] X β} \\ = \frac{1}{n - p} {σ^{2} [n - tr (X {(X^{'} X)}^{- 1} X^{'})] \\ + β^{'} X^{'} X β - β^{'} X^{'} X {(X^{'} X)}^{- 1} X^{'} X β} \\ = \frac{1}{n - p} {σ^{2} [n - tr (X {(X^{'} X)}^{- 1} X^{'})] \\ + β^{'} X^{'} X β - β^{'} X^{'} X β} \\ = \frac{1}{n - p} σ^{2} [n - tr (I_{P})] = \frac{1}{n - p} σ^{2} (n - p) = σ^{2} . \end{matrix}

(11)

Therefore, $\hat{β}$ and ${\hat{σ}}^{2}$ are unbiased estimators of β and σ ².

We have assumed that E _i is distributed as an extreme value, and we use the approximation of the extreme value distribution of the errors E _i by the normal distribution. For normally distributed observations, ${\hat{β}}_{j} / (\hat{σ} \sqrt{δ_{j}})$ follows a noncentral t distribution with n − p degree of freedom and noncentrality parameter $- \infty < β_{j} / (σ \sqrt{δ_{j}}) < \infty$ ,

\begin{matrix} 1 - α & = P {t_{1 - (α / 2)} [n - p, \frac{β_{j}}{(σ \sqrt{δ_{j}})}] \\ < \frac{{\hat{β}}_{j}}{(\hat{σ} \sqrt{δ_{j}})} < t_{α / 2} [n - p, \frac{β_{j}}{(σ \sqrt{δ_{j}})}]}, \end{matrix}

(12)

where t _α/2[r, s] represents the 100(1 − (α/2)) percentile point of a noncentral t distribution with r degrees of freedom and noncentrality parameter −∞ < s < ∞, and δ _j is the (j + 1)st diagonal element of (X′X)⁻¹. We use the approximation of the percentiles of the noncentral t distribution by the standard normal percentiles [13], then

\begin{matrix} 1 - α & = P {\begin{matrix} \frac{β_{j} / (σ \sqrt{δ_{j}}) - z_{α / 2} {[1 + (β_{j}^{2} / (σ^{2} δ_{j}) - z_{α / 2}^{2}) / 2 (n - p)]}^{1 / 2}}{1 - (z_{α / 2}^{2} / 2 (n - p))} < \\ \frac{{\hat{β}}_{j}}{(\hat{σ} \sqrt{δ_{j}})} < \frac{β_{j} / (σ \sqrt{δ_{j}}) + z_{α / 2} {[1 + ((β_{j}^{2} / (σ^{2} δ_{j}) - z_{α / 2}^{2}) / 2 (n - p))]}^{1 / 2}}{1 - (z_{α / 2}^{2} / 2 (n - p))} \end{matrix}}, \\ {(\frac{β_{j}}{σ})}^{U} & = {\frac{{\hat{β}}_{j}}{\hat{σ}} [1 - \frac{z_{α / 2}^{2}}{2 (n - p)}] + z_{α / 2} {[δ_{j} (1 + (\frac{({\hat{β}}_{j}^{2} / {\hat{σ}}^{2} δ_{j}) - z_{α / 2}^{2}}{2 (n - p)}))]}^{1 / 2}}, \\ {(\frac{β_{j}}{σ})}^{L} & = {\frac{{\hat{β}}_{j}}{\hat{σ}} [1 - \frac{z_{α / 2}^{2}}{2 (n - p)}] - z_{α / 2} {[δ_{j} (1 + (\frac{({\hat{β}}_{j}^{2} / {\hat{σ}}^{2} δ_{j}) - z_{α / 2}^{2}}{2 (n - p)}))]}^{1 / 2}}, \end{matrix}

(13)

Thus, we obtain an approximate 100(1 − α) percent confidence interval for ω _j

{\exp [\frac{π}{\sqrt{6}} {(\frac{β_{j}}{σ})}^{L}], \exp {[\frac{π}{\sqrt{6}} {(\frac{β_{j}}{σ})}^{U}]}^{}} .

(14)

3. Comparison of the Two Methods

Let Y _i be a continuous outcome variable. For fixed value of C, we define Y _i* such that

Y_{i}^{*} = {\begin{matrix} 1 if Y_{i}^{} \geq C, \\ 0 if {Y_{i}}^{} < C . \end{matrix}

(15)

Suppose that Y ₁*,…, Y _n* form a random sample of observations, and we fit a complementary log-log model

\begin{matrix} π_{i 1} = P (Y_{i}^{*} = 1 ∣ x_{i}) = \exp (- \exp (x_{i}^{} θ)), \\ π_{i 2} = P (Y_{i}^{*} = 1 ∣ x_{(- 1, i)}) = \exp (- \exp (x_{(- 1, i)} θ)), \end{matrix}

(16)

where x _i = (1,x _i1,…,x _i,p−1)′ is the P × 1 vector of covariates for the ith observation, and θ = (θ ₀,…,θ _p−1)′ is the P × 1 vector of unknown parameters. The dichotomized ω _j* parameter corresponding to the effect θ _j is

\begin{matrix} ω_{j}^{*} & = \frac{\ln (π_{1})}{\ln (π_{2})} \\ = \frac{\ln (P (Y^{*} = 1 ∣ x))}{\ln (P (Y^{*} = 1 ∣ x_{(- 1, j)}))} \\ = \frac{(\exp (x θ))}{(\exp (x_{(- 1, j)} θ))} \\ = \exp (θ_{j}) j = 0, \dots, p - 1 . \end{matrix}

(17)

In general, maximum likelihood estimation (MLE) can be used to estimate the parameter θ = (θ ₀,…, θ _p−1). Let $\hat{θ} = {({\hat{θ}}_{0}, \dots, {\hat{θ}}_{p - 1})}^{'}$ be the P × 1 ML estimate of θ, and let $COV (\hat{θ})$ be the P × P covariance matrix of $\hat{θ}$ . Using $COV (\hat{θ})$ from (23), one can construct confidence intervals. This matrix has as its diagonal the estimated variances of each of the ML estimates. The (j + 1)th diagonal element is given by $σ_{{\hat{θ}}_{j}}^{2}$ . Therefore,

{\hat{ω}}_{j}^{*} = \exp ({\hat{θ}}_{j}),

(18)

and for large samples, $({\hat{θ}}_{j}^{L}, {\hat{θ}}_{j}^{U}) = ({\hat{θ}}_{j} - z_{α / 2} {\hat{σ}}_{{\hat{θ}}_{j}}, {\hat{θ}}_{j} + z_{α / 2} {\hat{σ}}_{{\hat{θ}}_{j}})$ is a 100(1 − α) percent confidence interval for the true θ _j. Then $(\exp ({\hat{θ}}_{j}^{L}), \exp ({\hat{θ}}_{j}^{U}))$ is a 100(1 − α) percent confidence interval for the true ω _j*.

We now compare the ω _j from (7) with the ω _j* from (17)

\begin{matrix}  \end{matrix} \begin{matrix} \begin{matrix} ω_{j} = \frac{\ln (π_{1})}{\ln (π_{2})} \\ ω_{j}^{*} = \frac{\ln (π_{1})}{\ln (π_{2})} \end{matrix} & \Rightarrow ω_{j}^{*} = ω_{j} \\ \Rightarrow \exp (\frac{π}{\sqrt{6}} \cdot \frac{β_{j}}{σ}) = \exp (θ_{j}) \\ \Rightarrow \frac{π}{\sqrt{6}} \cdot \frac{β_{j}}{σ} = θ_{j} \forall β_{j}, θ_{j}, σ . \end{matrix}

(19)

This show that the coefficient of the complementary log-log model, θ _j, can be interpreted in terms of the regression coefficients, β _j. Note that β are related to the responses through the general linear regression model

y_{i} = x_{i} β + E_{i} i = 1, \dots, n,

(20)

where the independent E _i are distributed as an extreme value with mean 0 and variance σ ² > 0.

4. Covariance Matrix of Model Parameter Estimators

4.1. Derivation of var(ω _j*) for Large n

The information matrix of generalized linear models has the form ∫ = X′WX [1], where W is the diagonal matrix with diagonal elements w _i = (∂μ _i/∂η _i)²/(var(y _i)), y is response variable with independent observations (y ₁,…y _n), and x _ij denote the value of predictor j,

μ_{i} = E (y_{i}), η_{i} = g (μ_{i}) = \sum_{j} θ_{j} x_{i j}, j = 0,1, \dots, p - 1 .

(21)

The covariance matrix of $\hat{θ}$ is estimated by ${(X^{'} \hat{W} X)}^{- 1}$ .

Maximum likelihood estimation for the complementary log-log model is a special case of the generalized linear models. Let

\begin{matrix}  \end{matrix} \begin{matrix} μ_{i} & = π_{i} = \exp (- \exp (\sum_{j} θ_{j} x_{i j})) \\ \Rightarrow π_{i} = \exp (- \exp (η_{i})), \\ \frac{\partial μ_{i}}{\partial η_{i}} & = {(- \exp (η_{i}))}^{'} \exp (- \exp (η_{i})) = π_{i} \ln π_{i}, \\ w_{i} & = \frac{{(π_{I} \ln π_{i})}^{2}}{π_{i} (1 - π_{i})} = \frac{π_{i} {(\ln π_{i})}^{2}}{1 - π_{i}}, \end{matrix}

(22)

then

X^{'} W X = [\begin{matrix} \sum_{i = 1}^{n} \frac{π_{i} {(\ln π_{i})}^{2}}{1 - π_{i}} & \sum_{i = 1}^{n} x_{i 1} \frac{π_{i} {(\ln π_{i})}^{2}}{1 - π_{i}} & \dots & \sum_{i = 1}^{n} x_{i, p - 1} \frac{π_{i} {(\ln π_{i})}^{2}}{1 - π_{i}} \\ \sum_{i = 1}^{n} x_{i 1} \frac{π_{i} {(\ln π_{i})}^{2}}{1 - π_{i}} & \sum_{i = 1}^{n} x_{i 1}^{2} \frac{π_{i} {(\ln π_{i})}^{2}}{1 - π_{i}} & \dots & \sum_{i = 1}^{n} x_{1} x_{i, p - 1} \frac{π_{i} {(\ln π_{i})}^{2}}{1 - π_{i}} \\ ⋮ & ⋮ \\ \sum_{i = 1}^{n} x_{i, p - 1} \frac{π_{i} {(\ln π_{i})}^{2}}{1 - π_{i}} & \sum_{i = 1}^{n} x_{1 i} x_{i, p - 1} \frac{π_{i} {(\ln π_{i})}^{2}}{1 - π_{i}} & \dots & \sum_{i = 1}^{n} x_{i, p - 1}^{2} \frac{π_{i} {(\ln π_{i})}^{2}}{1 - π_{i}} \end{matrix}] .

(23)

It is readily shown that the results hold true for the largest extreme value distribution (Appendix A).

In large samples, $var ({\hat{θ}}_{j})$ approaches $σ_{θ_{j}}^{2} |_{θ = \hat{θ}}$ [14] which equals the (j + 1)th diagonal element of (X′WX)⁻¹.

By applying the delta method, let $f ({\hat{θ}}_{j}) = \exp ({\hat{θ}}_{j})$ , then

\begin{matrix}  \end{matrix} \begin{matrix} var ({\hat{ω}}_{j}^{*}) \to var (\exp ({\hat{θ}}_{j})) & = var (f ({\hat{θ}}_{j})) \\ = {(\frac{\partial f ({\hat{θ}}_{j})}{\partial {\hat{θ}}_{j}} |_{{\hat{θ}}_{j} = θ_{j}})}^{2} (var ({\hat{θ}}_{j})) \\ = {(\exp (θ_{j}))}^{2} \times σ_{{\hat{θ}}_{j}}^{2} . \end{matrix}

(24)

4.2. Derivation of $var ({\hat{ω}}_{j})$ for Large n

In large samples, from (10) ${\hat{σ}}^{2} \to σ^{2}$ [15]. Therefore,

var ({\hat{ω}}_{j}) = var (\exp (\frac{π {\hat{β}}_{j}}{\hat{σ} \sqrt{6}})) \to var (\exp (\frac{π {\hat{β}}_{j}}{σ \sqrt{6}})) .

(25)

In addition, $var ({\hat{β}}_{j}) = σ^{2} δ_{j}$ .

By applying the delta method, let $g ({\hat{β}}_{j}) = \exp (π {\hat{β}}_{j} / (σ \sqrt{6}))$ , then

\begin{matrix} var ({\hat{ω}}_{j}) \to var (\exp (\frac{π {\hat{β}}_{j}}{σ \sqrt{6}})) \\ = var (g ({\hat{β}}_{j})) \\ = {(\frac{\partial g ({\hat{β}}_{j})}{\partial {\hat{β}}_{j}} |_{{\hat{β}}_{j} = β_{j}})}^{2} \times var ({\hat{β}}_{j}) \\ = {(\frac{π}{σ \sqrt{6}} \exp (\frac{π β_{j}}{σ \sqrt{6}}))}^{2} σ^{2} δ_{j} \\ = \frac{π^{2}}{\sqrt{6}} δ_{j} {(\exp \frac{π β_{j}}{σ \sqrt{6}})}^{2} . \end{matrix}

(26)

5. Sample Sizes Saving

5.1. The Power for the Dichotomized Method

In large samples, ${\hat{σ}}_{{\hat{θ}}_{j}}$ converges to $σ_{{\hat{θ}}_{j}}$ almost surely [14]. Therefore, for a given value of ω _j = exp⁡θ _j (i.e., ln⁡ω _j = θ _j), the power is given by

\begin{matrix}  \end{matrix} \begin{matrix} p (ω_{j}) & = p {rejection of ω_{j} = 1 ∣ ω_{j} \neq 1} \\ = p {\exp (θ_{j}^{L}) > 1 ∣ θ_{j}} + p {\exp (θ_{j}^{U}) < 1 ∣ θ_{j}} \\ = p {{\hat{θ}}_{j} > z_{α / 2} σ_{{\hat{θ}}_{j}} ∣ θ_{j}} + p {{\hat{θ}}_{j} < - z_{α / 2} σ_{{\hat{θ}}_{j}} ∣ θ_{j}} \\ = p {Z > \frac{z_{α / 2} σ_{{\hat{θ}}_{j}} - \ln ω_{j}}{σ_{{\hat{θ}}_{j}}}} \\ + p {Z < \frac{- z_{α / 2} σ_{{\hat{θ}}_{j}} - \ln ω_{j}}{σ_{{\hat{θ}}_{j}}}} \\ = p {Z > z_{α / 2} - \frac{\ln ω_{j}}{σ_{{\hat{θ}}_{j}}}} + p {Z < - z_{α / 2} - \frac{\ln ω_{j}}{σ_{{\hat{θ}}_{j}}}} \\ = P {Z > z_{1}^{•}} + P {Z < - z_{2}^{•}}, \end{matrix}

(27)

where

{\begin{matrix} z_{1}^{•} = z_{α / 2} - \frac{\ln ω_{j}}{σ_{{\hat{θ}}_{j}}} \\ z_{2}^{•} = z_{α / 2} + \frac{\ln ω_{j}}{σ_{{\hat{θ}}_{j}}} \end{matrix}} .

(28)

5.2. The Power for the Proposed Method

In large samples, $\hat{σ}$ converges to σ almost surely [15]. Therefore, for a given value of $ω_{j} = \exp (π β_{j} / σ \sqrt{6})$ (i.e., $β_{j} = σ (\ln ω_{j} \sqrt{6} / π)$ ), the power is given by

\begin{matrix} p (ω_{j}) & = p {\exp (\frac{π}{\sqrt{6}} {(\frac{β_{j}}{σ})}^{L}) > 1 ∣ ω_{j}} \\ + p {\exp (\frac{π}{\sqrt{6}} {(\frac{β_{j}}{σ})}^{U}) < 1 ∣ ω_{j}} \\ = P {β_{J}^{L} > z_{α / 2} σ \sqrt{δ_{j}} ∣ β_{j} = \frac{σ \ln ω_{j} \sqrt{6}}{π}} \\ + P {β_{J}^{U} < - z_{α / 2} σ \sqrt{δ_{j}} ∣ β_{j} = \frac{σ \ln ω_{j} \sqrt{6}}{π}} \\ = p {Z > \frac{z_{α / 2} σ \sqrt{δ_{j}} - (σ \ln ω_{j} \sqrt{6} / π)}{σ \sqrt{δ_{j}}}} \\ + p {Z < \frac{- z_{α / 2} σ \sqrt{δ_{j}} - (σ \ln ω_{j} \sqrt{6} / π)}{σ \sqrt{δ_{j}}}} \\ = p {Z > z_{α / 2} - \frac{\ln ω_{j} \sqrt{6}}{π \sqrt{δ_{j}}}} \\ + p {Z < - z_{α / 2} - \frac{\ln ω_{j} \sqrt{6}}{π \sqrt{δ_{j}}}} \\ = p {Z > z_{1}} + p {Z < - z_{2}}, \end{matrix}

(29)

where

{\begin{matrix} z_{1} = z_{α / 2} - \frac{\ln ω_{j} \sqrt{6}}{π \sqrt{δ_{j}}} \\ z_{2} = z_{α / 2} + \frac{\ln ω_{j} \sqrt{6}}{π \sqrt{δ_{j}}} \end{matrix}} .

(30)

Our proposed method, since it is based on continuous data rather than dichotomized, is likely to be more powerful. We show that the proposed method can produce substantial sample size saving for a given power. Let

the number of parameters p = 2 (i.e., θ = (θ ₀, θ ₁)),
x _i = (1,x _i1)′, x _i1 ∈ {−a + (2an/(g − 1)) | n = 0,…, g − 1}, that is, x _i1 follows a discrete uniform distribution with range (−a, a). For simplicity, a = 2.
Total samples are n and n* for the proposed and dichotomized methods, respectively. These samples included k and k* set of these g uniformly distributed points for the proposed and dichotomized methods, respectively. That is, n = gk and n* = gk*, then

δ_{j} = {[k \sum_{i = 1}^{g} {(x_{1 i} - {\overset{̅}{x}}_{1 .})}^{2}]}^{- 1}, j = 1,

(31)

and from (23),

σ_{{\hat{θ}}_{j}}^{2} = \frac{\sum_{i = 1}^{g} ((π_{i}) {(\ln (π_{i}))}^{2} / \ln (1 - π_{i}))}{(k^{*}) {\sum_{i = 1}^{g} x_{1 i}^{2} ((π_{i}) {(\ln (π_{i}))}^{2} / \ln (1 - π_{i})) \sum_{i = 1}^{g} ((π_{i}) {(\ln (π_{i}))}^{2} / \ln (1 - π_{i})) - {[\sum_{i = 1}^{g} x_{1 i} ((π_{i}) {(\ln (π_{i}))}^{2} / \ln (1 - π_{i}))]}^{2}}} .

(32)

We consider the same power for two methods:

\begin{matrix}  \end{matrix} \begin{matrix} \begin{matrix} z_{1} = z_{1}^{*} \\ z_{2} = z_{2}^{*} \end{matrix} \Rightarrow {\begin{matrix} z_{α / 2} - \frac{\ln ω_{j}}{σ_{{\hat{θ}}_{j}}} = z_{α / 2} - \frac{\ln ω_{j} \sqrt{6}}{π \sqrt{δ_{j}}} \\ z_{α / 2} + \frac{\ln ω_{j}}{σ_{{\hat{θ}}_{j}}} = z_{α / 2} + \frac{\ln ω_{j} \sqrt{6}}{π \sqrt{δ_{j}}} \end{matrix} \Rightarrow \frac{π}{\sqrt{6}} \sqrt{δ_{j}} = σ_{{\hat{θ}}_{j}}, j = 1 \Rightarrow \frac{π}{\sqrt{6}} \sqrt{{[k \sum_{i = 1}^{g} {(x_{1 i} - {\overset{̅}{x}}_{1 .})}^{2}]}^{- 1}} \\ = \sqrt{\frac{\sum_{i = 1}^{g} ((π_{i}) {(\ln (π_{i}))}^{2} / \ln (1 - π_{i}))}{(k^{*}) {\sum_{i = 1}^{g} x_{1 i}^{2} ((π_{i}) {(\ln (π_{i}))}^{2} / \ln (1 - π_{i})) \sum_{i = 1}^{g} ((π_{i}) {(\ln (π_{i}))}^{2} / \ln (1 - π_{i})) - {[\sum_{i = 1}^{g} x_{1 i} ((π_{i}) {(\ln (π_{i}))}^{2} / \ln (1 - π_{i}))]}^{2}}}} \end{matrix}

(33)

relative sample size

\begin{matrix} \frac{n^{*}}{n} & = \frac{k^{*}}{k} = \frac{6 σ_{{\hat{θ}}_{j}}^{2}}{π^{2} δ_{j}} \\ = \frac{\sum_{i = 1}^{g} {(x_{1 i} - {\overset{̅}{x}}_{1 .})}^{2} \times \sum_{i = 1}^{g} ((π_{i}) {(\ln (π_{i}))}^{2} / \ln (1 - π_{i}))}{(π^{2} / 6) {\sum_{i = 1}^{g} x_{1 i}^{2} ((π_{i}) {(\ln (π_{i}))}^{2} / \ln (1 - π_{i})) \sum_{i = 1}^{g} ((π_{i}) {(\ln (π_{i}))}^{2} / \ln (1 - π_{i})) - {[\sum_{i = 1}^{g} x_{1 i} ((π_{i}) {(\ln (π_{i}))}^{2} / \ln (1 - π_{i}))]}^{2}}} . \end{matrix}

(34)

That is, (34) is independent of σ ² and applies for any power, and any test size α.

Table 1 presents relative sample sizes n*/n for a given fixed parameter ω _j* and an average proportion of success $\overset{̅}{π}$ . We consider the situations in which $\overset{̅}{π} = \sum_{i = 1}^{g} (π_{i} / g) = 0.1,0.2,0.3,0.4,0.5$ , g = 9, ω _j* = 0.25,0.50,0.75.

Table 1.

Relative sample sizes required to attain any power for the dichotomizing method versus the proposed method.

ω* = exp⁡(θ)	Average proportion of successes ( $\overset{̅}{π}$ )
ω* = exp⁡(θ)	0.1	0.2	0.3	0.4	0.5
0.25	23.7166	9.5092	7.4954	7.1996	6.8575
0.50	10.6719	5.4176	3.4215	2.5209	2.1784
0.75	7.7088	3.8713	2.5171	1.9380	1.5841

Open in a new tab

For given fixed ω _j* and $\overset{̅}{π}$ , the relative sample sizes in Table 1 can be computed by the following step:

compute the value θ _j via the equation θ _j = ln⁡(ω _j*),
calculate the cut-off point C iteratively such that $\overset{̅}{π}$ attained the specified value for the values x _i1, using the value of θ _j in (i).

As can be seen from Table 1, all values are greater than 1. The values of n*/n increase as the ω _j* moves farther away from 1. Values of Table 1 immediately highlight the improvement accomplished by the proposed method.

6. Relative Efficiency of ${\hat{ω}}_{j}$ with ${\hat{ω}}_{j}^{*}$

Here, we examine the relative efficiency of the estimate ${\hat{ω}}_{j}$ to the estimate ${\hat{ω}}_{j}^{*}$ .

Using (24) and (26), the relative efficiency is given by

\begin{matrix} r . e . ({\hat{ω}}_{j}, {\hat{ω}}_{j}^{*}) & = \frac{var ({\hat{ω}}_{j}^{*})}{var ({\hat{ω}}_{j})} \\ = \frac{6 {(\exp (θ_{j}))}^{2} \times σ_{{\hat{θ}}_{j}}^{2}}{π^{2} δ_{j} {(\exp (λ β_{j} / σ))}^{2}} = \frac{6 σ_{{\hat{θ}}_{j}}^{2}}{π^{2} δ_{j}} . \end{matrix}

(35)

Note that the relative efficiency is independent of n and σ ² and converges to a constant. Comparing (34) and (35), the relative efficiency equals the relative sample sizes. Therefore, as in Table 1, the proposed method is a consistent improvement over the dichotomizing method with respect to relative efficiencies.

It should be noted that these results hold true under the following assumptions:

the responses y _i and β are related through the equation y _i = x _i β + E _i where the independent E _i are distributed as an extreme value with mean 0 and variance σ ² > 0,
the independent variables x _i follow a discrete uniform distribution.

7. Odds Ratio

For values of π larger than 0.90, −ln⁡(π) and π/(1 − π) are very close. Hence, for large values of π,

\frac{\ln (π_{1})}{\ln (π_{2})} ≅ \frac{π_{1} / 1 - π_{1}}{π_{2} / 1 - π_{2}} = OR .

(36)

And from (7), odds ratio is given by

OR = \exp (\frac{π}{\sqrt{6}} \cdot \frac{β_{j}}{σ}) .

(37)

The parameters estimated from the linear regression can be interpreted as an odds ratio.

8. Simulation Study

It should be noted that, as in Table 1, the proposed method is a consistent improvement over the dichotomizing method with respect to relative efficiencies. These results hold true under the assumption that predictor variable has a discrete uniform distribution and that the random variables E _i follow an extreme value distribution. To demonstrate the robustness of this conclusion to changes in the distributions of predictor variables, simulations were run under different distributional conditions. The data were sampled 10000 times for three sample sizes {n = 250, 500, 1000}, three average proportions of successes ${\overset{̅}{π} = 0.10, 0.50, 0.95}$ , and seven ω _j{ω _j = 0.75, 0.90, 1.1, 1.2, 1.3, 1.4, 1.5}. The simulated data are generated using the following algorithm

Generate y _i, where y _i = β ₀ + β ₁ x _i + E _i, $β_{1} = \sqrt{6} \ln ω_{j} / π$ through (7) to produce the correct ω _j, and for simplicity β ₀ = 0, σ ² = 1.
For fixed $\overset{̅}{π}$ , generate cutoff point C using (15).

We simulated the data for two scenarios based on the distribution of the explanatory variable. In the first scenario, the independent variable follows a continuous uniform distribution and range (−2, 2), and in the second, the independent variable follows a truncated normal distribution with mean 0 and range (−2, 2). The relative mean square errors, relative interval lengths, absolute biases, and the probability of coverage were calculated.

Results of the simulations addressing the validity of the proposed method are displayed in Tables 2 and 3.

Table 2.

Simulated relative mean square errors, relative intervals lengths, coverage probabilities, and absolute biases for the proposed and dichotomizing methods (using a continuous uniform distribution for the explanatory variable and an extreme value distribution for the errors).

Sample size	ω Cut off	.75	.9	1.1	1.2	1.3	1.4	1.5
		1.15^a	1.07	1.09	1.14	1.24	1.47	1.71
		1.10^b	1.03	1.03	1.07	1.14	1.23	1.35
	0.10	0.943^c	0.948	0.949	0.949	0.945	0.938	0.933
		0.948^d	0.947	0.949	0.947	0.951	0.947	0.953
		0.05^e	0.04	0.12	0.14	0.10	0.15	0.11
		0.07^f	0.01	0.17	0.13	0.24	0.34	0.58

		1.23	1.26	1.27	1.28	1.27	1.24	1.26
		2.16	1.13	1.23	1.14	1.15	1.17	1.19
1000	0.50	0.940	0.951	0.951	0.945	0.942	0.937	0.934
		0.951	0.949	0.951	0.950	0.948	0.947	0.948
		0.04	0.01	0.08	0.10	0.05	0.09	0.04
		0.05	0.04	0.15	0.12	0.09	0.12	0.13

		12.75	12.44	13.22	12.68	13.14	12.91	12.79
		3.67	3.57	3.58	3.63	3.69	3.76	3.84
	0.95	0.943	0.951	0.952	0.944	0.944	0.938	0.929
		0.952	0.954	0.952	0.952	0.951	0.951	0.951
		0.04	0.07	0.11	0.10	0.10	0.17	0.10
		0.75	0.68	0.86	1.01	1.21	1.45	1.24

		1.30	1.08	1.07	1.17	1.24	1.54	1.95
		1.16	1.03	1.04	1.08	1.15	1.25	1.39
	0.10	0.942	0.950	0.951	0.95	0.944	0.941	0.936
		0.951	0.950	0.949	0.951	0.954	0.954	0.953
		0.12	0.07	0.24	0.25	0.21	0.18	0.29
		0.23	0.08	0.33	0.39	0.41	0.73	1.21

		1.35	1.10	1.27	1.26	1.26	1.25	1.26
		1.26	1.03	1.13	1.14	1.16	1.17	1.20
500	0.50	0.940	0.949	0.947	0.948	0.943	0.940	0.933
		0.952	0.951	0.949	0.949	0.954	0.950	0.951
		0.23	0.34	0.27	0.23	0.26	0.25	0.38
		0.48	0.11	0.17	0.18	0.31	0.26	0.42

		13.04	13.17	13.8	13.90	14.45	14.48	14.47
		3.72	3.65	3.68	3.73	3.82	3.91	3.99
	0.95	0.942	0.947	0.951	0.949	0.947	0.938	0.935
		0.953	0.952	0.954	0.955	0.955	0.953	0.954
		0.05	0.11	0.08	0.08	0.24	0.32	0.27
		0.94	1.38	1.78	1.92	2.52	3.00	2.90

		13.41	14.46	1.12	1.28	1.52	1.96	2.33
		3.78	3.73	1.04	1.09	1.18	1.30	1.45
	0.10	0.942	0.949	0.949	0.945	0.942	0.942	0.933
		0.957	0.954	0.948	0.949	0.952	0.957	0.953
		0.02	0.20	0.38	0.33	0.42	0.41	0.66
		2.11	2.74	0.42	0.84	1.18	1.78	2.24

		1.27	1.25	1.32	1.28	1.30	1.30	1.29
		1.16	1.13	1.13	1.14	1.16	1.18	1.20
250	0.50	0.941	0.948	0.952	0.947	0.945	0.943	0.933
		0.951	0.951	0.951	0.950	0.951	0.951	0.951
		0.12	0.13	0.35	0.44	0.41	0.53	0.55
		0.11	0.22	0.39	0.47	0.51	0.74	0.59

		12.98	14.6	15.64	15.46	17.05	16.89	18.33
		3.75	3.72	3.82	3.88	4.01	4.12	4.29
	0.95	0.945	0.955	0.946	0.948	0.940	0.937	0.932
		0.959	0.955	0.955	0.959	0.958	0.957	0.952
		0.02	0.16	0.39	0.22	0.46	0.47	0.51
		1.22	2.75	3.97	3.98	4.99	5.19	6.19

Open in a new tab

a: Relative mean square errors, b: Relative intervals lengths, c: Coverage probability (proposed), d: Coverage probability (dichotomized), e: % bias (proposed), f: % bias (dichotomized).

Table 3.

Simulated relative mean square errors, relative intervals lengths, coverage probabilities, and absolute biases for the proposed and dichotomizing methods (using a truncated normal distribution for the explanatory variable and an extreme value distribution for the errors).

Sample size	ω Cut off	.75	.9	1.1	1.2	1.3	1.4	1.5
		1.17^a	1.02	1.08	1.13	1.19	1.28	1.36
		1.11^b	1.03	1.03	1.06	1.10	1.25	1.22
	0.10	0.942^c	0.948	0.948	0.952	0.944	0.942	0.940
		0.951^d	0.951	0.950	0.952	0.949	0.951	0.951
		0.08^e	0.06	0.03	0.14	0.13	0.14	0.16
		0.10^f	0.11	0.15	0.23	0.30	0.39	0.39

		1.26	1.24	1.26	1.28	1.28	1.25	1.28
		1.24	1.13	1.13	1.14	1.14	1.15	1.17
1000	0.50	0.944	0.948	0.952	0.947	0.947	0.944	0.941
		0.948	0.951	0.949	0.949	0.947	0.950	0.949
		0.02	0.09	0.08	0.07	0.18	0.16	0.13
		0.03	0.06	0.12	0.16	0.20	0.16	0.14

		12.33	13.12	13.03	12.71	12.86	12.55	12.88
		3.62	3.59	3.61	3.62	3.64	3.68	3.71
	0.95	0.944	0.951	0.948	0.948	0.945	0.945	0.946
		0.952	0.948	0.95	0.949	0.949	0.951	0.952
		0.10	0.04	0.11	0.04	0.16	0.16	0.20
		1.26	1.05	1.56	1.36	1.43	1.80	1.94

		1.18	1.09	1.06	1.75	1.23	1.32	1.58
		1.11	1.03	1.03	1.06	1.11	1.16	1.23
	0.10	0.945	0.95	0.951	0.951	0.949	0.943	0.944
		0.953	0.953	0.953	0.950	0.949	0.951	0.950
		0.04	0.13	0.31	0.18	0.33	0.36	0.37
		0.21	0.08	0.37	0.50	0.62	0.69	0.96

		1.25	1.27	1.27	1.29	1.27	1.29	1.25
		1.14	1.13	1.13	1.14	1.15	1.16	1.17
500	0.50	0.944	0.948	0.949	0.947	0.948	0.944	0.935
		0.951	0.951	0.951	0.948	0.951	0.948	0.949
		0.13	0.22	0.35	0.37	0.35	0.30	0.44
		0.16	0.19	0.39	0.48	0.44	0.41	0.54

		13.11	14.02	14.02	13.5	13.54	13.80	14.32
		3.73	3.71	3.73	3.75	3.77	3.81	3.86
	0.95	0.944	0.95	0.951	0.950	0.947	0.944	0.944
		0.954	0.95	0.951	0.953	0.948	0.956	0.953
		0.15	0.10	0.24	0.38	0.32	0.33	0.43
		2.50	2.70	2.92	3.10	2.92	3.36	3.89

		1.28	1.11	1.12	1.19	1.33	1.54	1.76
		1.11	1.03	1.04	1.08	1.13	1.19	1.28
	0.10	0.947	0.951	0.950	0.947	0.950	0.950	0.942
		0.951	0.950	0.950	0.952	0.954	0.952	0.951
		0.40	0.34	0.37	0.64	0.69	0.58	0.81
		0.26	0.06	0.69	1.08	1.30	1.55	2.22

		1.32	1.30	1.27	1.33	1.31	1.33	1.31
		1.15	1.13	1.13	1.14	1.18	1.17	1.18
250	0.50	0.951	0.95	0.953	0.951	0.940	0.945	0.940
		0.949	0.951	0.952	0.948	0.948	0.950	0.948
		0.22	0.43	0.57	0.69	0.66	0.58	0.66
		0.38	0.53	0.64	0.89	0.91	0.82	0.91

		14.09	14.51	16.27	15.91	15.89	15.73	15.60
		3.86	3.87	3.93	3.92	3.98	4.04	4.11
	0.95	0.943	0.95	0.951	0.951	0.947	0.944	0.937
		0.953	0.95	0.953	0.956	0.953	0.956	0.952
		0.30	0.37	0.57	0.68	0.42	0.62	0.75
		4.98	5.52	6.547	5.91	6.17	6.88	7.72

Open in a new tab

a: Relative mean square errors, b: Relative intervals lengths, c: Coverage probability (proposed), d: Coverage probability (dichotomized), e: % bias (proposed), f: % bias (dichotomized).

The simulations show that the relative mean square errors are all greater than 1, increasing with the average proportion of successes and when the ω _j moves farther away from 1. The results in Tables 1 and 2 demonstrate that the proposed method provides confidence intervals which successfully maintain their nominal 95 percent coverage. For the proposed method in first scenario, 51 out of 63 coverage probabilities fell within (0.94, 0.96), and all 63 coverage probabilities are greater than 0.93 and, in the second scenario, almost all coverage probabilities fell within (0.94, 0.96). The absolute biases for proposed method are never greater than a few percent. The proposed method is less biased than the dichotomizing method in 6 of 63 simulations in both two scenarios.

9. An Example

To illustrate the application of the proposed method presented in the previous section, we utilize the data arising from the National Health Survey in Iran. The other analyses using this data appear in many places [16].

In this study, 14176 women aged 20–69 years were investigated. BMI (body mass index), our dependent variable, was calculated as weight in kilograms divided by height in meters squared (kg/m²). Independent variables included place of residence, age, smoking, economic index, marital status, and education level. The independent variables considered were both categorical and continuous. At first, BMI was treated as a continuous variable, and ${\hat{ω}}_{j}$ and 95 percent confidence intervals were calculated using the proposed linear regression method. Then subjects were classified into obese (BMI ≥ 30 kg/m²) and nonobese (BMI <30 kg/m²). A complementary log-log model was used for the binary analysis, with obese or nonobese used as the outcome measure. The ${\hat{ω}}_{j}^{*}$ and 95 percent confidence intervals were calculated using the dichotomized method. Table 4 presents the coefficient estimates, estimated confidence intervals, and relative confidence interval lengths. The proposed and dichotomizing methods produced different confidence intervals, although the ${\hat{ω}}_{j}$ and ${\hat{ω}}_{j}^{*}$ were similar only varying slightly. The ${\hat{ω}}_{j}$ estimate from the proposed method had smaller variances and shorter confidence intervals than the dichotomizing method. All relative confidence interval lengths were greater than 2.58.

Table 4.

Adjusted ${\hat{ω}}_{j}^{*}$ , ${\hat{ω}}_{j}$ for obesity and confidence intervals using two methods for the National Health Survey.

Covariates	${\hat{ω}}_{j}$ ( ${\hat{ω}}_{j}^{*}$ )	95% CI^a (proposed)	95% CI (dichotomized)	Relative^b length of CI
Place of residence	1.65 (1.97)^c	1.58–1.74	1.79–2.18	2.43
Age	1.021 (1.019)	1.018–1.022	1.015–1.022	1.75
Years of education	0.99 (0.98)	0.985–0.997	0.971–0.994	1.92
Smoking	0.76 (0.68)	0.66–0.90	0.51–0.92	1.71
Marital status	1.16 (1.42)	1.10–1.22	1.27–1.58	2.58
Lower-middle economy index	1.24 (1.32)	1.14–1.32	1.18–1.48	1.67
Upper-middle economy index	1.21 (1.26)	1.14–1.29	1.12–1.42	2.0
High economy index	1.20 (1.21)	1.11–1.30	1.08–1.36	1.47

Open in a new tab

^aconfidence interval, ^bdichotomized/proposed, ^cproposed (dichotomized).

10. Discussion

When assuming the errors E _i are distributed as an extreme value distribution, as noted before, the method has several advantages. First, the method allows the researcher to apply the complementary log-log model without dichotomizing and without loss of information. Second, the ${\hat{ω}}_{j}^{*}$ from the dichotomizing method is dependent on the chosen cutoff point C and will vary with c. However, the proposed ${\hat{ω}}_{j}$ is independent of the c since ${\hat{ω}}_{j}$ is a function of the continuous Y _i and not a function of the dichotomized Y _i* defined through C. Third, we show that the coefficient of the complementary log-log model, θ _j, can be interpreted in terms of the regression coefficients, β _j. Fourth, when the independent variables x _i follow a discrete uniform distribution, the proposed method is a consistent improvement over the dichotomizing method with respect to relative efficiencies. The proposed method can provide sample size saving, smaller variances, and shorter confidence intervals than the dichotomized method. Fifth, when π is large, the parameters estimated from the linear regression can be interpreted as odds ratios.

Our results were consistent with the findings by Moser and Coombs [12] and Bakhshi et al. [16] showing the greater efficiency of parameter estimates from the regression method that avoids dichotomizing in comparison with a more traditional dichotomizing method using the logistic regression.

Our main recommendation is to let continuous response remain continuous. Do not throw away information by transforming the data to binary. This means that if the objective is to estimate and/or test coefficients when responses are continuous, please resist dichotomizing your response variable.

Conflict of Interests

The authors have declared no conflict of interests.

Appendix

A. Largest Extreme Value Distribution

(a) The PDF and CDF are given by

\begin{matrix} f (y ∣ x β, σ) & = \frac{π}{σ \sqrt{6}} \\ \times \exp (- \frac{y - x β - k σ}{σ} \times \frac{π}{\sqrt{6}} \\ - \exp (\frac{y - x β - k σ}{σ} \times \frac{π}{\sqrt{6}})) \\ - \infty 〈 x 〈 \infty, σ 〉 0, \\ P (y \leq c) & = 1 - \exp (- \exp (- \frac{c - x β - k σ}{σ} \times \frac{π}{\sqrt{6}})) \\ - \infty 〈 x 〈 \infty, σ 〉 0, \end{matrix}

(A.1)

where Y is a continuous outcome variable, x = (1, x ₁,…, x _p−1) is the p × 1 vector of known independent variables, β = (β ₀, β ₁,…, β _p−1) is the p × 1 vector of unknown parameters, and k ≈ 0.45.

It is easy to check that

\begin{matrix}  \end{matrix} \begin{matrix} ω_{j} & = \frac{\ln (1 - π_{1})}{\ln (1 - π_{2})} = \frac{\ln (1 - p (y \leq c ∣ x))}{\ln (1 - p (y \leq c ∣ x_{(- 1, j)}))} \\ = \frac{- \exp (- ((c - x^{'} β - k σ) / σ) \times (π / \sqrt{6}))}{- \exp (- ((c - x_{(- 1, j)}^{'} β - k σ) / σ) \times (π / \sqrt{6}))} \\ = \exp (\frac{π}{\sqrt{6}} \cdot \frac{β_{j}}{σ}) \Rightarrow 1 - π_{1} \\ = {(1 - π_{2})}^{\exp ((π / \sqrt{6}) \cdot (β_{j} / σ))}, \end{matrix}

(A.2)

where

\begin{matrix} x_{} & = (1, x_{1}, \dots, x_{j}, \dots, x_{p - 1}), \\ x_{(- 1, j)} & = (1, x_{1}, \dots, x_{j} - 1, \dots, x_{p - 1}), \\ β & = {(β_{0}, β_{1}, \dots, β_{j}, \dots, β_{p - 1})}^{'} . \end{matrix}

(A.3)

(b) Suppose that E _i is distributed as a largest extreme value with mean 0 and variance σ ² > 0. We conclude that the PDF and CDF of each independent Y _i are given by (A.1), and the corresponding equality (A.2) is given by

{\hat{ω}}_{j} = \frac{\ln (1 - {\hat{π}}_{1})}{\ln (1 - {\hat{π}}_{2})} = \exp (\frac{π}{\sqrt{6}} \cdot \frac{{\hat{β}}_{j}}{\hat{σ}}) .

(A.4)

\begin{matrix} μ_{i} & = π_{i} = 1 - \exp (- \exp (\sum_{j} θ_{j} x_{i j})) \\ \Rightarrow π_{i} = 1 - \exp (- \exp (η_{i})), \\ \frac{\partial μ_{i}}{\partial η_{i}} & = - {(- \exp (η_{i}))}^{'} \exp (- \exp (η_{i})) \\ = - (1 - π_{i}) \ln (1 - π_{i}) \\ w_{i} & = \frac{{((1 - π_{i}) \ln (1 - π_{i}))}^{2}}{π_{i} (1 - π_{i})} \\ = \frac{(1 - π_{i}) {(\ln (1 - π_{i}))}^{2}}{π_{i}}, \end{matrix}

(A.5)

then

X^{'} W X = [\begin{matrix} \sum_{i = 1}^{n} \frac{(1 - π_{i}) {(\ln (1 - π_{i}))}^{2}}{π_{i}} & \sum_{i = 1}^{n} \frac{(1 - π_{i}) {(\ln (1 - π_{i}))}^{2}}{π_{i}} & \dots & \sum_{i = 1}^{n} x_{i, p - 1} \frac{(1 - π_{i}) {(\ln (1 - π_{i}))}^{2}}{π_{i}} \\ \sum_{i = 1}^{n} x_{i 1} \frac{(1 - π_{i}) {(\ln (1 - π_{i}))}^{2}}{π_{i}} & \sum_{i = 1}^{n} x_{i 1}^{2} \frac{(1 - π_{i}) {(\ln (1 - π_{i}))}^{2}}{π_{i}} & \dots & \sum_{i = 1}^{n} x_{1} x_{i, p - 1} \frac{(1 - π_{i}) {(\ln (1 - π_{i}))}^{2}}{π_{i}} \\ ⋮ & ⋮ \\ \sum_{i = 1}^{n} x_{i, p - 1} \frac{(1 - π_{i}) {(\ln (1 - π_{i}))}^{2}}{π_{i}} & \sum_{i = 1}^{n} x_{i} x_{i, p - 1} \frac{(1 - π_{i}) {(\ln (1 - π_{i}))}^{2}}{π_{i}} & \dots & \sum_{i = 1}^{n} x_{i, p - 1}^{2} \frac{(1 - π_{i}) {(\ln (1 - π_{i}))}^{2}}{π_{i}} \end{matrix}] .

(A.6)

References

1.Agresti A. Categorical Data Analysis. 2nd edition. New York, NY, USA: Wiley; 2002. [Google Scholar]
2.Zhao LP, Kolonel LN. Efficiency loss from categorizing quantitative exposures into qualitative exposures in case-control studies. American Journal of Epidemiology. 1992;136(4):464–474. doi: 10.1093/oxfordjournals.aje.a116520. [DOI] [PubMed] [Google Scholar]
3.MacCallum RC, Zhang S, Preacher KJ, Rucker DD. On the practice of dichotomization of quantitative variables. Psychological Methods. 2002;7(1):19–40. doi: 10.1037/1082-989x.7.1.19. [DOI] [PubMed] [Google Scholar]
4.Cohen J. The cost of dichotomization. Applied Psychological Measurement. 1983;7(3):249–253. [Google Scholar]
5.Greenland S. Avoiding power loss associated with categorization and ordinal scores in dose-response and trend analysis. Epidemiology. 1995;6(4):450–454. doi: 10.1097/00001648-199507000-00025. [DOI] [PubMed] [Google Scholar]
6.Austin PC, Brunner LJ. Inflation of the type I error rate when a continuous confounding variable is categorized in logistics regression analyses. Statistics in Medicine. 2004;23(7):1159–1178. doi: 10.1002/sim.1687. [DOI] [PubMed] [Google Scholar]
7.Vargha A, Rudas T, Delaney HD, Maxwell SE. Dichotomization, partial correlation, and conditional independence. Journal of Educational and Behavioral Statistics. 1996;21(3):264–282. [Google Scholar]
8.Maxwell SE, Delaney HD. Bivariate median splits and spurious statistical significance. Psychological Bulletin. 1993;113(1):181–190. [Google Scholar]
9.Streiner DL. Breaking up is hard to do: the heartbreak of dichotomizing continuous data. Canadian Journal of Psychiatry. 2002;47(3):262–266. doi: 10.1177/070674370204700307. [DOI] [PubMed] [Google Scholar]
10.Chen H, Cohen P, Chen S. Biased odds ratios from dichotomization of age. Statistics in Medicine. 2007;26(18):3487–3497. doi: 10.1002/sim.2737. [DOI] [PubMed] [Google Scholar]
11.Ragland DR. Dichotomizing continuous outcome variables: dependence of the magnitude of association and statistical power on the cutpoint. Epidemiology. 1992;3(5):434–440. doi: 10.1097/00001648-199209000-00009. [DOI] [PubMed] [Google Scholar]
12.Moser BK, Coombs LP. Odds ratios for a continuous outcome variable without dichotomizing. Statistics in Medicine. 2004;23(12):1843–1860. doi: 10.1002/sim.1776. [DOI] [PubMed] [Google Scholar]
13.Johnson NL, Welch H, Wei CZ. Application of the non-central t distribution. Biometrika. 1940;31(3-4):362–389. [Google Scholar]
14.Serfling RJ. Approximation Theory of Mathematical Statistics. New York, NY, USA: Wiley; 1980. [Google Scholar]
15.Lai TL, Robbins H, Wei CZ. Strong consistency of least squares estimates in multiple regression. Proceedings of the National Academy of Sciences of the United States of America. 1978;75(7):3034–3036. doi: 10.1073/pnas.75.7.3034. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Bakhshi E, Eshraghian MR, Mohammad K, Seifi B. A comparison of two methods for estimating odds ratios: results from the National Health Survey. BMC Medical Research Methodology. 2008;8, article 78 doi: 10.1186/1471-2288-8-78. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B1] 1.Agresti A. Categorical Data Analysis. 2nd edition. New York, NY, USA: Wiley; 2002. [Google Scholar]

[B2] 2.Zhao LP, Kolonel LN. Efficiency loss from categorizing quantitative exposures into qualitative exposures in case-control studies. American Journal of Epidemiology. 1992;136(4):464–474. doi: 10.1093/oxfordjournals.aje.a116520. [DOI] [PubMed] [Google Scholar]

[B3] 3.MacCallum RC, Zhang S, Preacher KJ, Rucker DD. On the practice of dichotomization of quantitative variables. Psychological Methods. 2002;7(1):19–40. doi: 10.1037/1082-989x.7.1.19. [DOI] [PubMed] [Google Scholar]

[B4] 4.Cohen J. The cost of dichotomization. Applied Psychological Measurement. 1983;7(3):249–253. [Google Scholar]

[B5] 5.Greenland S. Avoiding power loss associated with categorization and ordinal scores in dose-response and trend analysis. Epidemiology. 1995;6(4):450–454. doi: 10.1097/00001648-199507000-00025. [DOI] [PubMed] [Google Scholar]

[B6] 6.Austin PC, Brunner LJ. Inflation of the type I error rate when a continuous confounding variable is categorized in logistics regression analyses. Statistics in Medicine. 2004;23(7):1159–1178. doi: 10.1002/sim.1687. [DOI] [PubMed] [Google Scholar]

[B7] 7.Vargha A, Rudas T, Delaney HD, Maxwell SE. Dichotomization, partial correlation, and conditional independence. Journal of Educational and Behavioral Statistics. 1996;21(3):264–282. [Google Scholar]

[B8] 8.Maxwell SE, Delaney HD. Bivariate median splits and spurious statistical significance. Psychological Bulletin. 1993;113(1):181–190. [Google Scholar]

[B9] 9.Streiner DL. Breaking up is hard to do: the heartbreak of dichotomizing continuous data. Canadian Journal of Psychiatry. 2002;47(3):262–266. doi: 10.1177/070674370204700307. [DOI] [PubMed] [Google Scholar]

[B10] 10.Chen H, Cohen P, Chen S. Biased odds ratios from dichotomization of age. Statistics in Medicine. 2007;26(18):3487–3497. doi: 10.1002/sim.2737. [DOI] [PubMed] [Google Scholar]

[B11] 11.Ragland DR. Dichotomizing continuous outcome variables: dependence of the magnitude of association and statistical power on the cutpoint. Epidemiology. 1992;3(5):434–440. doi: 10.1097/00001648-199209000-00009. [DOI] [PubMed] [Google Scholar]

[B12] 12.Moser BK, Coombs LP. Odds ratios for a continuous outcome variable without dichotomizing. Statistics in Medicine. 2004;23(12):1843–1860. doi: 10.1002/sim.1776. [DOI] [PubMed] [Google Scholar]

[B13] 13.Johnson NL, Welch H, Wei CZ. Application of the non-central t distribution. Biometrika. 1940;31(3-4):362–389. [Google Scholar]

[B14] 14.Serfling RJ. Approximation Theory of Mathematical Statistics. New York, NY, USA: Wiley; 1980. [Google Scholar]

[B15] 15.Lai TL, Robbins H, Wei CZ. Strong consistency of least squares estimates in multiple regression. Proceedings of the National Academy of Sciences of the United States of America. 1978;75(7):3034–3036. doi: 10.1073/pnas.75.7.3034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.Bakhshi E, Eshraghian MR, Mohammad K, Seifi B. A comparison of two methods for estimating odds ratios: results from the National Health Survey. BMC Medical Research Methodology. 2008;8, article 78 doi: 10.1186/1471-2288-8-78. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Let Continuous Outcome Variables Remain Continuous

Enayatollah Bakhshi

Brian McArdle

Kazem Mohammad

Behjat Seifi

Akbar Biglarian

Abstract

1. Introduction

2. Methods

2.1. Model

2.2. (Largest) Extreme Value Distribution

2.3. The Proposed Confidence Intervals

3. Comparison of the Two Methods

4. Covariance Matrix of Model Parameter Estimators

4.1. Derivation of var(ω _j*) for Large n

4.2. Derivation of $var ({\hat{ω}}_{j})$ for Large n

5. Sample Sizes Saving

5.1. The Power for the Dichotomized Method

5.2. The Power for the Proposed Method

Table 1.

6. Relative Efficiency of ${\hat{ω}}_{j}$ with ${\hat{ω}}_{j}^{*}$

7. Odds Ratio

8. Simulation Study

Table 2.

Table 3.

9. An Example

Table 4.

10. Discussion

Conflict of Interests

Appendix

A. Largest Extreme Value Distribution

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Let Continuous Outcome Variables Remain Continuous

Enayatollah Bakhshi

Brian McArdle

Kazem Mohammad

Behjat Seifi

Akbar Biglarian

Abstract

1. Introduction

2. Methods

2.1. Model

2.2. (Largest) Extreme Value Distribution

2.3. The Proposed Confidence Intervals

3. Comparison of the Two Methods

4. Covariance Matrix of Model Parameter Estimators

4.1. Derivation of var(ω j*) for Large n

4.2. Derivation of var(ω^j) for Large n

5. Sample Sizes Saving

5.1. The Power for the Dichotomized Method

5.2. The Power for the Proposed Method

Table 1.

6. Relative Efficiency of ω^j with ω^j∗

7. Odds Ratio

8. Simulation Study

Table 2.

Table 3.

9. An Example

Table 4.

10. Discussion

Conflict of Interests

Appendix

A. Largest Extreme Value Distribution

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

4.1. Derivation of var(ω _j*) for Large n

4.2. Derivation of $var ({\hat{ω}}_{j})$ for Large n

6. Relative Efficiency of ${\hat{ω}}_{j}$ with ${\hat{ω}}_{j}^{*}$