Considering the sample sizes as truncated Poisson random variables in mixed effects models

Célia Nunes; Elsa Moreira; Sandra S Ferreira; Dário Ferreira; João T Mexia

doi:10.1080/02664763.2019.1641188

. 2019 Jul 14;47(13-15):2641–2657. doi: 10.1080/02664763.2019.1641188

Considering the sample sizes as truncated Poisson random variables in mixed effects models

Célia Nunes ^a,^CONTACT, Elsa Moreira ^b, Sandra S Ferreira ^a, Dário Ferreira ^a, João T Mexia ^b

PMCID: PMC9042002 PMID: 35707435

Abstract

When applying analysis of variance, the sample sizes may not be previously known, so it is more appropriate to consider them as realizations of random variables. A motivating example is the collection of observations during a fixed time span in a study comparing, for example, several pathologies of patients arriving at a hospital. This paper extends the theory of analysis of variance to those situations considering mixed effects models. We will assume that the occurrences of observations correspond to a counting process and the sample dimensions have Poisson distribution. The proposed approach is applied to a study of cancer patients.

Keywords: Random sample sizes, mixed effects, L extensions models, F-tests, counting processes, cancer registries

2010 Mathematics Subject Classifications: 62J12, 62J10, 62J99

1. Introduction

In some applications of analysis of variance in medicine, social sciences, economic or agriculture, etc., it is more appropriate to regard the sample sizes as random variables. These situations occur commonly when there is a fixed time span for collecting the observations, other examples arise when some other resource is limited. A motivating example is the collection of data from patients with several pathologies arriving at a hospital during a fixed time span. The number of patients for each pathology is not known in advance and a replication of the study during a different time period of the same length would result in a sample of different size. Therefore, if we plan to conduct just one study to compare the pathologies, it is more appropriate to consider the sample sizes as realizations, $n_{1}, \dots, n_{m}$ , of random variables, $N_{1}, \dots, N_{m}$ , [15,17,20]. Another important case arises when one of the pathologies is rare since, in that case, the desired number of patients in the sample set may not be achieved, [19]. In the cited studies, fixed effects ANOVA was applied. Now we extend the results to mixed effects models to deal with random sample sizes.

The current approach must be based on an adequate choice of the distribution of $N_{1}, \dots, N_{m}$ . In this paper, we will assume that the occurrence of observations corresponds to independent counting processes. An illustrative example of this is the aforementioned case, concerning the comparison of pathologies. This leads us to consider the assumption of $N_{1}, \dots, N_{m}$ being independent and Poisson distributed with parameters $λ_{1}, \dots, λ_{m}$ , $N_{i} \sim P (λ_{i})$ , $i = 1, \dots, m$ [12,15,17–20]. Since we need to have at least one observation per treatment, we will consider the random variables ${\ddot{N}}_{i}$ , $i = 1, \dots, m$ , obtained truncating the random variables $N_{i}$ for $N_{i} \geq 1$ , $i = 1, \dots, m$ (see Appendix 1). Through the independence of ${\ddot{N}}_{i}, i = 1, \dots, m$ , the variable $\ddot{N} = \sum_{i = 1}^{m} {\ddot{N}}_{i}$ has truncated Poisson distribution with parameter

λ = \sum_{i = 1}^{m} λ_{i} .

For different situations, it will be more appropriate to consider other discrete distributions for random sample sizes, such as

the Binomial distribution, when there exists an upper bound for the sample sizes, which however may not be attained (either owing to occurrences of failures or for some other reason). An illustrative example of this is when a planned number of patients are approached but only a proportion of them give consent to be included in the study [16,17];
the Negative Binomial distribution, which can be used as an alternative to the Poisson distribution in cases in which the observations are overdispersed with respect to a Poisson distribution.

This paper is structured as follows. In Section 2, we present the formulation of the mixed models in the context of random sample sizes. The test statistics and their conditional and unconditional distributions are obtained in Section 3. Section 4 presents an application based on real medical data, namely on patients affected by cancer, in order to illustrate the usefulness of our approach. Finally, some concluding remarks are made in Section 5.

2. Model

When considering in mixed models that the sample size are random variables, very likely we will get different number of observations per treatment (combination of factor levels), that is, we have an unbalanced design. In order to cope with unbalanced situations a more broader class of models, designated as L extensions or L models, was developed some years ago in [3] and [14]. Using the L extensions in the formulation of the mixed models with random sample sizes, allow us to deal the lack of orthogonality originated by unbalanced situation.

Let us suppose that the m components of $Y^{o}$ correspond to the treatments of a linear model and

L = L (n) = D (1_{n_{1}}, \dots, 1_{n_{m}})

(1)

be the block diagonal matrix with the principal blocks $1_{n_{1}}, \dots, 1_{n_{m}}$ , where $1_{n}$ denotes the vector with all n components equal to 1 and $n = (n_{1}, \dots, n_{m})^{'}$ . Then

Y = L Y^{o} + ε

(2)

corresponds to a model with sample sizes $n_{1}, \dots, n_{m}$ , where $ε$ is the error vector with null mean vector and variance–covariance matrix $σ^{2} I_{n}$ , with $I_{n}$ the $n \times n$ identity matrix and

n = \sum_{i = 1}^{m} n_{i} .

Let's consider that

Y^{o} = \sum_{i = 0}^{w} X_{i} β_{i},

(3)

where $β_{0}$ is fixed with $c_{0}$ components and $β_{1}, \dots, β_{w}$ are random and independent, with null mean vectors and variance–covariance matrices $σ_{1}^{2} I_{c_{1}}, \dots, σ_{w}^{2} I_{c_{w}}$ , where $c_{i}$ , $i = 1, \dots, w$ , denote the number of components of $β_{i}, i = 1, \dots, w$ . Thus $Y^{o}$ has mean vector and variance–covariance matrix given by

\begin{aligned} μ^{o} & = X_{0} β_{0} \\ V^{o} & = \sum_{i = 1}^{w} σ_{i}^{2} M_{i}, \end{aligned}

with $M_{i} = X_{i} X_{i}^{'}$ , $i = 1, \dots, w$ , where matrices $X_{i}$ have m rows and $c_{i}$ , $i = 0, \dots, w$ , columns, see e.g. [5,8,23]. We point out that $Y^{o}$ and $Y$ are random vectors with m and n components, respectively, since $L$ is an $n \times m$ matrix.

3. Test statistics and their distributions

In this section, we obtain the test statistics and their conditional distribution and unconditional distribution, under the assumption that we have random sample sizes. We will start by presenting some important results about L extensions.

Let us assume that $Y^{o}$ has orthogonal block structure, so the matrices $M_{1}, \dots, M_{w}$ commute and they will be linear combinations of pairwise orthogonal projection matrices $K_{1}, \dots, K_{ℓ}$ , see [2]. Thus we have

M_{i} = \sum_{j = 1}^{ℓ} b_{i, j} K_{j}, i = 1, \dots, w,

and

V^{o} = \sum_{j = 1}^{ℓ} γ_{j} K_{j},

where $γ_{j} = \sum_{i = 1}^{w} b_{i, j} σ_{i}^{2}$ , $j = 1, \dots, ℓ$ . With $B = [b_{i, j}]$ , $γ = (γ_{1}, \dots, γ_{ℓ})^{'}$ and $σ^{2} = (σ_{1}^{2}, \dots, σ_{w}^{2})^{'}$ , we also have

γ = B^{'} σ^{2},

see e.g. [1,2,4,6]. Let's consider that the row vectors of $A_{j}$ , $j = 1, \dots, ℓ$ , constitute an orthonormal basis for the range space of $K_{j}$ , $R (K_{j})$ , $j = 1, \dots, ℓ$ , then we have

\begin{aligned} K_{j} & = A_{j}^{'} A_{j}, j = 1, \dots, ℓ \\ I_{g_{j}} & = A_{j} A_{j}^{'}, j = 1, \dots, ℓ, \end{aligned}

with $g_{j} = rank (K_{j})$ .

Let $L^{+}$ the MOORE-PENROSE inverse of matrix $L$ , then the orthogonal projection matrices (OPM) on $\bar{Ω} = R (L)$ and on its orthogonal complement ${\bar{Ω}}^{⊥}$ are [22]

\begin{aligned} L L^{+} = T \\ I_{n} - T . \end{aligned}

So, with $L = D (1_{n_{1}}, \dots, 1_{n_{m}})$ , we have

L^{+} = D (\frac{1}{n_{1}} 1_{n_{1}}^{'}, \dots, \frac{1}{n_{m}} 1_{n_{m}}^{'}) .

When $Y^{o}$ is independent of $ε \sim N (0, σ^{2} I_{n})$ , i.e. $ε$ is normal with null mean vector and variance–covariance matrix $σ^{2} I_{n}$ , then $T ε$ and $(I_{n} - T) ε$ are also independent, since they have normal joint distribution and null cross-covariance matrices. Therefore

T Y = T L Y^{o} + T ε = L Y^{o} + T ε

and

Y_{{\bar{Ω}}^{⊥}} = (I_{n} - T) Y = (I_{n} - T) ε

are independent.

Since the column vectors of $L$ are linearly independent we have [22]

L^{+} L = I_{m} .

So we can consider [3]

Y^{o o} = L^{+} Y = Y^{o} + L^{+} ε = Y^{o} + L^{+} T ε,

since $L^{+} T = L^{+} L L^{+} = L^{+}$ , independent of $Y_{{\bar{Ω}}^{⊥}}$ , then independent of

S = ‖ Y_{{\bar{Ω}}^{⊥}} ‖^{2},

(4)

where $(1 / σ^{2}) S$ has chi-square distribution with

g (n) = n - m

degrees of freedom, $S \sim σ^{2} χ_{g (n)}^{2}$ .

Let us now observe that $Y^{o o}$ has mean vector and variance–covariance matrix given by

\begin{aligned} μ^{o o} & = μ^{o} = X_{0} β_{0} \\ V^{o o} & = V^{o} + σ^{2} L^{+} (L^{+})^{'} = \sum_{j = 1}^{ℓ} γ_{j} K_{j} + σ^{2} L^{+} (L^{+})^{'} . \end{aligned}

With $L = D (1_{n_{1}}, \dots, 1_{n_{m}})$ , we will have

L^{+} (L^{+})^{'} = D (n_{1}^{- 1}, \dots, n_{m}^{- 1})

and

Y_{j}^{o o} = A_{j} Y^{o o}, j = 1, \dots, ℓ,

has mean vector and variance–covariance matrix

\begin{aligned} μ_{j}^{o o} & = A_{j} μ^{o} = A_{j} X_{0} β_{0}, j = 1, \dots, ℓ \\ V_{j}^{o o} & = γ_{j} I_{g_{j}} + σ^{2} A_{j} (L^{+} (L^{+})^{'}) A_{j}^{'}, j = 1, \dots, ℓ . \end{aligned}

Being $P_{j}$ and $Q_{j}$ the OPM on $R (A_{j} X_{0})$ and $R (A_{j} X_{0})^{⊥}$ , with rank $p_{j}$ and $f_{j} = g_{j} - p_{j}$ , $j = 1, \dots, ℓ$ , respectively and $S_{j}$ and $W_{j}$ the matrices which the row vectors constitute an orthonormal base to $R (A_{j} X_{0})$ and $R (A_{j} X_{0})^{⊥}$ , $j = 1, \dots, ℓ$ , we have

\begin{aligned} P_{j} & = S_{j}^{'} S_{j}, j = 1, \dots, ℓ \\ Q_{j} & = W_{j}^{'} W_{j}, j = 1, \dots, ℓ . \end{aligned}

3.1. Fixed sample sizes

Let us now address the hypothesis tests for the canonical variance components [13], $γ_{1}, \dots, γ_{ℓ}$ , assuming that, with $0 \leq z < ℓ,$

p_{j} < g_{j}, j = z + 1, \dots ., ℓ .

So, let's consider

Y_{j}^{∙} = W_{j} Y_{j}^{o o} = W_{j} A_{j} Y^{o} + W_{j} A_{j} L^{+} ε, j = z + 1, \dots ., ℓ,

which has null mean vector and variance–covariance matrix $γ_{j} I_{f_{j}} + σ^{2} B_{j}, j > z,$ with

B_{j} = W_{j} A_{j} L^{+} (L^{+})^{'} A_{j}^{'} W_{j}^{'}, j = z + 1, \dots, ℓ .

We intend to test the hypothesis

H_{0, j} : γ_{j} = 0, j = z + 1, \dots, ℓ .

(5)

When $H_{0, j}$ holds, we have

p r (W_{j} A_{j} Y^{o} = 0) = 1, j = z + 1, \dots, ℓ,

and consequently

p r (Y_{j}^{∙} = W_{j} A_{j} L^{+} ε) = 1, j = z + 1, \dots, ℓ .

Therefore, when $H_{0, j}$ holds, $Y_{j}^{∙}$ has null mean vector and variance–covariance matrix $σ^{2} B_{j}, j = z + 1, \dots, ℓ,$ and $(1 / σ^{2}) (Y_{j}^{∙})^{'} (B_{j}^{- 1}) Y_{j}^{∙}$ has chi-square distribution with $f_{j}$ degrees of freedom, $(Y_{j}^{∙})^{'} (B_{j}^{- 1}) Y_{j}^{∙} \sim σ^{2} χ_{f_{j}}^{2}$ , $j = z + 1, \dots, ℓ$ [10].

Since $Y_{j}^{o o}$ is independent of S, $Y_{j}^{∙}$ is also independent of S, $j = z + 1, \dots, ℓ .$ Due to this, when $H_{0, j}$ holds, the statistic

F_{j} = \frac{g (n)}{f_{j}} \frac{(Y_{j}^{∙})^{'} (B_{j}^{- 1}) Y_{j}^{∙}}{S}, j = z + 1, \dots, ℓ,

(6)

has central F distribution with $f_{j}$ , $j = z + 1, \dots, ℓ$ , and $g (n)$ degrees of freedom, $F (\cdot | f_{j}, g (n))$ , named as conditional distribution, and $F_{j}$ might be used as the test statistic [21]. Moreover, the tests with the statistic $F_{j}$ , $j = z + 1, \dots, ℓ,$ are unbiased, e.g. [9,10].

3.2. Random sample sizes

Let us consider that $n$ is the realization of a random vector $\ddot{N} = ({\ddot{N}}_{1}, \dots, {\ddot{N}}_{m})^{'}$ , which means that the samples will have random dimensions. In this section, we will focus on the case where

L (\ddot{N}) = D (1_{{\ddot{N}}_{1}}, \dots, 1_{{\ddot{N}}_{m}}),

for this reason the previous results need to be unconditioned in order to $\ddot{N}$ .

Let us now suppose that we intend to test the hypothesis

H_{0} : θ = 0,

where $θ$ is a general parameter, and the test is unbiased whatever $n$ . So, denoting by $p r_{n, θ} (R e j_{α})$ [ $p r_{n, 0} (R e j_{α})$ ] the probability of rejecting $H_{0}$ for a significance level α, given $n$ and the parameter $θ$ [the probability of rejecting $H_{0}$ , given $n$ and $θ = 0$ ], we have

p r_{n, θ} (R e j_{α}) > p r_{n, 0} (R e j_{α}) .

(7)

Unconditioning (7) in order to $\ddot{N}$ , we still obtain

p r_{θ} (R e j_{α}) > p r_{0} (R e j_{α}),

and the test still unbiased.

So, since the tests for the hypothesis $H_{0, j} : γ_{j} = 0, j = z + 1, \dots, ℓ,$ are unbiased whatever $n$ , we can conclude that they still remain unbiased after unconditioning.

Let us assume that the occurrence of observations corresponds to independent counting processes, which lead us to consider that ${\ddot{N}}_{1}, \dots, {\ddot{N}}_{m}$ have truncated Poisson distribution with parameters $λ_{i}$ , $i = 1, \dots, m$ . Furthermore, to perform inference we also consider that $\ddot{N} = \sum_{i = 1}^{m} {\ddot{N}}_{i} > m$ .

In order to avoid unbalanced cases we will assume that we have a global minimum dimension for the samples [12,20]. Therefore, considering $\ddot{N} > m^{∙}$ , with $m^{∙} \geq m$ , we may take the probability

\begin{aligned} {\ddot{p}}_{n, m^{∙}} & = p r (\ddot{N} = n | \ddot{N} > m^{∙}) = \frac{p r (\ddot{N} = n)}{p r (\ddot{N} > m^{∙})} \\ = \frac{{\ddot{p}}_{n}}{p r (\ddot{N} > m)} \frac{p r (\ddot{N} > m)}{p r (\ddot{N} > m^{∙})} = \frac{{\ddot{p}}_{n}}{1 - {\ddot{p}}_{m}} \frac{1 - {\ddot{p}}_{m}}{1 - \sum_{h = m}^{m^{∙}} {\ddot{p}}_{h}} \\ = {\ddot{p}}_{n, m} \frac{1 - {\ddot{p}}_{m}}{1 - \sum_{h = m}^{m^{∙}} {\ddot{p}}_{h}}, n = m^{∙} + 1, \dots, \end{aligned}

where

{\ddot{p}}_{n, m} = \frac{{\ddot{p}}_{n}}{1 - {\ddot{p}}_{m}}, n = m^{∙} + 1, \dots,

(8)

as defined in (A1), Appendix 1, which is dedicated to the truncated Poisson distribution.

Consequently, the unconditional distribution of $F_{j}$ , $j = z + 1, \dots, ℓ,$ when the hypothesis $H_{0, j}$ holds, will be given by, e.g. [12,20],

\begin{aligned} {\bar{\bar{F}}}_{j} (z) & = \sum_{n = m^{∙} + 1}^{\infty} p r (\ddot{N} = n | \ddot{N} > m^{∙}) F (z | f_{j}, g (n)) \\ = \sum_{n = m^{∙} + 1}^{\infty} {\ddot{p}}_{n, m^{∙}} F (z | f_{j}, g (n)), j = z + 1, \dots, ℓ . \end{aligned}

(9)

4. An application to real data

In this section, we apply the proposed methodology to a dataset from patients affected by cancer. The data was collected from the U.S. Cancer Statistics Working Group [24] according to official guidelines and refer to the age of disease detection in 2009. We compare the results obtained using our approach and the common ANOVA.

We will consider a mixed model with one fixed and one random effects factors. The fixed effects factor will be the Gender, with two levels (Male and Female). Due to the large number of cancer types we resorted to the simple random sampling method to select three different types of cancer from the available list. Thus the random effects factor will be the Type of Cancer and the selected types constitute a random sample.

Table 1 illustrates the types of cancer which have been selected, the number of patients and the mean ages at the time of disease detection. This leads to $m = 2 \times 3 = 6$ different treatments. The global frequencies of these three types of cancer, for males and females, are provided in Appendix 2.

Table 1. Number of patients and sample mean ages.

	Number of patients		Sample means
Type of cancer	Male	Female	Male	Female
Stomach (digestive system)	44	30	70.523	68.833
Melanomas of the skin	134	99	63.791	57.303
Non-Hodgkin lymphoma	123	105	63.382	66.286

Open in a new tab

According to (3), in this particular example we have

Y^{o} = X_{0} β_{0} + X_{1} β_{1} + X_{2} β_{2},

(10)

where $β_{0}$ is fixed and $β_{1}$ and $β_{2}$ are random, independent, corresponding, respectively, to the random effects factor (Type of cancer) and interaction between the two factors. We have the design matrices

\begin{aligned} X_{0} & = I_{2} \otimes 1_{3} \\ X_{1} & = 1_{2} \otimes I_{3} \\ X_{2} & = I_{2} \otimes I_{3}, \end{aligned}

where ⊗ denotes the Kronecker product, and

\begin{aligned} M_{1} & = J_{2} \otimes I_{3} \\ M_{2} & = I_{2} \otimes I_{3} . \end{aligned}

Let's assume that

\begin{aligned} M_{1} & = 2 K_{1} \\ M_{2} & = K_{1} + K_{2}, \end{aligned}

which means that

\begin{aligned} K_{1} & = \frac{1}{2} M_{1} = \frac{1}{2} J_{2} \otimes I_{3} \\ K_{2} & = M_{2} - \frac{1}{2} M_{1} = (I_{2} - \frac{1}{2} J_{2}) \otimes I_{3} \end{aligned}

and consequently the matrices $A_{j},$ j=1,2, will be given by

\begin{aligned} A_{1} & = [\begin{matrix} \frac{1}{\sqrt{2}} & 0 & 0 & \frac{1}{\sqrt{2}} & 0 & 0 \\ 0 & \frac{1}{\sqrt{2}} & 0 & 0 & \frac{1}{\sqrt{2}} & 0 \\ 0 & 0 & \frac{1}{\sqrt{2}} & 0 & 0 & \frac{1}{\sqrt{2}} \end{matrix}] \\ A_{2} & = [\begin{matrix} - \frac{1}{\sqrt{2}} & 0 & 0 & \frac{1}{\sqrt{2}} & 0 & 0 \\ 0 & - \frac{1}{\sqrt{2}} & 0 & 0 & \frac{1}{\sqrt{2}} & 0 \\ 0 & 0 & - \frac{1}{\sqrt{2}} & 0 & 0 & \frac{1}{\sqrt{2}} \end{matrix}] \end{aligned}

and

\begin{aligned} A_{1} X_{0} & = \frac{1}{\sqrt{2}} 1_{2}^{'} \otimes 1_{3} \\ A_{2} X_{0} & = [- \frac{1}{\sqrt{2}} \frac{1}{\sqrt{2}}] \otimes 1_{3} . \end{aligned}

The matrices $Q_{j}$ , j=1,2, which are the OPM on $R (A_{j} X_{0})^{⊥}$ , j=1,2, will be given by

\begin{aligned} Q_{1} & = W_{1}^{'} W_{1} = I_{3} - \frac{1}{3} J_{3} \\ Q_{2} & = W_{2}^{'} W_{2} = I_{3} - \frac{1}{3} J_{3}, \end{aligned}

with $J_{r} = 1_{r} 1_{r}^{'}$ and

W_{1} = W_{2} = [\begin{matrix} - \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{6}} & - \frac{2}{\sqrt{6}} & \frac{1}{\sqrt{6}} \end{matrix}] .

Moreover, $f_{1} = rank (Q_{1}) = 3$ and $f_{2} = rank (Q_{2}) = 3$ . Besides this, the OPM on $R (A_{j} X_{0})$ , j=1,2, are

\begin{aligned} P_{1} & = A_{1} X_{0} (A_{1} X_{0})^{+} = [\begin{matrix} \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \end{matrix}] \\ P_{2} & = A_{2} X_{0} (A_{2} X_{0})^{+} = [\begin{matrix} \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \end{matrix}] . \end{aligned}

We will test the hypotheses

H_{0, j} : γ_{j} = 0, j = 1, 2,

which are the hypotheses of absence of random effects and interaction between the two factors.

Given $\ddot{N} = n$ , when $H_{0, j}$ , j=1,2 holds, the conditional distribution of

F_{j} = \frac{g (n)}{3} \frac{(Y_{j}^{∙})^{'} (B_{j}^{- 1}) Y_{j}^{∙}}{S}, j = 1, 2

is a central F distribution with $f_{j} = rank (Q_{j}) = 3$ , j=1,2, and $g (n) = n - 6$ degrees of freedom, $F (\cdot | 3, n - 6)$ .

In the calculations, we assume that

\sum_{n = 0}^{m^{∙}} {\ddot{p}}_{n, m^{∙}} ≃ 0,

which means that, with high probability, we have $\ddot{N} > m^{∙}$ , so $m^{∙} + 1$ is the global minimum dimension for the samples. Therefore the unconditional distribution of the statistics will be given by

{\bar{\bar{F}}}_{j} (z) = \sum_{n = m^{∙} + 1}^{\infty} {\ddot{p}}_{n, m^{∙}} F (z | 3, n - 6), j = 1, 2.

(11)

Besides this, due to the monotony property of the F distribution [12], when $n < n^{o}$ , we have

F (z | 3, n - 6) < F (z | 3, n^{o} - 6),

(12)

so that

F (z | 3, m^{∙} + 1 - 6) \leq {\bar{\bar{F}}}_{j} (z) \leq 1

which gives us a lower bound for ${\bar{\bar{F}}}_{j} (z)$ . Thus, from $F (z | 3, m^{∙} - 5)$ , we can obtain upper bounds for the quantiles of the unconditional distributions ${\bar{\bar{F}}}_{j} (z)$ , j=1,2. If we use these upper bounds as critical values, we will have tests with sizes that do not exceed the theoretical values.

Remark

We can use these upper bounds for a preliminary test. If the test statistic exceeds the upper bound it also exceeds the real critical value (obtained when using the unconditional distribution). For the cases when the test statistic is lower than the upper bound one must compute the critical value solving the equation ${\bar{\bar{F}}}_{j} (z) = 1 - α$ , for z, j=1,2. To solve it we may truncate the series in Equation (11) according to the rule established in [11,19]. This way, restricting the sum to the term $\bar{m} = \sum_{i = 1}^{m} {\bar{m}}_{i}$ , with $n_{i} \leq {\bar{m}}_{i}$ , where $n_{i}$ are the realizations of the ${\ddot{N}}_{i}$ , $i = 1, \dots m$ , we will have
${\bar{\bar{F}}}_{j, \bar{m}} (z) = \sum_{n = m^{∙} + 1}^{\bar{m}} {\ddot{p}}_{n, m^{∙}} F (z | 3, n - 6), i = 1, 2.$
Considering ε small, we choose each ${\bar{m}}_{i}$ such that
$\sum_{n_{i} = 0}^{{\bar{m}}_{i}} e^{- λ_{i}} \frac{λ_{i}^{n_{i}}}{n_{i}!} > 1 - ϵ \Leftrightarrow ϵ > 1 - \sum_{n_{i} = 0}^{{\bar{m}}_{i}} e^{- λ_{i}} \frac{λ_{i}^{n_{i}}}{n_{i}!}, i = 1, \dots, m .$ (13)
This inequality will be used to obtain the minimum value of $\bar{m}$ needed to ${\bar{\bar{F}}}_{j, \bar{m}} (z)$ be a good approximation for the distribution ${\bar{\bar{F}}}_{j} (z)$ , i=1,2, [11].

Usually the analysis starts with a test of interaction and follows with the tests to the main effects whenever it is not significant. We do not follow this approach since we are interested in showing how these tests could be carried out through unconditioning [20].

4.1. Random effects factor

For the second factor, we have

Y_{1}^{∙} = W_{1} A_{1} L^{+} Y = [\begin{matrix} 1.1255 \\ - 1.8846 \end{matrix}],

where

L^{+} = D (\frac{1}{44} 1_{44}^{'}, \frac{1}{30} 1_{30}^{'}, \frac{1}{134} 1_{134}^{'}, \frac{1}{99} 1_{99}^{'}, \frac{1}{123} 1_{123}^{'}, \frac{1}{105} 1_{105}^{'}),

with $L^{+} Y$ the vector of the sample means with components 70.523, 68.833, 63.791, 57.303, 63.382, 66.286 and

B_{1} = W_{1} A_{1} L^{+} (L^{+})^{'} A_{1}^{'} W_{1}^{'} = [\begin{matrix} 0.012453695 & - 0.002286565 \\ - 0.002286565 & 0.017972370 \end{matrix}]

So, for the numerator of the statistic $F_{1}$ we obtain

(Y_{1}^{∙})^{'} (B_{1}^{- 1}) Y_{1}^{∙} = 262.120.

When $\ddot{N} = n$ , $S = ‖ Y_{{\bar{Ω}}^{⊥}} ‖$ is the product by $σ^{2}$ of a central chi-square with $g (n) = n - 6$ degrees of freedom, $σ^{2} χ_{n - 6}^{2}$ . In this case, we obtained $S = 131250.672.$

Therefore, the statistic's value, $F_{1, O b s}$ , is given by

F_{1, O b s} = \frac{529}{3} \frac{262.120}{131250.672} = 0.352.

If we use the common conditional distribution of $F_{1}$ , which corresponds to $F (z | 3, 529)$ , since n=535, we will obtain the quantiles given in Table 2.

Table 2. The quantiles of the conditional distribution.

Values of α	0.1	0.05	0.01
$z_{1 - α}$	2.094	2.622	3.819

Open in a new tab

So, since $F_{1, O b s} < z_{1 - α}$ , we do not reject $H_{0, 1}$ for the usual levels of significance.

Let's assume that we have 12 [16 and 19] observations as global minimum dimensions for the samples, which means that we consider $m^{∙} + 1 = 12 \Leftrightarrow m^{∙} = 11$ [ $m^{∙} = 15$ and $m^{∙} = 18$ ]. Table 3 shows the upper bounds for the quantiles with probability $1 - α$ , $z_{1 - α}^{u}$ , of the unconditional distribution ${\bar{\bar{F}}}_{1} (z)$ .

Table 3. Upper bounds for the quantiles.

	Values of α	0.1	0.05	0.01
	$m^{∙} = 11$	3.289	4.757	9.779
$z_{1 - α}^{u}$	$m^{∙} = 15$	2.728	3.708	6.552
	$m^{∙} = 18$	2.560	3.410	5.739

Open in a new tab

It is to be expected that the quantiles for random sample sizes (obtained when using the unconditional distribution) to exceed the classical ones (obtained when using common conditional distribution), since the first ones take into account a new source of variation. Then, since in this case we do not reject the hypothesis using the classical quantiles the same result is expected when using the quantiles for random sample sizes and consequently the upper bound approach. This interpretation leads us to not reject $H_{0, 1}$ .

The quantiles for the unconditional distribution are approximated by truncation of the infinite series indicated in Equation (11). We obtained the minimum value $\bar{m} = 38$ for a truncation error not greater than $10^{- 8}$ ( $ϵ \leq 10^{- 8}$ ). To carry out the computation, we assumed that $λ_{i}$ , $i = 1, \dots, 6$ , are the daily average of occurrences per year. So we have $λ_{1} = 0.13, λ_{2} = 0.09, λ_{3} = 0.37, λ_{4} = 0.28, λ_{5} = 0.34, λ_{6} = 0.29$ .

The obtained quantiles with probability $1 - α$ , $z_{1 - α}^{t}$ , of the truncated unconditional distribution

{\bar{\bar{F}}}_{1, \bar{m}} (z) = \sum_{n = m^{∙} + 1}^{38} {\ddot{p}}_{n, m^{∙}} F (z | 3, n - 6)

(14)

are presented in Table 4.

Table 4. The quantiles of the truncated unconditional distribution.

	Values of α	0.1	0.05	0.01
	$m^{∙} = 11$	3.255	4.693	9.583
$z_{1 - α}^{t}$	$m^{∙} = 15$	2.720	3.695	6.518
	$m^{∙} = 18$	2.555	3.402	5.722

Open in a new tab

Results in Table 4 agree with those in Table 3, i.e. $H_{0, 1}$ is not rejected therefore the random factor is not significant.

4.2. Interaction

For the interaction between the fixed factor and the random one, we have

Y_{2}^{∙} = W_{2} A_{2} L^{+} Y = [\begin{matrix} 7.8572 \\ 0.0512 \end{matrix}]

and

B_{2} = W_{2} A_{2} L^{+} (L^{+})^{'} A_{2}^{'} W_{2}^{'} = [\begin{matrix} 0.012453695 & - 0.002286565 \\ - 0.002286565 & 0.017972370 \end{matrix}] .

For the numerator of the statistic $F_{2}$ , we obtain

(Y_{2}^{∙})^{'} (B_{2}^{- 1}) Y_{2}^{∙} = 5084.346.

Therefore, the statistic's value, $F_{2, O b s}$ , is given by

F_{2, O b s} = \frac{529}{3} \frac{5084.346}{131250.672} = 6.831.

If we use the common conditional distribution of $F_{2}$ , which corresponds to $F (z | 3, 529)$ , we obtain the quantiles given in Table 2. Since $F_{2, O b s} > z_{1 - α}$ , we reject $H_{0, 2}$ for the usual levels of significance.

Considering the truncated unconditional distribution, ${\bar{\bar{F}}}_{2, \bar{m}}$ , which correspond to ${\bar{\bar{F}}}_{1, \bar{m}}$ defined in (14), we obtained the quantiles, $z_{1 - α}^{t}$ , given in Table 4. The results in this table lead us to:

reject $H_{0, 2}$ for $α = 0.1$ and 0.05 and do not reject for $α = 0.01$ , considering $m^{∙} + 1 = 12$ ;
reject $H_{0, 2}$ for the usual level of significance, considering $m^{∙} + 1 = 16$ or 19.

Table 3 shows the upper bounds for the quantiles with probability $1 - α$ , $z_{1 - α}^{u}$ , of the unconditional distribution. These results agree with those based on the quantiles of the truncated unconditional distribution. Assuming the values of the test statistic remain unchanged, then we should have the total sample sizes presented in Table 5 for ensuring rejection.

Table 5. Minimum value $m^{∙}$ that leads to reject the hypothesis $H_{0, 2}$ .

Values of $1 - α$	0.1	0.05	0.01
$m^{∙}$	8	9	15

Open in a new tab

Since for higher values of $m^{∙}$ we would get lower values for the quantiles, we have $F_{O b s, 2} > z_{1 - α}^{u}$ for all $m^{∙} \geq 15$ . In this case, we reject $H_{0, 2}$ considering the usual levels of significance, which means that the interaction between factors is significant.

4.3. Conclusion

Our discussion shows the relevance of the unconditional approach in avoiding false rejections. As we saw, the inference results for some situations depends on the approach. Since the unconditional approach is more secure, when testing the interaction the null hypothesis is not rejected when $m^{∙} = 11$ and $α = 0.01$ , whereas the common conditional approach would lead to a false rejection.

The results in Tables 3 and 4 show that for higher minimum sample sizes, we get smaller upper bounds and quantiles of the unconditional distribution. Due to this, we may conclude that with the increase of the minimum sample sizes, the decision based on both approaches is similar.

To finish we would like to note that all the computations were performed using the R software.

5. Final remarks

The approach followed in this paper is more realistic than the usual F tests for the situations where it is not possible to known in advance the sample sizes. To do that, we have to make assumptions regarding the distribution of the sample sizes based on previous knowledge of the sample collection and incorporate this source of variation into the mixed model. We choose the Poisson distribution since it would correspond to Poisson processes for observation collection and the underlying assumption for these (independent and stable increments and not clustering) seems realist. Moreover, the L extensions fit easily in the assumption of random sample sizes. These model formulation have been used to solve the unbalance originated by different number of observations per treatment, which cause non-orthogonality in fixed and mixed effects models. We included an application with cancer data to illustrate how straightforward it is to apply our approach in a medical context. The comparative results show that when random sample sizes are considered the critical values may exceed those of classical ANOVA (obtained when using the common F conditional distribution). So, we can conclude that this approach avoids working with incorrect critical values and thus carrying out tests without the proper level. We would like also to highlight that our methodology is not restricted to the medical domain and yet may be applied to several other research areas.

Acknowledgments

The authors would like to thank the anonymous referees for useful comments and suggestions.

Appendices.

Appendix 1. Truncated Poisson distributions

This appendix presents some results about the truncated Poisson distribution, which are useful in obtaining the unconditional distribution of the test statistics.

Since we need to have at least one observation per treatment, we will consider the common form of truncated Poisson distribution, which corresponds to the omission of the zero class, e.g. [7]. So we have $N_{i} \geq 1, i = 1, \dots, m$ . To perform inference, we also consider that $N > m,$ where $N = \sum_{i = 1}^{m} N_{i} .$

As previously mentioned, we assumed that $N_{i} \sim P (λ_{i})$ , $i = 1, \dots, m$ and $N \sim P (λ)$ . So we have

p_{r, i} = p r (N_{i} = r | N_{i} \geq 1) = \frac{p r (N_{i} = r)}{p r (N_{i} \geq 1)} = \frac{e^{- λ_{i}} λ_{i}^{r} / r!}{1 - e^{- λ_{i}}} = \frac{e^{- λ_{i}}}{1 - e^{- λ_{i}}} \frac{λ_{i}^{r}}{r!}, r \geq 1, i = 1, \dots, m .

Therefore, the moment generating function of $N_{i}$ , when $N_{i} \geq 1$ , $i = 1, \dots, m$ , will be

φ_{i} (u) = \sum_{r = 1}^{\infty} \frac{e^{- λ_{i}}}{1 - e^{- λ_{i}}} \frac{λ_{i}^{r} e^{r u}}{r!} = \frac{e^{- λ_{i}}}{1 - e^{- λ_{i}}} (e^{λ_{i} e^{u}} - 1), i = 1, \dots, m,

and the probability generating functions

χ_{i} (z) = φ_{i} (\ln z) = \frac{e^{- λ_{i}}}{1 - e^{- λ_{i}}} (e^{λ_{i} z} - 1), i = 1, \dots, m .

With ${\ddot{N}}_{i}$ , $i = 1, \dots, m$ , the truncated variables $N_{i}$ , $i = 1, \dots, m$ , when $N_{i} \geq 1$ , and considering

\ddot{N} = \sum_{i = 1}^{m} {\ddot{N}}_{i},

we will obtain the probability generating function

\begin{aligned} \ddot{χ} (z) & = \prod_{i = 1}^{m} χ_{i} (z) = (\prod_{i = 1}^{m} \frac{e^{- λ_{i}}}{1 - e^{- λ_{i}}}) \prod_{i = 1}^{m} (e^{λ_{i} z} - 1) \\ = (\prod_{i = 1}^{m} \frac{e^{- λ_{i}}}{1 - e^{- λ_{i}}}) \sum_{C \subseteq \bar{\bar{m}}} (- 1)^{m - ♯ (C)} e^{(\sum_{i \in C} λ_{i}) z}, i = 1, \dots, m, \end{aligned}

where $\bar{\bar{m}} = {1, \dots, m}$ and $♯ (C)$ denotes the cardinal of $C$ , any subset of $\bar{\bar{m}}$ .

Therefore we will have

{\ddot{p}}_{r} = p r (\ddot{N} = r) = \frac{1}{r!} (\prod_{i = 1}^{m} \frac{e^{- λ_{i}}}{1 - e^{- λ_{i}}}) \sum_{C \subseteq \bar{\bar{m}}} (- 1)^{m - ♯ (C)} {(\sum_{i \in C} λ_{i})}^{r}, r = m, \dots .

It is interesting to observe that we have

{\ddot{χ}}^{⟨ s ⟩} (0) = 0, s = 1, \dots, m - 1,

where $⟨ s ⟩$ denotes the derivative of order s, which results from

j_{1} + \dots + j_{m} = s; s = 1, \dots, m - 1,

$j = (j_{1}, \dots, j_{m})^{'}$ have one or more null components and $χ_{i} (0) = 0$ , $i = 1, \dots, m .$

Indeed, with $P_{s}^{(m)}$ the family of partitions of s with cardinal m, we have

{\ddot{χ}}^{⟨ s ⟩} (0) = \sum_{j \in P_{s}^{(m)}} \frac{(\sum_{i = 1}^{m} j_{i})!}{\prod_{i = 1}^{m} j_{i}!} \prod_{i = 1}^{m} χ_{i}^{< j_{i} >} (0), s = 1, \dots

and, if s<m, whatever $j \in P_{s}^{(m)}$ ,

\prod_{i = 1}^{m} χ_{i}^{⟨ j_{i} ⟩} (0) = 0

since $j$ has at least one null component. So, since ${\ddot{χ}}^{< s >} (0) = s! {\ddot{p}}_{s}$ , we obtain

{\ddot{p}}_{s} = \frac{1}{s!} {\ddot{χ}}^{< s >} (0) = 0, s \leq m - 1.

Furthermore, the only non-null term of ${\ddot{χ}}^{< m >} (0)$ corresponds to $j = 1_{m}$ , so

{\ddot{p}}_{m} = p r (\ddot{N} = m) = \frac{1}{m!} {\ddot{χ}}^{< m >} (0) = \prod_{i = 1}^{m} χ_{i}^{< 1 >} (0) = \prod_{i = 1}^{m} \frac{e^{- λ_{i}} λ_{i}}{1 - e^{- λ_{i}}}

and

p r (\ddot{N} > m) = 1 - {\ddot{p}}_{m} = 1 - \prod_{i = 1}^{m} \frac{e^{- λ_{i}} λ_{i}}{1 - e^{- λ_{i}}} .

Considering $\ddot{N}$ the random vector with components ${\ddot{N}}_{1}, \dots, {\ddot{N}}_{m}$ , we have $\ddot{N} > m$ , which means there exists at least one ${\ddot{N}}_{i} > 1, i = 1, \dots, m,$ if and only if $\ddot{N} > 1_{m}$ so

p r (\ddot{N} > 1_{m}) = 1 - {\ddot{p}}_{m} .

We also have

{\ddot{p}}_{r, m} = p r (\ddot{N} = r | \ddot{N} > m) = \frac{{\ddot{p}}_{r}}{1 - {\ddot{p}}_{m}}, r = m + 1, \dots .

(A1)

Appendix 2. Frequency tables of types of cancer

Table A1. Males with stomach (digestive system) cancer.

Age	1–4	5–9	10–14	15–19	20–24	25–29	30–34	35–39	40–44
Mean age	2	7	12	17	22	27	32	37	42
Patients	0	0	0	0	0	0	0	0	1
Age	45–49	50–54	55–59	60–64	65–69	70–74	75–79	80–84	85 $+$
Mean age	47	52	57	62	67	72	77	82	87
Patients	1	2	4	5	6	7	7	6	5

Open in a new tab

Table A2. Females with stomach (digestive system) cancer.

Age	1–4	5–9	10–14	15–19	20–24	25–29	30–34	35–39	40–44
Mean age	2	7	12	17	22	27	32	37	42
Patients	0	0	0	0	0	0	0	1	1
Age	45–49	50–54	55–59	60–64	65–69	70–74	75–79	80–84	85 $+$
Mean age	47	52	57	62	67	72	77	82	87
Patients	2	2	2	3	3	3	4	4	5

Open in a new tab

Table A3. Males with melanomas of the skin.

Age	1–4	5–9	10–14	15–19	20–24	25–29	30–34	35–39	40–44
Mean age	2	7	12	17	22	27	32	37	42
Patients	0	0	0	0	1	2	2	4	6
Age	45–49	50–54	55–59	60–64	65–69	70–74	75–79	80–84	85 $+$
Mean age	47	52	57	62	67	72	77	82	87
Patients	8	12	14	17	16	16	14	12	10

Open in a new tab

Table A4. Females with melanomas of the skin.

Age	1–4	5–9	10–14	15–19	20–24	25–29	30–34	35–39	40–44
Mean age	2	7	12	17	22	27	32	37	42
Patients	0	0	0	1	2	4	4	6	7
Age	45–49	50–54	55–59	60–64	65–69	70–74	75–79	80–84	85 $+$
Mean age	47	52	57	62	67	72	77	82	87
Patients	10	10	10	10	8	7	7	6	7

Open in a new tab

Table A5. Males with non-Hodgkin lymphoma.

Age	1–4	5–9	10–14	15–19	20–24	25–29	30–34	35–39	40–44
Mean age	2	7	12	17	22	27	32	37	42
Patients	0	0	1	1	1	2	2	3	5
Age	45–49	50–54	55–59	60–64	65–69	70–74	75–79	80–84	85 $+$
Mean age	47	52	57	62	67	72	77	82	87
Patients	8	10	12	14	15	14	14	12	9

Open in a new tab

Table A6. Males with non-Hodgkin lymphoma.

Age	1–4	5–9	10–14	15–19	20–24	25–29	30–34	35–39	40–44
Mean age	2	7	12	17	22	27	32	37	42
Patients	0	0	0	1	1	1	2	2	3
Age	45–49	50–54	55–59	60–64	65–69	70–74	75–79	80–84	85 $+$
Mean age	47	52	57	62	67	72	77	82	87
Patients	5	7	9	11	13	13	13	12	12

Open in a new tab

Funding Statement

This work was partially supported by the FCT- Fundação para a Ciência e Tecnologia, under the projects UID/MAT/00212/2019 and UID/MAT/00297/2019.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

1.Bailey R.A., Ferreira S.S., Ferreira D., and Nunes C., Estimability of variance components when all model matrices commute, Linear Algebra Appl. 492 (2016), pp. 144–160. doi: 10.1016/j.laa.2015.11.002 [DOI] [Google Scholar]
2.Carvalho F., Mexia J.T., Santos C., and Nunes C., Inference for types and structured families of commutative orthogonal block structures, Metrika 78 (2015), pp. 337–372. doi: 10.1007/s00184-014-0506-8 [DOI] [Google Scholar]
3.Ferreira S., Ferreira D., Moreira E., and Mexia J.T., Inference for L orthogonal models, J. Interdiscip. Math. 12 (2009), pp. 815–824. doi: 10.1080/09720502.2009.10700666 [DOI] [Google Scholar]
4.Ferreira S.S., Ferreira D., Nunes C., and Mexia J.T., Estimation of variance components in linear mixed models with commutative orthogonal block structure, Rev. Colomb. Estadist. 36 (2013), pp. 261–271. [Google Scholar]
5.Heinzl F. and Tutz G., Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm, Stat. Modell. 13 (2013), pp. 41–67. doi: 10.1177/1471082X12471372 [DOI] [Google Scholar]
6.Houtman A.M. and Speed T.P., Balance in designed experiments with orthogonal block structure, Ann. Statist. 11 (1983), pp. 1069–1085. doi: 10.1214/aos/1176346322 [DOI] [Google Scholar]
7.Johnson N.L. and Kotz S., Discrete Distributions, John Wiley & Sons, New York, 1969. [Google Scholar]
8.Khuri A.I., Mathew T., and Sinha B.K., Statistical Tests for Mixed Linear Models, John Wiley & Sons, New York, 1998. [Google Scholar]
9.Lehmann E.L., Testing Statistical Hypotheses, John Wiley & Sons, New York, 1959. [Google Scholar]
10.Mexia J.T., Best linear unbiased estimates, duality of F tests and the Scheffé multiple comparison method in presence of controlled heterocedasticity, Comput. Stat. Data Anal. 10 (1990), pp. 271–281. doi: 10.1016/0167-9473(90)90007-5 [DOI] [Google Scholar]
11.Mexia J.T. and Moreira E., Randomized sample size F tests for the one-way layout. 8th International Conference on Numerical Analysis and Applied Mathematics 2010. AIP Conf. Proc. 1281(II), 2010, pp. 1248–1251.
12.Mexia J.T., Nunes C., Ferreira D., Ferreira S.S., and Moreira E., Orthogonal fixed effects ANOVA with random sample sizes, Proceedings of the 5th International Conference on Applied Mathematics, Simulation, Modelling (ASM'11), 2011, pp. 84–90
13.Michalski A. and Zmyślony R., Testing hypothesis for variance components in mixed linear models, Statistics 27 (1996), pp. 297–310. doi: 10.1080/02331889708802533 [DOI] [Google Scholar]
14.Moreira E., Mexia J.T., Fonseca M., and Zmyślony R., L models and multiple regressions designs, Statist. Papers 50 (2009), pp. 869–885. doi: 10.1007/s00362-009-0255-3 [DOI] [Google Scholar]
15.Moreira E.E., Mexia J.T., and Minder C.E., F tests with random sample size. Theory and applications, Stat. Probab. Lett. 83 (2013), pp. 1520–1526. doi: 10.1016/j.spl.2013.02.020 [DOI] [Google Scholar]
16.Nunes C., Capistrano G., Ferreira D., Ferreira S.S., and Mexia J.T., One-way fixed effects ANOVA with missing observations, Proceedings of the 12th International Conference on Numerical Analysis and Applied Mathematics, AIP Conf. Proc. 1648, 2015, p. 110008.
17.Nunes C., Capristano G., Ferreira D., Ferreira S.S., and Mexia J.T., Exact critical values for one-way fixed effects models with random sample sizes, J. Comput. Appl. Math. 354 (2019), pp. 112–122. doi: 10.1016/j.cam.2018.05.057. [DOI] [Google Scholar]
18.Nunes C., Ferreira D., Ferreira S.S., and Mexia J.T., F Tests with Random Sample Sizes. 8th International Conference on Numerical Analysis and Applied Mathematics. AIP Conf. Proc. 1281(II), 2010, pp. 1241–1244
19.Nunes C., Ferreira D., Ferreira S.S., and Mexia J.T., F-tests with a rare pathology, J. Appl. Stat. 39 (2012), pp. 551–561. doi: 10.1080/02664763.2011.603293 [DOI] [Google Scholar]
20.Nunes C., Ferreira D., Ferreira S.S., and Mexia J.T., Fixed effects ANOVA: An extension to samples with random size, J. Stat. Comput. Simul. 84 (2014), pp. 2316–2328. doi: 10.1080/00949655.2013.791293 [DOI] [Google Scholar]
21.Scheffé H., The Analysis of Variance, Wiley Series in Probability and Statistics, John Wiley & Sons, New York, 1959. [Google Scholar]
22.Schott J.R., Matrix Analysis for Statistics, John Wiley & Sons, New York, 1997. [Google Scholar]
23.Searle S.R., Casella G., and McCulloch C.E., Variance Components, Wiley Series in Probability and Statistics, John Wiley & Sons, New York, 1992. [Google Scholar]
24.U.S. Cancer Statistics Working Group , United States Cancer Statistics: 1999–2010 Incidence and Mortality Web-based Report. Atlanta: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute, 2013. Available at https://nccd.cdc.gov/uscs/.

[CIT0001] 1.Bailey R.A., Ferreira S.S., Ferreira D., and Nunes C., Estimability of variance components when all model matrices commute, Linear Algebra Appl. 492 (2016), pp. 144–160. doi: 10.1016/j.laa.2015.11.002 [DOI] [Google Scholar]

[CIT0002] 2.Carvalho F., Mexia J.T., Santos C., and Nunes C., Inference for types and structured families of commutative orthogonal block structures, Metrika 78 (2015), pp. 337–372. doi: 10.1007/s00184-014-0506-8 [DOI] [Google Scholar]

[CIT0003] 3.Ferreira S., Ferreira D., Moreira E., and Mexia J.T., Inference for L orthogonal models, J. Interdiscip. Math. 12 (2009), pp. 815–824. doi: 10.1080/09720502.2009.10700666 [DOI] [Google Scholar]

[CIT0004] 4.Ferreira S.S., Ferreira D., Nunes C., and Mexia J.T., Estimation of variance components in linear mixed models with commutative orthogonal block structure, Rev. Colomb. Estadist. 36 (2013), pp. 261–271. [Google Scholar]

[CIT0005] 5.Heinzl F. and Tutz G., Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm, Stat. Modell. 13 (2013), pp. 41–67. doi: 10.1177/1471082X12471372 [DOI] [Google Scholar]

[CIT0006] 6.Houtman A.M. and Speed T.P., Balance in designed experiments with orthogonal block structure, Ann. Statist. 11 (1983), pp. 1069–1085. doi: 10.1214/aos/1176346322 [DOI] [Google Scholar]

[CIT0007] 7.Johnson N.L. and Kotz S., Discrete Distributions, John Wiley & Sons, New York, 1969. [Google Scholar]

[CIT0008] 8.Khuri A.I., Mathew T., and Sinha B.K., Statistical Tests for Mixed Linear Models, John Wiley & Sons, New York, 1998. [Google Scholar]

[CIT0009] 9.Lehmann E.L., Testing Statistical Hypotheses, John Wiley & Sons, New York, 1959. [Google Scholar]

[CIT0010] 10.Mexia J.T., Best linear unbiased estimates, duality of F tests and the Scheffé multiple comparison method in presence of controlled heterocedasticity, Comput. Stat. Data Anal. 10 (1990), pp. 271–281. doi: 10.1016/0167-9473(90)90007-5 [DOI] [Google Scholar]

[CIT0011] 11.Mexia J.T. and Moreira E., Randomized sample size F tests for the one-way layout. 8th International Conference on Numerical Analysis and Applied Mathematics 2010. AIP Conf. Proc. 1281(II), 2010, pp. 1248–1251.

[CIT0012] 12.Mexia J.T., Nunes C., Ferreira D., Ferreira S.S., and Moreira E., Orthogonal fixed effects ANOVA with random sample sizes, Proceedings of the 5th International Conference on Applied Mathematics, Simulation, Modelling (ASM'11), 2011, pp. 84–90

[CIT0013] 13.Michalski A. and Zmyślony R., Testing hypothesis for variance components in mixed linear models, Statistics 27 (1996), pp. 297–310. doi: 10.1080/02331889708802533 [DOI] [Google Scholar]

[CIT0014] 14.Moreira E., Mexia J.T., Fonseca M., and Zmyślony R., L models and multiple regressions designs, Statist. Papers 50 (2009), pp. 869–885. doi: 10.1007/s00362-009-0255-3 [DOI] [Google Scholar]

[CIT0015] 15.Moreira E.E., Mexia J.T., and Minder C.E., F tests with random sample size. Theory and applications, Stat. Probab. Lett. 83 (2013), pp. 1520–1526. doi: 10.1016/j.spl.2013.02.020 [DOI] [Google Scholar]

[CIT0016] 16.Nunes C., Capistrano G., Ferreira D., Ferreira S.S., and Mexia J.T., One-way fixed effects ANOVA with missing observations, Proceedings of the 12th International Conference on Numerical Analysis and Applied Mathematics, AIP Conf. Proc. 1648, 2015, p. 110008.

[CIT0017] 17.Nunes C., Capristano G., Ferreira D., Ferreira S.S., and Mexia J.T., Exact critical values for one-way fixed effects models with random sample sizes, J. Comput. Appl. Math. 354 (2019), pp. 112–122. doi: 10.1016/j.cam.2018.05.057. [DOI] [Google Scholar]

[CIT0018] 18.Nunes C., Ferreira D., Ferreira S.S., and Mexia J.T., F Tests with Random Sample Sizes. 8th International Conference on Numerical Analysis and Applied Mathematics. AIP Conf. Proc. 1281(II), 2010, pp. 1241–1244

[CIT0019] 19.Nunes C., Ferreira D., Ferreira S.S., and Mexia J.T., F-tests with a rare pathology, J. Appl. Stat. 39 (2012), pp. 551–561. doi: 10.1080/02664763.2011.603293 [DOI] [Google Scholar]

[CIT0020] 20.Nunes C., Ferreira D., Ferreira S.S., and Mexia J.T., Fixed effects ANOVA: An extension to samples with random size, J. Stat. Comput. Simul. 84 (2014), pp. 2316–2328. doi: 10.1080/00949655.2013.791293 [DOI] [Google Scholar]

[CIT0021] 21.Scheffé H., The Analysis of Variance, Wiley Series in Probability and Statistics, John Wiley & Sons, New York, 1959. [Google Scholar]

[CIT0022] 22.Schott J.R., Matrix Analysis for Statistics, John Wiley & Sons, New York, 1997. [Google Scholar]

[CIT0023] 23.Searle S.R., Casella G., and McCulloch C.E., Variance Components, Wiley Series in Probability and Statistics, John Wiley & Sons, New York, 1992. [Google Scholar]

[CIT0024] 24.U.S. Cancer Statistics Working Group , United States Cancer Statistics: 1999–2010 Incidence and Mortality Web-based Report. Atlanta: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute, 2013. Available at https://nccd.cdc.gov/uscs/.

PERMALINK

Considering the sample sizes as truncated Poisson random variables in mixed effects models

Célia Nunes

Elsa Moreira

Sandra S Ferreira

Dário Ferreira

João T Mexia

Abstract

1. Introduction

2. Model

3. Test statistics and their distributions

3.1. Fixed sample sizes

3.2. Random sample sizes

4. An application to real data

Table 1. Number of patients and sample mean ages.

Remark

4.1. Random effects factor

Table 2. The quantiles of the conditional distribution.

Table 3. Upper bounds for the quantiles.

Table 4. The quantiles of the truncated unconditional distribution.

4.2. Interaction

Table 5. Minimum value m∙ that leads to reject the hypothesis H0,2.

4.3. Conclusion

5. Final remarks

Acknowledgments

Appendices.

Appendix 1. Truncated Poisson distributions

Appendix 2. Frequency tables of types of cancer

Table A1. Males with stomach (digestive system) cancer.

Table A2. Females with stomach (digestive system) cancer.

Table A3. Males with melanomas of the skin.

Table A4. Females with melanomas of the skin.

Table A5. Males with non-Hodgkin lymphoma.

Table A6. Males with non-Hodgkin lymphoma.

Funding Statement

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 5. Minimum value $m^{∙}$ that leads to reject the hypothesis $H_{0, 2}$ .