Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2019 Jul 14;47(13-15):2641–2657. doi: 10.1080/02664763.2019.1641188

Considering the sample sizes as truncated Poisson random variables in mixed effects models

Célia Nunes a,CONTACT, Elsa Moreira b, Sandra S Ferreira a, Dário Ferreira a, João T Mexia b
PMCID: PMC9042002  PMID: 35707435

Abstract

When applying analysis of variance, the sample sizes may not be previously known, so it is more appropriate to consider them as realizations of random variables. A motivating example is the collection of observations during a fixed time span in a study comparing, for example, several pathologies of patients arriving at a hospital. This paper extends the theory of analysis of variance to those situations considering mixed effects models. We will assume that the occurrences of observations correspond to a counting process and the sample dimensions have Poisson distribution. The proposed approach is applied to a study of cancer patients.

Keywords: Random sample sizes, mixed effects, L extensions models, F-tests, counting processes, cancer registries

2010 Mathematics Subject Classifications: 62J12, 62J10, 62J99

1. Introduction

In some applications of analysis of variance in medicine, social sciences, economic or agriculture, etc., it is more appropriate to regard the sample sizes as random variables. These situations occur commonly when there is a fixed time span for collecting the observations, other examples arise when some other resource is limited. A motivating example is the collection of data from patients with several pathologies arriving at a hospital during a fixed time span. The number of patients for each pathology is not known in advance and a replication of the study during a different time period of the same length would result in a sample of different size. Therefore, if we plan to conduct just one study to compare the pathologies, it is more appropriate to consider the sample sizes as realizations, n1,,nm, of random variables, N1,,Nm, [15,17,20]. Another important case arises when one of the pathologies is rare since, in that case, the desired number of patients in the sample set may not be achieved, [19]. In the cited studies, fixed effects ANOVA was applied. Now we extend the results to mixed effects models to deal with random sample sizes.

The current approach must be based on an adequate choice of the distribution of N1,,Nm. In this paper, we will assume that the occurrence of observations corresponds to independent counting processes. An illustrative example of this is the aforementioned case, concerning the comparison of pathologies. This leads us to consider the assumption of N1,,Nm being independent and Poisson distributed with parameters λ1,,λm, NiP(λi), i=1,,m [12,15,17–20]. Since we need to have at least one observation per treatment, we will consider the random variables N¨i, i=1,,m, obtained truncating the random variables Ni for Ni1, i=1,,m (see Appendix 1). Through the independence of N¨i,i=1,,m, the variable N¨=i=1mN¨i has truncated Poisson distribution with parameter

λ=i=1mλi.

For different situations, it will be more appropriate to consider other discrete distributions for random sample sizes, such as

  • the Binomial distribution, when there exists an upper bound for the sample sizes, which however may not be attained (either owing to occurrences of failures or for some other reason). An illustrative example of this is when a planned number of patients are approached but only a proportion of them give consent to be included in the study [16,17];

  • the Negative Binomial distribution, which can be used as an alternative to the Poisson distribution in cases in which the observations are overdispersed with respect to a Poisson distribution.

This paper is structured as follows. In Section 2, we present the formulation of the mixed models in the context of random sample sizes. The test statistics and their conditional and unconditional distributions are obtained in Section 3. Section 4 presents an application based on real medical data, namely on patients affected by cancer, in order to illustrate the usefulness of our approach. Finally, some concluding remarks are made in Section 5.

2. Model

When considering in mixed models that the sample size are random variables, very likely we will get different number of observations per treatment (combination of factor levels), that is, we have an unbalanced design. In order to cope with unbalanced situations a more broader class of models, designated as L extensions or L models, was developed some years ago in [3] and [14]. Using the L extensions in the formulation of the mixed models with random sample sizes, allow us to deal the lack of orthogonality originated by unbalanced situation.

Let us suppose that the m components of Yo correspond to the treatments of a linear model and

L=L(n)=D(1n1,,1nm) (1)

be the block diagonal matrix with the principal blocks 1n1,,1nm, where 1n denotes the vector with all n components equal to 1 and n=(n1,,nm). Then

Y=LYo+ε (2)

corresponds to a model with sample sizes n1,,nm, where ε is the error vector with null mean vector and variance–covariance matrix σ2In, with In the n×n identity matrix and

n=i=1mni.

Let's consider that

Yo=i=0wXiβi, (3)

where β0 is fixed with c0 components and β1,,βw are random and independent, with null mean vectors and variance–covariance matrices σ12Ic1,,σw2Icw, where ci, i=1,,w, denote the number of components of βi,i=1,,w. Thus Yo has mean vector and variance–covariance matrix given by

μo=X0β0Vo=i=1wσi2Mi,

with Mi=XiXi, i=1,,w, where matrices Xi have m rows and ci, i=0,,w, columns, see e.g. [5,8,23]. We point out that Yo and Y are random vectors with m and n components, respectively, since L is an n×m matrix.

3. Test statistics and their distributions

In this section, we obtain the test statistics and their conditional distribution and unconditional distribution, under the assumption that we have random sample sizes. We will start by presenting some important results about L extensions.

Let us assume that Yo has orthogonal block structure, so the matrices M1,,Mw commute and they will be linear combinations of pairwise orthogonal projection matrices K1,,K, see [2]. Thus we have

Mi=j=1bi,jKj,i=1,,w,

and

Vo=j=1γjKj,

where γj=i=1wbi,jσi2, j=1,,. With B=[bi,j], γ=(γ1,,γ) and σ2=(σ12,,σw2), we also have

γ=Bσ2,

see e.g. [1,2,4,6]. Let's consider that the row vectors of Aj, j=1,,, constitute an orthonormal basis for the range space of Kj, R(Kj), j=1,,, then we have

Kj=AjAj,j=1,,Igj=AjAj,j=1,,,

with gj=rank(Kj).

Let L+ the MOORE-PENROSE inverse of matrix L, then the orthogonal projection matrices (OPM) on Ω¯=R(L) and on its orthogonal complement Ω¯ are [22]

LL+=TInT.

So, with L=D(1n1,,1nm), we have

L+=D(1n11n1,,1nm1nm).

When Yo is independent of εN(0,σ2In), i.e. ε is normal with null mean vector and variance–covariance matrix σ2In, then Tε and (InT)ε are also independent, since they have normal joint distribution and null cross-covariance matrices. Therefore

TY=TLYo+Tε=LYo+Tε

and

YΩ¯=(InT)Y=(InT)ε

are independent.

Since the column vectors of L are linearly independent we have [22]

L+L=Im.

So we can consider [3]

Yoo=L+Y=Yo+L+ε=Yo+L+Tε,

since L+T=L+LL+=L+, independent of YΩ¯, then independent of

S=YΩ¯2, (4)

where (1/σ2)S has chi-square distribution with

g(n)=nm

degrees of freedom, Sσ2χg(n)2.

Let us now observe that Yoo has mean vector and variance–covariance matrix given by

μoo=μo=X0β0Voo=Vo+σ2L+(L+)=j=1γjKj+σ2L+(L+).

With L=D(1n1,,1nm), we will have

L+(L+)=D(n11,,nm1)

and

Yjoo=AjYoo,j=1,,,

has mean vector and variance–covariance matrix

μjoo=Ajμo=AjX0β0,j=1,,Vjoo=γjIgj+σ2Aj(L+(L+))Aj,j=1,,.

Being Pj and Qj the OPM on R(AjX0) and R(AjX0), with rank pj and fj=gjpj, j=1,,, respectively and Sj and Wj the matrices which the row vectors constitute an orthonormal base to R(AjX0) and R(AjX0), j=1,,, we have

Pj=SjSj,j=1,,Qj=WjWj,j=1,,.

3.1. Fixed sample sizes

Let us now address the hypothesis tests for the canonical variance components [13], γ1,,γ, assuming that, with 0z<,

pj<gj,j=z+1,.,.

So, let's consider

Yj=WjYjoo=WjAjYo+WjAjL+ε,j=z+1,.,,

which has null mean vector and variance–covariance matrix γjIfj+σ2Bj,j>z, with

Bj=WjAjL+(L+)AjWj,j=z+1,,.

We intend to test the hypothesis

H0,j:γj=0,j=z+1,,. (5)

When H0,j holds, we have

pr(WjAjYo=0)=1,j=z+1,,,

and consequently

pr(Yj=WjAjL+ε)=1,j=z+1,,.

Therefore, when H0,j holds, Yj has null mean vector and variance–covariance matrix σ2Bj,j=z+1,,, and (1/σ2)(Yj)(Bj1)Yj has chi-square distribution with fj degrees of freedom, (Yj)(Bj1)Yjσ2χfj2, j=z+1,, [10].

Since Yjoo is independent of S, Yj is also independent of S, j=z+1,,. Due to this, when H0,j holds, the statistic

Fj=g(n)fj(Yj)(Bj1)YjS,j=z+1,,, (6)

has central F distribution with fj, j=z+1,,, and g(n) degrees of freedom, F(|fj,g(n)), named as conditional distribution, and Fj might be used as the test statistic [21]. Moreover, the tests with the statistic Fj, j=z+1,,, are unbiased, e.g. [9,10].

3.2. Random sample sizes

Let us consider that n is the realization of a random vector N¨=(N¨1,,N¨m), which means that the samples will have random dimensions. In this section, we will focus on the case where

L(N¨)=D(1N¨1,,1N¨m),

for this reason the previous results need to be unconditioned in order to N¨.

Let us now suppose that we intend to test the hypothesis

H0:θ=0,

where θ is a general parameter, and the test is unbiased whatever n. So, denoting by prn,θ(Rejα) [prn,0(Rejα)] the probability of rejecting H0 for a significance level α, given n and the parameter θ [the probability of rejecting H0, given n and θ=0], we have

prn,θ(Rejα)>prn,0(Rejα). (7)

Unconditioning (7) in order to N¨, we still obtain

prθ(Rejα)>pr0(Rejα),

and the test still unbiased.

So, since the tests for the hypothesis H0,j:γj=0,j=z+1,,, are unbiased whatever n, we can conclude that they still remain unbiased after unconditioning.

Let us assume that the occurrence of observations corresponds to independent counting processes, which lead us to consider that N¨1,,N¨m have truncated Poisson distribution with parameters λi, i=1,,m. Furthermore, to perform inference we also consider that N¨=i=1mN¨i>m.

In order to avoid unbalanced cases we will assume that we have a global minimum dimension for the samples [12,20]. Therefore, considering N¨>m, with mm, we may take the probability

p¨n,m=pr(N¨=n|N¨>m)=pr(N¨=n)pr(N¨>m)=p¨npr(N¨>m)pr(N¨>m)pr(N¨>m)=p¨n1p¨m1p¨m1h=mmp¨h=p¨n,m1p¨m1h=mmp¨h,n=m+1,,

where

p¨n,m=p¨n1p¨m,n=m+1,, (8)

as defined in (A1), Appendix 1, which is dedicated to the truncated Poisson distribution.

Consequently, the unconditional distribution of Fj, j=z+1,,, when the hypothesis H0,j holds, will be given by, e.g. [12,20],

F¯¯j(z)=n=m+1pr(N¨=n|N¨>m)F(z|fj,g(n))=n=m+1p¨n,mF(z|fj,g(n)),j=z+1,,. (9)

4. An application to real data

In this section, we apply the proposed methodology to a dataset from patients affected by cancer. The data was collected from the U.S. Cancer Statistics Working Group [24] according to official guidelines and refer to the age of disease detection in 2009. We compare the results obtained using our approach and the common ANOVA.

We will consider a mixed model with one fixed and one random effects factors. The fixed effects factor will be the Gender, with two levels (Male and Female). Due to the large number of cancer types we resorted to the simple random sampling method to select three different types of cancer from the available list. Thus the random effects factor will be the Type of Cancer and the selected types constitute a random sample.

Table 1 illustrates the types of cancer which have been selected, the number of patients and the mean ages at the time of disease detection. This leads to m=2×3=6 different treatments. The global frequencies of these three types of cancer, for males and females, are provided in Appendix 2.

Table 1. Number of patients and sample mean ages.

  Number of patients Sample means
Type of cancer Male Female Male Female
Stomach (digestive system) 44 30 70.523 68.833
Melanomas of the skin 134 99 63.791 57.303
Non-Hodgkin lymphoma 123 105 63.382 66.286

According to (3), in this particular example we have

Yo=X0β0+X1β1+X2β2, (10)

where β0 is fixed and β1 and β2 are random, independent, corresponding, respectively, to the random effects factor (Type of cancer) and interaction between the two factors. We have the design matrices

X0=I213X1=12I3X2=I2I3,

where ⊗ denotes the Kronecker product, and

M1=J2I3M2=I2I3.

Let's assume that

M1=2K1M2=K1+K2,

which means that

K1=12M1=12J2I3K2=M212M1=(I212J2)I3

and consequently the matrices Aj, j=1,2, will be given by

A1=[120012000120012000120012]A2=[120012000120012000120012]

and

A1X0=121213A2X0=[1212]13.

The matrices Qj, j=1,2, which are the OPM on R(AjX0), j=1,2, will be given by

Q1=W1W1=I313J3Q2=W2W2=I313J3,

with Jr=1r1r and

W1=W2=[12012162616].

Moreover, f1=rank(Q1)=3 and f2=rank(Q2)=3. Besides this, the OPM on R(AjX0), j=1,2, are

P1=A1X0(A1X0)+=[131313131313131313]P2=A2X0(A2X0)+=[131313131313131313].

We will test the hypotheses

H0,j:γj=0,j=1,2,

which are the hypotheses of absence of random effects and interaction between the two factors.

Given N¨=n, when H0,j, j=1,2 holds, the conditional distribution of

Fj=g(n)3(Yj)(Bj1)YjS,j=1,2

is a central F distribution with fj=rank(Qj)=3, j=1,2, and g(n)=n6 degrees of freedom, F(|3,n6).

In the calculations, we assume that

n=0mp¨n,m0,

which means that, with high probability, we have N¨>m, so m+1 is the global minimum dimension for the samples. Therefore the unconditional distribution of the statistics will be given by

F¯¯j(z)=n=m+1p¨n,mF(z|3,n6),j=1,2. (11)

Besides this, due to the monotony property of the F distribution [12], when n<no, we have

F(z|3,n6)<F(z|3,no6), (12)

so that

F(z|3,m+16)F¯¯j(z)1

which gives us a lower bound for F¯¯j(z). Thus, from F(z|3,m5), we can obtain upper bounds for the quantiles of the unconditional distributions F¯¯j(z), j=1,2. If we use these upper bounds as critical values, we will have tests with sizes that do not exceed the theoretical values.

Remark

  • We can use these upper bounds for a preliminary test. If the test statistic exceeds the upper bound it also exceeds the real critical value (obtained when using the unconditional distribution). For the cases when the test statistic is lower than the upper bound one must compute the critical value solving the equation F¯¯j(z)=1α, for z, j=1,2. To solve it we may truncate the series in Equation (11) according to the rule established in [11,19]. This way, restricting the sum to the term m¯=i=1mm¯i, with nim¯i, where ni are the realizations of the N¨i, i=1,m, we will have
    F¯¯j,m¯(z)=n=m+1m¯p¨n,mF(z|3,n6),i=1,2.
    Considering ε small, we choose each m¯i such that
    ni=0m¯ieλiλinini!>1ϵϵ>1ni=0m¯ieλiλinini!,i=1,,m. (13)
    This inequality will be used to obtain the minimum value of m¯ needed to F¯¯j,m¯(z) be a good approximation for the distribution F¯¯j(z), i=1,2, [11].
  • Usually the analysis starts with a test of interaction and follows with the tests to the main effects whenever it is not significant. We do not follow this approach since we are interested in showing how these tests could be carried out through unconditioning [20].

4.1. Random effects factor

For the second factor, we have

Y1=W1A1L+Y=[1.12551.8846],

where

L+=D(144144,130130,11341134,199199,11231123,11051105),

with L+Y the vector of the sample means with components 70.523, 68.833, 63.791, 57.303, 63.382, 66.286 and

B1=W1A1L+(L+)A1W1=[0.0124536950.0022865650.0022865650.017972370]

So, for the numerator of the statistic F1 we obtain

(Y1)(B11)Y1=262.120.

When N¨=n, S=YΩ¯ is the product by σ2 of a central chi-square with g(n)=n6 degrees of freedom, σ2χn62. In this case, we obtained S=131250.672.

Therefore, the statistic's value, F1,Obs, is given by

F1,Obs=5293262.120131250.672=0.352.

If we use the common conditional distribution of F1, which corresponds to F(z|3,529), since n=535, we will obtain the quantiles given in Table 2.

Table 2. The quantiles of the conditional distribution.

Values of α 0.1 0.05 0.01
z1α 2.094 2.622 3.819

So, since F1,Obs<z1α, we do not reject H0,1 for the usual levels of significance.

Let's assume that we have 12 [16 and 19] observations as global minimum dimensions for the samples, which means that we consider m+1=12m=11[m=15 and m=18]. Table 3 shows the upper bounds for the quantiles with probability 1α, z1αu, of the unconditional distribution F¯¯1(z).

Table 3. Upper bounds for the quantiles.

  Values of α 0.1 0.05 0.01
  m=11 3.289 4.757 9.779
z1αu m=15 2.728 3.708 6.552
  m=18 2.560 3.410 5.739

It is to be expected that the quantiles for random sample sizes (obtained when using the unconditional distribution) to exceed the classical ones (obtained when using common conditional distribution), since the first ones take into account a new source of variation. Then, since in this case we do not reject the hypothesis using the classical quantiles the same result is expected when using the quantiles for random sample sizes and consequently the upper bound approach. This interpretation leads us to not reject H0,1.

The quantiles for the unconditional distribution are approximated by truncation of the infinite series indicated in Equation (11). We obtained the minimum value m¯=38 for a truncation error not greater than 108 (ϵ108). To carry out the computation, we assumed that λi, i=1,,6, are the daily average of occurrences per year. So we have λ1=0.13,λ2=0.09,λ3=0.37,λ4=0.28,λ5=0.34,λ6=0.29.

The obtained quantiles with probability 1α, z1αt, of the truncated unconditional distribution

F¯¯1,m¯(z)=n=m+138p¨n,mF(z|3,n6) (14)

are presented in Table 4.

Table 4. The quantiles of the truncated unconditional distribution.

  Values of α 0.1 0.05 0.01
  m=11 3.255 4.693 9.583
z1αt m=15 2.720 3.695 6.518
  m=18 2.555 3.402 5.722

Results in Table 4 agree with those in Table 3, i.e. H0,1 is not rejected therefore the random factor is not significant.

4.2. Interaction

For the interaction between the fixed factor and the random one, we have

Y2=W2A2L+Y=[7.85720.0512]

and

B2=W2A2L+(L+)A2W2=[0.0124536950.0022865650.0022865650.017972370].

For the numerator of the statistic F2, we obtain

(Y2)(B21)Y2=5084.346.

Therefore, the statistic's value, F2,Obs, is given by

F2,Obs=52935084.346131250.672=6.831.

If we use the common conditional distribution of F2, which corresponds to F(z|3,529), we obtain the quantiles given in Table 2. Since F2,Obs>z1α, we reject H0,2 for the usual levels of significance.

Considering the truncated unconditional distribution, F¯¯2,m¯, which correspond to F¯¯1,m¯ defined in (14), we obtained the quantiles, z1αt, given in Table 4. The results in this table lead us to:

  • reject H0,2 for α=0.1 and 0.05 and do not reject for α=0.01, considering m+1=12;

  • reject H0,2 for the usual level of significance, considering m+1=16 or 19.

Table 3 shows the upper bounds for the quantiles with probability 1α, z1αu, of the unconditional distribution. These results agree with those based on the quantiles of the truncated unconditional distribution. Assuming the values of the test statistic remain unchanged, then we should have the total sample sizes presented in Table 5 for ensuring rejection.

Table 5. Minimum value m that leads to reject the hypothesis H0,2.

Values of 1α 0.1 0.05 0.01
m 8 9 15

Since for higher values of m we would get lower values for the quantiles, we have FObs,2>z1αu for all m15. In this case, we reject H0,2 considering the usual levels of significance, which means that the interaction between factors is significant.

4.3. Conclusion

Our discussion shows the relevance of the unconditional approach in avoiding false rejections. As we saw, the inference results for some situations depends on the approach. Since the unconditional approach is more secure, when testing the interaction the null hypothesis is not rejected when m=11 and α=0.01, whereas the common conditional approach would lead to a false rejection.

The results in Tables 3 and 4 show that for higher minimum sample sizes, we get smaller upper bounds and quantiles of the unconditional distribution. Due to this, we may conclude that with the increase of the minimum sample sizes, the decision based on both approaches is similar.

To finish we would like to note that all the computations were performed using the R software.

5. Final remarks

The approach followed in this paper is more realistic than the usual F tests for the situations where it is not possible to known in advance the sample sizes. To do that, we have to make assumptions regarding the distribution of the sample sizes based on previous knowledge of the sample collection and incorporate this source of variation into the mixed model. We choose the Poisson distribution since it would correspond to Poisson processes for observation collection and the underlying assumption for these (independent and stable increments and not clustering) seems realist. Moreover, the L extensions fit easily in the assumption of random sample sizes. These model formulation have been used to solve the unbalance originated by different number of observations per treatment, which cause non-orthogonality in fixed and mixed effects models. We included an application with cancer data to illustrate how straightforward it is to apply our approach in a medical context. The comparative results show that when random sample sizes are considered the critical values may exceed those of classical ANOVA (obtained when using the common F conditional distribution). So, we can conclude that this approach avoids working with incorrect critical values and thus carrying out tests without the proper level. We would like also to highlight that our methodology is not restricted to the medical domain and yet may be applied to several other research areas.

Acknowledgments

The authors would like to thank the anonymous referees for useful comments and suggestions.

Appendices.

Appendix 1. Truncated Poisson distributions

This appendix presents some results about the truncated Poisson distribution, which are useful in obtaining the unconditional distribution of the test statistics.

Since we need to have at least one observation per treatment, we will consider the common form of truncated Poisson distribution, which corresponds to the omission of the zero class, e.g. [7]. So we have Ni1,i=1,,m. To perform inference, we also consider that N>m, where N=i=1mNi.

As previously mentioned, we assumed that NiP(λi), i=1,,m and NP(λ). So we have

pr,i=pr(Ni=r|Ni1)=pr(Ni=r)pr(Ni1)=eλiλir/r!1eλi=eλi1eλiλirr!,r1,i=1,,m.

Therefore, the moment generating function of Ni, when Ni1, i=1,,m, will be

φi(u)=r=1eλi1eλiλirerur!=eλi1eλi(eλieu1),i=1,,m,

and the probability generating functions

χi(z)=φi(lnz)=eλi1eλi(eλiz1),i=1,,m.

With N¨i, i=1,,m, the truncated variables Ni, i=1,,m, when Ni1, and considering

N¨=i=1mN¨i,

we will obtain the probability generating function

χ¨(z)=i=1mχi(z)=(i=1meλi1eλi)i=1m(eλiz1)=(i=1meλi1eλi)Cm¯¯(1)m(C)e(iCλi)z,i=1,,m,

where m¯¯={1,,m} and (C) denotes the cardinal of C, any subset of m¯¯.

Therefore we will have

p¨r=pr(N¨=r)=1r!(i=1meλi1eλi)Cm¯¯(1)m(C)(iCλi)r,r=m,.

It is interesting to observe that we have

χ¨s(0)=0,s=1,,m1,

where s denotes the derivative of order s, which results from

j1++jm=s;s=1,,m1,

j=(j1,,jm) have one or more null components and χi(0)=0, i=1,,m.

Indeed, with Ps(m) the family of partitions of s with cardinal m, we have

χ¨s(0)=jPs(m)(i=1mji)!i=1mji!i=1mχi<ji>(0),s=1,

and, if s<m, whatever jPs(m),

i=1mχiji(0)=0

since j has at least one null component. So, since χ¨<s>(0)=s!p¨s, we obtain

p¨s=1s!χ¨<s>(0)=0,sm1.

Furthermore, the only non-null term of χ¨<m>(0) corresponds to j=1m, so

p¨m=pr(N¨=m)=1m!χ¨<m>(0)=i=1mχi<1>(0)=i=1meλiλi1eλi

and

pr(N¨>m)=1p¨m=1i=1meλiλi1eλi.

Considering N¨ the random vector with components N¨1,,N¨m, we have N¨>m, which means there exists at least one N¨i>1,i=1,,m, if and only if N¨>1m so

pr(N¨>1m)=1p¨m.

We also have

p¨r,m=pr(N¨=r|N¨>m)=p¨r1p¨m,r=m+1,. (A1)

Appendix 2. Frequency tables of types of cancer

Table A1. Males with stomach (digestive system) cancer.

Age 1–4 5–9 10–14 15–19 20–24 25–29 30–34 35–39 40–44
Mean age 2 7 12 17 22 27 32 37 42
Patients 0 0 0 0 0 0 0 0 1
Age 45–49 50–54 55–59 60–64 65–69 70–74 75–79 80–84 85+
Mean age 47 52 57 62 67 72 77 82 87
Patients 1 2 4 5 6 7 7 6 5

Table A2. Females with stomach (digestive system) cancer.

Age 1–4 5–9 10–14 15–19 20–24 25–29 30–34 35–39 40–44
Mean age 2 7 12 17 22 27 32 37 42
Patients 0 0 0 0 0 0 0 1 1
Age 45–49 50–54 55–59 60–64 65–69 70–74 75–79 80–84 85+
Mean age 47 52 57 62 67 72 77 82 87
Patients 2 2 2 3 3 3 4 4 5

Table A3. Males with melanomas of the skin.

Age 1–4 5–9 10–14 15–19 20–24 25–29 30–34 35–39 40–44
Mean age 2 7 12 17 22 27 32 37 42
Patients 0 0 0 0 1 2 2 4 6
Age 45–49 50–54 55–59 60–64 65–69 70–74 75–79 80–84 85+
Mean age 47 52 57 62 67 72 77 82 87
Patients 8 12 14 17 16 16 14 12 10

Table A4. Females with melanomas of the skin.

Age 1–4 5–9 10–14 15–19 20–24 25–29 30–34 35–39 40–44
Mean age 2 7 12 17 22 27 32 37 42
Patients 0 0 0 1 2 4 4 6 7
Age 45–49 50–54 55–59 60–64 65–69 70–74 75–79 80–84 85+
Mean age 47 52 57 62 67 72 77 82 87
Patients 10 10 10 10 8 7 7 6 7

Table A5. Males with non-Hodgkin lymphoma.

Age 1–4 5–9 10–14 15–19 20–24 25–29 30–34 35–39 40–44
Mean age 2 7 12 17 22 27 32 37 42
Patients 0 0 1 1 1 2 2 3 5
Age 45–49 50–54 55–59 60–64 65–69 70–74 75–79 80–84 85+
Mean age 47 52 57 62 67 72 77 82 87
Patients 8 10 12 14 15 14 14 12 9

Table A6. Males with non-Hodgkin lymphoma.

Age 1–4 5–9 10–14 15–19 20–24 25–29 30–34 35–39 40–44
Mean age 2 7 12 17 22 27 32 37 42
Patients 0 0 0 1 1 1 2 2 3
Age 45–49 50–54 55–59 60–64 65–69 70–74 75–79 80–84 85+
Mean age 47 52 57 62 67 72 77 82 87
Patients 5 7 9 11 13 13 13 12 12

Funding Statement

This work was partially supported by the FCT- Fundação para a Ciência e Tecnologia, under the projects UID/MAT/00212/2019 and UID/MAT/00297/2019.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  • 1.Bailey R.A., Ferreira S.S., Ferreira D., and Nunes C., Estimability of variance components when all model matrices commute, Linear Algebra Appl. 492 (2016), pp. 144–160. doi: 10.1016/j.laa.2015.11.002 [DOI] [Google Scholar]
  • 2.Carvalho F., Mexia J.T., Santos C., and Nunes C., Inference for types and structured families of commutative orthogonal block structures, Metrika 78 (2015), pp. 337–372. doi: 10.1007/s00184-014-0506-8 [DOI] [Google Scholar]
  • 3.Ferreira S., Ferreira D., Moreira E., and Mexia J.T., Inference for L orthogonal models, J. Interdiscip. Math. 12 (2009), pp. 815–824. doi: 10.1080/09720502.2009.10700666 [DOI] [Google Scholar]
  • 4.Ferreira S.S., Ferreira D., Nunes C., and Mexia J.T., Estimation of variance components in linear mixed models with commutative orthogonal block structure, Rev. Colomb. Estadist. 36 (2013), pp. 261–271. [Google Scholar]
  • 5.Heinzl F. and Tutz G., Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm, Stat. Modell. 13 (2013), pp. 41–67. doi: 10.1177/1471082X12471372 [DOI] [Google Scholar]
  • 6.Houtman A.M. and Speed T.P., Balance in designed experiments with orthogonal block structure, Ann. Statist. 11 (1983), pp. 1069–1085. doi: 10.1214/aos/1176346322 [DOI] [Google Scholar]
  • 7.Johnson N.L. and Kotz S., Discrete Distributions, John Wiley & Sons, New York, 1969. [Google Scholar]
  • 8.Khuri A.I., Mathew T., and Sinha B.K., Statistical Tests for Mixed Linear Models, John Wiley & Sons, New York, 1998. [Google Scholar]
  • 9.Lehmann E.L., Testing Statistical Hypotheses, John Wiley & Sons, New York, 1959. [Google Scholar]
  • 10.Mexia J.T., Best linear unbiased estimates, duality of F tests and the Scheffé multiple comparison method in presence of controlled heterocedasticity, Comput. Stat. Data Anal. 10 (1990), pp. 271–281. doi: 10.1016/0167-9473(90)90007-5 [DOI] [Google Scholar]
  • 11.Mexia J.T. and Moreira E., Randomized sample size F tests for the one-way layout. 8th International Conference on Numerical Analysis and Applied Mathematics 2010. AIP Conf. Proc. 1281(II), 2010, pp. 1248–1251.
  • 12.Mexia J.T., Nunes C., Ferreira D., Ferreira S.S., and Moreira E., Orthogonal fixed effects ANOVA with random sample sizes, Proceedings of the 5th International Conference on Applied Mathematics, Simulation, Modelling (ASM'11), 2011, pp. 84–90
  • 13.Michalski A. and Zmyślony R., Testing hypothesis for variance components in mixed linear models, Statistics 27 (1996), pp. 297–310. doi: 10.1080/02331889708802533 [DOI] [Google Scholar]
  • 14.Moreira E., Mexia J.T., Fonseca M., and Zmyślony R., L models and multiple regressions designs, Statist. Papers 50 (2009), pp. 869–885. doi: 10.1007/s00362-009-0255-3 [DOI] [Google Scholar]
  • 15.Moreira E.E., Mexia J.T., and Minder C.E., F tests with random sample size. Theory and applications, Stat. Probab. Lett. 83 (2013), pp. 1520–1526. doi: 10.1016/j.spl.2013.02.020 [DOI] [Google Scholar]
  • 16.Nunes C., Capistrano G., Ferreira D., Ferreira S.S., and Mexia J.T., One-way fixed effects ANOVA with missing observations, Proceedings of the 12th International Conference on Numerical Analysis and Applied Mathematics, AIP Conf. Proc. 1648, 2015, p. 110008.
  • 17.Nunes C., Capristano G., Ferreira D., Ferreira S.S., and Mexia J.T., Exact critical values for one-way fixed effects models with random sample sizes, J. Comput. Appl. Math. 354 (2019), pp. 112–122. doi: 10.1016/j.cam.2018.05.057. [DOI] [Google Scholar]
  • 18.Nunes C., Ferreira D., Ferreira S.S., and Mexia J.T., F Tests with Random Sample Sizes. 8th International Conference on Numerical Analysis and Applied Mathematics. AIP Conf. Proc. 1281(II), 2010, pp. 1241–1244
  • 19.Nunes C., Ferreira D., Ferreira S.S., and Mexia J.T., F-tests with a rare pathology, J. Appl. Stat. 39 (2012), pp. 551–561. doi: 10.1080/02664763.2011.603293 [DOI] [Google Scholar]
  • 20.Nunes C., Ferreira D., Ferreira S.S., and Mexia J.T., Fixed effects ANOVA: An extension to samples with random size, J. Stat. Comput. Simul. 84 (2014), pp. 2316–2328. doi: 10.1080/00949655.2013.791293 [DOI] [Google Scholar]
  • 21.Scheffé H., The Analysis of Variance, Wiley Series in Probability and Statistics, John Wiley & Sons, New York, 1959. [Google Scholar]
  • 22.Schott J.R., Matrix Analysis for Statistics, John Wiley & Sons, New York, 1997. [Google Scholar]
  • 23.Searle S.R., Casella G., and McCulloch C.E., Variance Components, Wiley Series in Probability and Statistics, John Wiley & Sons, New York, 1992. [Google Scholar]
  • 24.U.S. Cancer Statistics Working Group , United States Cancer Statistics: 1999–2010 Incidence and Mortality Web-based Report. Atlanta: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute, 2013. Available at https://nccd.cdc.gov/uscs/.

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES