Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2022 Sep 2;32(1):163–183. doi: 10.1007/s11749-022-00828-9

Homogeneity tests for one-way models with dependent errors under correlated groups

Yuichi Goto 1,, Koichi Arakaki 2, Yan Liu 3,4, Masanobu Taniguchi 3
PMCID: PMC9438895  PMID: 36091581

Abstract

We consider the problem of testing for the existence of fixed effects and random effects in one-way models, where the groups are correlated and the disturbances are dependent. The classical F-statistic in the analysis of variance is not asymptotically distribution-free in this setting. To overcome this problem, we propose a new test statistic for this problem without any distributional assumptions, so that the test statistic is asymptotically distribution-free. The proposed test statistic takes the form of a natural extension of the classical F-statistic in the sense of distribution-freeness. The new tests are shown to be asymptotically size α and consistent. The nontrivial power under local alternatives is also elucidated. The theoretical results are justified by numerical simulations for the model with disturbances from linear time series with innovations of symmetric random variables, heavy-tailed variables, and skewed variables, and furthermore from GARCH models. The proposed test is applied to log-returns for stock prices and uncovers random effects in sectors.

Supplementary Information

The online version contains supplementary material available at 10.1007/s11749-022-00828-9.

Keywords: Dynamic panel data, Fixed effect, Homogeneity test, Longitudinal data, One-way model, Random effect

Introduction

Longitudinal data and panel data are omnipresent in the real world. Statistical methods to analyze such data have been studied for several decades (Diggle et al. 2002). The methods have a wide range of applications, e.g., analysis of stress in mothers (Zeger et al. 1985), the weight of infants (Hoover et al. 1998), and COVID-19 data (Bernardes et al. 2020; Lucas et al. 2020).

The analysis of variance (ANOVA) is a common method to test for equality among groups. An F-statistic, defined as the ratio of variance between groups to variance within groups, is designed to test for the homogeneity of groups for independent and identically distributed (i.i.d.) data. Numerous papers are devoted to ANOVA and related topics for i.i.d. data (see, e.g., Searle et al. 1992; Rashid 1995; Clarke 2008; Liu and Xu 2016, and references there in). By contrast, the statistic does not work for dependent data. To resolve the issue, Nagahata and Taniguchi (2018) studied a test for the equality of means among groups based on the Whittle likelihood for multivariate one-way fixed effect models. Their statistic can be rephrased as the classical F-statistic rescaled by the spectral density of disturbances. They showed their statistic is asymptotically Chi-square distributed, although they did not derive the consistency of the test and assumed independence of groups.

One-way models for time series are closely related to the analysis of longitudinal data and dynamic panel data. For dynamic panel data, Baltagi and Li (1991) constructed the consistent estimator of variance of random effects for dynamic panel data models with errors from the autoregressive process of order 1 (AR(1)), provided that the number of groups and sample size tends to infinity. Galbraith and Zinde-Walsh (1995) dealt with error components models for panel data models with errors from the autoregressive moving-average process of orders p and q. You and Zhou (2013) advocated semiparametric panel data partially linear additive models with errors from the AR(1). The statistical methods for longitudinal data also have been intensively investigated. For example, Tang and Leng (2011) estimated regression coefficients by the empirical likelihood. Li (2011) constructed an efficient estimator for semiparametric regression models. A panel data model with common shocks is proposed by Bai and Li (2014), and Ergemen and Velasco (2017) extended the model to a fractionally integrated panel data model with common shocks. Under high-dimensional settings, Zhong et al. (2019) considered a test for homogeneity of covariance matrices and constructed a change test for covariance matrices. Fang et al. (2020) proposed a test for regression parameters. However, the principal objective in these fields is not fixed and random effects but is regression coefficients.

The importance of fixed effects and random effects has been recognized, whereas, to our best knowledge, there are few references of diagnostic tests for fixed effects and random effects. On a related topic, Akharif et al. (2020) and Fihri et al. (2020) established optimal tests for the existence of random coefficients for i.i.d. data based on the locally asymptotic normality for random coefficient regression models. The optimal test based on multivariate ranks for the existence of fixed effects for i.i.d. data proposed by Hallin et al. (2021). Recently, González et al. (2021) have discussed tests for the existence of fixed effects and interactions for two-way models for spatial point processes. Ditzhaus et al. (2021) proposed robust tests based on quantiles for fixed effects and interaction for i.i.d. random variables.

We propose a test for the existence of fixed or random effects in one-way models for correlated groups and derive the asymptotic null distribution. In addition, the consistency of the proposed test and the nontrivial power under the local alternatives are elucidated. The numerical study illustrates the finite sample performance of the proposed test and comparison with the classical test. In particular, we also include the skewness and the heteroscedasticity in the disturbance process, which reveals its own importance in practical applications (Cook and Weisberg 1983). In this study, we also compare our statistic with the classical statistic. The classical statistic, defined in Sect. 2, assumes independence between groups, which is a major drawback in its application. The new statistic, defined in Sect. 3, elaborately relaxes the strong assumption of independence between groups. We emphasize that our setting allows us to deal with correlated groups, and thus, our proposed method has a wide range of applications.

A motivated real data example with correlated groups is the analysis of stock prices. Stock prices can be categorized by industry. Equity-focused investors believe that the stock prices are linked by factors related to earnings. For example, stock prices of automobile companies are linked to exchange rates. In other words, equity-focused investors believe that there are random effects related to industries. Our test which takes into account correlations between groups can be applied to verify this hypothesis.

This paper is organized as follows: We briefly review spectra and the classical settings and test statistic in Sect. 2. In Sect. 3, we introduce the fixed effects model and propose a new test for the existence of fixed effects. In Sect. 4, we deal with the random effects model and derive the asymptotic results for the proposed test. Section 5 presents the simulation study. In Sect. 6, we apply our test for the existence of effects to the log-returns in stock prices. The discussion is provided in Sect. 7. Supplementary material includes all proofs of theorems and additional simulation results.

Preliminary

Spectral density

In the frequency-domain approach, the L2-based spectral density is a pivotal index to describe time-dependent structures of data. To recall the definition, let Xt be a strictly stationary process with the autocovariance function γX(h)=EXtXt+h satisfying h=-γX(h)<. Then, the spectral density function is defined, for λ[-π,π], as

fX(λ)=12πh=-γX(h)e-ihλ. 1

Since γX(h)=-ππfX(λ)eihλdλ, the information of the spectrum fX(λ) is equivalent to that of autocovariance functions for all lags {γX(h)}hZ. A multivariate spectral density function can be defined by replacing γX(h) in (1) with ΓX(h)=EXtXt+h for a p-dimensional strictly stationary process Xt. Typical examples of spectra are the spectrum for ARMA models of orders (p,q) and the exponential type of the spectrum proposed by Bloomfield (1973), taking the forms of

fARMA(λ)=σ22π1+θ1e-iλ++θqe-iqλ21-ϕ1e-iλ--ϕpe-ipλ2andfEXP(λ)=σ22πexp2r=1dςrcos(rλ),

where σ,θ1,,θq,ϕ1,,ϕp,ς1,,ςd are parameters, respectively. Other examples can be found by, e.g., Chiu (1988). We refer readers to von Sachs (2020) for review.

Classical setting and statistic

Nagahata and Taniguchi (2018) discussed one-way models with independent groups; for a fixed group size a, a growing sample size ni of the ith group (i=1,a), and a fixed dimension p of time series in each group,

yit=μ+τi+eit,i=1,,a;t=1,,ni, 2

where yit=(yit1,,yitp)T is a tth p-dimensional observation of an ith group, μ=(μ1,,μp)T is a general mean, τi=(τi1,,τip)T is a fixed effect such that i=1aτi=0, and eit=(eit1,,eitp)T is a centered strictly stationary sequence such that {eit}tZ is independent of {ejt}tZ for ji and eit has a p-by-p spectral density matrix f(λ)=(fj1j2(λ))j1,j2=1,,p which is independent of i. For a test for existence of fixed effects defined in (5), they proposed the following test statistic

Sn=ni=1a(yi.¯-y..¯)T2πf~n(0)-1(yi.¯-y..¯), 3

where yi.¯=t=1niyit/ni, y..¯=i=1at=1niyit/(ani), f~n(0) is defined as

f~n(0)=1ai=1af^ii(λ)/ρi,

where f^ii(λ) is given in (6), and ρi=ni/n with n=i=1ani. This statistic is standardized within groups, and thus, the test based on Sn is asymptotically distribution-free in the case of independent groups (see Sect. 7). However, it does not hold when groups are correlated. This paper focuses on data with correlated groups such as stock prices are considered. In stock prices, sectors correspond to groups. We propose the test statistic standardized not only within groups but also between groups, defined in (7) so that our test statistic is asymptotically distribution-free. In this sense, our statistic takes the form of the natural extension of Sn.

Test for existence of fixed effects

In this section, we scrutinize one-way fixed effects model with dependent disturbance processes when the number of groups is fixed and the number of observations for each group diverges. Let us consider the model

yit=μ+τi+eit,i=1,,a;t=1,,ni, 4

where yit=(yit1,,yitp)T is a tth p-dimensional observation of an ith group, μ=(μ1,,μp)T is a general mean, τi=(τi1,,τip)T is a fixed effect such that i=1aτi=0, and eit=(eit1,,eitp)T is a centered strictly stationary sequence. Suppose that an observed stretch {yit;i=1,,a,t=1,,ni} is available, and (e1tT,,eatT)T has an ap-by-ap spectral density matrix f(λ)=(fij(λ))i,j=1,,a for λ[-π,π]. In addition, there exists ρi(0,1) such that ni=ρin with n=i=1ani. The number of groups, the length of time series from an ith group, and the dimension of time series from each group at each time are denoted as a, ni, and p, respectively. The role of p is to include the multivariate analysis of variance (MANOVA) case. Obviously, p=1 corresponds to the univariate ANOVA.

Remark 1

The above one-way model defined in (4) seems that only one time series for each group can be coped with, whereas we can handle the case that there are more than one time series for each group by reconfiguring the settings as follows: taking p as pq for qN,

yit=(yit11,,yit1q,yit21,,yitp1,yitpq)T, where 1q is a q-dimensional vector with all elements equal to one, μ=(μ11qT,,μp1qT)T, τi=(τi11qT,,τip1qT)T, and eit=(eit11,,eit1q,eit12,,eitp1,,eitpq)T. Moreover, p and q can depend on i. In this case, p and q represent the dimension of time series from each group at each time and the number of time series in each group, respectively.

Remark 2

The condition i=1aτi=0 is not essential. When i=1aτi0, we can redefine μ as μ-i=1aτi and τi as τi-i=1aτi.

Let the null hypothesis H0 and the alternative K0 be

H0:τ1==τavsK0:τi0for somei. 5

Under the assumption i=1aτi=0, the null hypothesis is equivalent to τi=0 for all i{1,,a}.

Let f^n(λ)=(f^ij(λ))i,j=1,,a be the nonparametric spectral density estimator defined as

f^ij(λ)=12π{hZ;hmin{ni,nj}-1}ωhMnΓ^ij(h)e-ihλ,λ[-π,π], 6

where ω(x)=-W(t)eixtdt and the function W(·) satisfy Assumption 3.2. Here, Mn is a positive sequence such that Mn and Mn/mini=1,,ani0 as mini=1,,ani, for h{0,,min{ni,nj}-1},

Γ^ij(h)=1min{ni,nj}-ht=1min{ni,nj}-h(yi(t+h)-yi.¯)(yjt-yj.¯)T,

for h{-min{ni,nj}+1,,0}, and

Γ^ij(h)=1min{ni,nj}-ht=-h+1min{ni,nj}(yi(t+h)-yi.¯)(yjt-yj.¯)T,

where yi.¯=t=1niyit/ni, and y..¯=i=1at=1niyit/(ani). Let V^n=(V^ij)i,j=1,a be

V^ij=2πmin{ρi,ρj}ρiρjf^ij(0)-2πas=1amin{ρs,ρj}ρsρjf^sj(0)+min{ρi,ρs}ρiρsf^is(0)+2πa2s,k=1amin{ρs,ρk}ρsρkf^sk(0).

The test statistic for H0 is proposed as

Tn=n(y1.¯T-y..¯T,,ya.¯T-y..¯T)V^n-(y1.¯T-y..¯T,,ya.¯T-y..¯T)T, 7

where V^n- denotes the Moore–Penrose inverse of V^n. Using the Moore–Penrose inverse V^n- in Tn is essential since V^n is a singular matrix. Actually, i=1aV^ij=Op for any j, where Op is an p-by-p zero matrix; thus, 0 is an eigenvalue of V^n. It is worth mentioning that our proposed test statistic Tn is scale-invariant. Since (y1.¯T-y..¯T,,ya.¯T-y..¯T) converges in distribution to a centered normal distribution with variance V, defined in Theorem 1, and V is the function of the spectral density matrix f(λ) (see Lemma 1 in Section A in the supplementary material), f(λ) appears.

To state the assumptions, we define, for a random variables {Xt}, the cumulant of order of (X1,,X) as

cum(X1,,X)=(ν1,,νp)(-1)p-1(p-1)!Ejν1Xν1EjνpXνp,

where the summation (ν1,,νp) extends over all partitions (ν1,,νp) of {1,2,,} (see Brillinger 1981, p. 19). The following assumptions are made throughout the paper.

Assumption 3.1

For all N, (k1,,k){1,,a}, and (r1,,r){1,,p},

s2,,s=-1+j=1sjκr1rk1k(s2,,s)<,

where κr1rk1k(s2,,s)=cum{ek10r1,ek2s2r2,,eksr}.

Assumption 3.2

W(·) is a real, bounded, nonnegative, even function such that -W(t)dt=1 and -W2(t)dt< with a bounded derivative.

Assumption 3.3

rank(V^n) converges in probability to rank(V), where V is defined in Theorem 1, as mini=1,,ani.

We briefly explain all assumptions. Assumption 3.1 is an assumption often imposed for dependent observations (see Brillinger 1981, p. 26). It implies the asymptotic normality of (y1.¯T-y..¯T,,ya.¯T-y..¯T). This assumption can be relaxed as Remark 1 in Section A in the supplementary material. Assumption 3.2 is a natural assumption for the nonparametric spectral density estimator. In conjunction with Assumption 3.1, f^n(λ) is a consistent estimator (see Brillinger 1981, Corollaries 5.6.1 and 5.6.2 and Theorem 5.9.1). Other conditions which ensure the consistency of the nonparametric spectral density estimator can be seen in Robinson (1991). Assumption 3.3 is a technical assumption to ensure V^n- converges in probability to V- as mini=1,,ani (see Rakocevic 1997; Stewart 1969).

Remark 3

When we assume independence of groups and f11(0)==faa(0), V^n fulfills Assumption 3.3. As an illustration, we set p=1 and a=3. Then,

V=1-1/a-1/a-1/a-1/a1-1/a-1/a-1/a-1/a1-1/a2πf(0),

and for matrices,

P=100010111andB=1-1/a-1/a-1/a-1/a1-1/a-1/a2πf(0),

it holds that

graphic file with name 11749_2022_828_Equ27_HTML.gif

Also, the matrix PV^n takes the form of

graphic file with name 11749_2022_828_Equ28_HTML.gif

where Bn^ is an appropriate (a-1)-by-a matrix. Since B is a full rank matrix and the set of all full rank (a-1)-by-a matrices is open, Bn^ is a full rank matrix for large n. Hence, the condition is confirmed.

Then, we obtain the following asymptotic null distribution based on Rao and Mitra (1971, Theorem 9.2.3, p. 173).

Theorem 1

Suppose Assumptions 3.13.3 hold. Under H0, Tn converges in distribution to the Chi-square distribution with r degrees of freedom as mini=1,,ani, where r=rank(V) and V=(Vij)i,j=1,a with

Vij=2πmin{ρi,ρj}ρiρjfij(0)-2πas=1amin{ρs,ρj}ρsρjfsj(0)+min{ρi,ρs}ρiρsfis(0)+2πa2s,k=1amin{ρs,ρk}ρsρkfsk(0).

From Theorem 1, we obtain an asymptotically size α test whether we reject H0 when Tnχr^n2[1-α], where r^n=rank(V^n-) and χr^n2[1-α] denotes the upper α-percentiles of the Chi-square distribution with r^n degrees of freedom.

We elucidate the theoretical power of the test in the next theorem.

Theorem 2

Suppose Assumptions 3.13.3 hold. Under the alternative K0, the power of the above test based on Tn converges to 1, as mini=1,,ani. In other words, the test is consistent.

To see the nontrivial power of the proposed test, let us consider local alternative hypotheses. Provided the perturbations h1,,ha satisfying i=1ahi=0, the local alternative is defined as

K0(n):τi=hin(i=1,,a).

Theorem 3

Suppose Assumptions 3.13.3 hold. Under the local alternatives K0(n), Tn converges in distribution to the noncentral Chi-square distribution with r degrees of freedom and the noncentrality parameter δ=(h1T,,haT)V-(h1T,,haT)T, as mini=1,,ani.

In view of this theorem, the nontrivial asymptotic power of the test under the local alternatives can be expressed as

1-Ψr,δ(χr2[1-α]),

where Ψr,δ is the cumulative distribution function of the noncentral Chi-square with r degrees of freedom and the noncentrality parameter δ.

Remark 4

In case that the number of time series in each group is greater than one (q2, see Remark 1), the multiple comparison problem occurs since our test provides different p-values for different orders of time series. For example, for p=1, we obtain (q!)a-1 different p-values in total. To avoid the multiple comparison problem, we propose that yit=(yit1.,yit2.,,yitp.)T , where yitpq=j=1qyitpj/q, is used instead of (yit11,,yit1q,yit21,,yitp1,yitpq)T.

Test for existence of random effects

In this section, we consider the one-way random effects model with a series of strictly stationary residuals when the number of groups is fixed and the number of observations for each group diverges. The only difference from the fixed effects model (4) is that τi is random effect of the ith group. To be simple, we assume (τ1T,,τaT)T follows the ap-dimensional centered normal distribution with variance Στ=(Σijτ)i,j=1,,a. Here, {τj} are supposed to be independent of any disturbance process {eit;t=1,...,ni}. In this random effects model, the spectral density of yit does not exist due to the random effects.

Let the null hypothesis H1 and the alternative K1 for the existence of random effects be

H1:Στ=OapvsK1:ΣτOap, 8

where Oap is an ap-by-ap zero matrix. The test statistic Tn, defined in (7), is still available in this situation. The following theorem shows that the asymptotic null distribution is exactly the same as that for the fixed effects model.

Theorem 4

Suppose Assumptions 3.13.3 hold. Under the null H1, Tn converges in distribution to the Chi-square distribution with r degrees of freedom as mini=1,,ani.

In consequence, we reject H1 in favor of K1 if Tnχr^n2[1-α]. The consistency of the test is shown as follows.

Theorem 5

Suppose Assumptions 3.13.3 hold. Under the alternative K1, the proposed test is consistent. More precisely, under the alternative K1, pr(Tnχr^n2[1-α])1, as mini=1,,ani.

Now we consider the local alternative hypothesis to study the nontrivial power of the test based on Tn. Let H=(Hij)i,j=1,a be an ap-by-ap symmetric, positive definite matrix, and the local alternatives K(n) be defined as

K1(n):Στ=Hn.

The nontrivial power of the proposed test is elucidated in the next result.

Theorem 6

Suppose Assumptions 3.13.3 hold. Under the alternatives K1(n), we have

limmini=1,,anipr(Tnχr^n2[1-α])=prZTV-Zχr2[1-α],

where Z follows an ap-dimensional centered normal distribution with variance H~+V; Here, H~=(H~ij)i,j=1,a is determined in terms of the matrix H as

H~ij=Hij-1as=1a(Hsj+His)+1a2s,k=1aHsk.

Remark 5

We can generalize the random effects (τ1,,τa) to an ap-dimensional random vector and show corresponding theorems to Theorems 46.

Numerical study

The finite sample performance of the proposed test based on Tn and comparison with the classical test based on Sn are illustrated in this section. To be specific, we let the dimension of time series from each group at each time p, the number of time series in each group q, and the number of groups a be p=1, q=1, and a=3,9. The sample sizes are set as (I) n1==na=1000, (II) n3k-1=n3k-2=2000 and n3k=1000 for ka/3, (III) n1==na=2000. (I) and (III) are cases of the sample size of each group being equal (balanced design). (II) is the case of the sample size of each group being unequal (unbalanced design). For each 1tmaxi=1,,ani, denote (e1t,,eat)T by (et)=(eit)i=1,,a.

We consider two scenarios, independent groups (Case 1) and correlated groups (Case 2). The disturbance process {eit} is supposed to follow a multivariate moving-average model or a generalized autoregressive conditional heteroscedasticity model. Let {εt} be an i.i.d. sequence in the following.

As for Processes 1–3, we suppose et=εt+Φεt-1, with the coefficient matrix Φ=(Φij), where, in Case 1, Φ=0.5Ia and, in Case 2, Φ3k-2,3k-2=0.7, Φ3k-1,3k-1=-0.5, Φ3k,3k=0.3, Φ3k,3k-2=0.3, Φ3k,3k-1=-0.1 for positive integer ka/3; and otherwise Φij=0.

Process 1: In Case 1, each component of εt follows a centered normal distribution with unit variance, which is of independent other components of εt. In Case 2, εt is distributed as a zero mean multivariate normal distribution with covariance matrix Σ=(Σij), where Σii=1 and Σj,j+1=Σj+1,j=0.5 for 1ia, 1ja-1.

Process 2: In Case 1, each component of εt follows a centered t-distribution with 5 degrees of freedom, which is of independent other components of εt. In Case 2, εt is distributed as a zero mean multivariate t-distribution with 5 degrees of freedom, with the scale matrix Σ defined in Process 1.

Process 3: In Case 1, each component of εt follows a centered skew normal distribution with location parameter 0, scale 1 and shape parameter 50, which is of independent other components of εt. The noncentered skew normal distribution has a nonzero mean 502/π(1+502). In Case 2, εt is distributed as a centered multivariate skew normal distribution with location parameter 0a, correlation matrix Σ defined in Process 1, and shape parameter ζ=501a, where 0a and 1a are a-dimensional vectors with every component being zero and one, respectively. The skewed process is found in Chan and Tong (1986); The joint density function of multivariate skew normal distribution is given, for xRa, by

fSN(x;Σ,ζ)=2υa(x;Σ)Υ(ζTx),

where υa(·;Ω) is the probability density function of the a-dimensional centered multivariate normal distribution with a correlation matrix Σ and Υ(·) is the cumulative distribution function of the standard normal distribution. Note that the noncentered process has a nonzero mean 2/(π(1+ζTΣζ))Σζ unless ζ=0, so we need subtract the mean. The more details of multivariate skew normal distribution can be found in Azzalini and Valle (1996), Azzalini and Capitanio (1999).

As for Process 4, we suppose

Process 4: {et} follows the generalized autoregressive conditional heteroscedasticity model

eit=hit1/2εit,i=1,,a,h1that=11+0.1Φe1t2eat2+0.1h1,t-10.1ha,t-1,

where εt is distributed as a zero mean multivariate normal distribution with covariance Ia in case 1 and Σ in case 2.

R package mvtnorm (Genz et al. 2021) is available to produce innovation processes for Processes 1 and 2. Process 4 can be produced by R package ccgarch (Nakatani 2014). The skew normal distribution can be generated by R package sn (Azzalini 2022).

Features of Processes 1–4 as follows: Process 1 is the most standard setting. Fifth and higher moments of Process 2 do not exist. Processes 3 and 4 have a nonzero skewness and conditional heteroskedasticity, respectively.

We report the rejection probabilities of our proposed test Tn and the classical tests Sn in Figs. 1, 2 and 3 over 1000 simulations for the following situations: (i) τ=0a; (ii) τ=(τ1,,τa), where τ3k-2=-0.03, τ3k-1=0, and τ3k=0.03 for ka/3; and (iii) τ is distributed as a zero mean multivariate normal with covariance matrix Στ. We let Στ be a block diagonal matrix whose off-diagonal blocks are all 3×3 zero matrix and main-diagonal blocks are all the same 3×3 matrix Σ~τ=(Σ~ijτ)/5000, where Σ~11τ=3, Σ~22τ=2, Σ~33τ=Σ~12τ=Σ~21τ=1, Σ~23τ=Σ~32τ=-0.5, and Σ~13τ=Σ~31τ=0.008. The significance level is set to be 0.05.

Fig. 1.

Fig. 1

Empirical size of tests for the existence of fixed and random effects based on Tn and Sn. The upper and lower plots correspond to a=3 and a=9, respectively. The left and right plots correspond to the cases 1 (independent groups) and 2 (correlated groups), respectively. The tick marks of the x-label (I), (II), and (III) correspond to the sample size n1==na=1000,n3k-1=n3k-2=2000 and n3k=1000 for ka/3, and n1==na=2000, respectively

Fig. 2.

Fig. 2

Empirical power of tests for the existence of fixed effects based on Tn and Sn for fixed effects τ=(τ1,,τa), where τ3k-2=-0.03, τ3k-1=0, and τ3k=0.03 for ka/3. The upper and lower plots correspond to a=3 and a=9, respectively. The left and right plots correspond to the cases 1 (independent groups) and 2 (correlated groups), respectively. The tick marks of the x-label (I), (II), and (III) correspond to the sample size n1==na=1000, n3k-1=n3k-2=2000 and n3k=1000 for ka/3, and n1==na=2000, respectively

Fig. 3.

Fig. 3

Empirical power of tests for the existence of random effects based on Tn and Sn for random effects τ distributed as a zero mean multivariate normal with covariance matrix Στ, where Σ~τ=(Σ~ijτ)/5000 is a block diagonal matrix whose main-diagonal blocks are all the same 3×3 matrix such as Σ~11τ=3, Σ~22τ=2, Σ~33τ=Σ~12τ=Σ~21τ=1, Σ~23τ=Σ~32τ=-0.5, Σ~13τ=Σ~31τ=0.008, and τ3k=0.03 for ka/3. The upper and lower plots correspond to a=3 and a=9, respectively. The left and right plots correspond to the cases 1 (independent groups) and 2 (correlated groups), respectively. The tick marks of the x-label (I), (II), and (III) correspond to the sample size n1==na=1000, n3k-1=n3k-2=2000 and n3k=1000 for ka/3, and n1==na=2000, respectively

The situation (i) corresponds to both null hypotheses H0 and H1 defined in (5) and (8), respectively, and (ii) and (iii) correspond to the alternatives K0 and K1, respectively. Note that fixed effects and random effects are chosen as tiny so that power become less than one to compare performances of tests against Processes 1–4. In the supplementary material, the consistency can be confirmed by results (see Tables 1–6 in Section B.1).

Figure 1 shows the empirical size of the tests. Both tests work well for a=3 and the case 1 (the top left plot) for all processes. Our proposed test based on Tn has good size for a=3 and the case 2 (the top right plot). On the other hand, our test has small size distortion for a=9 and the cases 1 and 2 (the lower plots). This distortion has occurred by the accumulation of estimating errors of the large matrix V- (see Figures 1 and 2 in Section B.2 in the supplementary material). As expected, the classical test based on Sn has size distortion for both a=3,9 and the case 2 (the right plots) since the correlated groups are dealt with.

Figures 2 and 3 show the empirical power of the tests. Figures 2 and 3 for both a=3,9 and the case 1 display that empirical power of both tests are nearly equal for each model.

In most cases, size and power for the unbalanced design (II) n3k-1=n3k-2=2000 and n3k=1000 for ka/3 fall between results for the balanced designs (I) n1==na=1000 and (III) n1==na=2000. There are the cases that the empirical power for unbalanced design (II) is worse than that for balanced design (I) regardless of the fact that the total sample size of (II) is larger than that of (I), e.g., the power of Processes 1 and 2 for a=9 and case 2 in Fig. 3. Further, we implemented some additional experiments in the supplementary material and confirmed that the consistency of our test, i.e., the empirical power goes to one (see Tables 1–6 in Section B.1). Overall, our proposed test works well to detect the existence of fixed or random effects. In summary, our test outperforms the classical test when groups are correlated and a is moderate.

Application to real data

Data analysis on stock prices often does not take random effects into account. However, for some portfolio of stocks, random effects cannot always be ignored. In fact, equity-focused investors take into account the sensitivities of currency, oil prices, market, etc. in determining their equity portfolios. In other words, equity-focused investors believe that the factors related to earnings and stock prices are linked. For example, stock prices of trading companies are linked to oil prices. It can be rephrased that equity-focused investors believe that random effects with respect to industries exist. In this empirical study, we pursue the question of whether random effects really exist for a portfolio that combines the automobile, telecom, and trading companies. We analyze the log-return in stock prices from January 4, 2016, to December 30, 2019. The companies we investigate are Itochu Corp., Mitsubishi Corp, Mitsui & Co., Ltd., and Marubeni Corp. from trading companies, Honda Motor Co. Ltd., Nissan Motor Co., Ltd., Suzuki Motor Corp., and Subaru Corp. from car companies, and KDDI Corp., Hikari Tsushin Inc. and NTT Data Corp. from telecom companies. The length of each time series is 978. These data can be downloaded from the website https://www.investing.com.

For this dataset, the number of groups a is three (trading, car, and telecom sectors), the dimension of time series p from each firm is one which corresponds to univariate ANOVA, the number of firms q is three for telecom sector and four for car and trading sectors, and the number of observations is n1=n2=n3=978.

The plots of the log-returns are shown in Fig. 4. The dataset seems stationary and we cannot tell the difference between sectors. Table 1 gives that sample means and variances of the log-returns. The sample means of Suzuki and Hikari appear to be large compared to the sample means for other car and telecom companies, respectively. As for sample variances, the variances of Suzuki and Subaru are a little larger than other companies. Figure 5 shows the heatmap of sample correlations. These data have correlations between and within groups. This implies the classical F-test statistic should not be applied in this situation since it is designed for independent groups. Interesting observations for the data are as follows: Within-group correlations of telecom and trading companies are low and rather high, respectively. This may be because of a similar product mix for trading companies and a different product mix for telecom companies. Between-group correlations for car and trading companies are higher than those for telecom and car companies and those for telecom and trading companies. This may be ascribed to the facts that car and trading companies’ stocks are cyclical, and by contrast, telecom companies’ stocks are defensive.

Fig. 4.

Fig. 4

Plots of log-return for stock prices

Table 1.

Sample means and sample variances of log-returns

Sector Company Mean ×10-4 Variance ×10-4
Trading company Itochu 5.89 2.15
Mitsubishi 3.74 2.57
Mitsui 3.15 2.16
Marubeni 2.80 2.81
Car company Honda -1.90 2.75
Nissan -6.80 2.35
Suzuki 2.51 3.91
Subaru -5.91 3.58
Telecom company KDDI 0.71 2.45
Hikari 12.61 2.86
NTT 2.53 2.71

Fig. 5.

Fig. 5

Heatmap of sample correlations between companies

We apply our test and the classical test as a comparison to this dataset (see Remark 4) and obtain the values 5.517 and 2.401 and the corresponding p-values 0.0634 and 0.301, respectively.

Therefore, the null hypothesis H1 does not rejected under the significance level 0.05 for the existence of random effects for both tests. However, the p-values of our test is close to 0.05, and for the significance level 0.1, our tests rejects the hypothesis, but the classical test does not. From the observations that (i) our dataset has between-group correlations, and thus, the classical test is not appropriate, (ii) there exists the tendency of sample means: Car companies tend to have negative sample mean; in contrast, telecom and trading companies tend to positive sample mean, and (iii) the p-value of our test is close to 0.05, we conclude random effects should be taken into consideration for modeling log-return for stock prices. This result ensures equity-focused investors’ thoughts that different industries have different factors that affect corporate profits of companies and corporate profits influence stock prices such as profits of trading companies are linked to the price of crude oil.

Our result is convincing from portfolio theory. In that field, it is well known that portfolios of stocks have systematic risks related to the whole market and unsystematic risks related to sectors and companies. Many studies taking into account unsystematic risk have been conducted and emphasized the importance of unsystematic risks (see Aber 1976; Hsu and Jang 2008, and references therein). Industry effects corresponds to unsystematic risks in our case.

Additional thoughts/remarks

Nagahata and Taniguchi (2018) showed the asymptotic null distribution of Sn under the independence of groups. The following lines show that the independence of groups can be relaxed to uncorrelated groups. A simple algebra gives

Sn=ni=1a(yi.¯-y..¯)T2πf~n(0)-1(yi.¯-y..¯)=n2πf~n(0)-1/2e1.¯2πf~n(0)-1/2ea.¯TIa-Ja/aIp2πf~n(0)-1/2e1.¯2πf~n(0)-1/2ea.¯.

Under Assumption 3.1 and the balanced design (n1==na), it holds that n2πf~n(0)-1/2e1.¯,,2πf~n(0)-1/2ea.¯ converges in distribution to N(0,Iap) as n.

The idempotence of Ia-Ja/aIp, rankIa-Ja/aIp=(a-1)p, the positive definiteness of the spectral density matrix, and the continuous mapping theorem yield that Sn converges in distribution to the Chi-square distribution with (a-1)p degrees of freedom under the independence of groups. The consistency of the test under the alternative and the power of the test under the local alternative can also be derived along the same line as our proof.

The independence or uncorrelatedness of groups is quite restrictive and impractical. In the case that groups are correlated, the asymptotic null limit distribution of Sn depends on the process since the nondiagonal elements of the asymptotic variance of the vector n2πf~n(0)-1/2e1.¯,,2πf~n(0)-1/2ea.¯ are not equal to zero. Thus, the p-value of the test based on Sn is not easy to compute. On the other hand, our proposed test statistic Tn is asymptotically distribution-free under the null. Based on the numerical studies, we realized the proposed test statistic has some size distortion under the null for large a. One direction to solve this problem is using Sn and applying a bootstrap method to obtain critical value. Homogeneity tests specialized for this type of models will be investigated in our future work.

Discussion

In this paper, the tests for the existence of fixed and random effects for one-way model with correlated groups were considered. The new test statistic was proposed and out tests are shown to be asymptotically size α under the null and consistent. The nontrivial power of tests is derived under the local alternative. In the numerical study, we confirmed our test performs well for several settings. In particular, our test is superior to the classical test when groups are correlated and a is moderate. The empirical study suggests the random effects are better to take into account in the analysis of stock prices .

Supplementary information

Supplementary material includes all proofs of theorems and additional simulation results.

Supplementary Information

Below is the link to the electronic supplementary material.

Acknowledgements

The authors are grateful to the editor and two referees for their instructive comments. The authors gratefully acknowledge Mr. Takeshi Tamaoka, the chief executive officer of Ananas Japan Co. Ltd, and Mr. Yuki Nakayasu, the chief executive officer of Minsetsu Inc., for their comments from practical points of view on the real data analysis. This work was supported by JSPS Grant-in-Aid for Research Activity Start-up under Grant Number JP21K20338 (Y.G.); JSPS Grant-in-Aid for Scientific Research (C) under Grant Number JP20K11719 (Y.L.); JSPS Grant-in-Aid for Scientific Research (S) under Grant Number JP18H05290 (M.T.); and the Research Institute for Science & Engineering of Waseda University (M.T.). This work was mainly carried out when the first author was affiliated with Waseda University.

Declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Yuichi Goto, Email: yuichi.goto@math.kyushu-u.ac.jp.

Koichi Arakaki, Email: arakaki74@akane.waseda.jp.

Yan Liu, Email: liu@ims.sci.waseda.ac.jp.

Masanobu Taniguchi, Email: taniguchi@waseda.jp.

References

  1. Aber JW. Industry effects and multivariate stock price behavior. J Financ Quant Anal. 1976;11(4):617–624. doi: 10.2307/2330216. [DOI] [Google Scholar]
  2. Akharif A, Fihri M, Hallin M, Mellouk A. Optimal pseudo-Gaussian and rank-based random coefficient detection in multiple regression. Electron J Stat. 2020;14(2):4207–4243. doi: 10.1214/20-EJS1770. [DOI] [Google Scholar]
  3. Azzalini A (2022) The R package sn: the skew-normal and related distributions such as the skew-t and the SUN (version 2.0.2). Università degli Studi di Padova, Italia
  4. Azzalini A, Capitanio A. Statistical applications of the multivariate skew normal distribution. J R Stat Soc Ser B. 1999;61(3):579–602. doi: 10.1111/1467-9868.00194. [DOI] [Google Scholar]
  5. Azzalini A, Valle AD. The multivariate skew-normal distribution. Biometrika. 1996;83(4):715–726. doi: 10.1093/biomet/83.4.715. [DOI] [Google Scholar]
  6. Bai J, Li K. Theory and methods of panel data models with interactive effects. Ann Stat. 2014;42(1):142–170. doi: 10.1214/13-AOS1183. [DOI] [Google Scholar]
  7. Baltagi BH, Li Q. A transformation that will circumvent the problem of autocorrelation in an error-component model. J Econom. 1991;48(3):385–393. doi: 10.1016/0304-4076(91)90070-T. [DOI] [Google Scholar]
  8. Bernardes J, Mishra N, Tran F, Bahmer T, Best L, Blase J, Bordoni D, Franzenburg J, Geisen U, Josephs-Spaulding J, Köhler P, Künstner A, Rosati E, Aschenbrenner A, Bacher P, Baran N, Boysen T, Brandt B, Bruse N, Dörr J, Dräger A, Elke G, Ellinghaus D, Fischer J, Forster M, Franke A, Franzenburg S, Frey N, Friedrichs A, J. Fuß, Glück A, Hamm J, Hinrichsen F, Hoeppner M, Imm S, Junker R, Kaiser S, Kan Y, Knoll R, Lange C, Laue G, Lier C, Lindner M, Marinos G, Markewitz R, Nattermann J, Noth R, Pickkers P, Rabe K, Renz A, Röcken C, Rupp J, Schaffarzyk A, Scheffold A, Schulte-Schrepping J, Schunk D, Skowasch D, Ulas T, Wandinger K, Wittig M, Zimmermann J, Busch H, Hoyer B, Kaleta C, Heyckendorf J, Kox M, Rybniker J, Schreiber S, Schultze J, Rosenstiel P, DeCOI (2020) Longitudinal multi-omics analyses identify responses of megakaryocytes, erythroid cells, and plasmablasts as hallmarks of severe COVID-19. Immunity 53(6):1296–1314 [DOI] [PMC free article] [PubMed]
  9. Bloomfield P. An exponential model for the spectrum of a scalar time series. Biometrika. 1973;60(2):217–226. doi: 10.1093/biomet/60.2.217. [DOI] [Google Scholar]
  10. Brillinger DR. Time series: data analysis and theory. San Francisco: Holden-Day; 1981. [Google Scholar]
  11. Chan K, Tong H. A note on certain integral equations associated with non-linear time series analysis. Probab Theory Relat Fields. 1986;73(1):153–158. doi: 10.1007/BF01845999. [DOI] [Google Scholar]
  12. Chiu ST. Weighted least squares estimators on the frequency domain for the parameters of a time series. Ann Stat. 1988;16(3):1315–1326. doi: 10.1214/aos/1176350963. [DOI] [Google Scholar]
  13. Clarke BR (2008) Linear models: the theory and application of analysis of variance. Wiley
  14. Cook RD, Weisberg S. Diagnostics for heteroscedasticity in regression. Biometrika. 1983;70(1):1–10. doi: 10.1093/biomet/70.1.1. [DOI] [Google Scholar]
  15. Diggle PJ, Heagerty P, Liang KY, Zeger SL. Analysis of longitudinal data. Oxford: Oxford University Press; 2002. [Google Scholar]
  16. Ditzhaus M, Fried R, Pauly M. QANOVA: quantile-based permutation methods for general factorial designs. TEST. 2021;30:960–979. doi: 10.1007/s11749-021-00758-y. [DOI] [PubMed] [Google Scholar]
  17. Ergemen YE, Velasco C. Estimation of fractionally integrated panels with fixed effects and cross-section dependence. J Econom. 2017;196(2):248–258. doi: 10.1016/j.jeconom.2016.05.020. [DOI] [Google Scholar]
  18. Fang EX, Ning Y, Li R. Test of significance for high-dimensional longitudinal data. Ann Stat. 2020;48(5):2622–2645. doi: 10.1214/19-AOS1900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Fihri M, Akharif A, Mellouk A, Hallin M. Efficient pseudo-Gaussian and rank-based detection of random regression coefficients. J Nonparam Stat. 2020;32(2):367–402. doi: 10.1080/10485252.2020.1748625. [DOI] [Google Scholar]
  20. Galbraith JW, Zinde-Walsh V. Transforming the error-components model for estimation with general ARMA disturbances. J Econom. 1995;66(1–2):349–355. doi: 10.1016/0304-4076(94)01621-6. [DOI] [Google Scholar]
  21. Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2021) mvtnorm: multivariate normal and t distributions. R package version 1.1-3
  22. González JA, Lagos-Álvarez BM, Mateu J. Two-way layout factorial experiments of spatial point pattern responses in mineral flotation. TEST. 2021;30:1046–1075. doi: 10.1007/s11749-021-00768-w. [DOI] [Google Scholar]
  23. Hallin M, Hlubinká D, Hudecova S. Efficient fully distribution-free center-outward rank tests for multiple-output regression and MANOVA. J Am Stat Assoc. 2021;66:1–43. [Google Scholar]
  24. Hoover DR, Rice JA, Wu CO, Yang LP. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85(4):809–822. doi: 10.1093/biomet/85.4.809. [DOI] [Google Scholar]
  25. Hsu LT, Jang S. The determinant of the hospitality industry’s unsystematic risk: a comparison between hotel and restaurant firms. Int J Hosp Tour Admin. 2008;9(2):105–127. [Google Scholar]
  26. Li Y. Efficient semiparametric regression for longitudinal data with nonparametric covariance estimation. Biometrika. 2011;98(2):355–370. doi: 10.1093/biomet/asq080. [DOI] [Google Scholar]
  27. Liu X, Xu X. Confidence distribution inferences in one-way random effects model. TEST. 2016;25(1):59–74. doi: 10.1007/s11749-015-0440-8. [DOI] [Google Scholar]
  28. Lucas, C., P. Wong, J. Klein, T.B. Castro, J. Silva, M. Sundaram, M.K. Ellingson, T. Mao, J.E. Oh, B. Israelow, T. Takahashi, M. Tokuyama, P. Lu, A. Venkataraman, A. Park, S. Mohanty, H. Wang, A.L. Wyllie, C.B.F. Vogels, R. Earnest, S. Lapidus, I.M. Ott, A.J. Moore, M.C. Muenker, J.B. Fournier, M. Campbell, C.D. Odio, A. Casanovas-Massana, Y.I. Team, R. Herbst, A.C. Shaw, R. Medzhitov, W.L. Schulz, N.D. Grubaugh, C.D. Cruz, S. Farhadian, A.I. Ko, S.B. Omer, and A. Iwasaki Longitudinal analyses reveal immunological misfiring in severe COVID-19. Nature. 2020;584(7821):463–469. doi: 10.1038/s41586-020-2588-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Nagahata H, Taniguchi M. 4. Analysis of variance for multivariate time series. Metron. 2018;76:69–82. doi: 10.1007/s40300-017-0122-2. [DOI] [Google Scholar]
  30. Nakatani T (2014) ccgarch: an R package for modelling multivariate GARCH models with conditional correlations
  31. Rakocevic V. On continuity of the Moore–Penrose and Drazin inverses. Mater Vesn. 1997;49(3–4):163–172. [Google Scholar]
  32. Rao CR, Mitra SK. Generalized inverse of matrices and its applications. New York: Wiley; 1971. [Google Scholar]
  33. Rashid MM. Robust analysis of two-way models with repeated measures on both factors. TEST. 1995;4(1):39–62. doi: 10.1007/BF02563102. [DOI] [Google Scholar]
  34. Robinson PM. Automatic frequency domain inference on semiparametric and nonparametric models. Econometrica. 1991;59(5):1329–1363. doi: 10.2307/2938370. [DOI] [Google Scholar]
  35. Searle SR, Casella G, McCulloch CE. Variance Components. New York: Wiley; 1992. [Google Scholar]
  36. Stewart G. On the continuity of the generalized inverse. SIAM J Appl Math. 1969;17(1):33–45. doi: 10.1137/0117004. [DOI] [Google Scholar]
  37. Tang CY, Leng C. Empirical likelihood and quantile regression in longitudinal data analysis. Biometrika. 2011;98(4):1001–1006. doi: 10.1093/biomet/asr050. [DOI] [Google Scholar]
  38. von Sachs R. Nonparametric spectral analysis of multivariate time series. Annu Rev Stat Appl. 2020;7:361–386. doi: 10.1146/annurev-statistics-031219-041138. [DOI] [Google Scholar]
  39. You J, Zhou X. Efficient estimation in panel data partially additive linear model with serially correlated errors. Stat Sin. 2013;23:271–303. [Google Scholar]
  40. Zeger SL, Liang KY, Self SG. The analysis of binary longitudinal data with time independent covariates. Biometrika. 1985;72(1):31–38. [Google Scholar]
  41. Zhong PS, Li R, Santo S. Homogeneity tests of covariance matrices with high-dimensional longitudinal data. Biometrika. 2019;106(3):619–634. doi: 10.1093/biomet/asz011. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Test (Madrid, Spain) are provided here courtesy of Nature Publishing Group

RESOURCES