Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2021 Apr 14;49(10):2629–2656. doi: 10.1080/02664763.2021.1913106

A new multivariate t distribution with variant tail weights and its application in robust regression analysis

Chi Zhang a, Guo-Liang Tian b,CONTACT, Kam Chuen Yuen c, Pengyi Liu d, Man-Lai Tang e
PMCID: PMC9225396  PMID: 35757045

ABSTRACT

In this paper, we propose a new kind of multivariate t distribution by allowing different degrees of freedom for each univariate component. Compared with the classical multivariate t distribution, it is more flexible in the model specification that can be used to deal with the variant amounts of tail weights on marginals in multivariate data modeling. In particular, it could include components following the multivariate normal distribution, and it contains the product of independent t-distributions as a special case. Subsequently, it is extended to the regression model as the joint distribution of the error terms. Important distributional properties are explored and useful statistical methods are developed. The flexibility of the specified structure in better capturing the characteristic of data is exemplified by both simulation studies and real data analyses.

Keywords: Expectation/conditional maximization algorithm, multivariate t distribution, multivariate t regression model, multivariate truncated normal distribution, stochastic representation

1. Introduction

As a natural generalization of the univariate Student's t distribution, the multivariate t (MVT) distribution is a robust alternative to the multivariate normal distribution in the analysis of multivariate continuous data with heavy tails or outliers. The first derivation of the MVT distribution was accomplished by independent researchers, see [4,6], and the latter just concentrated on the bivariate case. Some characterizations on the MVT distribution were presented, see [5,16]. The work of considering an efficient computational procedure for the bivariate case was included in [1]. The maximum likelihood estimation methods for the MVT distribution with missing data were also discussed in [17,19–21]. The regression models with MVT error terms have been widely investigated by [8,18,29]. A comprehensive review on the topic was summarized in [14], as the first monograph on the MVT distribution. The corresponding mathematical properties such as stochastic representation (SR), consistency property, density expansion, moments conditional distributions can also be found in [23], and the associated estimation methods can be found in [24]. Recently, a sampling method for the MVT distribution with R package was provided, see [11].

The most common way of constructing the MVT distribution is through the SR of a normal random vector and an independent chi-squared random variable. Assume that the random vector z=(Z1,,Zd)Nd(00,Σ), the random variable Vχ2(ν), and they are independent (denoted by zV). Define

(X1,,Xd)=x=μ+zV/ν, (1)

then x is said to follow a multivariate t distribution, denoted by xtd(μ,Σ,ν), where μ=(μ1,,μd) is the location parameter vector, Σ=(σij)d×d is a positive-definite scale matrix and ν is the degrees of freedom. When ν=1, the distribution reduces to the multivariate Cauchy distribution. As ν, the distribution approaches the multivariate normal distribution. Hence, the parameter ν may be viewed as a robustness tuning parameter (see [22], pp.5332). It can be fixed in advance or inferred from the observed data. An equivalent expression of (1) is

x=μ+U12z, (2)

where UGamma(ν/2,ν/2) (the density of YGamma(α,β) is denoted by Gamma(yα,β) =βαyα1exp(βy)/Γ(α) with α>0 and β>0), zNd(00,Σ) and Uz. We call (1) or (2) the SR of the classical MVT or Type I MVT random vector.

From the SR (1) or (2), we have observed that the Type I MVT distribution has the following particular disadvantages:

  1. All components follow univariate t distributions with the same degrees of freedom ν and hence the same amount of tailweight (see [13], pp. 163).

  2. The Type I MVT random vector x with a finite ν includes neither a component Xi following the univariate normal distribution nor a sub-vector (Xi1,,Xir) following the r-dimensional normal distribution, where 1i1<<ird and 1r<d.

  3. The Type I MVT random vector x can never contain statistically independent components since all components {Xi}i=1d share a common random variable V or U, even when Σ is diagonal.

These drawbacks definitely limit its application to a certain extent. To overcome the above first drawback, within the framework of copula, a class of meta-elliptical distributions including a special member of so-called asymmetric multivariate t (AMVT) distribution was proposed, see [7], whose marginals being univariate t distributions with different degrees of freedom. They further pointed out that the AMVT distribution and Type I MVT distribution have the same copulas (i.e. the same correlation structure). Hence, the AMVT distribution still cannot overcome the above second and third drawbacks. Moreover, they did not develop any statistical inference methods on this particular distribution. Then, a new bivariate t distribution with marginals having different degrees of freedom was proposed, see [13], by defining

X1=Z1V1/ν1andX2=Z2(V1+V2)/ν2, (3)

where Z1,Z2,V1,V2 are assumed to be mutually independent, Z1,Z2iidN(0,1), V1χ2(ν1) and V2χ2(ν2ν1) with ν2ν1. It also mentioned two possible multivariate extensions of (3) (see [13]) as:

X1=Z1V1/ν1,X2=Z2(V1+V2)/ν2,,Xd=Zd(V1++Vd)/νd, (4)

where Z1,,Zd,V1,,Vd are assumed to be mutually independent, Z1,,ZdiidN(0,1), V1χ2(ν1) and Viχ2(νiνi1) for i=2,,d with 0<ν1ν2νd; and

X1=Z1V1/ν1,X2=Z2(V1+W2)/ν2,,Xd=Zd(V1+Wd)/νd, (5)

where the {Zi}i=1d and V1 are the same as defined above and Wiχ2(νiν1) with 0<ν1νi for i=2,,d. We have three comments on (3)–(5). First, all components in (3)–(5) follow univariate t distributions with possible different degrees of freedom, i.e. the first drawback of Type I MVT distribution can be overcome; however, the second and third drawbacks cannot yet be overcome. Second, only the distributional properties for the bivariate case were studied in [13], and did not provide any statistical inference methods. Third, for multivariate cases, it remarked that ‘It appears, however, that only in the bivariate case are the joint density function and hence conditional distributions fully tractable. It is this that prompted the publication of the current special case, along with any independent interest, this special case may possess’, see [13]. An alternative t-distribution by replacing the common gamma divisor in (2) with p i.i.d. gamma divisors was then proposed, see [9], and they only mentioned the proposal of allowing these independent gamma divisors with possible different degrees of freedom, but without any detailed discussion. By inserting multidimensional weights into the Gaussian scale mixture, it is generalized to a multivariate t-distribution, see [10]. However, the resulting marginal is a linear combination of univariate t-distributions, thus has no intuitive interpretation on the degrees of freedom for marginal distributions. For the two data sets presented in Section 6 below, we may encounter that there is a dependency structure among components, while each component does not have the same tailweight. The classical MVT distribution is no longer appropriate, thus a new tool is expected to break through the limitations in existing models and have a broader application.

In this paper, we will propose a new kind of MVT distribution by allowing different degrees of freedom for each univariate component, called Type II  MVT distribution. The proposed distribution has several remarkable features including (a) all components follow univariate t-distributions with possible different degrees of freedom; (b) it could include components following the multivariate normal distributions when the corresponding νi; (c) it could contain statistically independent components such that the product of independent t distributions is its special case. Compared with the classical MVT distribution, this new structure can better capture the characteristic of the data. We first derive important distributional properties and then develop useful statistical inference methods.

The rest of the paper is organized as follows. In Section 2, the Type II  MVT distribution is outlined and some distributional properties are explored. In Section 3, the maximum likelihood estimation of the parameters via the Monte Carlo expectation/conditional maximization (ECM) algorithm and test of independence are developed, and then an extended regression model is introduced and investigated. Bayesian methods are presented in Section 4. In Section 5, some simulation studies are performed to evaluate the proposed methods. Two real data sets are used to compare the proposed distribution with the classical MVT distribution in Section 6. Finally, a discussion is given in Section 7.

2. Type II  multivariate t distribution

We define a new MVT distribution through the SR instead of the joint probability density function (pdf). A random vector x=(X1,,Xd) is said to follow the Type II  MVT distribution, denoted by xtd(II)(μ,Σ,ν), if x can be stochastically represented as

x=μ+U1/2z, (6)

where μ=(μ1,,μd), U1/2=diag(u1/2), u=(U1,,Ud), UiindGamma(νi/2,νi/2) for i=1,,d, z=(Z1,,Zd)Nd(00,Σ), ν=(ν1,,νd), and uz. The vector ν could be fixed in advance or it can be estimated from the observed data. Note that if ν1==νd=ν, the distribution of x in (6) becomes the alternative multivariate t distribution proposed by [9].

By comparing with the Type I MVT distribution, the Type II  MVT distribution possesses the following three significant characteristics:

  1. From (6), it is easy to obtain Xi=μi+Ui1/2Zi, indicating that all components follow univariate t-distributions with possible different degrees of freedom; i.e. Xit(μi,σii,νi) for i=1,,d.

  2. The Type II  MVT random vector x could include components following a multivariate normal distribution if the corresponding νi.

  3. The Type II  MVT random vector x could contain statistically independent components. From (7) and (8) below, in Type II  MVT distribution, any two components Xi,Xj ( ij) are independent provided that Zi,Zj are independent. Whereas in Type I MVT distribution, any two components are always dependent since they share a common factor U. Particularly, when Σ is diagonal, the Type II  MVT density becomes the product of d independent univariate t densities.

Remark 2.1

The dependency structure of the Type II  MVT distribution induced by the SR (6) is totally determined by the variance–covariance matrix Σ of z. Note that the Gamma random variables {Ui}i=1d are independent, thus all {νi}i=1d are free parameters and the estimation procedure on each νi will be independent and straightforward. However, the dependency structure of the two distributions respectively induced by the SRs (4) and (5) is only own to the common random variable V1 since the correlation from the multivariate normal distribution is not incorporated. Under the construction of (4) or (5), the estimations of νi's will be more complicated since an ascending order constraint ν1ν2νd or ν1νi for i=2,,d is involved.

2.1. Density function of Type II  MVT distribution

To derive the joint pdf of Type II  MVT distribution, we first introduce the multivariate truncated normal distribution. A d-dimensional random vector w=(W1,,Wd) is said to follow the multivariate normal distribution truncated in the box [a,b], denoted by wTNd(μ,Σ;[a,b]) with a=(a1,,ad)Rd and b=(b1,,bd)Rd, if its joint pdf is (see [12], pp. 210)

TNd(w|μ,Σ;[a,b])=exp[0.5(wμ)Σ1(wμ)][a,b]exp[0.5(wμ)Σ1(wμ)]dw,awb,

where w=(w1,,wd) is the realization of w=(W1,,Wd).

Now, we can rewrite (6) as a mixture of

x|(u=u)Nd(μ,U1/2ΣU1/2)and{Ui}i=1dindGamma(νi/2,νi/2), (7)

where u=(u1,,ud) is the realization of u=(U1,,Ud) and U1/2=diag(u1/2) is the realization of U1/2=diag(u1/2). The joint pdf of x is obtained as (for detailed derivation, see Appendix):

fx(x)=π2d2|Σ|12i=1d(νi2)νi2Γ(νi2)μν(w)C, (8)

where

μν(w)=R+di=1dwiνiC1exp0.5wΣ1wdw,C=R+dexp0.5wΣ1wdw, (9)

and Σ=(XΣ1X+Σ0)1 with

X=diag(xμ)andΣ0=diag(ν). (10)

To have an insight into the function μν(w), we first define the mixed moments of a random vector. Let a random vector x=(X1,,Xd) have the joint pdf fx(x). Given r=(r1,,rd), where r1,,rd are non-negative constants, then

E(X1r1Xdrd)=x1r1xdrdfx(x)dx

is called the mixed moments of x in power of r with respect to xfx(x). Note that C is the normalizing constant of TNd(00,Σ;R+d), then μν(w) given by (9) is the mixed moments of w in power of ν with respect to wTNd(0,Σ; R+d). It is clear from (8) that this distribution is not of the elliptical form (see [7]).

Let x=(x1,x2), where x1 consists of the first q components and x2 consists of the last (dq) components. To consider the conditional distribution of x1 given x2=x2, the derivation of fx1|x2(x1|x2)=fx(x)/fx2(x2) is not easy since the marginal density of x2 depends on a multiple integral over hypercubes. For visualization, we present the conditional distribution curves for the bivariate case. The parameters are set as

  1. Case 1: μ=(0,0), Σ=[1,0.8;0.8,1], ν=(3,5). The plots of f(x1|x2) are curved for x2=2,1,0,1,2, respectively.

  2. Case 2: μ=(1,3), Σ=[1,0.5;0.5,1], ν=(3,2). The plots of f(x1|x2) are curved for x2=1,1,3,5,7, respectively.

From Figure 1, the curves of f(x1|x2) are in similar shapes as the value of x2 moves under the same parameter configuration.

Figure 1.

Figure 1.

The conditional distribution curves for f(x1|x2) in Type II MVT distribution. (a) μ=(0,0), Σ=[1,0.8;0.8,1], ν=(3,5); (b) μ=(1,3), Σ=[1,0.5;0.5,1], ν=(3,2).

2.2. Moments and correlation

The mean vector and variance–covariance matrix of x are, respectively, given by

E(x)=μ,for νi>1,i=1,,dandVar(x)=Σ=(σij),

where

σii=νiνi2σii,σij=(νi/2)12Γ(νi12)(νj/2)12Γ(νj12)Γ(νi2)Γ(νj2)σij,ij,

for νi,νj>2. Thus, the correlation coefficient between Xi and Xj is given by

Corr(Xi,Xj)=(νi2)(νj2)Γ(νi12)Γ(νj12)σij2Γ(νi2)Γ(νj2)σiiσjj,νi,νj>2. (11)

From (11), we can see that the sign of Corr(Xi,Xj) is determined by σij and can be arbitrary (positive, zero or negative).

2.3. Comparison of the densities

We compare the pdfs of Type I MVT distribution td(μ,Σ,ν) and Type II  MVT distribution td(II)(μ,Σ,ν). For the same μ and Σ, if one sets ν=ν11d where 11d=(1,,1), from (2) and (6) we can see that the marginal density of each Xi from the two distributions are identical, following t(μi,σii,ν). However, the shapes of the two density functions may appear totally different. Moreover, Type II  MVT distribution is more flexible since it could contain the case of νν11d.

For more comparisons, we consider d = 2 and set parameter configurations as μ=(μ1,μ2) and Σ=[σ11,σ12;σ21,σ22] for both distributions, ν being the degrees of freedom in Type I MVT, ν=(ν1,ν2) being degrees of freedom in Type II  MVT, and ν=ν1=ν2. To illustrate the differences, we averagely take 30 values of X1 within [μ13σ11,μ1+3σ11] and 30 values of X2 within [μ23σ22,μ2+3σ22]. The maximum and minimum of Type I and Type II  MVT densities over these (X1,X2) points are computed and displayed in Table 1. The correlation coefficients between two components denoted by ρ in the two distributions are also compared. Besides, the shapes of corresponding densities are curved in both contour graphs and three-dimensional perspectives by Figures 2 and 3. It is observed that when the two densities have the same amount of marginal tail weights, the Type II  density curve is a bit flatter than the Type I curve, and the correlation coefficient between components in Type II  MVT is also weaker than that in Type I MVT, which is not surprising as the dependence in Type II  MVT only relies on the dependence from the multivariate normal vector.

Table 1.

Maximum and minimum values of Type I and II MVT densities for ν=ν1=ν2.

  μ=(0,0),Σ=diag(112) μ=(1,1),Σ=[2,2;2,3]
  ν=3 (ν1,ν2)=(3,3) ν=10 (ν1,ν2)=(10,10)
  f(I) f(II) f(I) f(II)
Maximum 0.156351 0.133184 0.111747 0.106234
Minimum 0.001228 0.000528 7.0562×108 4.4166×1010
ρ 0 0 0.816497 0.767150

Note: f(I): Density function of Type I MVT distribution; f(II): Density function of Type II  MVT distribution, c.f. (8).

Figure 2.

Figure 2.

The contour plots and 3-D perspectives of Type I and Type II  MVT density curves given μ=(0,0), Σ=diag(112). (a1)–(a2) Type I MVT with ν=3; (b1)–(b2) Type II  MVT with ν1=ν2=3.

Figure 3.

Figure 3.

The contour plots and 3-D perspectives of Type I and Type II  MVT density curves given μ=(1,1), Σ=[2,2;2,3]. (a1)–(a2) Type I MVT with ν=10; (b1)–(b2) Type II  MVT with ν1=ν2=10.

Moreover, we could have different νi's in Type II  MVT, thus we choose several combinations of parameters with ν1ν2. Similarly, the maximum and minimum of densities are presented in Table 2 and their various shapes are displayed in Figure 4.

Table 2.

Maximum and minimum values of Type II  MVT density function for ν1ν2.

  μ=(0,0),Σ=diag(112) μ=(1,1),Σ=[2,2;2,3]
  (ν1,ν2)=(3,5) (ν1,ν2)=(4,20)
  f(II) f(II)
Maximum 0.137650 0.103573
Minimum 0.000397 1.2944×109
ρ 0 0.713626

Figure 4.

Figure 4.

The contour plots and 3-D perspectives of Type II  MVT density curves. (a1)–(a2) μ=(0,0), Σ=diag(112), (ν1,ν2)=(3,5); (b1)–(b2) μ=(1,1), Σ=[2,2;2,3], (ν1,ν2)=(4,20).

3. Estimation of parameters and test of independence

3.1. MLEs of (μ,Σ,ν) via the Monte Carlo ECM algorithm

Usually the value of ν cannot be known in advance in practice, we need to estimate ν together with (μ,Σ). Let x1,,xniidtd(II)(μ,Σ,ν) and the observed data be Yobs={xj}j=1n, where xj=(x1j,,xdj) denotes the realization of xj=(X1j,,Xdj) for j=1,,n. The mixture expression of Type II  MVT random vector specified by (7) evokes us to employ the ECM algorithm to obtain the MLEs of the parameters. For each xj ( j=1,,n), based on (7), we introduce the corresponding latent vector uj=(U1j,,Udj), where UijindGamma(νi/2,νi/2) for i=1,,d. The missing data are denoted by Ymis={uj}j=1n, where uj=(u1j,,udj) is the realization of uj, and the complete data are Ycom={Yobs,Ymis}={xj,uj}j=1n. The complete-data likelihood function is given by

L(μ,Σ,ν|Ycom)j=1ni=1duij12|Σ|n2exp12j=1n(xjμ)(Uj12Σ1Uj12)(xjμ)×j=1ni=1dνi2νi/2Γνi2uijνi21expνi2uij|Σ|n2exp12j=1n(xjμ)(Uj12Σ1Uj12)(xjμ)×j=1ni=1dνi2νi/2Γνi2uijνi12expνi2uij,

where Uj1/2=diag(uj12). Let Wj=diag(wj)=diag(uj1/2)=Uj1/2 and Wj=diag(wj)=Uj1/2 is its realization with wj=(W1j,,Wdj) and wj=(w1j,,wdj). Then the log-likelihood function becomes

(μ,Σ,ν|Ycom)n2log|Σ|12trΣ1j=1nWj(xjμ)Wj(xjμ)+i=1dnνi2logνi2nlogΓνi2+(νi1)j=1nlog(wij)νi2j=1nwij2.

The CM-step is to calculate the complete-data conditional MLEs of (μ,Σ) as given by

μˆ=j=1nWjΣ1Wj1j=1nWjΣ1Wjxj,Σˆ=1nj=1nWj(xjμ)(xjμ)Wj. (12)

And the complete-data MLE of νi is the solution to the equation

logνi2+1ψνi2+1nj=1n2log(wij)wij2=0,i=1,,d, (13)

where ψ() is the digamma function. The E-step is to replace those terms involved Wj's in (12)–(13), i.e. wijwkj and log(wij) for i,k=1,,d and j=1,,n, by their conditional expectations. To this end, we first need to derive the conditional distribution of wj|xj, which is given by

fwj|xj(wj|xj)w1jν1wdjνdexp12wjΣj1wj,wj00, (14)

where Σj=(XjΣ1Xj+Σ0)1, Xj=diag(xjμ) and Σ0 is defined by (10). Since the normalizing constant in (14) is not available in closed form, we will perform the Monte Carlo implementation in the E-step (see [2,15,28]).

Given (14), the Monte Carlo method for calculating the conditional expectations is summarized as follows.

  • Step 1:
    Generate wj(1),,wj(G)iidTNd(00,Σj;R+d), and approximately calculate the conditional pdf (14) as
    fwj|xj(wj|xj)1cjw1jν1wdjνdTNdwj|00,Σj;R+d,
    where
    cj=1Gg=1Gw1j(g)ν1wdj(g)νd. (15)
  • Step 2:
    Approximately calculate the conditional expectations as
    E(WijWkj|xj,μ,Σ,ν)1Gcjg=1Gwij(g)wkj(g)l=1dwlj(g)νl, (16)
    E(logWij|xj,μ,Σ,ν)1Gcjg=1Glogwij(g)l=1dwlj(g)νl (17)
    for i,k=1,,d.

By combining (12)–(13) with (16)–(17), we could obtain the MLEs of (μ,Σ,ν).

Regarding the convergence of the Monte Carlo ECM algorithm, the specification of G is very important. Large value of G could move the approximation closer to the true maximizer, but it is time-consuming. Therefore, it is inefficient to start with a large value of G when the current approximation to the maximizer is far from the true value. Instead, it is recommended to monitor the convergence of the algorithm till the process has been almost stabilized, then to terminate the algorithm and to choose the stabilized point as the new initial value to continue with a large value of G. This will further decrease the system variability and obtain a more closed maximizer to the real value. Besides, let θ(t) be the t-th approximation of θˆ, then the stopping rule of the Monte Carlo ECM algorithm is set as |θ(t+1)θ(t)|δ, where δ is a predetermined precision.

3.2. Testing hypothesis of independence

Suppose that we want to test the following hypotheses

H0Σ is diagonalagainstH1Σ is not diagonal.

Under H0, the likelihood ratio test (LRT) statistic is given by

T=2(μˆ0,Σˆ0,νˆ0|Yobs)(μˆ,Σˆ,νˆ|Yobs).χ2d(d1)2, (18)

where (μˆ0,Σˆ0,νˆ0) are the constrained MLEs of (μ,Σ,ν) under H0 while (μˆ,Σˆ,νˆ) are the unconstrained MLEs of (μ,Σ,ν). Under H0, Type II  MVT distribution reduces to the product of d independent univariate t distributions, thus the constrained MLEs (μˆ0,Σˆ0,νˆ0) are easily obtained. The test statistic T asymptotically follows a chi-squared distribution with d(d1)/2 degrees of freedom under H0, and the corresponding p-value is given by

p-value=Pr(T>t|H0),if d2,2min{Pr(T>t|H0),Pr(Tt|H0)},if d3. (19)

For a given significance level α, the null hypothesis should be rejected if p-valueα.

3.3. Type II  multivariate t regression model

In existing multivariate linear regression models, the error terms are often assumed to follow a multivariate normal or t distribution, where the latter can improve the robustness of the normal model for data with outliers. Since the proposed Type II  MVT distribution is much more flexible than Type I MVT distribution, we adopt it as the joint distribution of the error terms. Then Type II  MVT regression model is formulated as

yj=μj+ϵj,j=1,,n,μij=xjβi,i=1,,m, (20)

where yj=(Y1j,,Ymj) is an m×1 vector of response, μj=(μ1j,,μmj), {ϵj}j=1niidtm(II)(00,Σ,ν), Σ is a positive-definite matrix, ν=(ν1,,νm) is the vector of degrees of freedom, xj=(1,x1j,,xpj) is the known covariate vector for the subject j and βi=(βi0,βi1,,βip) is the vector of regression coefficients. Note that μj can be expressed in matrix form as

μj=xj000xj000xjm×m(p+1)β1β2βm=ˆXjβ,

where β=(β1,,βm). Now we rewrite (20) as

yj=Xjβ+ϵj,j=1,,n, (21)

and the objective is to estimate β, Σ and ν.

Let yj=(y1j,,ymj) be the realization of yj, and the observed data be denoted by Yobs={yj,xj}j=1n. Similar with that in Section 3.1, we introduce latent variables Wij2indGamma(νi/2,νi/2) for i=1,,m and j=1,,n such that

yj|(wj=wj)Nm(Xjβ,Wj1ΣWj1),j=1,,n,

where the notations wj, wj and Wj are the same as those denoted in Section 3.1.

The Monte Carlo ECM algorithm is also employed to give the MLEs of parameters. The first two CM-steps are to calculate the conditional MLEs

βˆ=j=1nXjWjΣ1WjXj1j=1nXjWjΣ1Wjyj,Σˆ=1nj=1nWj(yjXjβ)(yjXjβ)Wj, (22)

and the third CM-step is to calculate νˆi, which is the same solution to the equation (14) for i=1,,m. The E-step is to replace wijwkj and log(wij) for i,k=1,,m by their conditional expectations, which can be calculated similarly as (15)–(17) based on the following conditional distribution

fwj|yj(wj|yj)w1jν1wmjνmexp12wjYjΣ1Yj+Σ0wj,wj00, (23)

where Yj=diag(yjXjβ) and Σ0=diag(ν).

It is quite difficult to obtain the standard deviations of MLEs and Wald confidence intervals of parameters due to the complexity of the observed-data likelihood function and the observed information matrix. Fortunately, the posterior samples of these parameters are relatively easy to generate by using the Gibbs sampling as shown in the next section. Based on the posterior samples, we can provide the posterior means, the posterior standard deviations and Bayesian credible intervals.

4. Bayesian methods

In this section, we discuss the Bayesian methods for Type II  MVT distribution. The prior distribution of the parameters (μ,Σ,ν) for the classical MVT distribution has been discussed, see [17], in which μ, Σ and ν are assumed independent. And the locally uniform prior is assigned to μ, the inverse Wishart prior is assigned to Σ as

p(μ)constantandp(Σ)|Σ|m+12exp12trAΣ1, (24)

where m is a scalar and A is a d×d non-negative definite matrix. If m = d and A=O (i.e. zero matrix), p(Σ) is the non-informative prior; if m = −1 and A=O, p(Σ) is the flat prior. The prior distribution for the hyperparameter, the degrees of freedom ν, is advised by some literature (see [17,25]). One assumption of the prior of ν is the flat prior distribution for ν1, i.e.

p(ν)ν2,ν1. (25)

For the proposed Type II  MVT distribution td(II)(μ,Σ,ν), similarly we can apply (24) as the priors for (μ,Σ), and extend the prior for the univariate hyperparameter in (25) to the multivariate case as

p(ν)=i=1dνi2,νi1 for i=1,,d. (26)

The posterior distribution of the complete-data is

p(μ,Σ,ν|Ycom)L(μ,Σ,ν|Ycom)p(μ,Σ)p(ν)|Σ|n+m+12exp12trC1Σ1+AΣ1×i=1dj=1nνi2νi/2Γνi2uijνi12expνi2uijνi2,

where

C1=j=1nUj12(xjμ)(xjμ)Uj12.

Firstly, the conditional predictive distribution of missing data is derived as

f(Ymis|Yobs,μ,Σ,ν)=j=1nfuj|xj(uj|xj,μ,Σ,ν),

where

fuj|xj(uj|xj,μ,Σ,ν)(i=1duijνi12)exp[12(xjμ)(Uj12Σ1Uj12)(xjμ)12i=1dνiuij]. (27)

To generate samples of (μ,Σ,ν), we combine the Gibbs sampling method with the acceptance–rejection (AR) algorithm (see [27]). We propose that the envelope density and the envelope constant are given by

gj(uj)=iIGammauij|νi+12,νi2iIUuij|0,eνiνi1νi12andc=iIΓνi+12(νi/2)νi+12, (28)

respectively, where

I={i|0<νi1.837,i=1,,d}

and U(|a,b) denotes the density of the uniform distribution U(a,b). Note that Σ and Σ1 are positive definite, for any non-zero vector z, we always have zΣ1z0. Thus, from (27), it follows that

fuj|xj(uj|xj,μ,Σ)i=1duijνi12expνiuij2iIuijνi12expνiuij2iIeνiνi1νi12=cgj(uj),

since the function h(u)=uν12exp(νu/2) arrives its maximum [eν/(ν1)]ν12 at u=11/ν when ν>1.837. For 0<ν1.837, we have Γ(ν+12)/(ν/2)ν+121. Overall, the envelope constant c specified by (28) is always greater than or equal to 1, indicating that the function in (27) is minorized by cgj(uj).

Thus, the procedures to generate posterior samples by AR algorithm are given as:

  • Step 1:
    For each j{1,,n}, draw uj=uj from fuj|xj(uj|xj,μ,Σ,ν) through the AR algorithm with the following two steps:
    1. Draw Vj=vjU(0,1), and independently draw Yij=yijindGamma(νi+12,νi2) for iI and Yij=yijindU(0,[eνi/(νi1)]νi12) for iI. Set yj=(y1j,,ydj).
    2. If vjfuj|xj(yj|xj,μ,Σ,ν)/cgj(yj) (c and gj() are given by (28)), set uj=uj=yj; otherwise, go back to (a),
  • Step 2:
    Draw (μ,Σ,ν) from p(μ,Σ,ν|Ycom) via the Gibbs sampling method with the following loop:
    μ|(Σ,ν,Yobs,Ymis)Nd(A11B1,A11),Σ|(μ,ν,Yobs,Ymis)IWishartd(C1+A,n+md),
    where
    A1=j=1nUj12Σ1Uj12andB1=j=1nUj12Σ1Uj12xj,
    and generate ν from the posterior distribution of ν given (μ,Σ,Yobs,Ymis) via the grid method, with the density for each νi given by
    f(νi|μ,Σ,Yobs,Ymis)j=1nνi/2νi/2Γνi/2uijνi12expνi2uijνi2.

The above methods could be extended to the regression model specified in (20) or (21) by assigning priors for (β,Σ,ν).

5. Simulation studies

The components in Type I MVT distribution must be dependent and share the same degrees of freedom. While the proposed Type II  MVT distribution can accommodate different marginal amounts of tail weights on univariate t distributions and even approximate normal margins. Thus, we conduct some simulations to compare the fitting performances of the two distributions when data are generated from one of them and highlight the characteristic of Type II  MVT distribution.

We consider bivariate and trivariate cases, i.e. d = 2, 3. The sample size is set as n = 100, the mean vector μ is always chosen to be zero, the scale matrix Σ=(σij)d×d is set as the diagonal elements σii=1 for all i and the off-diagonal elements σij for ij will be specified in each case. Under different configurations of parameters (μ,Σ,ν,ν), samples will be generated from either td(μ,Σ,ν) or td(II)(μ,Σ,ν) as indicated by the first line of each experiment in the following two tables, where the results are obtained by fitting samples by both of the two distributions. In the case of d = 2, we have Experiments 1–4, the comparisons of estimation results including the average MLE, the corresponding mean squared error (MSE) of the estimates based on L = 1000 replications are presented in Table 3. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are used to reflect their fitting performances by averaging the AIC values and BIC values for each fitted model.

Table 3.

Comparisons of estimate performances with d = 2.

  Experiment 1: t2(μ,Σ,ν), σ12=0.5, ν=3
Type I μ1 μ2 σ11 σ12 σ22 ν AIC BIC
MLE 0.0009 0.0024 0.9794 0.4919 0.9955 2.9793 682.7873 698.4183
MSE 0.0142 0.0127 0.0401 0.0194 0.0394 0.3975
Type II μ1 μ2 σ11 σ12 σ22 ν1 ν2 AIC BIC
MLE 0.0011 0.0026 1.0712 0.6064 1.0978 3.4333 3.4489 694.9977 713.2339
MSE 0.0158 0.0145 0.0681 0.0480 0.0785
  Experiment 2: t2(II)(μ,Σ,ν), σ12=0.5, (ν1,ν2)=(3,3)
Type I μ1 μ2 σ11 σ12 σ22 ν AIC BIC
MLE −0.0024 −0.0035 1.1145 0.4475 1.1175 3.2953 705.6417 721.2727
MSE 0.0140 0.0148 0.0734 0.0168 0.0802
Type II μ1 μ2 σ11 σ12 σ22 ν1 ν2 AIC BIC
MLE −0.0006 −0.0014 0.9967 0.4965 1.0077 3.1612 3.2248 697.3014 715.5376
MSE 0.0127 0.0134 0.0553 0.0165 0.0567 0.8421 0.8900
  Experiment 3: t2(II)(μ,Σ,ν), σ12=0.5, (ν1,ν2)=(3,10)
Type I μ1 μ2 σ11 σ12 σ22 ν AIC BIC
MLE −0.0025 −0.0050 1.2948 0.4552 0.8495 4.8398 653.4693 669.1004
MSE 0.0166 0.0128 0.1672 0.0164 0.0475
Type II μ1 μ2 σ11 σ12 σ22 ν1 ν2 AIC BIC
MLE −0.0024 −0.0047 0.9957 0.5007 0.9884 3.0641 10.0156 646.4751 664.7113
MSE 0.0152 0.0121 0.0551 0.0159 0.0266 0.7348 5.0330
  Experiment 4: t2(II)(μ,Σ,ν), σ12=0, (ν1,ν2)=(5,5)
Type I μ1 μ2 σ11 σ12 σ22 ν AIC BIC
MLE −0.0030 0.0005 1.0831 0.0013 1.0807 5.5423 666.9350 682.5660
MSE 0.0127 0.0133 0.0470 0.0097 0.0533
Type II μ1 μ2 σ11 σ12 σ22 ν1 ν2 AIC BIC
MLE −0.0029 0.0005 0.9560 0.0006 0.9533 4.6581 4.6488 663.7192 681.9554
MSE 0.0123 0.0126 0.0372 0.0096 0.0417 1.4100 1.4080

Note: MLE is the average of 1000 point estimates with precision δ=102 via the Monte Carlo ECM algorithm; Mean squared error (MSE) of the estimate is equal to the sum of the variance and the squared bias of the estimator.

In the case of d = 3, we conduct Experiments 5–6, estimation accuracies on parameters are compared by MLE and MSE as summarized in Table 4. In a more direct-viewing way, box plots are adopted to present the estimation results for each experiment in Figures 57.

Figure 6.

Figure 6.

(a1–a3) Experiment 3: Box plots for MLEs of the parameters by bivariate Type I and Type II  MVT distributions; (b1–b3) Experiment 4: Box plots for MLEs of the parameters by bivariate Type I and Type II  MVT distributions.

Table 4.

Comparisons of estimate performances with d = 3.

  Experiment 5: t3(μ,Σ,ν), σij=(1)|ij|×0.5, ν=3
Type I μ1 μ2 μ3 σ11 σ12 σ13
MLE 0.0233 0.0020 0.0250 0.9997 −0.4970 0.5032
MSE 0.0102 0.0118 0.0101 0.0230 0.0118 0.0122
Type I σ22 σ23 σ33 ν    
MLE 0.9996 −0.4974 1.0042 3.0911    
MSE 0.0261 0.0118 0.0280 0.3512    
Type II μ1 μ2 μ3 σ11 σ12 σ13
MLE 0.0213 0.0016 0.0218 0.9147 −0.5093 0.5146
MSE 0.0070 0.0080 0.0077 0.0398 0.0191 0.0191
Type II σ22 σ23 σ33 ν1 ν2 ν3
MLE 0.9116 −0.5081 0.9157 2.9983 2.9775 2.9868
MSE 0.0434 0.0182 0.0446
  Experiment 6: t3(II)(μ,Σ,ν), σij=(1)|ij|×0.5, (ν1,ν2,ν3)=(3,3,3)
Type I μ1 μ2 μ3 σ11 σ12 σ13
MLE 0.0356 0.0075 0.0330 1.2886 −0.5332 0.5327
MSE 0.0155 0.0177 0.0157 0.1558 0.0181 0.0175
Type I σ22 σ23 σ33 ν    
MLE 1.2996 −0.5349 1.2854 4.0222    
MSE 0.1648 0.0185 0.1547    
Type II μ1 μ2 μ3 σ11 σ12 σ13
MLE 0.0230 0.0045 0.0206 0.9020 −0.4621 0.4629
MSE 0.0074 0.0089 0.0082 0.0405 0.0132 0.0125
Type II σ22 σ23 σ33 ν1 ν2 ν3
MLE 0.9039 −0.4646 0.8966 2.9699 2.9454 2.9617
MSE 0.0433 0.0139 0.0445 0.3139 0.3137 0.3377

Note: MLE is the average of 1000 point estimates with precision δ=102 via the Monte Carlo ECM algorithm; Mean squared error (MSE) of the estimate is equal to the sum of the variance and the squared bias of the estimator.

Figure 5.

Figure 5.

(a1–a3) Experiment 1: Box plots for MLEs of the parameters by bivariate Type I and Type II  MVT distributions; (b1–b3) Experiment 2: Box plots for MLEs of the parameters by bivariate Type I and Type II  MVT distributions.

Figure 7.

Figure 7.

(a1–a3) Experiment 5: Box plots for MLEs of the parameters by trivariate Type I and Type II  MVT distributions; (b1–b3) Experiment 6: Box plots for MLEs of the parameters by trivariate Type I and Type II  MVT distributions.

From Tables 3 and 4, it is observed that the performance of the Monte Carlo ECM algorithm in parameter estimation of Type II  MVT distribution is quite satisfactory in the sense that all the average MLEs are close to their true values. When data are indeed generated from Type I MVT distribution as in Experiments 1 and 5, the point estimates in Type I MVT are closer to their true values and are more stable with smaller MSEs for most of the parameters, and Type I MVT overwhelms Type II  MVT in model fitting in terms of AIC and BIC values. While the estimation by Type II  MVT still gives an acceptable result where the estimates of (μ,Σ) are not too far away from the true values and the estimates on νi's are close to each other. Moreover, when samples are indeed generated from Type II  MVT distribution with identical νi's as in Experiments 2, 4, 6, the estimation performance by Type II  MVT distribution is better, as the estimates are more accurate and have smaller MSEs; while in Type I MVT, from Figures 57, we can see that the estimations on parameters especially Σ and ν are more likely to produce outliers. If the true values of ν1 and ν2 are distinct as in Experiment 3, the Type II  MVT distribution apparently outperforms Type I MVT distribution which has only one degrees of freedom parameter, and Type II  MVT distribution has a better depiction on different amounts of tail weights for marginal components.

6. Applications

To ease the numerical work and graphical presentation, we focus on the bivariate case to provide the numerical illustration of the presented methodologies.

6.1. Sport data

We make use of a data set, collected by the Australian Institute of Sport and reported, see [3], containing several variables measured on n = 202 Australian athletes. Specifically, we consider the pair of variables BMI (body mass index) and LBM (lean body mass). Table 5 summaries the parameter estimates from Type I bivariate t distribution t2(μ,Σ,ν) with unknown ν and Type II  bivariate t distribution t2(II)(μ,Σ,ν) with unknown ν.

Table 5.

MLEs of parameters for fitting (BMI, LBM).

  MLEs of parameters   Criterion
Dist. μ Σ ν ν log-L AIC BIC
Type I (22.747964.3418) (6.201021.370921.3709147.3899) 11.0467 −1228.471 2468.942 2488.792
Type II (22.650964.1985) (5.408122.338422.3384167.2201) (6.221774.1270) −1223.653 2461.307 2484.464

To test the independence between BMI and LBM, from (18) and (19), the value of the test statistic T is given as t=144.9212χ0.052(1)=3.84 and the corresponding p-value 00.05, indicating that the null hypothesis should be rejected at 0.05 significance level.

Figure 8 displays the scatter plot and the superimposed contours of the fitted Type II  bivariate t distribution for the (BMI, LBM) pair. When comparing the two models, the proposed Type II  distribution is selected by both AIC and BIC, also with a larger value of log-likelihood. Besides, the estimated correlation coefficient from (11) is r=0.6981, which is very close to the sample index ρ=0.7139. The Kendall's tau and Spearman's rho for measuring the rank correlation between BMI and LBM are rk=0.5176 and rs=0.7026 respectively. All of these statistics reveal that there is an obvious positive correlation between the two variables, this feature is well captured by the proposed Type II  MVT distribution.

Figure 8.

Figure 8.

Scatter plot (BMI, LBM) and fitted Type II  bivariate t distribution.

6.2. Tuberculosis vaccine data

The data consist of 13 trials on the efficacy of Bacillus Calmette–Guéin (BCG) vaccine against tuberculosis for a vaccinated group and a non-vaccinated control group, as presented in Table 6. Let VD, VND denote the number of disease cases and non-disease cases in the vaccinated group, NVD, NVND denote the number of disease cases and non-disease cases in the non-vaccinated group, respectively. Some covariates are available: geographic latitude of the place where the study was done; year of publication. Here, we will carry out a bivariate analysis on the log-odds of tuberculosis in the vaccinated and not-vaccinated control arm. Let y=(Y1,Y2) where Y1=log(VD/VND) and Y2=log(NVD/NVND), and the covariates are chosen to be x=(1,x1,x2), where x1=Latitude33, x2=Year66. The Type II  bivariate t regression model is applied to analyze this data set.

Table 6.

Data from clinical trials on the efficacy of BCG vaccine in the prevention of tuberculosis (see [26]).

  Vaccinated Not Vaccinated    
Trial Disease No disease Disease No disease Latitude Year
1 4 119 11 128 44 48
2 6 300 29 274 55 49
3 3 228 11 209 42 60
4 62 13536 248 12619 52 77
5 33 5036 47 5761 13 73
6 180 1361 372 1079 44 53
7 8 2537 10 619 19 73
8 505 87886 499 87892 13 80
9 29 7470 45 7232 27 68
10 17 1699 65 1600 42 61
11 186 50448 141 27197 18 74
12 5 2493 3 2338 33 69
13 27 16886 29 17825 33 76

By adopting the bivariate t regression model specified by (21) and setting the initial values of (β,Σ,ν) as

β(0)=0.1×116,Σ(0)=Var(y)=1.49941.88821.88822.7850,ν(0)=112,

we calculate the MLEs of these parameters by using the Monte Carlo ECM algorithm (22), (13) and (23), which converged to (βˆ,Σˆ,νˆ) in 51 iterations with precision δ=102. These results are summarized in Table 7.

Table 7.

MLEs and posterior estimates of parameters for the tuberculosis vaccine data.

      Posterior Posterior 95% Bayesian
Parameter Coefficients MLE Mean Std Credible Interval
  Constant −5.006 −5.017 0.190 [−5.408, −4.606]
β1 x1 0.009 0.005 0.021 [−0.031,0.046]
  x2 −0.080 −0.085 0.028 [−0.128, −0.026]
  Constant −4.137 −4.139 0.243 [−4.652, −3.685]
β2 x1 0.043 0.035 0.032 [−0.023, 0.116]
  x2 −0.078 −0.090 0.041 [−0.168, −0.003]
  σ11 0.445 0.494 0.192 [0.235, 1.005]
Σ σ12 0.510 0.519 0.159 [0.245, 0.812]
  σ22 0.649 0.690 0.220 [0.305, 1.085]
ν ν1 13.834 12.870 0.581 [12.027, 13.923]
  ν2 3.062 3.234 0.141 [3.018, 3.479]

To illustrate the proposed Bayesian methods in Section 4, we assign the non-informative prior of (24) to (β,Σ) (i.e. p(β,Σ)|Σ|d+12) and (26) to ν, we generate 1000 posterior samples of (β,Σ,ν) by using the Gibbs sampling embedded with the AR algorithm. By discarding the first half of these samples, we can calculate the posterior means, the posterior standard deviations and the 95% Bayesian credible intervals as shown in the last three columns of Table 7.

Now suppose that we want to test the independence between Y1 and Y2, i.e. to test whether σ12 is zero or not. Under the null hypothesis, we have Yijt(xjβi,σii,νi) for j=1,,n and i = 1, 2. That is, we fit each component by the univariate t regression model. The parameters for each univariate distribution are estimated by

βˆ1=(4.9466,0.0038,0.0809),σˆ11=0.2683,νˆ1=3.6620,βˆ2=(3.8122,0.0275,0.0582),σˆ22=0.0012,νˆ2=0.3679.

From (18)–(19), the LRT statistic is given by t = 9.6778 and the corresponding p-value=0.0019<0.05. Thus, the null hypothesis should be rejected.

Moreover, from (11), the estimated correlation coefficient between Y1 and Y2 is r=0.7487, which is acceptable comparing with the sample correlation coefficient given by 0.924, indicating a positive correlation between Y1 and Y2. The statistics of Kendall's tau and Spearman's rho for rank correlation coefficient between Y1 and Y2 are rk=0.7692 and rs=0.8956, concurring with the above result. The difference of estimates on ν1 and ν2 from the marginal analysis indicates the distinct amounts of tail weights for Y1 and Y2, thus the Type II  bivariate t regression model provides a more reasonable fit to this data set than Type I model.

From the regression results, we know that the covariate x1 is insignificant while x2 is significant. To get a final model, we still need some extra work. We use this data set just for demonstrating that the proposed model is applicable to a regression analysis. Furthermore, incorporating covariates into the scale structure could be a potential future work.

7. Discussions

To overcome three obvious disadvantages associated with the classical MVT distribution, in this paper, we introduced a Type II  MVT distribution as a new robust alternative to the multivariate normal distribution. This new distribution has three noteworthy characteristics: (1) All components follow univariate t-distributions with not necessarily identical degrees of freedom; (2) it is applicable to the multivariate data that some components are marginally t distributed while some are approximately normal distributed, depending on small or large value of the corresponding νi; and (3) it could contain some dependent components and some statistically independent components. Therefore, the proposed distribution is more flexible in model specification. Confirmed by the two real data sets presented in Section 6, the variables are moderately or strongly correlated but with very different tail weights seen from the marginal perspective. In such cases, both the dependency structure and various amounts on tails should be considered. The classical MVT distribution has a very conspicuous drawback that limits the same degrees of freedom for all components. Instead, the proposed Type II  MVT distribution covers the two aspects thus it is superior to existing models and has a better performance in our two real data fitting.

In general, although the joint density of the Type II  MVT random vector does not have a closed-form expression, the SR (6) is very useful in the derivation of the marginal distributions, mixed moments, and Monte Carlo ECM algorithm. From the characteristic (1) in Section 2, we have shown that each component follows a univariate t-distribution. In fact, any sub-vectors of x, (Xi1,Xi2,,Xir) say, follows tr(II)(μ,Σ,ν), where μ=(μi1,μi2,,μir), Σ is a sub-matrix consisting of the i1,i2,,ir rows and i1,i2,,ir columns of Σ, and ν=(νi1,νi2,,νir). We also applied the Type II  MVT distribution to the linear regression model by adopting it as the joint distribution of the error terms, taking advantage of its flexibility to better capture the characteristic of the data when outliers exist.

Further extension of (6) can be considered as follows:

Xi=μi+Zi(V+Vi)/(ν+νi),i=1,,d, (29)

where z=(Z1,,Zd)Nd(00,Σ), Vχ2(ν), Viχ2(νi) for i=1,,d, and (z,V,V1,,Vd) are mutually independent. In particular, in (29) when ν1==νd=0, it reduces to Type I MVT distribution; and when ν=0, it reduces to Type II  MVT distribution. Note that the Type I MVT distribution (2) and the proposed Type II  MVT distribution (6) are non-nested. In (29), the dependency of the random vector x=(X1,,Xd) can possibly come from both the multivariate normal vector z and the common Gamma random variable V. Besides, compared with the two models in (4)–(5), although similar in expressions, the model (29) will bring convenience in parameter estimations without embedding the constraint as stated in Remark 2.1. When analyzing real data, we can start with this general model (29), and turn to the reduced model based on the estimation results. The model (29) includes the Type I and Type II  MVT distributions as special cases, thus it is much more convenient and useful for model selection.

As a comparison with the model proposed in [10], similarly we decompose the scale matrix into Σ=DAD, where D is the matrix of eigenvectors of Σ and A is a diagonal matrix whose entries are the eigenvalues of Σ, then the vector x in (6) can be reformulated as

x=μ+U1/2DA1/2z0, (30)

where z0=(Z10,,Zd0) is a d-dimensional Gaussian random vector with mean zero vector and covariance matrix equal to the identity matrix. Note that the model in (30) is different from the formulation in [10] since in general it is not commutative for matrix product, and only when D is a diagonal matrix the two are equivalent.

Acknowledgments

The authors are grateful to the editor and anonymous reviewer's valuable comments and suggestions.

Appendix. Derivation of joint pdf fx(x).

Based on the mixture expressions in (7), given the conditional distribution of x(u=u) and the marginal distributions of Ui's, the joint pdf of x is given by

fx(x)=R+dfx|u(x|u)fu(u)du=R+d2πd2|U1/2ΣU1/2|12exp[12(xμ)(U1/2ΣU1/2)1(xμ)]×i=1d(νi2)νi2Γ(νi2)uiνi21eνiui2du=2πd2|Σ|12[i=1d(νi2)νi2Γ(νi2)]×R+d(i=1duiνi12)exp{12[(u1/2)XΣ1Xu1/2+i=1dνiui]}du.

By employing the transformation of wi=ui1/2 for i=1,,d and then dui=2widwi, it follows that

fx(x)=2πd2|Σ|12[i=1d(νi2)νi2Γ(νi2)]×R+d(i=1dwiνi1)exp{12[wXΣ1Xw+i=1dνiwi2]}(i=1d2wi)dw=π2d2|Σ|12[i=1d(νi2)νi2Γ(νi2)]R+d(i=1dwiνi)exp12wXΣ1X+Σ0wdw=π2d2|Σ|12[i=1d(νi2)νi2Γ(νi2)]R+d(i=1dwiνi)exp12wΣ1wdw=π2d2|Σ|12[i=1d(νi2)νi2Γ(νi2)]CR+d(i=1dwiνi)1Cexp12wΣ1wdw=π2d2|Σ|12[i=1d(νi2)νi2Γ(νi2)]μν(w)C,

where X and Σ0 are defined in (10).

Funding Statement

Chi Zhang's research was supported by National Natural Science Foundation of China (Grant No. 11801380). Guo-Liang Tian's research was fully supported by National Natural Science Foundation of China (Grant No. 11771199). Kam Chuen Yuen's research was supported by a Seed Fund for Basic Research of the University of Hong Kong, and a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. HKU17306220). The work of Man-Lai Tang was partially supported through grants from the Research Grant Council of the Hong Kong Special Administrative Region (UGC/FDS14/P06/17, UGC/FDS14/P02/18, and the Research Matching Grant Scheme (RMGS)) and a grant from the National Natural Science Foundation of China (11871124). The computing facilities/software were supported by SAS Viya and the Big Data Intelligence Centre at Hang Seng University of Hong Kong.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Amos D.E. and Bulgren W.G., On the computation of a bivariate t-distribution, Math. Comp. 23 (1969), pp. 319–333. [Google Scholar]
  • 2.Booth J.G. and Hobert J.P., Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm, J. R. Stat. Soc. Ser. B. Stat. Methodol. 61 (1999), pp. 265–285. [Google Scholar]
  • 3.Cook R.D. and Weisberg S., An Introduction to Regression Graphics, John Wiley & Sons, New York, 2009. [Google Scholar]
  • 4.Cornish E.A., The multivariate t-distribution associated with a set of normal sample deviates, Aust. J. Phys. 7 (1954), pp. 531–542. [Google Scholar]
  • 5.Cornish E.A., The multivariate t-distribution associated with the general multivariate normal distribution, CSIRO Tech. Paper No. 13, CSIRO Division in Mathematics and Statistics, Adelaide, 1962.
  • 6.Dunnett C.W. and Sobel M., A bivariate generalization of Student's t-distribution, with tables for certain special cases, Biometrika 41 (1954), pp. 153–169. [Google Scholar]
  • 7.Fang H.B., Fang K.T. and Kotz S., The meta-elliptical distributions with given marginals, J. Multivariate Anal. 82 (2002), pp. 1–16. [Google Scholar]
  • 8.Fernández C. and Steel M.F.J., Multivariate student-t regression models: Pitfalls and inference, Biometrika 86 (1999), pp. 153–167. [Google Scholar]
  • 9.Finegold M. and Drton M., Robust graphical modeling of gene networks using classical and alternative t-distributions, Ann. Appl. Stat. 5 (2011), pp. 1057–1080. [Google Scholar]
  • 10.Forbes F. and Wraith D., A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: Application to robust clustering, Stat. Comput. 24 (2014), pp. 971–984. [Google Scholar]
  • 11.Hofert M., On sampling from the multivariate t distribution, R. J. 5 (2013), pp. 129–136. [Google Scholar]
  • 12.Horrace W.C., Some results on the multivariate truncated normal distribution, J. Multivariate Anal. 94 (2005), pp. 209–221. [Google Scholar]
  • 13.Jones M.C., A dependent bivariate t distribution with marginals on different degrees of freedom, Statist. Probab. Lett. 56 (2002), pp. 163–170. [Google Scholar]
  • 14.Kotz S. and Nadarajah S., Multivariate t Distributions and Their Applications, Cambridge University Press, Cambridge, UK, 2004. [Google Scholar]
  • 15.Levine R.A. and Casella G., Implementations of the Monte Carlo EM algorithm, J. Comput. Graph. Statist. 10 (2001), pp. 422–439. [Google Scholar]
  • 16.Lin P.E., Some characterizations of the multivariate t distribution, J. Multivariate Anal. 2 (1972), pp. 339–344. [Google Scholar]
  • 17.Liu C., Missing data imputation using the multivariate t distribution, J. Multivariate Anal. 53 (1995), pp. 139–158. [Google Scholar]
  • 18.Liu C., Bayesian robust multivariate linear regression with incomplete data, J. Amer. Statist. Assoc. 91 (1996), pp. 1219–1227. [Google Scholar]
  • 19.Liu C., ML estimation of the multivariate t distribution and the EM algorithm, J. Multivariate Anal. 63 (1997), pp. 296–312. [Google Scholar]
  • 20.Liu C. and Rubin D.B., The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence, Biometrika 81 (1994), pp. 633–648. [Google Scholar]
  • 21.Liu C. and Rubin D.B., ML estimation of the t distribution using EM and its extensions, ECM and ECME, Statist. Sinica 5 (1995), pp. 19–39. [Google Scholar]
  • 22.McLachlan G.J., Bean R.W. and Jones L.B.T., Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution, Comput. Statist. Data Anal. 51 (2007), pp. 5327–5338. [Google Scholar]
  • 23.Nadarajah S. and Kotz S., Mathematical properties of the multivariate t distribution, Acta. Appl. Math. 89 (2005), pp. 53–84. [Google Scholar]
  • 24.Nadarajah S. and Kotz S., Estimation methods for the multivariate t distribution, Acta. Appl. Math. 102 (2008), pp. 99–118. [Google Scholar]
  • 25.Relles D.A. and Rogers W.H., Statisticians are fairly robust estimators of location, J. Amer. Statist. Assoc. 72 (1977), pp. 107–111. [Google Scholar]
  • 26.van Houwelingen H.C., Arends L.R. and Stijnen T., Advanced methods in meta-analysis: Multivariate approach and meta-regression, Stat. Med. 21 (2002), pp. 589–624. [DOI] [PubMed] [Google Scholar]
  • 27.von Neumann J., Various techniques used in connection with random digits, J. Res. Nat. Bur. Stand. Appl. Math. Series 3 (1951), pp. 36–38. [Google Scholar]
  • 28.Wei G.C.G. and Tanner M.A., A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms, J. Amer. Statist. Assoc. 85 (1990), pp. 699–704. [Google Scholar]
  • 29.Zellner A., Bayesian and non-Bayesian analysis of the regression model with multivariate student-t error terms, J. Amer. Statist. Assoc. 71 (1976), pp. 400–405. [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES