A new multivariate t distribution with variant tail weights and its application in robust regression analysis

Chi Zhang; Guo-Liang Tian; Kam Chuen Yuen; Pengyi Liu; Man-Lai Tang

doi:10.1080/02664763.2021.1913106

. 2021 Apr 14;49(10):2629–2656. doi: 10.1080/02664763.2021.1913106

A new multivariate t distribution with variant tail weights and its application in robust regression analysis

Chi Zhang ^a, Guo-Liang Tian ^b,^CONTACT, Kam Chuen Yuen ^c, Pengyi Liu ^d, Man-Lai Tang ^e

PMCID: PMC9225396 PMID: 35757045

ABSTRACT

In this paper, we propose a new kind of multivariate t distribution by allowing different degrees of freedom for each univariate component. Compared with the classical multivariate t distribution, it is more flexible in the model specification that can be used to deal with the variant amounts of tail weights on marginals in multivariate data modeling. In particular, it could include components following the multivariate normal distribution, and it contains the product of independent t-distributions as a special case. Subsequently, it is extended to the regression model as the joint distribution of the error terms. Important distributional properties are explored and useful statistical methods are developed. The flexibility of the specified structure in better capturing the characteristic of data is exemplified by both simulation studies and real data analyses.

Keywords: Expectation/conditional maximization algorithm, multivariate t distribution, multivariate t regression model, multivariate truncated normal distribution, stochastic representation

1. Introduction

As a natural generalization of the univariate Student's t distribution, the multivariate t (MVT) distribution is a robust alternative to the multivariate normal distribution in the analysis of multivariate continuous data with heavy tails or outliers. The first derivation of the MVT distribution was accomplished by independent researchers, see [4,6], and the latter just concentrated on the bivariate case. Some characterizations on the MVT distribution were presented, see [5,16]. The work of considering an efficient computational procedure for the bivariate case was included in [1]. The maximum likelihood estimation methods for the MVT distribution with missing data were also discussed in [17,19–21]. The regression models with MVT error terms have been widely investigated by [8,18,29]. A comprehensive review on the topic was summarized in [14], as the first monograph on the MVT distribution. The corresponding mathematical properties such as stochastic representation (SR), consistency property, density expansion, moments conditional distributions can also be found in [23], and the associated estimation methods can be found in [24]. Recently, a sampling method for the MVT distribution with R package was provided, see [11].

The most common way of constructing the MVT distribution is through the SR of a normal random vector and an independent chi-squared random variable. Assume that the random vector $z = (Z_{1}, \dots, Z_{d})^{⊤} \sim N_{d} (0 0, Σ)$ , the random variable $V \sim χ^{2} (ν)$ , and they are independent (denoted by $z ⊥ ⊥ V$ ). Define

(X_{1}, \dots, X_{d})^{⊤} = x = μ + \frac{z}{\sqrt{V / ν}},

(1)

then $x$ is said to follow a multivariate t distribution, denoted by $x \sim t_{d} (μ, Σ, ν)$ , where $μ = (μ_{1}, \dots, μ_{d})^{⊤}$ is the location parameter vector, $Σ = (σ_{i j})_{d \times d}$ is a positive-definite scale matrix and ν is the degrees of freedom. When $ν = 1$ , the distribution reduces to the multivariate Cauchy distribution. As $ν \to \infty$ , the distribution approaches the multivariate normal distribution. Hence, the parameter ν may be viewed as a robustness tuning parameter (see [22], pp.5332). It can be fixed in advance or inferred from the observed data. An equivalent expression of (1) is

x = μ + U^{- \frac{1}{2}} z,

(2)

where $U \sim Gamma (ν / 2, ν / 2)$ (the density of $Y \sim Gamma (α, β)$ is denoted by $Gamma (y ∣ α, β)$ $= β^{α} y^{α - 1} \exp (- β y) / Γ (α)$ with $α > 0$ and $β > 0$ ), $z \sim N_{d} (0 0, Σ)$ and $U ⊥ ⊥ z$ . We call (1) or (2) the SR of the classical MVT or Type I MVT random vector.

From the SR (1) or (2), we have observed that the Type I MVT distribution has the following particular disadvantages:

All components follow univariate t distributions with the same degrees of freedom ν and hence the same amount of tailweight (see [13], pp. 163).
The Type I MVT random vector $x$ with a finite ν includes neither a component $X_{i}$ following the univariate normal distribution nor a sub-vector $(X_{i_{1}}, \dots, X_{i_{r}})^{⊤}$ following the r-dimensional normal distribution, where $1 \leq i_{1} < \dots < i_{r} \leq d$ and $1 \leq r < d$ .
The Type I MVT random vector $x$ can never contain statistically independent components since all components ${X_{i}}_{i = 1}^{d}$ share a common random variable V or U, even when $Σ$ is diagonal.

These drawbacks definitely limit its application to a certain extent. To overcome the above first drawback, within the framework of copula, a class of meta-elliptical distributions including a special member of so-called asymmetric multivariate t (AMVT) distribution was proposed, see [7], whose marginals being univariate t distributions with different degrees of freedom. They further pointed out that the AMVT distribution and Type I MVT distribution have the same copulas (i.e. the same correlation structure). Hence, the AMVT distribution still cannot overcome the above second and third drawbacks. Moreover, they did not develop any statistical inference methods on this particular distribution. Then, a new bivariate t distribution with marginals having different degrees of freedom was proposed, see [13], by defining

X_{1} = \frac{Z_{1}}{\sqrt{V_{1} / ν_{1}}} and X_{2} = \frac{Z_{2}}{\sqrt{(V_{1} + V_{2}) / ν_{2}}},

(3)

where $Z_{1}, Z_{2}, V_{1}, V_{2}$ are assumed to be mutually independent, $Z_{1}, Z_{2} \overset{i i d}{\sim} N (0, 1)$ , $V_{1} \sim χ^{2} (ν_{1})$ and $V_{2} \sim χ^{2} (ν_{2} - ν_{1})$ with $ν_{2} \geq ν_{1}$ . It also mentioned two possible multivariate extensions of (3) (see [13]) as:

X_{1}^{'} = \frac{Z_{1}}{\sqrt{V_{1} / ν_{1}}}, X_{2}^{'} = \frac{Z_{2}}{\sqrt{(V_{1} + V_{2}) / ν_{2}}}, \dots, X_{d}^{'} = \frac{Z_{d}}{\sqrt{(V_{1} + \dots + V_{d}) / ν_{d}}},

(4)

where $Z_{1}, \dots, Z_{d}, V_{1}, \dots, V_{d}$ are assumed to be mutually independent, $Z_{1}, \dots, Z_{d} \overset{i i d}{\sim} N (0, 1)$ , $V_{1} \sim χ^{2} (ν_{1})$ and $V_{i} \sim χ^{2} (ν_{i} - ν_{i - 1})$ for $i = 2, \dots, d$ with $0 < ν_{1} \leq ν_{2} \leq \dots \leq ν_{d}$ ; and

X_{1}^{″} = \frac{Z_{1}}{\sqrt{V_{1} / ν_{1}}}, X_{2}^{″} = \frac{Z_{2}}{\sqrt{(V_{1} + W_{2}) / ν_{2}}}, \dots, X_{d}^{″} = \frac{Z_{d}}{\sqrt{(V_{1} + W_{d}) / ν_{d}}},

(5)

where the ${Z_{i}}_{i = 1}^{d}$ and $V_{1}$ are the same as defined above and $W_{i} \sim χ^{2} (ν_{i} - ν_{1})$ with $0 < ν_{1} \leq ν_{i}$ for $i = 2, \dots, d$ . We have three comments on (3)–(5). First, all components in (3)–(5) follow univariate t distributions with possible different degrees of freedom, i.e. the first drawback of Type I MVT distribution can be overcome; however, the second and third drawbacks cannot yet be overcome. Second, only the distributional properties for the bivariate case were studied in [13], and did not provide any statistical inference methods. Third, for multivariate cases, it remarked that ‘It appears, however, that only in the bivariate case are the joint density function and hence conditional distributions fully tractable. It is this that prompted the publication of the current special case, along with any independent interest, this special case may possess’, see [13]. An alternative t-distribution by replacing the common gamma divisor in (2) with p i.i.d. gamma divisors was then proposed, see [9], and they only mentioned the proposal of allowing these independent gamma divisors with possible different degrees of freedom, but without any detailed discussion. By inserting multidimensional weights into the Gaussian scale mixture, it is generalized to a multivariate t-distribution, see [10]. However, the resulting marginal is a linear combination of univariate t-distributions, thus has no intuitive interpretation on the degrees of freedom for marginal distributions. For the two data sets presented in Section 6 below, we may encounter that there is a dependency structure among components, while each component does not have the same tailweight. The classical MVT distribution is no longer appropriate, thus a new tool is expected to break through the limitations in existing models and have a broader application.

In this paper, we will propose a new kind of MVT distribution by allowing different degrees of freedom for each univariate component, called Type II MVT distribution. The proposed distribution has several remarkable features including (a) all components follow univariate t-distributions with possible different degrees of freedom; (b) it could include components following the multivariate normal distributions when the corresponding $ν_{i} \to \infty$ ; (c) it could contain statistically independent components such that the product of independent t distributions is its special case. Compared with the classical MVT distribution, this new structure can better capture the characteristic of the data. We first derive important distributional properties and then develop useful statistical inference methods.

The rest of the paper is organized as follows. In Section 2, the Type II MVT distribution is outlined and some distributional properties are explored. In Section 3, the maximum likelihood estimation of the parameters via the Monte Carlo expectation/conditional maximization (ECM) algorithm and test of independence are developed, and then an extended regression model is introduced and investigated. Bayesian methods are presented in Section 4. In Section 5, some simulation studies are performed to evaluate the proposed methods. Two real data sets are used to compare the proposed distribution with the classical MVT distribution in Section 6. Finally, a discussion is given in Section 7.

2. Type II multivariate $t$ distribution

We define a new MVT distribution through the SR instead of the joint probability density function (pdf). A random vector $x = (X_{1}, \dots, X_{d})^{⊤}$ is said to follow the Type II MVT distribution, denoted by $x \sim t_{d}^{(I I)} (μ, Σ, ν)$ , if $x$ can be stochastically represented as

x = μ + U^{- 1 / 2} z,

(6)

where $μ = (μ_{1}, \dots, μ_{d})^{⊤}$ , $U^{- 1 / 2} = diag (u^{- 1 / 2})$ , $u = (U_{1}, \dots, U_{d})^{⊤}$ , $U_{i} \overset{i n d}{\sim} Gamma (ν_{i} / 2, ν_{i} / 2)$ for $i = 1, \dots, d$ , $z = (Z_{1}, \dots, Z_{d})^{⊤} \sim N_{d} (0 0, Σ)$ , $ν = (ν_{1}, \dots, ν_{d})^{⊤}$ , and $u ⊥ ⊥ z$ . The vector $ν$ could be fixed in advance or it can be estimated from the observed data. Note that if $ν_{1} = \dots = ν_{d} = ν$ , the distribution of $x$ in (6) becomes the alternative multivariate t distribution proposed by [9].

By comparing with the Type I MVT distribution, the Type II MVT distribution possesses the following three significant characteristics:

From (6), it is easy to obtain $X_{i} = μ_{i} + U_{i}^{- 1 / 2} Z_{i}$ , indicating that all components follow univariate t-distributions with possible different degrees of freedom; i.e. $X_{i} \sim t (μ_{i}, σ_{i i}, ν_{i})$ for $i = 1, \dots, d$ .
The Type II MVT random vector $x$ could include components following a multivariate normal distribution if the corresponding $ν_{i} \to \infty$ .
The Type II MVT random vector $x$ could contain statistically independent components. From (7) and (8) below, in Type II MVT distribution, any two components $X_{i}, X_{j}$ ( $i \neq j$ ) are independent provided that $Z_{i}, Z_{j}$ are independent. Whereas in Type I MVT distribution, any two components are always dependent since they share a common factor U. Particularly, when $Σ$ is diagonal, the Type II MVT density becomes the product of d independent univariate t densities.

Remark 2.1

The dependency structure of the Type II MVT distribution induced by the SR (6) is totally determined by the variance–covariance matrix $Σ$ of $z$ . Note that the Gamma random variables ${U_{i}}_{i = 1}^{d}$ are independent, thus all ${ν_{i}}_{i = 1}^{d}$ are free parameters and the estimation procedure on each $ν_{i}$ will be independent and straightforward. However, the dependency structure of the two distributions respectively induced by the SRs (4) and (5) is only own to the common random variable $V_{1}$ since the correlation from the multivariate normal distribution is not incorporated. Under the construction of (4) or (5), the estimations of $ν_{i}$ 's will be more complicated since an ascending order constraint $ν_{1} \leq ν_{2} \leq \dots \leq ν_{d}$ or $ν_{1} \leq ν_{i}$ for $i = 2, \dots, d$ is involved.

2.1. Density function of Type II MVT distribution

To derive the joint pdf of Type II MVT distribution, we first introduce the multivariate truncated normal distribution. A d-dimensional random vector $w = (W_{1}, \dots, W_{d})^{⊤}$ is said to follow the multivariate normal distribution truncated in the box $[a, b]$ , denoted by $w \sim {TN}_{d} (μ, Σ; [a, b])$ with $a = (a_{1}, \dots, a_{d})^{⊤} \in R^{d}$ and $b = (b_{1}, \dots, b_{d})^{⊤} \in R^{d}$ , if its joint pdf is (see [12], pp. 210)

{TN}_{d} (w | μ, Σ; [a, b]) = \frac{\exp [- 0.5 (w - μ)^{⊤} Σ^{- 1} (w - μ)]}{\int_{[a, b]} \exp [- 0.5 (w - μ)^{⊤} Σ^{- 1} (w - μ)] d w}, a \leq w \leq b,

where $w = (w_{1}, \dots, w_{d})^{⊤}$ is the realization of $w = (W_{1}, \dots, W_{d})^{⊤}$ .

Now, we can rewrite (6) as a mixture of

x | (u = u) \sim N_{d} (μ, U^{- 1 / 2} Σ U^{- 1 / 2}) and {U_{i}}_{i = 1}^{d} \overset{i n d}{\sim} Gamma (ν_{i} / 2, ν_{i} / 2),

(7)

where $u = (u_{1}, \dots, u_{d})^{⊤}$ is the realization of $u = (U_{1}, \dots, U_{d})^{⊤}$ and $U^{- 1 / 2} = diag (u^{- 1 / 2})$ is the realization of $U^{- 1 / 2} = diag (u^{- 1 / 2})$ . The joint pdf of $x$ is obtained as (for detailed derivation, see Appendix):

f_{x} (x) = {(\frac{π}{2})}^{- \frac{d}{2}} | Σ |^{- \frac{1}{2}} [\prod_{i = 1}^{d} \frac{(\frac{ν_{i}}{2})^{\frac{ν_{i}}{2}}}{Γ (\frac{ν_{i}}{2})}] \cdot μ_{ν} (w) \cdot C,

(8)

where

\begin{aligned} μ_{ν} (w) & = \int_{R_{+}^{d}} (\prod_{i = 1}^{d} w_{i}^{ν_{i}}) \cdot C^{- 1} \exp (- 0.5 w^{⊤} Σ^{* - 1} w) d w, \\ C & = \int_{R_{+}^{d}} \exp (- 0.5 w^{⊤} Σ^{* - 1} w) d w, \end{aligned}

(9)

and $Σ^{*} = (X^{*} Σ^{- 1} X^{*} + Σ_{0})^{- 1}$ with

X^{*} = diag (x - μ) and Σ_{0} = diag (ν) .

(10)

To have an insight into the function $μ_{ν} (w)$ , we first define the mixed moments of a random vector. Let a random vector $x = (X_{1}, \dots, X_{d})^{⊤}$ have the joint pdf $f_{x} (x)$ . Given $r = (r_{1}, \dots, r_{d})^{⊤}$ , where $r_{1}, \dots, r_{d}$ are non-negative constants, then

E (X_{1}^{r_{1}} \dots X_{d}^{r_{d}}) = \int x_{1}^{r_{1}} \dots x_{d}^{r_{d}} \cdot f_{x} (x) d x

is called the mixed moments of $x$ in power of $r$ with respect to $x \sim f_{x} (x)$ . Note that C is the normalizing constant of ${TN}_{d} (0 0, Σ^{*}; R_{+}^{d})$ , then $μ_{ν} (w)$ given by (9) is the mixed moments of $w$ in power of $ν$ with respect to $w \sim {TN}_{d} (0, Σ^{*};$ $R_{+}^{d})$ . It is clear from (8) that this distribution is not of the elliptical form (see [7]).

Let $x = (x_{1}^{⊤}, x_{2}^{⊤})^{⊤}$ , where $x_{1}$ consists of the first q components and $x_{2}$ consists of the last $(d - q)$ components. To consider the conditional distribution of $x_{1}$ given $x_{2} = x_{2}$ , the derivation of $f_{x_{1} | x_{2}} (x_{1} | x_{2}) = f_{x} (x) / f_{x_{2}} (x_{2})$ is not easy since the marginal density of $x_{2}$ depends on a multiple integral over hypercubes. For visualization, we present the conditional distribution curves for the bivariate case. The parameters are set as

Case 1: $μ = (0, 0)^{⊤}$ , $Σ = [1, 0.8; 0.8, 1]$ , $ν = (3, 5)^{⊤}$ . The plots of $f (x_{1} | x_{2})$ are curved for $x_{2} = - 2, - 1, 0, 1, 2$ , respectively.
Case 2: $μ = (1, 3)^{⊤}$ , $Σ = [1, - 0.5; - 0.5, 1]$ , $ν = (3, 2)^{⊤}$ . The plots of $f (x_{1} | x_{2})$ are curved for $x_{2} = - 1, 1, 3, 5, 7$ , respectively.

From Figure 1, the curves of $f (x_{1} | x_{2})$ are in similar shapes as the value of $x_{2}$ moves under the same parameter configuration.

2.2. Moments and correlation

The mean vector and variance–covariance matrix of $x$ are, respectively, given by

E (x) = μ, for ν_{i} > 1, i = 1, \dots, d and Var (x) = Σ^{'} = (σ_{i j}^{'}),

where

σ_{i i}^{'} = \frac{ν_{i}}{ν_{i} - 2} σ_{i i}, σ_{i j}^{'} = \frac{(ν_{i} / 2)^{\frac{1}{2}} Γ (\frac{ν_{i} - 1}{2}) (ν_{j} / 2)^{\frac{1}{2}} Γ (\frac{ν_{j} - 1}{2})}{Γ (\frac{ν_{i}}{2}) Γ (\frac{ν_{j}}{2})} σ_{i j}, i \neq j,

for $ν_{i}, ν_{j} > 2$ . Thus, the correlation coefficient between $X_{i}$ and $X_{j}$ is given by

Corr (X_{i}, X_{j}) = \frac{\sqrt{(ν_{i} - 2) (ν_{j} - 2)} Γ (\frac{ν_{i} - 1}{2}) Γ (\frac{ν_{j} - 1}{2}) σ_{i j}}{2 Γ (\frac{ν_{i}}{2}) Γ (\frac{ν_{j}}{2}) \sqrt{σ_{i i} σ_{j j}}}, ν_{i}, ν_{j} > 2.

(11)

From (11), we can see that the sign of $Corr (X_{i}, X_{j})$ is determined by $σ_{i j}$ and can be arbitrary (positive, zero or negative).

2.3. Comparison of the densities

We compare the pdfs of Type I MVT distribution $t_{d} (μ, Σ, ν)$ and Type II MVT distribution $t_{d}^{(I I)} (μ, Σ, ν)$ . For the same $μ$ and $Σ$ , if one sets $ν = ν 1 1_{d}$ where $1 1_{d} = (1, \dots, 1)^{⊤}$ , from (2) and (6) we can see that the marginal density of each $X_{i}$ from the two distributions are identical, following $t (μ_{i}, σ_{i i}, ν)$ . However, the shapes of the two density functions may appear totally different. Moreover, Type II MVT distribution is more flexible since it could contain the case of $ν \neq ν 1 1_{d}$ .

For more comparisons, we consider d = 2 and set parameter configurations as $μ = (μ_{1}, μ_{2})^{⊤}$ and $Σ = [σ_{11}, σ_{12}; σ_{21}, σ_{22}]$ for both distributions, ν being the degrees of freedom in Type I MVT, $ν = (ν_{1}, ν_{2})^{⊤}$ being degrees of freedom in Type II MVT, and $ν = ν_{1} = ν_{2}$ . To illustrate the differences, we averagely take 30 values of $X_{1}$ within $[μ_{1} - 3 \sqrt{σ_{11}}, μ_{1} + 3 \sqrt{σ_{11}}]$ and 30 values of $X_{2}$ within $[μ_{2} - 3 \sqrt{σ_{22}}, μ_{2} + 3 \sqrt{σ_{22}}]$ . The maximum and minimum of Type I and Type II MVT densities over these $(X_{1}, X_{2})$ points are computed and displayed in Table 1. The correlation coefficients between two components denoted by ρ in the two distributions are also compared. Besides, the shapes of corresponding densities are curved in both contour graphs and three-dimensional perspectives by Figures 2 and 3. It is observed that when the two densities have the same amount of marginal tail weights, the Type II density curve is a bit flatter than the Type I curve, and the correlation coefficient between components in Type II MVT is also weaker than that in Type I MVT, which is not surprising as the dependence in Type II MVT only relies on the dependence from the multivariate normal vector.

Table 1.

Maximum and minimum values of Type I and II MVT densities for $ν = ν_{1} = ν_{2}$ .

	$μ = (0, 0)^{⊤}, Σ = diag (1 1_{2})$		$μ = (1, - 1)^{⊤}, Σ = [2, 2; 2, 3]$
	$ν = 3$	$(ν_{1}, ν_{2}) = (3, 3)$	$ν = 10$	$(ν_{1}, ν_{2}) = (10, 10)$
	$f^{(I)}$	$f^{(I I)}$	$f^{(I)}$	$f^{(I I)}$
Maximum	0.156351	0.133184	0.111747	0.106234
Minimum	0.001228	0.000528	$7.0562 \times 10^{- 8}$	$4.4166 \times 10^{- 10}$
ρ	0	0	0.816497	0.767150

Open in a new tab

Note: $f^{(I)}$ : Density function of Type I MVT distribution; $f^{(I I)}$ : Density function of Type II MVT distribution, c.f. (8).

Figure 2. — The contour plots and 3-D perspectives of Type I and Type II MVT density curves given $μ = (0, 0)^{⊤}$ , $Σ = diag (1 1_{2})$ . (a1)–(a2) Type I MVT with $ν = 3$ ; (b1)–(b2) Type II MVT with $ν_{1} = ν_{2} = 3$ .

Figure 3. — The contour plots and 3-D perspectives of Type I and Type II MVT density curves given $μ = (1, - 1)^{⊤}$ , $Σ = [2, 2; 2, 3]$ . (a1)–(a2) Type I MVT with $ν = 10$ ; (b1)–(b2) Type II MVT with $ν_{1} = ν_{2} = 10$ .

Moreover, we could have different $ν_{i}$ 's in Type II MVT, thus we choose several combinations of parameters with $ν_{1} \neq ν_{2}$ . Similarly, the maximum and minimum of densities are presented in Table 2 and their various shapes are displayed in Figure 4.

Table 2.

Maximum and minimum values of Type II MVT density function for $ν_{1} \neq ν_{2}$ .

	$μ = (0, 0)^{⊤}, Σ = diag (1 1_{2})$	$μ = (1, - 1)^{⊤}, Σ = [2, 2; 2, 3]$
	$(ν_{1}, ν_{2}) = (3, 5)$	$(ν_{1}, ν_{2}) = (4, 20)$
	$f^{(I I)}$	$f^{(I I)}$
Maximum	0.137650	0.103573
Minimum	0.000397	$1.2944 \times 10^{- 9}$
ρ	0	0.713626

Open in a new tab

Figure 4. — The contour plots and 3-D perspectives of Type II MVT density curves. (a1)–(a2) $μ = (0, 0)^{⊤}$ , $Σ = diag (1 1_{2})$ , $(ν_{1}, ν_{2}) = (3, 5)$ ; (b1)–(b2) $μ = (1, - 1)^{⊤}$ , $Σ = [2, 2; 2, 3]$ , $(ν_{1}, ν_{2}) = (4, 20)$ .

3. Estimation of parameters and test of independence

3.1. MLEs of $(μ, Σ, ν)$ via the Monte Carlo ECM algorithm

Usually the value of $ν$ cannot be known in advance in practice, we need to estimate $ν$ together with $(μ, Σ)$ . Let $x_{1}, \dots, x_{n} \overset{i i d}{\sim} t_{d}^{(I I)} (μ, Σ, ν)$ and the observed data be $Y_{o b s} = {x_{j}}_{j = 1}^{n}$ , where $x_{j} = (x_{1 j}, \dots, x_{d j})^{⊤}$ denotes the realization of $x_{j} = (X_{1 j}, \dots, X_{d j})^{⊤}$ for $j = 1, \dots, n$ . The mixture expression of Type II MVT random vector specified by (7) evokes us to employ the ECM algorithm to obtain the MLEs of the parameters. For each $x_{j}$ ( $j = 1, \dots, n$ ), based on (7), we introduce the corresponding latent vector $u_{j} = (U_{1 j}, \dots, U_{d j})^{⊤}$ , where $U_{i j} \overset{i n d}{\sim} Gamma (ν_{i} / 2, ν_{i} / 2)$ for $i = 1, \dots, d$ . The missing data are denoted by $Y_{m i s} = {u_{j}}_{j = 1}^{n}$ , where $u_{j} = (u_{1 j}, \dots, u_{d j})^{⊤}$ is the realization of $u_{j}$ , and the complete data are $Y_{c o m} = {Y_{o b s}, Y_{m i s}} = {x_{j}, u_{j}}_{j = 1}^{n}$ . The complete-data likelihood function is given by

\begin{aligned} L (μ, Σ, ν | Y_{c o m}) & \propto (\prod_{j = 1}^{n} \prod_{i = 1}^{d} u_{i j}^{\frac{1}{2}}) | Σ |^{- \frac{n}{2}} \exp [- \frac{1}{2} \sum_{j = 1}^{n} (x_{j} - μ)^{⊤} (U_{j}^{\frac{1}{2}} Σ^{- 1} U_{j}^{\frac{1}{2}}) (x_{j} - μ)] \\ \times \prod_{j = 1}^{n} \prod_{i = 1}^{d} \frac{{(\frac{ν_{i}}{2})}^{ν_{i} / 2}}{Γ (\frac{ν_{i}}{2})} u_{i j}^{\frac{ν_{i}}{2} - 1} \exp (- \frac{ν_{i}}{2} u_{i j}) \\ \propto & | Σ |^{- \frac{n}{2}} \exp [- \frac{1}{2} \sum_{j = 1}^{n} (x_{j} - μ)^{⊤} (U_{j}^{\frac{1}{2}} Σ^{- 1} U_{j}^{\frac{1}{2}}) (x_{j} - μ)] \\ \times \prod_{j = 1}^{n} \prod_{i = 1}^{d} \frac{{(\frac{ν_{i}}{2})}^{ν_{i} / 2}}{Γ (\frac{ν_{i}}{2})} u_{i j}^{\frac{ν_{i} - 1}{2}} \exp (- \frac{ν_{i}}{2} u_{i j}), \end{aligned}

where $U_{j}^{1 / 2} = diag (u_{j}^{\frac{1}{2}})$ . Let $W_{j} = diag (w_{j}) = diag (u_{j}^{1 / 2}) = U_{j}^{1 / 2}$ and $W_{j} = diag (w_{j}) = U_{j}^{1 / 2}$ is its realization with $w_{j} = (W_{1 j}, \dots, W_{d j})^{⊤}$ and $w_{j} = (w_{1 j}, \dots, w_{d j})^{⊤}$ . Then the log-likelihood function becomes

\begin{aligned} ℓ (μ, Σ, ν | Y_{c o m}) \\ \propto - \frac{n}{2} \log | Σ | - \frac{1}{2} t r \{Σ^{- 1} \sum_{j = 1}^{n} [W_{j} (x_{j} - μ)] {[W_{j} (x_{j} - μ)]}^{⊤}\} \\ + \sum_{i = 1}^{d} [\frac{n ν_{i}}{2} \log (\frac{ν_{i}}{2}) - n \log Γ (\frac{ν_{i}}{2}) + (ν_{i} - 1) \sum_{j = 1}^{n} \log (w_{i j}) - \frac{ν_{i}}{2} \sum_{j = 1}^{n} w_{i j}^{2}] . \end{aligned}

The CM-step is to calculate the complete-data conditional MLEs of $(μ, Σ)$ as given by

\{\begin{cases} \hat{μ} = {(\sum_{j = 1}^{n} W_{j} Σ^{- 1} W_{j})}^{- 1} (\sum_{j = 1}^{n} W_{j} Σ^{- 1} W_{j} x_{j}), \\ \hat{Σ} = \frac{1}{n} \sum_{j = 1}^{n} W_{j} (x_{j} - μ) (x_{j} - μ)^{⊤} W_{j} . \end{cases}

(12)

And the complete-data MLE of $ν_{i}$ is the solution to the equation

\log (\frac{ν_{i}}{2}) + 1 - ψ (\frac{ν_{i}}{2}) + \frac{1}{n} \sum_{j = 1}^{n} [2 \log (w_{i j}) - w_{i j}^{2}] = 0, i = 1, \dots, d,

(13)

where $ψ (\cdot)$ is the digamma function. The E-step is to replace those terms involved $W_{j}$ 's in (12)–(13), i.e. $w_{i j} w_{k j}$ and $\log (w_{i j})$ for $i, k = 1, \dots, d$ and $j = 1, \dots, n$ , by their conditional expectations. To this end, we first need to derive the conditional distribution of $w_{j} | x_{j}$ , which is given by

f_{w_{j} | x_{j}} (w_{j} | x_{j}) \propto w_{1 j}^{ν_{1}} \dots w_{d j}^{ν_{d}} \exp (- \frac{1}{2} w_{j}^{⊤} Σ_{j}^{* - 1} w_{j}), w_{j} \geq 0 0,

(14)

where $Σ_{j}^{*} = (X_{j}^{*} Σ^{- 1} X_{j}^{*} + Σ_{0})^{- 1}$ , $X_{j}^{*} = diag (x_{j} - μ)$ and $Σ_{0}$ is defined by (10). Since the normalizing constant in (14) is not available in closed form, we will perform the Monte Carlo implementation in the E-step (see [2,15,28]).

Given (14), the Monte Carlo method for calculating the conditional expectations is summarized as follows.

Step 1:
Generate $w_{j}^{(1)}, \dots, w_{j}^{(G)} \overset{i i d}{\sim} {TN}_{d} (0 0, Σ_{j}^{*}; R_{+}^{d})$ , and approximately calculate the conditional pdf (14) as
$f_{w_{j} | x_{j}} (w_{j} | x_{j}) \approx \frac{1}{c_{j}} w_{1 j}^{ν_{1}} \dots w_{d j}^{ν_{d}} \cdot {TN}_{d} (w_{j} | 0 0, Σ_{j}^{*}; R_{+}^{d}),$
where
$c_{j} = \frac{1}{G} \sum_{g = 1}^{G} w_{1 j}^{(g) ν_{1}} \dots w_{d j}^{(g) ν_{d}} .$ (15)
Step 2:
Approximately calculate the conditional expectations as
$\begin{aligned} E (W_{i j} W_{k j} | x_{j}, μ, Σ, ν) & \approx \frac{1}{G c_{j}} \sum_{g = 1}^{G} [w_{i j}^{(g)} w_{k j}^{(g)} \prod_{l = 1}^{d} w_{l j}^{(g) ν_{l}}], \end{aligned}$ (16)

$\begin{aligned} E (\log W_{i j} | x_{j}, μ, Σ, ν) & \approx \frac{1}{G c_{j}} \sum_{g = 1}^{G} [\log (w_{i j}^{(g)}) \prod_{l = 1}^{d} w_{l j}^{(g) ν_{l}}] \end{aligned}$ (17)
for $i, k = 1, \dots, d$ .

By combining (12)–(13) with (16)–(17), we could obtain the MLEs of $(μ, Σ, ν)$ .

Regarding the convergence of the Monte Carlo ECM algorithm, the specification of G is very important. Large value of G could move the approximation closer to the true maximizer, but it is time-consuming. Therefore, it is inefficient to start with a large value of G when the current approximation to the maximizer is far from the true value. Instead, it is recommended to monitor the convergence of the algorithm till the process has been almost stabilized, then to terminate the algorithm and to choose the stabilized point as the new initial value to continue with a large value of G. This will further decrease the system variability and obtain a more closed maximizer to the real value. Besides, let $θ^{(t)}$ be the t-th approximation of $\hat{θ}$ , then the stopping rule of the Monte Carlo ECM algorithm is set as $| θ^{(t + 1)} - θ^{(t)} | \leq δ$ , where δ is a predetermined precision.

3.2. Testing hypothesis of independence

Suppose that we want to test the following hypotheses

H_{0} : Σ is diagonal against H_{1} : Σ is not diagonal .

Under $H_{0}$ , the likelihood ratio test (LRT) statistic is given by

T = - 2 \{ℓ ({\hat{μ}}_{0}, {\hat{Σ}}_{0}, {\hat{ν}}_{0} | Y_{o b s}) - ℓ (\hat{μ}, \hat{Σ}, \hat{ν} | Y_{o b s})\} \dot{\sim} χ^{2} (\frac{d (d - 1)}{2}),

(18)

where $({\hat{μ}}_{0}, {\hat{Σ}}_{0}, {\hat{ν}}_{0})$ are the constrained MLEs of $(μ, Σ, ν)$ under $H_{0}$ while $(\hat{μ}, \hat{Σ}, \hat{ν})$ are the unconstrained MLEs of $(μ, Σ, ν)$ . Under $H_{0}$ , Type II MVT distribution reduces to the product of d independent univariate t distributions, thus the constrained MLEs $({\hat{μ}}_{0}, {\hat{Σ}}_{0}, {\hat{ν}}_{0})$ are easily obtained. The test statistic T asymptotically follows a chi-squared distribution with $d (d - 1) / 2$ degrees of freedom under $H_{0}$ , and the corresponding $p -value$ is given by

p -value = \{\begin{cases} Pr (T > t | H_{0}), & if d \leq 2, \\ 2 min {Pr (T > t | H_{0}), Pr (T \leq t | H_{0})}, & if d \geq 3. \end{cases}

(19)

For a given significance level α, the null hypothesis should be rejected if $p -value \leq α$ .

3.3. Type II multivariate t regression model

In existing multivariate linear regression models, the error terms are often assumed to follow a multivariate normal or t distribution, where the latter can improve the robustness of the normal model for data with outliers. Since the proposed Type II MVT distribution is much more flexible than Type I MVT distribution, we adopt it as the joint distribution of the error terms. Then Type II MVT regression model is formulated as

\{\begin{cases} y_{j} = μ_{j} + ϵ_{j}, j = 1, \dots, n, \\ μ_{i j} = x_{j}^{⊤} β_{i}, i = 1, \dots, m, \end{cases}

(20)

where $y_{j} = (Y_{1 j}, \dots, Y_{m j})^{⊤}$ is an $m \times 1$ vector of response, $μ_{j} = (μ_{1 j}, \dots, μ_{m j})^{⊤}$ , ${ϵ_{j}}_{j = 1}^{n} \overset{i i d}{\sim} t_{m}^{(I I)} (0 0, Σ, ν)$ , $Σ$ is a positive-definite matrix, $ν = (ν_{1}, \dots, ν_{m})^{⊤}$ is the vector of degrees of freedom, $x_{j} = (1, x_{1 j}, \dots, x_{p j})^{⊤}$ is the known covariate vector for the subject j and $β_{i} = (β_{i 0}, β_{i 1}, \dots, β_{i p})^{⊤}$ is the vector of regression coefficients. Note that $μ_{j}$ can be expressed in matrix form as

μ_{j} = {(\begin{array}{cccc} x_{j}^{⊤} & 0 & \dots & 0 \\ 0 & x_{j}^{⊤} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & x_{j}^{⊤} \end{array})}_{m \times m (p + 1)} (\begin{matrix} β_{1} \\ β_{2} \\ ⋮ \\ β_{m} \end{matrix}) \hat{=} X_{j} β,

where $β = (β_{1}^{⊤}, \dots, β_{m}^{⊤})^{⊤}$ . Now we rewrite (20) as

y_{j} = X_{j} β + ϵ_{j}, j = 1, \dots, n,

(21)

and the objective is to estimate $β$ , $Σ$ and $ν$ .

Let $y_{j} = (y_{1 j}, \dots, y_{m j})^{⊤}$ be the realization of $y_{j}$ , and the observed data be denoted by $Y_{o b s} = {y_{j}, x_{j}}_{j = 1}^{n}$ . Similar with that in Section 3.1, we introduce latent variables $W_{i j}^{2} \overset{i n d}{\sim} Gamma (ν_{i} / 2, ν_{i} / 2)$ for $i = 1, \dots, m$ and $j = 1, \dots, n$ such that

y_{j} | (w_{j} = w_{j}) \sim N_{m} (X_{j} β, W_{j}^{- 1} Σ W_{j}^{- 1}), j = 1, \dots, n,

where the notations $w_{j}$ , $w_{j}$ and $W_{j}$ are the same as those denoted in Section 3.1.

The Monte Carlo ECM algorithm is also employed to give the MLEs of parameters. The first two CM-steps are to calculate the conditional MLEs

\{\begin{cases} \hat{β} = {(\sum_{j = 1}^{n} X_{j}^{⊤} W_{j} Σ^{- 1} W_{j} X_{j})}^{- 1} (\sum_{j = 1}^{n} X_{j}^{⊤} W_{j} Σ^{- 1} W_{j} y_{j}), \\ \hat{Σ} = \frac{1}{n} \sum_{j = 1}^{n} W_{j} (y_{j} - X_{j} β) (y_{j} - X_{j} β)^{⊤} W_{j}, \end{cases}

(22)

and the third CM-step is to calculate ${\hat{ν}}_{i}$ , which is the same solution to the equation (14) for $i = 1, \dots, m$ . The E-step is to replace $w_{i j} w_{k j}$ and $\log (w_{i j})$ for $i, k = 1, \dots, m$ by their conditional expectations, which can be calculated similarly as (15)–(17) based on the following conditional distribution

f_{w_{j} | y_{j}} (w_{j} | y_{j}) \propto w_{1 j}^{ν_{1}} \dots w_{m j}^{ν_{m}} \exp [- \frac{1}{2} w_{j}^{⊤} (Y_{j}^{*} Σ^{- 1} Y_{j}^{*} + Σ_{0}) w_{j}], w_{j} \geq 0 0,

(23)

where $Y_{j}^{*} = diag (y_{j} - X_{j} β)$ and $Σ_{0} = diag (ν)$ .

It is quite difficult to obtain the standard deviations of MLEs and Wald confidence intervals of parameters due to the complexity of the observed-data likelihood function and the observed information matrix. Fortunately, the posterior samples of these parameters are relatively easy to generate by using the Gibbs sampling as shown in the next section. Based on the posterior samples, we can provide the posterior means, the posterior standard deviations and Bayesian credible intervals.

4. Bayesian methods

In this section, we discuss the Bayesian methods for Type II MVT distribution. The prior distribution of the parameters $(μ, Σ, ν)$ for the classical MVT distribution has been discussed, see [17], in which $μ$ , $Σ$ and ν are assumed independent. And the locally uniform prior is assigned to $μ$ , the inverse Wishart prior is assigned to $Σ$ as

p (μ) \propto constant and p (Σ) \propto | Σ |^{- \frac{m + 1}{2}} \exp [- \frac{1}{2} t r (A Σ^{- 1})],

(24)

where m is a scalar and $A$ is a $d \times d$ non-negative definite matrix. If m = d and $A = O$ (i.e. zero matrix), $p (Σ)$ is the non-informative prior; if m = −1 and $A = O$ , $p (Σ)$ is the flat prior. The prior distribution for the hyperparameter, the degrees of freedom ν, is advised by some literature (see [17,25]). One assumption of the prior of ν is the flat prior distribution for $ν^{- 1}$ , i.e.

p (ν) \propto ν^{- 2}, ν \geq 1.

(25)

For the proposed Type II MVT distribution $t_{d}^{(I I)} (μ, Σ, ν)$ , similarly we can apply (24) as the priors for $(μ, Σ)$ , and extend the prior for the univariate hyperparameter in (25) to the multivariate case as

p (ν) = \prod_{i = 1}^{d} ν_{i}^{- 2}, ν_{i} \geq 1 for i = 1, \dots, d .

(26)

The posterior distribution of the complete-data is

\begin{aligned} p (μ, Σ, ν | Y_{c o m}) \propto L (μ, Σ, ν | Y_{c o m}) \cdot p (μ, Σ) \cdot p (ν) \\ \propto | Σ |^{- \frac{n + m + 1}{2}} \exp [- \frac{1}{2} t r (C_{1} Σ^{- 1} + A Σ^{- 1})] \\ \times \prod_{i = 1}^{d} [\prod_{j = 1}^{n} \frac{{(\frac{ν_{i}}{2})}^{ν_{i} / 2}}{Γ (\frac{ν_{i}}{2})} u_{i j}^{\frac{ν_{i} - 1}{2}} \exp (- \frac{ν_{i}}{2} u_{i j})] ν_{i}^{- 2}, \end{aligned}

where

C_{1} = \sum_{j = 1}^{n} U_{j}^{\frac{1}{2}} (x_{j} - μ) (x_{j} - μ)^{⊤} U_{j}^{\frac{1}{2}} .

Firstly, the conditional predictive distribution of missing data is derived as

f (Y_{m i s} | Y_{o b s}, μ, Σ, ν) = \prod_{j = 1}^{n} f_{u_{j} | x_{j}} (u_{j} | x_{j}, μ, Σ, ν),

where

\begin{aligned} f_{u_{j} | x_{j}} (u_{j} | x_{j}, μ, Σ, ν) \\ \propto (\prod_{i = 1}^{d} u_{i j}^{\frac{ν_{i} - 1}{2}}) \exp [- \frac{1}{2} (x_{j} - μ)^{⊤} (U_{j}^{\frac{1}{2}} Σ^{- 1} U_{j}^{\frac{1}{2}}) (x_{j} - μ) - \frac{1}{2} \sum_{i = 1}^{d} ν_{i} u_{i j}] . \end{aligned}

(27)

To generate samples of $(μ, Σ, ν)$ , we combine the Gibbs sampling method with the acceptance–rejection (AR) algorithm (see [27]). We propose that the envelope density and the envelope constant are given by

\begin{aligned} g_{j} (u_{j}) & = \prod_{i \in I} Gamma (u_{i j} | \frac{ν_{i} + 1}{2}, \frac{ν_{i}}{2}) \cdot \prod_{i \notin I} U (u_{i j} | 0, {(\frac{e ν_{i}}{ν_{i} - 1})}^{\frac{ν_{i} - 1}{2}}) and \\ c & = \prod_{i \in I} \frac{Γ (\frac{ν_{i} + 1}{2})}{(ν_{i} / 2)^{\frac{ν_{i} + 1}{2}}}, \end{aligned}

(28)

respectively, where

I = {i | 0 < ν_{i} \leq 1.837, i = 1, \dots, d}

and $U (\cdot | a, b)$ denotes the density of the uniform distribution $U (a, b)$ . Note that $Σ$ and $Σ^{- 1}$ are positive definite, for any non-zero vector $z$ , we always have $z^{⊤} Σ^{- 1} z \geq 0$ . Thus, from (27), it follows that

\begin{aligned} f_{u_{j} | x_{j}} (u_{j} | x_{j}, μ, Σ) & \leq \prod_{i = 1}^{d} [u_{i j}^{\frac{ν_{i} - 1}{2}} \exp (- \frac{ν_{i} u_{i j}}{2})] \\ \leq [\prod_{i \in I} u_{i j}^{\frac{ν_{i} - 1}{2}} \exp (- \frac{ν_{i} u_{i j}}{2})] \cdot \prod_{i \notin I} {(\frac{e ν_{i}}{ν_{i} - 1})}^{- \frac{ν_{i} - 1}{2}} = c g_{j} (u_{j}), \end{aligned}

since the function $h (u) = u^{\frac{ν - 1}{2}} \exp (- ν u / 2)$ arrives its maximum $[e ν / (ν - 1)]^{- \frac{ν - 1}{2}}$ at $u = 1 - 1 / ν$ when $ν > 1.837$ . For $0 < ν \leq 1.837$ , we have $Γ (\frac{ν + 1}{2}) / (ν / 2)^{\frac{ν + 1}{2}} \geq 1$ . Overall, the envelope constant c specified by (28) is always greater than or equal to 1, indicating that the function in (27) is minorized by $c g_{j} (u_{j})$ .

Thus, the procedures to generate posterior samples by AR algorithm are given as:

Step 1:
For each $j \in {1, \dots, n}$ , draw $u_{j} = u_{j}$ from $f_{u_{j} | x_{j}} (u_{j} | x_{j}, μ, Σ, ν)$ through the AR algorithm with the following two steps:
1. Draw $V_{j} = v_{j} \sim U (0, 1)$ , and independently draw $Y_{i j} = y_{i j} \overset{i n d}{\sim} Gamma (\frac{ν_{i} + 1}{2}, \frac{ν_{i}}{2})$ for $i \in I$ and $Y_{i j} = y_{i j} \overset{i n d}{\sim} U (0, [e ν_{i} / (ν_{i} - 1)]^{\frac{ν_{i} - 1}{2}})$ for $i \notin I$ . Set $y_{j} = (y_{1 j}, \dots, y_{d j})^{⊤}$ .
2. If $v_{j} \leq f_{u_{j} | x_{j}} (y_{j} | x_{j}, μ, Σ, ν) / c g_{j} (y_{j})$ (c and $g_{j} (\cdot)$ are given by (28)), set $u_{j} = u_{j} = y_{j}$ ; otherwise, go back to (a),
Step 2:
Draw $(μ, Σ, ν)$ from $p (μ, Σ, ν | Y_{c o m})$ via the Gibbs sampling method with the following loop:
$\begin{aligned} μ | (Σ, ν, Y_{o b s}, Y_{m i s}) & \sim N_{d} (A_{1}^{- 1} B_{1}, A_{1}^{- 1}), \\ Σ | (μ, ν, Y_{o b s}, Y_{m i s}) & \sim {IWishart}_{d} (C_{1} + A, n + m - d), \end{aligned}$
where
$A_{1} = \sum_{j = 1}^{n} U_{j}^{\frac{1}{2}} Σ^{- 1} U_{j}^{\frac{1}{2}} and B_{1} = \sum_{j = 1}^{n} U_{j}^{\frac{1}{2}} Σ^{- 1} U_{j}^{\frac{1}{2}} x_{j},$
and generate $ν$ from the posterior distribution of $ν$ given $(μ, Σ, Y_{o b s}, Y_{m i s})$ via the grid method, with the density for each $ν_{i}$ given by
$f (ν_{i} | μ, Σ, Y_{o b s}, Y_{m i s}) \propto [\prod_{j = 1}^{n} \frac{{(ν_{i} / 2)}^{ν_{i} / 2}}{Γ (ν_{i} / 2)} u_{i j}^{\frac{ν_{i} - 1}{2}} \exp (- \frac{ν_{i}}{2} u_{i j})] ν_{i}^{- 2} .$

The above methods could be extended to the regression model specified in (20) or (21) by assigning priors for $(β, Σ, ν)$ .

5. Simulation studies

The components in Type I MVT distribution must be dependent and share the same degrees of freedom. While the proposed Type II MVT distribution can accommodate different marginal amounts of tail weights on univariate t distributions and even approximate normal margins. Thus, we conduct some simulations to compare the fitting performances of the two distributions when data are generated from one of them and highlight the characteristic of Type II MVT distribution.

We consider bivariate and trivariate cases, i.e. d = 2, 3. The sample size is set as n = 100, the mean vector $μ$ is always chosen to be zero, the scale matrix $Σ = (σ_{i j})_{d \times d}$ is set as the diagonal elements $σ_{i i} = 1$ for all i and the off-diagonal elements $σ_{i j}$ for $i \neq j$ will be specified in each case. Under different configurations of parameters $(μ, Σ, ν, ν)$ , samples will be generated from either $t_{d} (μ, Σ, ν)$ or $t_{d}^{(I I)} (μ, Σ, ν)$ as indicated by the first line of each experiment in the following two tables, where the results are obtained by fitting samples by both of the two distributions. In the case of d = 2, we have Experiments 1–4, the comparisons of estimation results including the average MLE, the corresponding mean squared error (MSE) of the estimates based on L = 1000 replications are presented in Table 3. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are used to reflect their fitting performances by averaging the AIC values and BIC values for each fitted model.

Table 3.

Comparisons of estimate performances with d = 2.

	Experiment 1: $t_{2} (μ, Σ, ν)$ , $σ_{12} = 0.5$ , $ν = 3$
Type I	$μ_{1}$	$μ_{2}$	$σ_{11}$	$σ_{12}$	$σ_{22}$	ν		AIC	BIC
MLE	0.0009	0.0024	0.9794	0.4919	0.9955	2.9793		682.7873	698.4183
MSE	0.0142	0.0127	0.0401	0.0194	0.0394	0.3975		–	–
Type II	$μ_{1}$	$μ_{2}$	$σ_{11}$	$σ_{12}$	$σ_{22}$	$ν_{1}$	$ν_{2}$	AIC	BIC
MLE	0.0011	0.0026	1.0712	0.6064	1.0978	3.4333	3.4489	694.9977	713.2339
MSE	0.0158	0.0145	0.0681	0.0480	0.0785	–	–	–	–
	Experiment 2: $t_{2}^{(I I)} (μ, Σ, ν)$ , $σ_{12} = 0.5$ , $(ν_{1}, ν_{2}) = (3, 3)$
Type I	$μ_{1}$	$μ_{2}$	$σ_{11}$	$σ_{12}$	$σ_{22}$	ν		AIC	BIC
MLE	−0.0024	−0.0035	1.1145	0.4475	1.1175	3.2953		705.6417	721.2727
MSE	0.0140	0.0148	0.0734	0.0168	0.0802	–		–	–
Type II	$μ_{1}$	$μ_{2}$	$σ_{11}$	$σ_{12}$	$σ_{22}$	$ν_{1}$	$ν_{2}$	AIC	BIC
MLE	−0.0006	−0.0014	0.9967	0.4965	1.0077	3.1612	3.2248	697.3014	715.5376
MSE	0.0127	0.0134	0.0553	0.0165	0.0567	0.8421	0.8900	–	–
	Experiment 3: $t_{2}^{(I I)} (μ, Σ, ν)$ , $σ_{12} = 0.5$ , $(ν_{1}, ν_{2}) = (3, 10)$
Type I	$μ_{1}$	$μ_{2}$	$σ_{11}$	$σ_{12}$	$σ_{22}$	ν		AIC	BIC
MLE	−0.0025	−0.0050	1.2948	0.4552	0.8495	4.8398		653.4693	669.1004
MSE	0.0166	0.0128	0.1672	0.0164	0.0475	–		–	–
Type II	$μ_{1}$	$μ_{2}$	$σ_{11}$	$σ_{12}$	$σ_{22}$	$ν_{1}$	$ν_{2}$	AIC	BIC
MLE	−0.0024	−0.0047	0.9957	0.5007	0.9884	3.0641	10.0156	646.4751	664.7113
MSE	0.0152	0.0121	0.0551	0.0159	0.0266	0.7348	5.0330	–	–
	Experiment 4: $t_{2}^{(I I)} (μ, Σ, ν)$ , $σ_{12} = 0$ , $(ν_{1}, ν_{2}) = (5, 5)$
Type I	$μ_{1}$	$μ_{2}$	$σ_{11}$	$σ_{12}$	$σ_{22}$	ν		AIC	BIC
MLE	−0.0030	0.0005	1.0831	0.0013	1.0807	5.5423		666.9350	682.5660
MSE	0.0127	0.0133	0.0470	0.0097	0.0533	–		–	–
Type II	$μ_{1}$	$μ_{2}$	$σ_{11}$	$σ_{12}$	$σ_{22}$	$ν_{1}$	$ν_{2}$	AIC	BIC
MLE	−0.0029	0.0005	0.9560	0.0006	0.9533	4.6581	4.6488	663.7192	681.9554
MSE	0.0123	0.0126	0.0372	0.0096	0.0417	1.4100	1.4080	–	–

Open in a new tab

Note: MLE is the average of 1000 point estimates with precision $δ = 10^{- 2}$ via the Monte Carlo ECM algorithm; Mean squared error (MSE) of the estimate is equal to the sum of the variance and the squared bias of the estimator.

In the case of d = 3, we conduct Experiments 5–6, estimation accuracies on parameters are compared by MLE and MSE as summarized in Table 4. In a more direct-viewing way, box plots are adopted to present the estimation results for each experiment in Figures 5–7.

Figure 6. — (a1–a3) Experiment 3: Box plots for MLEs of the parameters by bivariate Type I and Type II MVT distributions; (b1–b3) Experiment 4: Box plots for MLEs of the parameters by bivariate Type I and Type II MVT distributions.

Table 4.

Comparisons of estimate performances with d = 3.

	Experiment 5: $t_{3} (μ, Σ, ν)$ , $σ_{i j} = (- 1)^{\| i - j \|} \times 0.5$ , $ν = 3$
Type I	$μ_{1}$	$μ_{2}$	$μ_{3}$	$σ_{11}$	$σ_{12}$	$σ_{13}$
MLE	0.0233	0.0020	0.0250	0.9997	−0.4970	0.5032
MSE	0.0102	0.0118	0.0101	0.0230	0.0118	0.0122
Type I	$σ_{22}$	$σ_{23}$	$σ_{33}$	ν
MLE	0.9996	−0.4974	1.0042	3.0911
MSE	0.0261	0.0118	0.0280	0.3512
Type II	$μ_{1}$	$μ_{2}$	$μ_{3}$	$σ_{11}$	$σ_{12}$	$σ_{13}$
MLE	0.0213	0.0016	0.0218	0.9147	−0.5093	0.5146
MSE	0.0070	0.0080	0.0077	0.0398	0.0191	0.0191
Type II	$σ_{22}$	$σ_{23}$	$σ_{33}$	$ν_{1}$	$ν_{2}$	$ν_{3}$
MLE	0.9116	−0.5081	0.9157	2.9983	2.9775	2.9868
MSE	0.0434	0.0182	0.0446	–	–	–
	Experiment 6: $t_{3}^{(I I)} (μ, Σ, ν)$ , $σ_{i j} = (- 1)^{\| i - j \|} \times 0.5$ , $(ν_{1}, ν_{2}, ν_{3}) = (3, 3, 3)$
Type I	$μ_{1}$	$μ_{2}$	$μ_{3}$	$σ_{11}$	$σ_{12}$	$σ_{13}$
MLE	0.0356	0.0075	0.0330	1.2886	−0.5332	0.5327
MSE	0.0155	0.0177	0.0157	0.1558	0.0181	0.0175
Type I	$σ_{22}$	$σ_{23}$	$σ_{33}$	ν
MLE	1.2996	−0.5349	1.2854	4.0222
MSE	0.1648	0.0185	0.1547	–
Type II	$μ_{1}$	$μ_{2}$	$μ_{3}$	$σ_{11}$	$σ_{12}$	$σ_{13}$
MLE	0.0230	0.0045	0.0206	0.9020	−0.4621	0.4629
MSE	0.0074	0.0089	0.0082	0.0405	0.0132	0.0125
Type II	$σ_{22}$	$σ_{23}$	$σ_{33}$	$ν_{1}$	$ν_{2}$	$ν_{3}$
MLE	0.9039	−0.4646	0.8966	2.9699	2.9454	2.9617
MSE	0.0433	0.0139	0.0445	0.3139	0.3137	0.3377

Open in a new tab

Figure 5. — (a1–a3) Experiment 1: Box plots for MLEs of the parameters by bivariate Type I and Type II MVT distributions; (b1–b3) Experiment 2: Box plots for MLEs of the parameters by bivariate Type I and Type II MVT distributions.

Figure 7. — (a1–a3) Experiment 5: Box plots for MLEs of the parameters by trivariate Type I and Type II MVT distributions; (b1–b3) Experiment 6: Box plots for MLEs of the parameters by trivariate Type I and Type II MVT distributions.

From Tables 3 and 4, it is observed that the performance of the Monte Carlo ECM algorithm in parameter estimation of Type II MVT distribution is quite satisfactory in the sense that all the average MLEs are close to their true values. When data are indeed generated from Type I MVT distribution as in Experiments 1 and 5, the point estimates in Type I MVT are closer to their true values and are more stable with smaller MSEs for most of the parameters, and Type I MVT overwhelms Type II MVT in model fitting in terms of AIC and BIC values. While the estimation by Type II MVT still gives an acceptable result where the estimates of $(μ, Σ)$ are not too far away from the true values and the estimates on $ν_{i}$ 's are close to each other. Moreover, when samples are indeed generated from Type II MVT distribution with identical $ν_{i}$ 's as in Experiments 2, 4, 6, the estimation performance by Type II MVT distribution is better, as the estimates are more accurate and have smaller MSEs; while in Type I MVT, from Figures 5–7, we can see that the estimations on parameters especially $Σ$ and ν are more likely to produce outliers. If the true values of $ν_{1}$ and $ν_{2}$ are distinct as in Experiment 3, the Type II MVT distribution apparently outperforms Type I MVT distribution which has only one degrees of freedom parameter, and Type II MVT distribution has a better depiction on different amounts of tail weights for marginal components.

6. Applications

To ease the numerical work and graphical presentation, we focus on the bivariate case to provide the numerical illustration of the presented methodologies.

6.1. Sport data

We make use of a data set, collected by the Australian Institute of Sport and reported, see [3], containing several variables measured on n = 202 Australian athletes. Specifically, we consider the pair of variables BMI (body mass index) and LBM (lean body mass). Table 5 summaries the parameter estimates from Type I bivariate t distribution $t_{2} (μ, Σ, ν)$ with unknown ν and Type II bivariate t distribution $t_{2}^{(I I)} (μ, Σ, ν)$ with unknown $ν$ .

Table 5.

MLEs of parameters for fitting (BMI, LBM).

	MLEs of parameters					Criterion
Dist.	$μ$	$Σ$	ν	$ν$	log-L	AIC	BIC
Type I	$(\begin{matrix} 22.7479 \\ 64.3418 \end{matrix})$	$(\begin{array}{cc} 6.2010 & 21.3709 \\ 21.3709 & 147.3899 \end{array})$	11.0467	–	−1228.471	2468.942	2488.792
Type II	$(\begin{matrix} 22.6509 \\ 64.1985 \end{matrix})$	$(\begin{array}{cc} 5.4081 & 22.3384 \\ 22.3384 & 167.2201 \end{array})$	–	$(\begin{matrix} 6.2217 \\ 74.1270 \end{matrix})$	−1223.653	2461.307	2484.464

Open in a new tab

To test the independence between BMI and LBM, from (18) and (19), the value of the test statistic T is given as $t = 144.9212 ≫ χ_{0.05}^{2} (1) = 3.84$ and the corresponding p-value $\approx 0 ≪ 0.05$ , indicating that the null hypothesis should be rejected at 0.05 significance level.

Figure 8 displays the scatter plot and the superimposed contours of the fitted Type II bivariate t distribution for the (BMI, LBM) pair. When comparing the two models, the proposed Type II distribution is selected by both AIC and BIC, also with a larger value of log-likelihood. Besides, the estimated correlation coefficient from (11) is $r = 0.6981$ , which is very close to the sample index $ρ = 0.7139$ . The Kendall's tau and Spearman's rho for measuring the rank correlation between BMI and LBM are $r_{k} = 0.5176$ and $r_{s} = 0.7026$ respectively. All of these statistics reveal that there is an obvious positive correlation between the two variables, this feature is well captured by the proposed Type II MVT distribution.

6.2. Tuberculosis vaccine data

The data consist of 13 trials on the efficacy of Bacillus Calmette–Guéin (BCG) vaccine against tuberculosis for a vaccinated group and a non-vaccinated control group, as presented in Table 6. Let VD, VND denote the number of disease cases and non-disease cases in the vaccinated group, NVD, NVND denote the number of disease cases and non-disease cases in the non-vaccinated group, respectively. Some covariates are available: geographic latitude of the place where the study was done; year of publication. Here, we will carry out a bivariate analysis on the log-odds of tuberculosis in the vaccinated and not-vaccinated control arm. Let $y = (Y_{1}, Y_{2})^{⊤}$ where $Y_{1} = \log (V D / V N D)$ and $Y_{2} = \log (N V D / N V N D)$ , and the covariates are chosen to be $x = (1, x_{1}, x_{2})^{⊤}$ , where $x_{1} = L a t i t u d e - 33$ , $x_{2} = Y e a r - 66$ . The Type II bivariate t regression model is applied to analyze this data set.

Table 6.

Data from clinical trials on the efficacy of BCG vaccine in the prevention of tuberculosis (see [26]).

	Vaccinated		Not Vaccinated
Trial	Disease	No disease	Disease	No disease	Latitude	Year
1	4	119	11	128	44	48
2	6	300	29	274	55	49
3	3	228	11	209	42	60
4	62	13536	248	12619	52	77
5	33	5036	47	5761	13	73
6	180	1361	372	1079	44	53
7	8	2537	10	619	19	73
8	505	87886	499	87892	13	80
9	29	7470	45	7232	27	68
10	17	1699	65	1600	42	61
11	186	50448	141	27197	18	74
12	5	2493	3	2338	33	69
13	27	16886	29	17825	33	76

Open in a new tab

By adopting the bivariate t regression model specified by (21) and setting the initial values of $(β, Σ, ν)$ as

β^{(0)} = 0.1 \times 1 1_{6}, Σ^{(0)} = Var (y) = (\begin{array}{cc} 1.4994 & 1.8882 \\ 1.8882 & 2.7850 \end{array}), ν^{(0)} = 1 1_{2},

we calculate the MLEs of these parameters by using the Monte Carlo ECM algorithm (22), (13) and (23), which converged to $(\hat{β}, \hat{Σ}, \hat{ν})$ in 51 iterations with precision $δ = 10^{- 2}$ . These results are summarized in Table 7.

Table 7.

MLEs and posterior estimates of parameters for the tuberculosis vaccine data.

			Posterior	Posterior	95% Bayesian
Parameter	Coefficients	MLE	Mean	Std	Credible Interval
	Constant	−5.006	−5.017	0.190	[−5.408, −4.606]
$β_{1}$	$x_{1}$	0.009	0.005	0.021	[−0.031,0.046]
	$x_{2}$	−0.080	−0.085	0.028	[−0.128, −0.026]
	Constant	−4.137	−4.139	0.243	[−4.652, −3.685]
$β_{2}$	$x_{1}$	0.043	0.035	0.032	[−0.023, 0.116]
	$x_{2}$	−0.078	−0.090	0.041	[−0.168, −0.003]
	$σ_{11}$	0.445	0.494	0.192	[0.235, 1.005]
$Σ$	$σ_{12}$	0.510	0.519	0.159	[0.245, 0.812]
	$σ_{22}$	0.649	0.690	0.220	[0.305, 1.085]
$ν$	$ν_{1}$	13.834	12.870	0.581	[12.027, 13.923]
	$ν_{2}$	3.062	3.234	0.141	[3.018, 3.479]

Open in a new tab

To illustrate the proposed Bayesian methods in Section 4, we assign the non-informative prior of (24) to $(β, Σ)$ (i.e. $p (β, Σ) \propto | Σ |^{- \frac{d + 1}{2}}$ ) and (26) to $ν$ , we generate 1000 posterior samples of $(β, Σ, ν)$ by using the Gibbs sampling embedded with the AR algorithm. By discarding the first half of these samples, we can calculate the posterior means, the posterior standard deviations and the 95% Bayesian credible intervals as shown in the last three columns of Table 7.

Now suppose that we want to test the independence between $Y_{1}$ and $Y_{2}$ , i.e. to test whether $σ_{12}$ is zero or not. Under the null hypothesis, we have $Y_{i j} \sim t (x_{j}^{⊤} β_{i}, σ_{i i}, ν_{i})$ for $j = 1, \dots, n$ and i = 1, 2. That is, we fit each component by the univariate t regression model. The parameters for each univariate distribution are estimated by

\begin{aligned} {\hat{β}}_{1} & = (- 4.9466, - 0.0038, - 0.0809)^{⊤}, {\hat{σ}}_{11} = 0.2683, {\hat{ν}}_{1} = 3.6620, \\ {\hat{β}}_{2} & = (- 3.8122, 0.0275, - 0.0582)^{⊤}, {\hat{σ}}_{22} = 0.0012, {\hat{ν}}_{2} = 0.3679. \end{aligned}

From (18)–(19), the LRT statistic is given by t = 9.6778 and the corresponding $p -value = 0.0019 < 0.05$ . Thus, the null hypothesis should be rejected.

Moreover, from (11), the estimated correlation coefficient between $Y_{1}$ and $Y_{2}$ is $r = 0.7487$ , which is acceptable comparing with the sample correlation coefficient given by 0.924, indicating a positive correlation between $Y_{1}$ and $Y_{2}$ . The statistics of Kendall's tau and Spearman's rho for rank correlation coefficient between $Y_{1}$ and $Y_{2}$ are $r_{k} = 0.7692$ and $r_{s} = 0.8956$ , concurring with the above result. The difference of estimates on $ν_{1}$ and $ν_{2}$ from the marginal analysis indicates the distinct amounts of tail weights for $Y_{1}$ and $Y_{2}$ , thus the Type II bivariate t regression model provides a more reasonable fit to this data set than Type I model.

From the regression results, we know that the covariate $x_{1}$ is insignificant while $x_{2}$ is significant. To get a final model, we still need some extra work. We use this data set just for demonstrating that the proposed model is applicable to a regression analysis. Furthermore, incorporating covariates into the scale structure could be a potential future work.

7. Discussions

To overcome three obvious disadvantages associated with the classical MVT distribution, in this paper, we introduced a Type II MVT distribution as a new robust alternative to the multivariate normal distribution. This new distribution has three noteworthy characteristics: (1) All components follow univariate t-distributions with not necessarily identical degrees of freedom; (2) it is applicable to the multivariate data that some components are marginally t distributed while some are approximately normal distributed, depending on small or large value of the corresponding $ν_{i}$ ; and (3) it could contain some dependent components and some statistically independent components. Therefore, the proposed distribution is more flexible in model specification. Confirmed by the two real data sets presented in Section 6, the variables are moderately or strongly correlated but with very different tail weights seen from the marginal perspective. In such cases, both the dependency structure and various amounts on tails should be considered. The classical MVT distribution has a very conspicuous drawback that limits the same degrees of freedom for all components. Instead, the proposed Type II MVT distribution covers the two aspects thus it is superior to existing models and has a better performance in our two real data fitting.

In general, although the joint density of the Type II MVT random vector does not have a closed-form expression, the SR (6) is very useful in the derivation of the marginal distributions, mixed moments, and Monte Carlo ECM algorithm. From the characteristic (1) in Section 2, we have shown that each component follows a univariate t-distribution. In fact, any sub-vectors of $x$ , $(X_{i_{1}}, X_{i_{2}}, \dots, X_{i_{r}})^{⊤}$ say, follows $t_{r}^{(I I)} (μ^{*}, Σ^{*}, ν^{*})$ , where $μ^{*} = (μ_{i_{1}}, μ_{i_{2}}, \dots, μ_{i_{r}})^{⊤}$ , $Σ^{*}$ is a sub-matrix consisting of the $i_{1}, i_{2}, \dots, i_{r}$ rows and $i_{1}, i_{2}, \dots, i_{r}$ columns of $Σ$ , and $ν^{*} = (ν_{i_{1}}, ν_{i_{2}}, \dots, ν_{i_{r}})^{⊤}$ . We also applied the Type II MVT distribution to the linear regression model by adopting it as the joint distribution of the error terms, taking advantage of its flexibility to better capture the characteristic of the data when outliers exist.

Further extension of (6) can be considered as follows:

X_{i} = μ_{i} + \frac{Z_{i}}{\sqrt{(V + V_{i}) / (ν + ν_{i})}}, i = 1, \dots, d,

(29)

where $z = (Z_{1}, \dots, Z_{d})^{⊤} \sim N_{d} (0 0, Σ)$ , $V \sim χ^{2} (ν)$ , $V_{i} \sim χ^{2} (ν_{i})$ for $i = 1, \dots, d$ , and $(z, V, V_{1}, \dots, V_{d})$ are mutually independent. In particular, in (29) when $ν_{1} = \dots = ν_{d} = 0$ , it reduces to Type I MVT distribution; and when $ν = 0$ , it reduces to Type II MVT distribution. Note that the Type I MVT distribution (2) and the proposed Type II MVT distribution (6) are non-nested. In (29), the dependency of the random vector $x = (X_{1}, \dots, X_{d})^{⊤}$ can possibly come from both the multivariate normal vector $z$ and the common Gamma random variable V. Besides, compared with the two models in (4)–(5), although similar in expressions, the model (29) will bring convenience in parameter estimations without embedding the constraint as stated in Remark 2.1. When analyzing real data, we can start with this general model (29), and turn to the reduced model based on the estimation results. The model (29) includes the Type I and Type II MVT distributions as special cases, thus it is much more convenient and useful for model selection.

As a comparison with the model proposed in [10], similarly we decompose the scale matrix into $Σ = D A D^{⊤}$ , where $D$ is the matrix of eigenvectors of $Σ$ and $A$ is a diagonal matrix whose entries are the eigenvalues of $Σ$ , then the vector $x$ in (6) can be reformulated as

x = μ + U^{- 1 / 2} D A^{1 / 2} z_{0},

(30)

where $z_{0} = (Z_{10}, \dots, Z_{d 0})^{⊤}$ is a d-dimensional Gaussian random vector with mean zero vector and covariance matrix equal to the identity matrix. Note that the model in (30) is different from the formulation in [10] since in general it is not commutative for matrix product, and only when $D$ is a diagonal matrix the two are equivalent.

Acknowledgments

The authors are grateful to the editor and anonymous reviewer's valuable comments and suggestions.

Appendix. Derivation of joint pdf $f_{x} (x)$ .

Based on the mixture expressions in (7), given the conditional distribution of $x ∣ (u = u)$ and the marginal distributions of $U_{i}$ 's, the joint pdf of $x$ is given by

\begin{aligned} f_{x} (x) & = \int_{R_{+}^{d}} f_{x | u} (x | u) f_{u} (u) d u \\ = \int_{R_{+}^{d}} {(2 π)}^{- \frac{d}{2}} | U^{- 1 / 2} Σ U^{- 1 / 2} |^{- \frac{1}{2}} \exp [- \frac{1}{2} (x - μ)^{⊤} (U^{- 1 / 2} Σ U^{- 1 / 2})^{- 1} (x - μ)] \\ \times \prod_{i = 1}^{d} \frac{(\frac{ν_{i}}{2})^{\frac{ν_{i}}{2}}}{Γ (\frac{ν_{i}}{2})} u_{i}^{\frac{ν_{i}}{2} - 1} e^{- \frac{ν_{i} u_{i}}{2}} d u \\ = {(2 π)}^{- \frac{d}{2}} | Σ |^{- \frac{1}{2}} [\prod_{i = 1}^{d} \frac{(\frac{ν_{i}}{2})^{\frac{ν_{i}}{2}}}{Γ (\frac{ν_{i}}{2})}] \\ \times \int_{R_{+}^{d}} (\prod_{i = 1}^{d} u_{i}^{\frac{ν_{i} - 1}{2}}) \exp {- \frac{1}{2} [{(u^{1 / 2})}^{⊤} X^{*} Σ^{- 1} X^{*} u^{1 / 2} + \sum_{i = 1}^{d} ν_{i} u_{i}]} d u . \end{aligned}

By employing the transformation of $w_{i} = u_{i}^{1 / 2}$ for $i = 1, \dots, d$ and then $d u_{i} = 2 w_{i} d w_{i}$ , it follows that

\begin{aligned} f_{x} (x) & = {(2 π)}^{- \frac{d}{2}} | Σ |^{- \frac{1}{2}} [\prod_{i = 1}^{d} \frac{(\frac{ν_{i}}{2})^{\frac{ν_{i}}{2}}}{Γ (\frac{ν_{i}}{2})}] \\ \times \int_{R_{+}^{d}} (\prod_{i = 1}^{d} w_{i}^{ν_{i} - 1}) \exp {- \frac{1}{2} [w^{⊤} X^{*} Σ^{- 1} X^{*} w + \sum_{i = 1}^{d} ν_{i} w_{i}^{2}]} (\prod_{i = 1}^{d} 2 w_{i}) d w \\ = {(\frac{π}{2})}^{- \frac{d}{2}} | Σ |^{- \frac{1}{2}} [\prod_{i = 1}^{d} \frac{(\frac{ν_{i}}{2})^{\frac{ν_{i}}{2}}}{Γ (\frac{ν_{i}}{2})}] \int_{R_{+}^{d}} (\prod_{i = 1}^{d} w_{i}^{ν_{i}}) \exp [- \frac{1}{2} w^{⊤} (X^{*} Σ^{- 1} X^{*} + Σ_{0}) w] d w \\ = {(\frac{π}{2})}^{- \frac{d}{2}} | Σ |^{- \frac{1}{2}} [\prod_{i = 1}^{d} \frac{(\frac{ν_{i}}{2})^{\frac{ν_{i}}{2}}}{Γ (\frac{ν_{i}}{2})}] \int_{R_{+}^{d}} (\prod_{i = 1}^{d} w_{i}^{ν_{i}}) \exp [- \frac{1}{2} w^{⊤} Σ^{* - 1} w] d w \\ = {(\frac{π}{2})}^{- \frac{d}{2}} | Σ |^{- \frac{1}{2}} [\prod_{i = 1}^{d} \frac{(\frac{ν_{i}}{2})^{\frac{ν_{i}}{2}}}{Γ (\frac{ν_{i}}{2})}] \cdot C \int_{R_{+}^{d}} (\prod_{i = 1}^{d} w_{i}^{ν_{i}}) \cdot \frac{1}{C} \exp [- \frac{1}{2} w^{⊤} Σ^{* - 1} w] d w \\ = {(\frac{π}{2})}^{- \frac{d}{2}} | Σ |^{- \frac{1}{2}} [\prod_{i = 1}^{d} \frac{(\frac{ν_{i}}{2})^{\frac{ν_{i}}{2}}}{Γ (\frac{ν_{i}}{2})}] \cdot μ_{ν} (w) \cdot C, \end{aligned}

where $X^{*}$ and $Σ_{0}$ are defined in (10).

Funding Statement

Chi Zhang's research was supported by National Natural Science Foundation of China (Grant No. 11801380). Guo-Liang Tian's research was fully supported by National Natural Science Foundation of China (Grant No. 11771199). Kam Chuen Yuen's research was supported by a Seed Fund for Basic Research of the University of Hong Kong, and a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. HKU17306220). The work of Man-Lai Tang was partially supported through grants from the Research Grant Council of the Hong Kong Special Administrative Region (UGC/FDS14/P06/17, UGC/FDS14/P02/18, and the Research Matching Grant Scheme (RMGS)) and a grant from the National Natural Science Foundation of China (11871124). The computing facilities/software were supported by SAS Viya and the Big Data Intelligence Centre at Hang Seng University of Hong Kong.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Amos D.E. and Bulgren W.G., On the computation of a bivariate t-distribution, Math. Comp. 23 (1969), pp. 319–333. [Google Scholar]
2.Booth J.G. and Hobert J.P., Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm, J. R. Stat. Soc. Ser. B. Stat. Methodol. 61 (1999), pp. 265–285. [Google Scholar]
3.Cook R.D. and Weisberg S., An Introduction to Regression Graphics, John Wiley & Sons, New York, 2009. [Google Scholar]
4.Cornish E.A., The multivariate t-distribution associated with a set of normal sample deviates, Aust. J. Phys. 7 (1954), pp. 531–542. [Google Scholar]
5.Cornish E.A., The multivariate t-distribution associated with the general multivariate normal distribution, CSIRO Tech. Paper No. 13, CSIRO Division in Mathematics and Statistics, Adelaide, 1962.
6.Dunnett C.W. and Sobel M., A bivariate generalization of Student's t-distribution, with tables for certain special cases, Biometrika 41 (1954), pp. 153–169. [Google Scholar]
7.Fang H.B., Fang K.T. and Kotz S., The meta-elliptical distributions with given marginals, J. Multivariate Anal. 82 (2002), pp. 1–16. [Google Scholar]
8.Fernández C. and Steel M.F.J., Multivariate student-t regression models: Pitfalls and inference, Biometrika 86 (1999), pp. 153–167. [Google Scholar]
9.Finegold M. and Drton M., Robust graphical modeling of gene networks using classical and alternative t-distributions, Ann. Appl. Stat. 5 (2011), pp. 1057–1080. [Google Scholar]
10.Forbes F. and Wraith D., A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: Application to robust clustering, Stat. Comput. 24 (2014), pp. 971–984. [Google Scholar]
11.Hofert M., On sampling from the multivariate t distribution, R. J. 5 (2013), pp. 129–136. [Google Scholar]
12.Horrace W.C., Some results on the multivariate truncated normal distribution, J. Multivariate Anal. 94 (2005), pp. 209–221. [Google Scholar]
13.Jones M.C., A dependent bivariate t distribution with marginals on different degrees of freedom, Statist. Probab. Lett. 56 (2002), pp. 163–170. [Google Scholar]
14.Kotz S. and Nadarajah S., Multivariate t Distributions and Their Applications, Cambridge University Press, Cambridge, UK, 2004. [Google Scholar]
15.Levine R.A. and Casella G., Implementations of the Monte Carlo EM algorithm, J. Comput. Graph. Statist. 10 (2001), pp. 422–439. [Google Scholar]
16.Lin P.E., Some characterizations of the multivariate t distribution, J. Multivariate Anal. 2 (1972), pp. 339–344. [Google Scholar]
17.Liu C., Missing data imputation using the multivariate t distribution, J. Multivariate Anal. 53 (1995), pp. 139–158. [Google Scholar]
18.Liu C., Bayesian robust multivariate linear regression with incomplete data, J. Amer. Statist. Assoc. 91 (1996), pp. 1219–1227. [Google Scholar]
19.Liu C., ML estimation of the multivariate t distribution and the EM algorithm, J. Multivariate Anal. 63 (1997), pp. 296–312. [Google Scholar]
20.Liu C. and Rubin D.B., The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence, Biometrika 81 (1994), pp. 633–648. [Google Scholar]
21.Liu C. and Rubin D.B., ML estimation of the t distribution using EM and its extensions, ECM and ECME, Statist. Sinica 5 (1995), pp. 19–39. [Google Scholar]
22.McLachlan G.J., Bean R.W. and Jones L.B.T., Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution, Comput. Statist. Data Anal. 51 (2007), pp. 5327–5338. [Google Scholar]
23.Nadarajah S. and Kotz S., Mathematical properties of the multivariate t distribution, Acta. Appl. Math. 89 (2005), pp. 53–84. [Google Scholar]
24.Nadarajah S. and Kotz S., Estimation methods for the multivariate t distribution, Acta. Appl. Math. 102 (2008), pp. 99–118. [Google Scholar]
25.Relles D.A. and Rogers W.H., Statisticians are fairly robust estimators of location, J. Amer. Statist. Assoc. 72 (1977), pp. 107–111. [Google Scholar]
26.van Houwelingen H.C., Arends L.R. and Stijnen T., Advanced methods in meta-analysis: Multivariate approach and meta-regression, Stat. Med. 21 (2002), pp. 589–624. [DOI] [PubMed] [Google Scholar]
27.von Neumann J., Various techniques used in connection with random digits, J. Res. Nat. Bur. Stand. Appl. Math. Series 3 (1951), pp. 36–38. [Google Scholar]
28.Wei G.C.G. and Tanner M.A., A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms, J. Amer. Statist. Assoc. 85 (1990), pp. 699–704. [Google Scholar]
29.Zellner A., Bayesian and non-Bayesian analysis of the regression model with multivariate student-t error terms, J. Amer. Statist. Assoc. 71 (1976), pp. 400–405. [Google Scholar]

[CIT0001] 1.Amos D.E. and Bulgren W.G., On the computation of a bivariate t-distribution, Math. Comp. 23 (1969), pp. 319–333. [Google Scholar]

[CIT0002] 2.Booth J.G. and Hobert J.P., Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm, J. R. Stat. Soc. Ser. B. Stat. Methodol. 61 (1999), pp. 265–285. [Google Scholar]

[CIT0003] 3.Cook R.D. and Weisberg S., An Introduction to Regression Graphics, John Wiley & Sons, New York, 2009. [Google Scholar]

[CIT0004] 4.Cornish E.A., The multivariate t-distribution associated with a set of normal sample deviates, Aust. J. Phys. 7 (1954), pp. 531–542. [Google Scholar]

[CIT0005] 5.Cornish E.A., The multivariate t-distribution associated with the general multivariate normal distribution, CSIRO Tech. Paper No. 13, CSIRO Division in Mathematics and Statistics, Adelaide, 1962.

[CIT0006] 6.Dunnett C.W. and Sobel M., A bivariate generalization of Student's t-distribution, with tables for certain special cases, Biometrika 41 (1954), pp. 153–169. [Google Scholar]

[CIT0007] 7.Fang H.B., Fang K.T. and Kotz S., The meta-elliptical distributions with given marginals, J. Multivariate Anal. 82 (2002), pp. 1–16. [Google Scholar]

[CIT0008] 8.Fernández C. and Steel M.F.J., Multivariate student-t regression models: Pitfalls and inference, Biometrika 86 (1999), pp. 153–167. [Google Scholar]

[CIT0009] 9.Finegold M. and Drton M., Robust graphical modeling of gene networks using classical and alternative t-distributions, Ann. Appl. Stat. 5 (2011), pp. 1057–1080. [Google Scholar]

[CIT0010] 10.Forbes F. and Wraith D., A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: Application to robust clustering, Stat. Comput. 24 (2014), pp. 971–984. [Google Scholar]

[CIT0011] 11.Hofert M., On sampling from the multivariate t distribution, R. J. 5 (2013), pp. 129–136. [Google Scholar]

[CIT0012] 12.Horrace W.C., Some results on the multivariate truncated normal distribution, J. Multivariate Anal. 94 (2005), pp. 209–221. [Google Scholar]

[CIT0013] 13.Jones M.C., A dependent bivariate t distribution with marginals on different degrees of freedom, Statist. Probab. Lett. 56 (2002), pp. 163–170. [Google Scholar]

[CIT0014] 14.Kotz S. and Nadarajah S., Multivariate t Distributions and Their Applications, Cambridge University Press, Cambridge, UK, 2004. [Google Scholar]

[CIT0015] 15.Levine R.A. and Casella G., Implementations of the Monte Carlo EM algorithm, J. Comput. Graph. Statist. 10 (2001), pp. 422–439. [Google Scholar]

[CIT0016] 16.Lin P.E., Some characterizations of the multivariate t distribution, J. Multivariate Anal. 2 (1972), pp. 339–344. [Google Scholar]

[CIT0017] 17.Liu C., Missing data imputation using the multivariate t distribution, J. Multivariate Anal. 53 (1995), pp. 139–158. [Google Scholar]

[CIT0018] 18.Liu C., Bayesian robust multivariate linear regression with incomplete data, J. Amer. Statist. Assoc. 91 (1996), pp. 1219–1227. [Google Scholar]

[CIT0019] 19.Liu C., ML estimation of the multivariate t distribution and the EM algorithm, J. Multivariate Anal. 63 (1997), pp. 296–312. [Google Scholar]

[CIT0020] 20.Liu C. and Rubin D.B., The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence, Biometrika 81 (1994), pp. 633–648. [Google Scholar]

[CIT0021] 21.Liu C. and Rubin D.B., ML estimation of the t distribution using EM and its extensions, ECM and ECME, Statist. Sinica 5 (1995), pp. 19–39. [Google Scholar]

[CIT0022] 22.McLachlan G.J., Bean R.W. and Jones L.B.T., Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution, Comput. Statist. Data Anal. 51 (2007), pp. 5327–5338. [Google Scholar]

[CIT0023] 23.Nadarajah S. and Kotz S., Mathematical properties of the multivariate t distribution, Acta. Appl. Math. 89 (2005), pp. 53–84. [Google Scholar]

[CIT0024] 24.Nadarajah S. and Kotz S., Estimation methods for the multivariate t distribution, Acta. Appl. Math. 102 (2008), pp. 99–118. [Google Scholar]

[CIT0025] 25.Relles D.A. and Rogers W.H., Statisticians are fairly robust estimators of location, J. Amer. Statist. Assoc. 72 (1977), pp. 107–111. [Google Scholar]

[CIT0026] 26.van Houwelingen H.C., Arends L.R. and Stijnen T., Advanced methods in meta-analysis: Multivariate approach and meta-regression, Stat. Med. 21 (2002), pp. 589–624. [DOI] [PubMed] [Google Scholar]

[CIT0027] 27.von Neumann J., Various techniques used in connection with random digits, J. Res. Nat. Bur. Stand. Appl. Math. Series 3 (1951), pp. 36–38. [Google Scholar]

[CIT0028] 28.Wei G.C.G. and Tanner M.A., A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms, J. Amer. Statist. Assoc. 85 (1990), pp. 699–704. [Google Scholar]

[CIT0029] 29.Zellner A., Bayesian and non-Bayesian analysis of the regression model with multivariate student-t error terms, J. Amer. Statist. Assoc. 71 (1976), pp. 400–405. [Google Scholar]

PERMALINK

A new multivariate t distribution with variant tail weights and its application in robust regression analysis

Chi Zhang

Guo-Liang Tian

Kam Chuen Yuen

Pengyi Liu

Man-Lai Tang

ABSTRACT

1. Introduction

2. Type II multivariate t distribution

Remark 2.1

2.1. Density function of Type II MVT distribution

Figure 1.

2.2. Moments and correlation

2.3. Comparison of the densities

Table 1.

Figure 2.

Figure 3.

Table 2.

Figure 4.

3. Estimation of parameters and test of independence

3.1. MLEs of (μ,Σ,ν) via the Monte Carlo ECM algorithm

3.2. Testing hypothesis of independence

3.3. Type II multivariate t regression model

4. Bayesian methods

5. Simulation studies

Table 3.

Figure 6.

Table 4.

Figure 5.

Figure 7.

6. Applications

6.1. Sport data

Table 5.

Figure 8.

6.2. Tuberculosis vaccine data

Table 6.

Table 7.

7. Discussions

Acknowledgments

Appendix. Derivation of joint pdf fx(x).

Funding Statement

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2. Type II multivariate $t$ distribution

3.1. MLEs of $(μ, Σ, ν)$ via the Monte Carlo ECM algorithm

Appendix. Derivation of joint pdf $f_{x} (x)$ .