Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Oct 1.
Published before final editing as: Commun Stat Theory Methods. 2025 Sep 11:10.1080/03610926.2025.2559116. doi: 10.1080/03610926.2025.2559116

Parameter-expanded data augmentation for analyzing multinomial probit models

Xiao Zhang 1
PMCID: PMC12483000  NIHMSID: NIHMS2113234  PMID: 41036164

Abstract

The multinomial probit model has been a prominent tool to analyze nominal categorical data, but the computational complexity of maximum likelihood functions presents challenges in the usage of this model. Furthermore, the model identification is extremely tenuous and usually necessitates the covariance matrix of the latent multivariate normal variables to be a restricted covariance matrix, which brings a rigorous task for both likelihood-based estimation and Markov chain Monte Carlo (MCMC) sampling. We tackle this issue by constructing a non-identifiable model and developing parameter-expanded data augmentation. Our proposed methods circumvent sampling a restricted covariance matrix commonly implemented by a painstaking Metropolis-Hastings (MH) algorithm and enable to sample a covariance matrix without restriction through a Gibbs sampler. Therefore, our proposed methods advance the convergence and mixing of the MCMC components considerably. We investigate our proposed methods along with the method based on the identifiable model through simulation studies and further illustrate their performance by an application to consumer choice on liquid laundry detergents data.

Keywords: MCMC, multinomial probit model, non-identifiable model, parameter-expanded data augmentation

1. Introduction

Discrete choice models have been commonly explored to analyze categorical data, especially in the fields of econometrics and transportation systems (Hausman and Wise 1978; Daganzo 1979). The multinomial logit (MNL) model and the multinomial probit (MNP) model are among the most popular discrete choice models. The MNL model assumes independence of irrelevant alternatives, which limits its use for categorical data with correlated levels (McFadden 1974, 1987). As an alternative, the MNP model allows correlated structure among latent utility scores underlying different categories and therefore has extended its use in the categorical data analysis. However, the MNP model is challenging to be utilized due to the computational complexity of maximum likelihood functions.

Various maximum likelihood estimation methods for analyzing the MNP model have been developed and investigated during the last couple of decades. Berndt, Hall, Hall, and Hausman (1974) proposed a minimum distance method; Clark (1961), Horowitz et al. (1982) and Kamakura (1989) explored approximation methods; McFadden (1989) and Pakes and Pollard (1989) developed the method of simulated moments estimator; and Geweke (1991), Hajivassiliou and McFadden (1990) and Keane (1992, 1994) conducted seminal work in simulated maximum likelihoods. Other methodical work including model estimability and identification can be referred to Bunch (1991), Bunch and Kitamura (1991), Geweke et al. (1994), Hajivassiliou et al. (1996), and Train (2009).

With the theoretical development of Gibbs sampler by Gelfand and Smith (1990) and the maturation of Markov chains theory (Tanner and Wong 1987; Smith and Roberts, 1993; Tierney, 1994; Gilks et al. 1996; Gelman et al. 2013), MCMC methods have been popularized and become a general computation tool in Bayesian inference. Pioneer contributions using MCMC to estimate the MNP model include Albert and Chib (1993), Geweke (1991), McCulloch and Rossi (1994), Geweke et al. (1994) and Chib et al. (1998). Albert and Chib (1993), Chib et al. (1998), Zhang et al. (2008) and Chan and Jeliazkov (2009) employed MH algorithm to sample restricted covariance matrix in the MNP model, while McCulloch and Rossi (1994) and Geweke et al. (1994) considered Gibbs sampling to sample a covariance matrix without restriction. Nobile (1998) added one Metropolis step along a direction of constant likelihood based on the work of McCulloch and Rossi (1994) to improve the convergence and mixing of the MCMC components. Chen and Kuo (2002) and Edwards and Allenby (2003) also considered the same model and conducted the MCMC analysis. McCulloch et al. (2000) specified a prior on the identified parameters and enabled to develop Gibbs sampling for the identifiable MNP model. The Bayesian methods avoid direct evaluation of the likelihoods and show advantages compared with the frequentist methods (Albert and Chib 1993; McCulloch and Rossi 1994; Geweke et al. 1994).

Liu and Wu (1999) proposed parameter-expanded data augmentation by expanding the parameter space through including a working/artificial parameter or parameter vector, which makes the identifiable model to be non-identifiable. They proved that under mild conditions the expanded data augmentation based on the non-identifiable model converges no slower than that based on the identifiable model. Another major work regarding extending the parameter space to accelerate the convergence of data augmentation is offered by Meng and van Dyk (1999). Liu et al. (1998) explored parameter expansion for the expectation-maximization algorithm; MacEachern (2007) illustrated this issue using the mixture of Dirichlet process model; and Lawrence et al. (2008) and Zhang (2020, 2022) investigated the matter using multivariate probit models.

Imai and van Dyk (2005a, b) developed the parameter-expanded data augmentation for the MNP model. Based on the work of Imai and van Dyk (2005a), Burgette and Nordheim (2012) and Burgette et al. (2021) developed the MCMC algorithms through trace restriction and symmetric prior for the MNP model, respectively. Jiao and van Dyk (2015) pointed out errors in the work of Imai and van Dyk (2005b), and Burgette et al. (2021) claimed to carefully correct the errors in their work. However, all these works are based on the same non-identifiable MNP model, and the derived data augmentation ignores the working parameter(s) entangled in both the regression parameters and the covariance matrix, and therefore, produces error.

In this investigation, we construct one non-identifiable MNP model and develop the correct and efficient parameter-expanded data augmentation for analyzing the MNP model. We point out the error in the work of Imai and van Dyk (2005a, 2005b) and those related to their work. The article is organized as follows. The MNP model and the constructed nonidentifiable model are presented in Section 2. Section 3 contains two proposed parameter-expanded data augmentation algorithms. We then illustrate our methods through simulation studies and compare them with the one based on the identifiable MNP model in Section 4. An application to the consumer choice on liquid laundry detergents data is demonstrated in Section 5. Finally, discussion and conclusions are offered in Section 6.

2. The MNP model

Let i=1,,n index subjects and j=1,,k+1 index levels of a multinomial/categorical outcome having k+1 levels. We let yij=1 if subject i chooses category level j and yij=0 if subject i chooses category level other than j, and yi=yi1,,yi(k+1) represents a multinomial 1×(k+1) vector. More compactly, we define d=d1,,dnT where di contains the index of the chosen category level for subject i, i.e., di=j if yij=1. For example, there is one three-categorical variable, if subject i chooses category level 2, then yi=yi1,yi2,yi3=(0,1,0) and di=2.

The MNP model assumes that there is a latent vector ui=ui1,,ui(k+1) underlying each multinomial vector yi, such that the multinomial outcome is determined by the maximum uij. That is

di=juijmax1lk+1uil. (1)

Then MNP model further assumes that the vector ui follows a multivariate normal distribution with mean equal to Aiβ and covariance matrix equal to V, where Ai is a (k+1)×p covariate matrix for subject i and β is a p×1 regression parameter vector. With this notation,

ui=Aiβ+δi (2)

where δi~Nk+1(0,V). Detailed specification regarding covariate matrix Ai can be referred to Keane (1992) and Geweke et al. (1994).

There are two identification problems in the above MNP model (Bunch 1991; Keane, 1992; Train 2009). Note that the model will not be changed if a constant is added to each element of ui in (1). This problem is usually solved by subtracting the (k+1)-th row of equation (2) from the first k rows. The model becomes

Zi=Xiβ+ϵi

where Zij=uij-ui(k+1),Xij=Aij-Ai(k+1),ϵij=δij-δi(k+1),ϵi~Nk0,Σr, and Σr=Ik,-1kVIk,-1kT, with Ik being identity matrix and 1k a vector of 1’s. Then the model can be described as:

di=0ifmax1lkZil<0jifmax1lkZil=Zij>0

We notice that the model is still not changed if Zil,l=1,,k, are multiplied by a non-zero constant. The usual way to solve this issue is to restrict the first element of Σr, to be equal 1. Then the model can be defined as follows:

di=0ifmax1lkZil<0jifmax1lkZil=Zij>0 (3)

where Zi~NkXiβ,Σr and the first diagonal element of Σr being equal to 1.

Based on Liu and Wu (1999), Meng and van Dyk (1999) and MacEachern (2007), the non-identifiable models may improve the convergence and mixing of MCMC components compared with the identifiable models. Wherefore, we construct a non-identifiable MNP model as

di=0ifmax1lkZil<0jifmax1lkZil=Zij>0 (4)

with Zi~Nσ11-1/2Xiβ,Σr. As can be seen, the model is non-identifiable due to introducing the redundant/artificial parameter σ11. Let Wi=σ11Zi, and then Wi~NXiβ,σ11Σr. Accordingly, the model can be defined as

di=0ifmax1lkWil<0jifmax1lkWil=Wij>0

with Wi~NkXiβ,σ11Σr. Notice that σ11Σr is a covariance matrix without restriction and we denote Σ=σ11Σr.

3. Parameter-expanded data augmentation

Continue with the notation of Section 2, the joint posterior density of β,Σ, and W=W1,,Wn given d=d1,,dn can be written as P(β,Σ,Wd)p(β)×p(Σ)×i=1nIi×ϕWi;Xiβ,Σ, where ϕ is the standard normal density function, and Ii=1di=0,wil<0,l=1,,k+1di=h,wih=max1lhwil,0. Then the MCMC sampling steps can be derived as follows:

  • Step 1: βΣ,W,d~Npβˆ,Vβ, where Vβ=i=1nXiTΣ-1Xi+C-1-1 and βˆ=Vβi=1nXiTΣ-1Wi+C-1b, assuming the prior of β follows Np(b,C).

  • Step 2: PWijβ,Σ,di,Wi(-j)Ii×PWijβ,Σ,Wi(-j), being a truncated univariate normal distribution.

  • Step 3: Σβ,W,d~Inverse-Wishartki=1nWi-XiβWi-XiβT+V,n+m+k+1), assuming a conjugate prior, Σ~Inverse-Wishartk(V,m).

We denote this algorithm as parameter-expanded Gibbs sampling (PX-GS) algorithm. To further improve the convergence and mixing of the Gibbs sampling (Liu 1994; Liu and Wu 1999), we marginalize σ11 by jointly sampling (W,σ11) given (β,Σr,d) and then (β,Σr,σ11) given (W,d):

  • Sampling (W,σ11β,Σr,d by first sampling σ11β,Σr,dσ11Σr, which is the conditional prior of σ11 given Σr and can be derived as σ11Σr~trVΣr-1χmk2, with tr being a trace of a matrix and χmk2 being a Chi-squared distribution with mk degrees of freedom; then followed by Wβ,Σr,σ11,dW(β,Σ,d) which is Step 2.

  • Sampling β,Σr,σ11(W,d) by first sampling βΣr,σ11,W,dβ(Σ,W,d) which is Step 1, then followed by Σr,σ11|(β,W,d)Σ|(β,W,d) which is Step 3.

We call this algorithm as parameter-expanded Gibbs sampling with marginalization for the MNP model (PX-GMN).

Imai and van Dyk (2005a, b) proposed marginal data augmentation by assuming that σ11Zi=σ11Zi1,,ZikT follows Nkσ11Xiβ,σ11Σr, for i=1,,n. The joint posterior density of β,Σr,σ11, and W=W1,,Wn given the observed d can be derived as follows:

Pβ,Σr,σ11,Wdi=1nIi×P(β)×PΣr,σ11×σ11-n/2×Σr-n2×exp-12i=1nWi-σ11XiβTσ11-12Σr-1σ11-12Wi-σ11Xiβ.

As can be seen, the posterior conditional distribution for β is a multivariate normal distribution, and the posterior conditional distribution of each Wij follows truncated normal distribution. The posterior density of Σ can be written as follows:

P(Σβ,W,d)P(Σ)×|Σ|-n2×exp-12i=1nWi-σ11XiβTΣ-1Wi-σ11Xiβ.

As is shown, although the prior of Σ,P(Σ), can be an Inverse-Wishart distribution, however, the posterior density of Σ cannot be from an Inverse-Wishart distribution due to σ11 being entangled in the terms of σ11Xiβ. Therefore, the derived MCMC sampling algorithms in Imai and van Dyk (2005a, b) and those related to their work (Burgette and Nordheim 2012; Burgette et al. 2021; Jiao and van Dyk 2015) contain errors. Algorithm 1 in Imai and van Dyk (2005a) can be restated in the following steps:

  • Sampling W,σ11β,Σr,d by sampling W~β~,Σ,d, followed by sampling σ11Σr, then set W=σ11W~.

  • Sampling β,σ11|(W,Σ,d) by sampling σ11W,Σr,d then β(W,Σ,d).

  • Sampling Σ(β,W,d).

Here the notation “~” indicates the identifiable parameters. As can be seen, σ11 is marginalized in the above three steps instead of two steps specified in the PX-GMN algorithm. However, due to the fail to derive the correct joint posterior distribution, the derivation of σ11W,Σr,d is erroneous. We continue to point out the theoretical flaw in Algorithm 2 in Imai and van Dyk (2005a). Specifically, Algorithm 2 can be rewritten as:

  • Sampling W~β~,Σ,d, followed by sampling σ11Σr, then set W=σ11W~.

  • Sampling Σ(β,W,d).

  • β~W~,Σr,d with β~=β/σ11.

As noted, in essence, the algorithm jointly samples (W,Σ,β~) given d. However, this is invalid since it samples the identifiable β~ and nonidentifiable (W,Σ) jointly instead of either joint sampling the identifiable (W~,Σr,β~) based on the identifiable model or joint sampling the nonidentifiable (W,Σ,β) based on nonidentifiable model, which is the PX-GS algorithm. The main cause for those errors to occur in both algorithms is that their nonidentifiable model is incapable to produce Gibbs sampling for all parameters, and moreover the construction of the sampling steps is not based on the joint posterior distribution, which is failed to be derived.

4. Simulation studies

We conduct thorough simulation studies to illustrate the performance of the PX-GS and PX-GMN algorithms proposed in Section 3 based on the nonidentifiable model and compare them with the PX-MH algorithm based on the identifiable model which entails a MH algorithm for sampling the restricted covariance matrix (Zhang et al. 2008).

The simulation studies are designed based on the simulated setting by McCulloch and Rossi (1994). Specifically, we generate six-choice categorical outcome with one independent covariate X drawn from a uniform distribution over (−2.0, 2.0) and the identified regression parameter β=2/50.894. The covariance matrix Σ has a compound symmetry (CS) correlation structure with equal correlation being 0.5, i.e., CS (0.5), and four identified variances being 0.8, 0.6, 0.4, 0.2 (the first variance or the first element of Σ is fixed at 1). We consider four prior scenarios to reflect from weak to strong prior information as follows: the first is denoted as PID to represent non-informative prior with non-informative prior for β and Σ~Inverse-Wishartk((m-k-1)I,m) and I being the identity matrix; the second is denoted as PCS1 with the prior of β following N(1.0,1.0) and Σ~Inverse-Wishartk((m-k-1)CS(0.4),m); the third is as PCS2 based on PCS adding the four identified variances all being 0.5; and the fourth is as PCS3 based on PCS adding the four identified variances being the same as the true values. Here, k, being 5, is the dimension for latent variables and m, the degrees of freedom, is set to be 10. We investigate sample sizes 500 and 2,000 and generate 100 datasets for each simulated scenario. We run the MCMC sampling for 20,000 iterations with 5,000 burn-in and check the MCMC convergence using R packages coda (Plummer et al. 2015) and boa (Smith, 2016).

Table 1, Figure 1 and Figure 2 are based on PID prior for illustration. Table 1 shows the averaged posterior estimated means for all three algorithms are closer to the true values with smaller standard deviations and better 95% credible interval coverage probabilities (CP) for sample size 2,000 than for 500. The PX-GS and PX-GMN algorithms produce larger standard deviations than the PX-MH algorithm does in general, accordingly with better CP. The PX-GMN algorithm has the largest standard deviations among those three algorithms for regression parameter β, hence yields the best CP. The PX-GS and PX-GMN algorithms have similar standard deviations for correlations and variances. It is noticeable that all three algorithms produce overestimated values for the variance σ55 and underestimated CP, especially for the PX-MH algorithm with sample size 500 (30% CP) and the PX-GS algorithm for sample size 2,000 (61% CP).

Table 1.

Averaged posterior means (mean), standard deviations (SD) and 95% credible interval coverage probabilities (CP%) with sample sizes 500 and 2000 based on 100 generated datasets using PID.

N=500 True Values PX-MH PX-GS PX-GSM
Mean SD CP% Mean SD CP% Mean SD CP%
β 0.89 1.02 0.09 67 1.01 0.14 91 1.01 0.18 97
r12 0.5 0.36 0.15 87 0.36 0.17 91 0.38 0.17 92
r13 0.5 0.35 0.14 79 0.41 0.16 93 0.41 0.17 95
r14 0.5 0.32 0.14 78 0.37 0.17 96 0.38 0.17 95
r15 0.5 0.31 0.15 76 0.34 0.18 95 0.36 0.19 96
r23 0.5 0.45 0.14 94 0.41 0.16 92 0.42 0.17 94
r24 0.5 0.41 0.16 96 0.38 0.16 94 0.39 0.17 95
r25 0.5 0.38 0.15 89 0.37 0.16 96 0.39 0.17 97
r34 0.5 0.39 0.15 94 0.42 0.15 99 0.45 0.16 99
r35 0.5 0.40 0.14 94 0.41 0.16 98 0.43 0.17 99
r45 0.5 0.37 0.16 95 0.39 0.16 99 0.4 0.17 98
σ22 0.8 0.97 0.28 88 0.91 0.35 97 0.93 0.37 98
σ33 0.6 0.75 0.21 88 0.77 0.30 94 0.77 0.31 94
σ44 0.4 0.52 0.17 95 0.54 0.22 97 0.54 0.24 98
σ55 0.2 0.41 0.12 30 0.42 0.19 75 0.39 0.18 84
N=2000
β 0.89 0.89 0.04 94 0.94 0.06 92 0.93 0.12 100
r12 0.5 0.48 0.07 95 0.45 0.08 97 0.49 0.09 98
r13 0.5 0.46 0.07 89 0.43 0.09 87 0.47 0.09 95
r14 0.5 0.48 0.08 93 0.44 0.09 94 0.46 0.1 96
r15 0.5 0.49 0.1 95 0.42 0.1 95 0.45 0.11 94
r23 0.5 0.47 0.08 87 0.46 0.08 94 0.49 0.09 92
r24 0.5 0.47 0.08 93 0.44 0.09 91 0.48 0.1 95
r25 0.5 0.45 0.08 89 0.44 0.09 96 0.47 0.11 98
r34 0.5 0.44 0.08 80 0.44 0.09 90 0.47 0.1 93
r35 0.5 0.44 0.08 91 0.45 0.09 96 0.47 0.1 96
r45 0.5 0.42 0.09 91 0.45 0.09 99 0.47 0.1 98
σ22 0.8 0.76 0.11 88 0.85 0.16 95 0.86 0.17 95
σ33 0.6 0.54 0.08 81 0.63 0.13 94 0.61 0.13 93
σ44 0.4 0.40 0.07 96 0.45 0.09 93 0.42 0.09 94
σ55 0.2 0.26 0.05 79 0.31 0.07 61 0.25 0.06 88

Figure 1.

Figure 1.

Relative biases for regression parameter, correlations and variances for sample sizes being 500 (1st column) and 2,000 (2nd column) based on 100 simulated datasets using PID for the PX-MH (orchid), PX-GS (blue), and PX-GMN (gray) algorithms.

Figure 2.

Figure 2.

ACF plots for regression parameter β, correlations (r23,r45) and variances (σ22,σ44) for sample sizes being 500 (1st row) and 2,000 (2nd row) based on 100 simulated datasets using PID for the PX-MH (orchid-dotdash), PX-GS (blue-longdash), and PX-GMN (black-solid) algorithms.

Figure 1 shows for sample size 500, all algorithms mainly have similar averaged relative biases for all parameters. For sample size being 2,000, the PX-GS algorithm has larger relative biases mostly than the PX-MH and PX-GMN algorithms, that is, for regression parameter β, correlations (except for r35 and r45) and variances (except for σ33).

Figure 2 displays the autocorrelation function (ACF) plots for selected parameters with the purpose of illustration. As can be observed, for regression parameter β, the ACF values of the PX-GMN algorithm decrease much faster than those of the PX-GS and PX-MH algorithms, and the ACF values of the PX-GMN decrease faster for sample size 2,000 than those for sample size 500 while those values of the PX-MH and PX-GS algorithms are similar for both sample sizes 500 and 2,000. As for correlations and variances, all three algorithms show similar ACF plots for sample size 500 in comparison with sample size 2,000, specifically, the PX-GS and PX-GMN algorithms have similar ACF plots although the PX-GS algorithm shows marginal advantage over the PX-GMN algorithm for r45 and σ44, and both are superior to the PX-MH algorithm.

Figure 3 exhibits prior sensitivity analysis for selected parameters using relative biases and sample size 2,000 for exemplification. As shown, the PX-MH algorithm is utmost affected by the priors and performs best using PID other than any informative prior. On the other hand, the PX-GS and PX-GMN algorithms display better performance with stronger prior information. However, it is noticeable that the PX-GMN algorithm is more robust to the prior specification than the PX-GS algorithm.

Figure 3.

Figure 3.

Relative biases for regression parameter β, correlation r15 and variance σ44 for sample sizes being 2,000 based on 100 simulated datasets using PID (bright gray), PCS (light gray), PCS1 (medium gray) and PCS2 (dark gray) for the PX-MH, PX-GS and PX-GMN algorithms.

Through the above examination, we conclude that the PX-GMN algorithm evinces superiority in parameter estimation, MCMC convergence and mixing, and robust to the prior specification in comparison with the PX-GS and PX-MH algorithms, suggesting that parameter-expanded data augmentation with marginalization should be considered to improve the performance of MCMC sampling. In addition, the PX-GS and PX-GMN algorithms based on the non-identifiable model perform better than the PX-MH algorithm based on the identifiable model in convergence and mixing of MCMC components; the PX-GS algorithm may produce biased estimation for large sample size, such as 2000; the PX-MH algorithm has the slowest convergence and hence may produce biased estimation with smallest standard deviations and inferior credible coverages, and is sensitive to the prior specification.

5. Application to consumer choice on liquid laundry detergents

The data of consumer choice on liquid laundry detergents was investigated by Chintagunta and Prasad (1998), Imai and van Dyk (2005a, b) and Burgette et al. (2021). It involves 2657 households in the Sioux Falls, South Dakota market, and six chosen brands- Tide, Wisk, EraPlus, Surf, Solo and All, with the log price for each brand. We analyze this data using a six-choice MNP model with covariate matrix including five intercept terms and one for the log price of each brand (Tide, Wisk, EraPlus, Surf, Solo) minus the log price of All (chosen as the reference category).

We consider non-informative prior for the regression parameter vector β=β1,,β6T, and Σ~Inverse-Wishartk((m-k-1)I,m), with k=5,m=10, and I being the identity matrix. We run the PX-MH, PX-GS and PX-GMN algorithms, each with 200,000 iterations and 100,000 burn-in, and present the results in Table 2. We can see that all three algorithms produce significantly negative estimated regression parameter β6 for price (95% CIs excluding 0), suggesting that the brand choice is negatively related to the price of the brand; the PX-GMN algorithm has the largest standard deviation (12.00) while the PX-MH algorithm has the smallest one (5.07). The PX-MH and PX-GS algorithms have much larger estimated correlations with smaller estimated standard deviations than the PX-GMN algorithm, however, the 95% CIs for all three algorithms are basically consistent regarding the coverages of 0. The same occurrence can also be observed for the estimation of variances. As shown, the PX-GMN algorithm gives largest estimated σ22 and σ33 with largest standard deviations being 0.76 and 0.43, and the PX-GS algorithm gives largest estimated σ44 and σ55 with largest standard deviations being 0.36 and 0.54. This phenomenon, that is, large estimated standard deviations/variances, is also noticed and commented on by Keane (1992). Hence, we also calculate the values for the log-likelihood function for each algorithm, which illustrate that the three algorithms have similar log-likelihood values with the PX-GMN algorithm having the largest one and the PX-MH algorithm the smallest one. This finding, as elaborated by Keane (1992), is related to model identification and estimability issue.

Table 2.

Posterior means (mean) (only show the covariate price β6, not including the intercepts β1 up to β5), standard deviations (SD), 95% credible intervals (CI) and Log-likelihood (LL) for data of consumer choice on liquid laundry detergents.

LL PX-MH −3776.784 PX-GS −3715.483 PX-GMN −3643.716
Mean SD CI Mean SD CI Mean SD CI
β6 −70.12 5.07 (−80.06, −59.84) −76.74 9.28 (−95.71, −59.81) −86.00 12.00 (−111.17, −64.31)
r12 0.63 0.10 (0.432, 0.827) 0.56 0.12 (0.280 0.764) 0.38 0.14 (0.082, 0.615)
r13 0.25 0.14 (−0.052, 0.497) 0.15 0.15 (−0.151, 0.427) −0.12 0.14 (−0.399, 0.157)
r14 0.33 0.12 (0.083, 0.546) 0.28 0.13 (0.003, 0.513) −0.03 0.13 (−0.285, 0.224)
r15 0.63 0.07 (0.493, 0.750) 0.58 0.09 (0.385, 0.741) 0.33 0.11 (0.106, 0.532)
r23 0.38 0.15 (0.038, 0.662) 0.27 0.17 (−0.092, 0.582) 0.01 0.20 (−0.400, 0.386)
r24 0.38 0.17 (0.077, 0.743) 0.31 0.16 (−0.002, 0.605) 0.00 0.18 (−0.333, 0.355)
r25 0.66 0.07 (0.516, 0.797) 0.60 0.09 (0.409, 0.764) 0.41 0.12 (0.136, 0.623)
r34 0.55 0.09 (0.351, 0.703) 0.53 0.09 (0.336, 0.698) 0.38 0.11 (0.160, 0.577)
r35 0.53 0.09 (0.346, 0.696) 0.50 0.10 (0.291, 0.681) 0.27 0.11 (0.035, 0.485)
r45 0.62 0.09 (0.421, 0.785) 0.60 0.09 (0.411, 0.749) 0.34 0.10 (0.134, 0.539)
σ22 2.12 0.58 (1.350, 3.646) 2.30 0.60 (1.318, 3.664) 2.56 0.76 (1.393, 4.334)
σ33 1.28 0.30 (0.790, 1.902) 1.46 0.45 (0.765, 2.494) 1.51 0.43 (0.825, 2.479)
σ44 0.97 0.22 (0.602, 1.445) 1.19 0.36 (0.611, 2.019) 1.01 0.29 (0.554, 1.698)
σ55 2.15 0.49 (1.346, 3.251) 2.37 0.54 (1.458, 3.562) 1.84 0.43 (1.117, 2.789)

6. Discussion

In this article, we propose two parameter-expanded data augmentation algorithms – PX-GS and PX-GMN based on the constructed nonidentifiable MNP model and investigate their performance with the PX-MH algorithm which is based on the identifiable MNP model. Our investigation shows that the PX-GMN algorithm, which is with marginalization for the redundant parameter, is superior in MCMC estimation, convergence and mixing compared with the PX-GS algorithm without marginalization and the PX-MH algorithm based on the identifiable model.

The PX-GS algorithm produces biased estimation for large sample size, such as 2,000. This matter also happens for analyzing multivariate ordinal data using multivariate probit model (Zhang 2023). In comparison with the PX-MH algorithm based on the identifiable model, the PX-GS and PX-GMN algorithms manifest advantages in 95% CI coverage probabilities, convergence and mixing of MCMC components and robust to the prior specification. However, they tend to produce large estimated standard deviations for estimated quantities, and this phenomenon is related to the tenuous identification issue in the MNP model itself.

It is generally acknowledged that the posterior means are close to the maximum likelihood estimates for relatively weak priors. However, with strong or informative priors specified, maximum a posteriori estimation may be explored to reduce the possible biases associated with the posterior means. This may become one of our future research projects.

Our investigation indicates that construction of non-identifiable models and development of parameter-expansion data augmentation with marginalization should be considered to improve the convergence and mixing of the MCMC components. Concurrently, the frail identification and estimability issue is worth our attention in the application of the MNP model.

Funding

This work is supported by NIH Grant R15GM151700 to Xiao Zhang.

Footnotes

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  1. Albert JH, and Chib S. 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88 (422):669–79. doi: 10.2307/2290350. [DOI] [Google Scholar]
  2. Berndt ER, Hall BH, Hall RE, and Hausman J. 1974. Estimation and inference in nonlinear structural models. Annals of Economic and Social Measurement 3:653–66. [Google Scholar]
  3. Bunch DS 1991. Estimability in the multinomial probit model. Transportation Research Part B: Methodological 25 (1):1–12. doi: 10.1016/0191-2615(91)90009-8. [DOI] [Google Scholar]
  4. Bunch DS, and Kitamura R. 1991. Probit model estimation revisited: Trinomial models of household car ownership. University of California Transportation Center, Working Papers qt2hr8d4bs, University of California Transportation Center. [Google Scholar]
  5. Burgette LF, and Nordheim EV. 2012. The trace restriction: An alternative identification strategy for the Bayesian multinomial probit model. Journal of Business & Economic Statistics 30 (3):404–10. doi: 10.1080/07350015.2012.680416. [DOI] [Google Scholar]
  6. Burgette LF, Puelz D, and Hahn PR. 2021. A symmetric prior for multinomial probit models. Bayesian Analysis 16 (3):991–1008. doi: 10.1214/20-BA1233. [DOI] [Google Scholar]
  7. Chan JC, and Jeliazkov I. 2009. MCMC Estimation of Restricted Covariance Matrices. Journal of Computational and Graphical Statistics 18 (2):457–80. doi: 10.1198/jcgs.2009.08095. [DOI] [Google Scholar]
  8. Chen Z, and Kuo L. 2002. Discrete choice models based on the scale mixture of multivariate normal distributions. Sankhyā: The Indian Journal of Statistics, Series B 1:192–213. [Google Scholar]
  9. Chib S, Greenberg E, and Chen Y. 1998. MCMC methods for fitting and comparing multinomial response models. OLIN-97–15, Available at SSRN: https://ssrn.com/abstract=61445
  10. Chintagunta PK, and Prasad AR. 1998. An empirical investigation of the “Dynamic McFadden” model of purchase timing and brand choice: Implications for market structure. Journal of Business & Economic Statistics 16 (1):2–12. doi: 10.2307/1392011. [DOI] [Google Scholar]
  11. Clark CE 1961. The greatest of a finite set of random variables. Operations Research 9 (2):145–62. doi: 10.1287/opre.9.2.145. [DOI] [Google Scholar]
  12. Daganzo C 1979. Multinomial probit: The theory and its application to demand forecasting. New York, NY: Academic Press, Inc. [Google Scholar]
  13. Edwards YD, and Allenby GM. 2003. Multivariate analysis of multiple response data. Journal of Marketing Research 40 (3):321–34. doi: 10.1509/jmkr.40.3.321.19233. [DOI] [Google Scholar]
  14. Gelfand AE, and Smith AF. 1990. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 85 (410):398–409. doi: 10.2307/2289776. [DOI] [Google Scholar]
  15. Gelman A, Carlin JB, Stern HS, and Rubin DB. 2013. Bayesian data analysis. 3rd ed. Chapman and Hall: CRC. [Google Scholar]
  16. Geweke J 1991. Efficient simulation from the multivariate normal and student-t distributions subject to linear constraints. In Computer Science and Statistics: Proceedings of the Twenty-Third Symposium on the Interface, 571–8. Alexandria, VA: American Statistical Association. [Google Scholar]
  17. Geweke J, Keane M, and Runkle D. 1994. Alternative computational approaches to inference in the multinomial probit model. The Review of Economics and Statistics 1:609–32. [Google Scholar]
  18. Gilks WR, Richardson S, Spiegelhalter DJ, eds. 1996. Markov. Chain Monte Carlo in Practice. London: Chapman and Hall. [Google Scholar]
  19. Hajivassiliou V, and McFadden D. 1990. The method of simulated scores for the estimation of LDV models with an application to external debt crises. Cowles Foundation Discussion Paper 967, Yale University. [Google Scholar]
  20. Hajivassiliou V, McFadden D, and Ruud P. 1996. Simulation of multivariate normal rectangle probabilities and their derivatives theoretical and computational results. Journal of Econometrics 72 (1–2):85–134. doi: 10.1016/0304-4076(94)01716-6. [DOI] [Google Scholar]
  21. Hausman JA, and Wise DA. 1978. A conditional probit model for qualitative choice: Discrete decisions recognizing interdependence and heterogeneous preferences. Econometrica 46 (2):403–26. doi: 10.2307/1913909. [DOI] [Google Scholar]
  22. Horowitz JL, Sparmann JM, and Daganzo CF. 1982. An investigation of the accuracy of the Clark approximation for the multinomial probit model. Transportation Science 16 (3):382–401. doi: 10.1287/trsc.16.3.382. [DOI] [Google Scholar]
  23. Imai K, and van Dyk DA. 2005a. A Bayesian analysis of the multinomial probit model using marginal data augmentation. Journal of Econometrics 124 (2):311–34. doi: 10.1016/j.jeconom.2004.02.002. [DOI] [Google Scholar]
  24. Imai K, and van Dyk DA. 2005b. MNP: R package for fitting the multinomial probit model. Journal of Statistical Software 14 (3):1–32. doi: 10.18637/jss.v014.i03. [DOI] [Google Scholar]
  25. Jiao X, and van Dyk DA. 2015. A corrected and more efficient suite of MCMC samplers for the multinomal probit model. arXiv preprint arXiv:1504.07823, 991–8. [Google Scholar]
  26. Kamakura WA 1989. The estimation of multinomial probit models: A new calibration algorithm. Transportation Science 23 (4):253–65. doi: 10.1287/trsc.23.4.253. [DOI] [Google Scholar]
  27. Keane MP 1992. A note on identification in the multinomial probit model. Journal of Business & Economic Statistics 10 (2):193–200. doi: 10.1080/07350015.1992.10509898. [DOI] [Google Scholar]
  28. Keane MP 1994. A computationally practical simulation estimator for panel data. Econometrica 62 (1):95–116. doi: 10.2307/2951477. [DOI] [Google Scholar]
  29. Lawrence E, Liu C, Bingham D, and Nair VN. 2008. Bayesian inference for multivariate ordinal data using parameter expansion. Technometrics 50 (2):182–91. doi: 10.1198/004017008000000064. [DOI] [Google Scholar]
  30. Liu C, Rubin DB, and Wu Y. 1998. Parameter expansion to accelerate EM: The PX-EM algorithm. Biometrika 85 (4):755–70. doi: 10.1093/biomet/85.4.755. [DOI] [Google Scholar]
  31. Liu JS 1994. The collapsed Gibbs sampler with applications to a gene regulation problem. Journal of the American Statistical Association 89 (427):958–66. doi: 10.2307/2290921. [DOI] [Google Scholar]
  32. Liu JS, and Wu YN. 1999. Parameter expansion for data augmentation. Journal of the American Statistical Association 94 (448):1264–74. doi: 10.2307/2669940. [DOI] [Google Scholar]
  33. MacEachern SN 2007. Comment on article by Jain and Neal. Bayesian Analysis 2 (3):483–94. doi: 10.1214/07-BA219C. [DOI] [Google Scholar]
  34. McCulloch RE, Polson NG, and Rossi PE. 2000. A Bayesian analysis of the multinomial probit model with fully identified parameters. Journal of Econometrics 99 (1):173–93. doi: 10.1016/S0304-4076(00)00034-8. [DOI] [Google Scholar]
  35. McCulloch R, and Rossi PE. 1994. An exact likelihood analysis of the multinomial probit model. Journal of Econometrics 64 (1–2):207–40. doi: 10.1016/0304-4076(94)90064-7. [DOI] [Google Scholar]
  36. McFadden D 1974. Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics, edited by Zarembka P, 105–42. New York: Academic Press. [Google Scholar]
  37. McFadden D 1987. Regression-based specification tests for the multinomial logit model. Journal of Econometrics 34 (1–2):63–82. doi: 10.1016/0304-4076(87)90067-4. [DOI] [Google Scholar]
  38. McFadden D 1989. A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica 57 (5):995–1026. doi: 10.2307/1913621. [DOI] [Google Scholar]
  39. Meng X-L, and van Dyk DA. 1999. Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86 (2):301–20. doi: 10.1093/biomet/86.2.301. [DOI] [Google Scholar]
  40. Nobile A 1998. A hybrid Markov chain for the Bayesian analysis of the multinomial probit model. Statistics and Computing 8 (3):229–42. doi: 10.1023/A:1008905311214. [DOI] [Google Scholar]
  41. Pakes A, and Pollard D. 1989. Simulation and the asymptotics of optimization estimators. Econometrica 57 (5):1027–57. doi: 10.2307/1913622. [DOI] [Google Scholar]
  42. Plummer M, Best N, Cowles K, and Vines K. 2015. Package ‘coda’. http://cran.r-project.org/web/packages/coda/coda.pdf
  43. Smith BJ 2016. Package ‘boa’. http://cran.r-project.org/web/packages/boa/boa.pdf.
  44. Smith SFM, and Roberts GO. 1993. Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. Journal of the Royal Statistical Society Series B: Statistical Methodology 55 (1):3–23. doi: 10.1111/j.2517-6161.1993.tb01466.x. [DOI] [Google Scholar]
  45. Tanner MA, and Wong WH. 1987. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association 82 (398):528–40. doi: 10.2307/2289457. [DOI] [Google Scholar]
  46. Tierney L 1994. Markov chains for exploring posterior distributions. The Annals of Statistics 22 (4):1701–962. doi: 10.1214/aos/1176325750. [DOI] [Google Scholar]
  47. Train KE 2009. Discrete Choice methods with simulation. Cambridge University Press. [Google Scholar]
  48. Zhang X 2020. Parameter-expanded data augmentation for analyzing correlated binary data using multivariate probit models. Statistics in Medicine 39 (25):3637–52. doi: 10.1002/sim.8685. [DOI] [PubMed] [Google Scholar]
  49. Zhang X 2022. Bayesian analysis of longitudinal ordinal data using non-identifiable multivariate probit models. Journal of Mathematics and Statistics 18 (1):163–75. doi: 10.3844/jmssp.2022.163.175. [DOI] [Google Scholar]
  50. Zhang X 2023. Bayesian analysis of multivariate longitudinal ordinal data using multiple multivariate probit models. American Journal of Theoretical and Applied Statistics 12:1–12. doi: 10.11648/j.ajtas.20231201.11. [DOI] [Google Scholar]
  51. Zhang X, Boscardin WJ, and Belin TR. 2008. Bayesian analysis of multivariate nominal measures using multivariate multinomial probit models. Computational Statistics & Data Analysis 52 (7):3697–708. doi: 10.1016/j.csda.2007.12.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES