Population Pharmacokinetic/Pharmacodyanamic Mixture Models via Maximum a Posteriori Estimation

Xiaoning Wang; Alan Schumitzky; David Z D’Argenio

doi:10.1016/j.csda.2009.04.017

. Author manuscript; available in PMC: 2010 Oct 1.

Published in final edited form as: Comput Stat Data Anal. 2009 Oct 1;53(12):3907–3915. doi: 10.1016/j.csda.2009.04.017

Population Pharmacokinetic/Pharmacodyanamic Mixture Models via Maximum a Posteriori Estimation

Xiaoning Wang ^a, Alan Schumitzky ^b, David Z D’Argenio ^c,^*

PMCID: PMC2743512 NIHMSID: NIHMS117073 PMID: 20161085

Abstract

Pharmacokinetic/pharmacodynamic phenotypes are identified using nonlinear random effects models with finite mixture structures. A maximum a posteriori probability estimation approach is presented using an EM algorithm with importance sampling. Parameters for the conjugate prior densities can be based on prior studies or set to represent vague knowledge about the model parameters. A detailed simulation study illustrates the feasibility of the approach and evaluates its performance, including selecting the number of mixture components and proper subject classification.

Keywords: Nonlinear random effects, EM algorithm, Importance sampling, Bayesian Analysis

1. Introduction

The use of mathematical modeling is central to the study of the absorption, distribution and elimination of therapeutic drugs and to understanding how drugs produce their effects. From its inception the field of pharmacokinetics and pharmacodynamics has incorporated methods of mathematical modeling, simulation and computation in an effort to better understand and quantify the processes of uptake, disposition and action of therapeutic drugs. These methods for pharmacokinetic/pharmacodynamic (PK/PD) systems analysis impact all aspects of drug development including in vitro, animal and human testing, as well as drug therapy. Modeling methodologies developed for studying pharmacokinetic/pharmacodynamic processes confront many challenges related in part to the severe restrictions on the number and type of measurements that are available from laboratory experiments and clinical trials, as well as the variability in the experiments and the uncertainty associated with the processes themselves.

Since their initial application to pharmacokinetics in the 1970’s, Bayesian methods have provided a framework for PK/PD modeling in drug development that can address some of the above-mentioned challenges. Sheiner et al. (1975) applied Bayesian estimation (maximum a posteriori probability estimation, MAP) to combine population information with drug concentration measurements obtained in an individual, in order to determine a patient-specific dosage regimen. Katz, Azen and Schumitzky (1981) reported the first efforts to calculate the complete posterior distribution of model parameters in a nonlinear pharmacokinetic individual estimation problem, for which they used numerical quadrature methods to perform the needed multi-dimensional integrations.

The population PK/PD problem has also been cast in a Bayesian framework, initially by Racine-Poon (1985) using a two-stage approach, and more generally by Wakefield et al. (1994). Solution to this computationally demanding problem is accomplished through the application of Markov chain Monte Carlo methods pioneered by Gelfand and Smith (1990) and now available in general purpose software (see Lunn et al., 2002 for discuss relevant to PK/PD population modeling).

Population PK/PD modeling, including Bayesian approaches, are used in drug development to identify the influence of measured covariates (e.g., demographic factors and disease status) on drug kinetics and response. It is now recognized, however, that genetic polymorphisms in drug metabolism and in the molecular targets of drug therapy can also influence the efficacy and toxicity of medications (Evans and Relling, 1999). Population modeling approaches that can identify and model distinct subpopulations not related to available measured covariates may, therefore, help determine otherwise unknown genetic and other determinants of observed pharmacokinetic/pharmacodynamic phenotypes.

We have previously reported on a maximum likelihood approach using finite mixture models to identify subpopulations with distinct pharmacokinetic/pharmacodynamic properties (Wang et al., 2007). Wakefield and Walker (1997) and Rosner and Mueller (1997) have introduced Bayesian approaches to address this problem within a nonparametric mixture model framework. In this paper a computationally practical maximum a posteriori probability estimation (MAP) approach is proposed using finite mixture models. Section 2 of this paper describes the finite mixture model within a nonlinear random effects framework and defines the MAP estimation problem. A solution via the EM algorithm is presented in Section 3. Subject classification and model selection issues are discussed in Section 4. An example and detailed simulation study are presented in Section 5 and Section 6 contains a discussion. The Appendix discusses important asymptotic properties of the MAP estimator, further motivating its use, and derives the formulas for the asymptotic covariance.

2. Nonlinear Random Effects Finite Mixture Model and MAP Estimation Problem

A two-stage nonlinear random effects model that incorporates a finite mixture model is given by

Y_{i} ∣ θ_{i}, β \sim N (h_{i} (θ_{i}), G_{i} (θ_{i}, β)), i = 1, \dots, n

(1)

and

θ_{1}, \dots, θ_{n} \sim_{i . i . d} \sum_{k = 1}^{K} w_{k} N (μ_{k}, \sum_{k}),

(2)

where i =1, …, n indexes the individuals and k =1, …, K indexes the mixing components. The alternate problem formulation presented in Cruz-Mesía et al. (2008) and in Pauler and Laird (2000), also results in the solution outlined below.

At the first stage represented by (1), Y_i = (y₁_i, …, y_{m_ii})^T is the observation vector for the ith individual (Y_i ∈R^mⁱ); h_i(θ_i) is the function defining the pharmacokinetic/pharmacodynamic (PK/PD) model, including subject specific variables (e.g., drug doses); θ_i is the vector of model parameters (random effects) (θ_i ∈ R^p); and G_i(θ_i, β) is a positive definite covariance matrix (G_i ∈ R^mⁱ^×^mⁱ) that may depend upon θ_i as well as on other parameters β (fixed effects) (β ∈ R^q).

At the second stage given by (2), a finite mixture model with K multivariate normal components is used to describe the population distribution. The weights {w_k}are nonnegative numbers summing to one, denoting the relative size of each mixing component (subpopulation), for which μ_k (μ_k ∈ R^p) is the mean vector and Σ_k (Σ_k ∈ R^p^×^p) is the positive definite covariance matrix.

Letting φ represent the collection of parameters {β, (w_k, μ_k, Σ_k), k = 1, …, K}, the population problem involves estimating φ given the observed data{Y₁, …, Y_n}. If φ is regarded as a random variable with a known prior distribution p(φ), the maximum a posteriori probability (MAP) estimate (φ^MAP) is the mode of the posterior distribution p(φ|Yⁿ):

p (φ ∣ Y^{n}) = \frac{p (Y^{n} ∣ φ) p (φ)}{p (Y^{n})} = \frac{\prod_{i = 1}^{n} p (Y_{i} ∣ φ) p (φ)}{p (Y^{n})} .

where Yⁿ={Y₁, …, Y_n}. The MAP estimator enjoys the same large sample properties as the ML estimator, namely consistency and asymptotic normality. In addition, the MAP objective function is a regularization of the likelihood function and as such avoids the well-documented singularities and degeneracies of mixture models (see, Fraley and Raftery (2005) and Ormoneit and Tresp(1998)).

For mixture of normal distributions, a multivariate normal prior on the mean for each mixing component is given by:

μ_{k} ∣ \sum_{k} \sim N (λ_{k}, \sum_{k} / τ_{k}), k = 1, \dots, K,

(3)

where λ_k and τ _k can be viewed as the mean and shrinkage respectively. An inverse Wishart prior with degrees of freedom q_k and scale matrix Ψ_k is assigned to each covariance component:

{(\sum_{k})}^{- 1} \sim Wishart (q_{k}, Ψ_{k}), k = 1, \dots, K .

(4)

The mixing weights have a Dirichlet distribution as the prior:

(w_{1}, \dots, w_{K}) \sim Dirichlet (a_{1}, \dots, a_{K}) .

(5)

We consider the model given by (1) and (2) for the important case G_i(θ_i, β) = σ²H_i(θ_i), where H_i(θ_i) is a known function and β = σ². The prior for σ² is an inverse Gamma distribution:

{(σ^{2})}^{- 1} \sim Gamma (a, b) .

(6)

These densities are conjugate priors to the multivariate normal mixtures (see appendix for the parameterizations used).

The hyper-parameters {a, b, (a_k,λ_k,τ_k,q_k,Ψ_k), k = 1, …, K} can be based on prior studies or set to represent vague knowledge about the parameters (see below).

3. Solution via the EM Algorithm

The EM algorithm, originally introduced by Dempster, Laird and Rubin (1977), is used to perform iterative computation of maximum likelihood (ML) estimates. It is applied below to solve the MAP estimation problem defined in the previous section.

The component label vector z_i is introduced as a K dimensional indicator such that z_i(k) is one or zero depending on whether or not the parameter θ_i arises from the kth mixing component. Letting φ⁽^r⁾ ={β,(w_k⁽^r⁾, μ_k⁽^r⁾, Σ_k⁽^r⁾, k = 1, …, K)} represent the parameters at the rth iteration of the EM algorithm, the E step computes a conditional posterior expectation, given by

Q (φ, φ^{(r)}) = E {log L_{c} (φ) ∣ Y^{N}, φ^{(r)}} + log p (φ),

where $log L_{c} (φ) = \sum_{i = 1}^{n} \sum_{k = 1}^{K} z_{i} (k) log (w_{k} p (Y_{i}, θ_{i} ∣ σ^{2}, μ_{k}, \sum_{k}))$ . In the M step, the posterior mode φ⁽^r⁺¹⁾ is estimated as the optimizer of Q(φ,φ⁽^r⁾) such that $φ^{(r + 1)} = \underset{φ}{arg max} Q (φ, φ^{(r)})$ . Under the prior defined above, the updating process of the M step is:

\begin{array}{l} {w_{k}}^{(r + 1)} = \frac{\sum_{i = 1}^{n} \int g_{i k} (θ_{i}, φ^{(r)}) d θ_{i} + (a_{k} - 1)}{n - K + \sum_{k = 1}^{K} a_{k}}, \\ μ_{k}^{(r + 1)} = \frac{\sum_{i = 1}^{n} \int θ_{i} g_{i k} (θ_{i}, φ^{(r)}) d θ_{i} + τ_{k} λ_{k}}{\sum_{i = 1}^{n} \int g_{i k} (θ_{i}, φ^{(r)}) d θ_{i} + τ_{k}}, \\ \sum_{k}^{(r + 1)} = \frac{\sum_{i = 1}^{n} \int (θ_{i} - {μ_{k}}^{(r + 1)}) (θ_{i} - {μ_{k}}^{(r + 1)}) g_{i k} (θ_{i}, φ^{(r)}) d θ_{i} + τ_{k} (λ_{k} - {μ_{k}}^{(r + 1)}) {(λ_{k} - {μ_{k}}^{(r + 1)})}^{T} + ψ_{k}}{\sum_{i = 1}^{n} \int g_{i k} (θ_{i}, φ^{(r)}) d θ_{i} + q_{k} - d}, \end{array}

and

{(σ^{2})}^{(r + 1)} = \frac{\sum_{i = 1}^{n} \sum_{k = 1}^{K} \int {(Y_{i} - h_{i} (θ_{i}))}^{T} H_{i} {(θ_{i})}^{- 1} (Y_{i} - h_{i} (θ_{i})) g_{i k} (θ_{i}, φ^{(r)}) d θ_{i} + 2 b}{\sum_{i = 1}^{n} m_{i} + 2 (a - 1)},

where $g_{i k} (θ_{i}, φ) = \frac{w_{k} p (Y_{i} ∣ σ^{2}, θ_{i}) p (θ_{i} ∣ μ_{k}, \sum_{k})}{\sum_{k = 1}^{K} w_{k} \int p (Y_{i} ∣ σ^{2}, θ_{i}) p (θ_{i} ∣ μ_{k}, \sum_{k}) d θ_{i}} for k = 1, \dots, K$ and d = dim(θ_i).

The E and M steps are repeated until convergence. Discussions on the sufficient conditions for the convergence can be found at Dempster et al. (1977), Wu (1983) and Tseng (2005). In the appendix, an error analysis for φ^MAP is presented.

In order to implement the algorithm, all the integrals in the updating equations must be evaluated at each iterative step. For the maximum likelihood mixture model problem we have successfully used importance sampling to calculate the corresponding integrals in the EM algorithm (see Wang et al. 2007). The same method can be applied here and was used in the examples presented below. In brief, an envelope function p_e₍_k) is selected for each mixing component and for each subject, so that a number of individual specific random samples are taken from $p_{e (k)} : θ_{i (k)}^{(1)}, \dots, θ_{i (k)}^{(T)} \sim_{i . i . d .} p_{e (k)} (θ_{i})$ . As all the integrals in the algorithm share the form ∫f(θ_i)g_ik (θ_i, φ)dθ_i, they are approximated as follows:

\int f (θ_{i}) g_{i k} (θ_{i}, φ) d θ_{i} ≅ \frac{\sum_{l = 1}^{T} f (θ_{i (k)}^{(l)}) w_{k} p (Y_{i} ∣ σ^{2}, θ_{i (k)}^{(l)}) p (θ_{i (k)}^{(l)} ∣ μ_{k}, \sum_{k}) / p_{e (k)} (θ_{i (k)}^{(l)})}{\sum_{k = 1}^{K} \sum_{l = 1}^{T} w_{k} p (Y_{i} ∣ σ^{2}, θ_{i (k)}^{(l)}) p (θ_{i (k)}^{(l)} ∣ μ_{k}, \sum_{k}) / p_{e (k)} (θ_{i (k)}^{(l)})} .

The envelope distribution is taken to be a multivariate normal density using the subject’s previously estimated mixing-component specific conditional mean (i.e., ∫θ_ig_ik (θ_i, φ)dθ_i) and conditional covariance (i.e., ∫(θ_i − μ_k)(θ_i − μ_k)^T g_ik (θ_i, φ)dθ_i) as its mean and covariance. The number of independent samples (T) depends on the complexity of the model and the required accuracy in the integral approximations.

4. Subpopulation Classification and Model Selection

Assigning each individual subject to a subpopulation follows the same method as presented in Wang et al. (2007) for ML estimation. The E-step computes the posterior probability that subject i belongs to the kth subgroup:

τ_{i} (k) = \frac{w_{k} p (Y_{i} ∣ σ^{2}, μ_{k}, \sum_{k})}{\sum_{k = 1}^{K} w_{k} p (Y_{i} ∣ σ^{2}, μ_{k}, \sum_{k})} = \frac{w_{k} \int p (Y_{i} ∣ σ^{2}, θ_{i}) p (θ_{i} ∣ μ_{k}, \sum_{k}) d θ_{i}}{\sum_{k = 1}^{K} w_{k} \int p (Y_{i} ∣ σ^{2}, θ_{i}) p (θ_{i} ∣ μ_{k}, \sum_{k}) d θ_{i}},

and each individual is classified to the subpopulation associated with the highest posterior probability of membership. For example, for each i, (i =1, …, n), set ẑ_i (k) = 1 if $k = arg max_{c} τ_{i} (c)$ or to zero otherwise. No additional computation is required since all the τ _i (k) are evaluated at each EM step.

Determining the number of mixing components K in a finite mixture model is an important but difficult problem. To compare two non-nested models, in contrast to likelihood ratio procedures for comparing nested models, several criteria have been proposed. For example, Lemenuel-diot et al. (2006) based such model selection on the Kullback-Leibler test, with the null hypothesis of p₀ components versus the alternative hypothesis of p₁ components (p₁ > p₀). Other criteria are based on a penalized objective function. For two models A and B with different values of K, say K_A and K_B, one would compare objective functions Q_A and Q_B, and prefer the model with the larger value. The Akaike Information Criterion (AIC) takes the form AIC = L − P, where L is the maximized log-likelihood for the model and P is the total number of estimated parameters. The Bayesian information criterion (BIC) is defined as BIC = L − 1/2P log N, where N is the number of observations in the data set. Fraley and Raftery (2005) reported using a BIC criterion for mixture model selection. Based on its simplicity and good behavior, their version of BIC criterion is also adopted here, with L replaced by the log-likelihood evaluated at φ^MAP.

5. Example

A one-compartment model with first-order elimination and first-order absorption is used as the model of the drug’s plasma concentrations y_ji, where

y_{j i} = \frac{D K a_{i}}{V_{i} K a_{i} - C L_{i}} (e^{- \frac{C L_{i}}{V_{i}} t_{j}} - e^{- K a_{i} t_{j}}) (1 + ε_{j i})

The model parameters V (L) and K_a (hr⁻¹) were assumed to be independent with $V_{i} \sim_{i . i . d .} L N (μ_{V} = 50, σ_{V}^{2} = 10^{2})$ and $K a_{i} \sim_{i . i . d .} L N (μ_{K a} = 1, σ_{K a}^{2} = {0.2}^{2})$ . The drug’s clearance CL (L/hr) was assumed to be a mixture of two lognormal densities:

C L_{i} \sim_{i . i . d .} w_{1} L N (μ_{C L_{1}} = 5, σ_{C L_{1}}^{2} = {(0.5 μ_{C L_{1}})}^{2}) + w_{2} L N (μ_{C L_{2}} = 10, σ_{C L_{2}}^{2} = {(0.5 μ_{C L_{2}})}^{2})

with mixing weights w₁ =0.3 and w₂ =0.7. A sparse sampling schedule was used (m=4) with t₁ =2, t₂ =8, t₃ =12 and t₂₄ = 24 hrs. This is a difficult mixture problem, as is evident from inspection of the density for CL shown in Fig. 1, and was taken from a study of the kinetics of the drug metoprolol used by Kaila et al. (2006). The within individual error ε_ji is assumed to be i.i.d with σ² =0.1^2. The number of samples was reduced from the six sample times used by Kaila et al. (1, 2, 4, 8, 12, 24 hrs) to the four sample times used here in order to provide an even more challenging test. The number and timing of samples can influence estimation results and formal approaches to sample schedule design for population studies are available (Mentré et al., (1997)). A total of 300 population data sets were simulated from this model each consisting of n =50 subjects.

Fig. 1 — Simulated population density of clearance CL

The MAP estimates were obtained for each of the 300 population data sets using the EM algorithm with importance sampling as described above (the lognormal kinetic parameters were transformed). Very dispersed priors were assumed for the transformed parameters as follows:

\begin{array}{l} μ_{1} ∣ \sum_{1} \sim N (log (5), \sum_{1} / 0.01) μ_{2} ∣ \sum_{2} \sim N (log (10), \sum_{2} / 0.01) \\ {(\sum_{1})}^{- 1} = {(\sum_{2})}^{- 1} \sim Wishart (1, 0.25) \\ (w_{1}, w_{2}) \sim Dirichlet (1, 1) \\ {(σ^{2})}^{- 1} \sim Gamma (1, 0) \end{array}

For each of the estimated parametersφ, its percent prediction error was calculated for each population data set as:

{pe}_{j} = 100 \times (φ_{j}^{MAP} - φ_{j}) / φ_{j}, j = 1, \dots, 300

These percent prediction errors were used to calculate the mean prediction error and root mean square prediction error (RMSE) for each parameter. In addition, the individual subject classification accuracy was evaluated for each population data set.

Figure 2 plots the estimated parameter population distributions of the 300 simulated data sets, as well as the true distributions used in the simulation, while Table 1 shows the mean estimates, estimation biases and root mean square errors (RMSE). The histogram of the percent correct classification is presented in Figure 3.

Fig. 2 — True (bold lines) and estimated (thin lines) population densities of CL (upper panel), V (middle panel) and Ka (lower panel).

Table 1.

True population values and mean of parameter estimates (300 trials), mean percent prediction error (PE) and root mean square percent prediction error (RMSE)

Parameter	Population Values	Mean of Estimates	Mean PE (%)	RMSE (%)
μ_CL1	5	5.177	3.541	30.50
μ_CL2	10	9.709	−2.910	20.55
μ_V	50	49.09	−1.816	3.374
μ_ka	1	0.986	−1.401	6.287
σ_CL² (cv%)	50	45.21	−18.25	36.79
σ_V² (cv%)	20	19.31	−6.796	22.64
σ_ka² (cv%)	20	19.77	−2.270	42.25
w₁	0.3	0.4486	49.53	68.44
σ	0.1	0.1094	9.421	15.46

Open in a new tab

Fig. 3 — Histogram of percent correct classification.

The data were also analyzed assuming a single component model for CL (lognormal) as well as for V and K_a as above. Model selection was based on the Bayesian information criterion (BIC) to select between a one or two component model for CL in each of the 300 population data sets. The estimated densities of clearance are displayed in Figure 4, while the detailed estimation results are given in Table 2. Based on BIC values, the two-component lognormal model for CL was correctly selected in 198 out of the 300 population trials.

Fig. 4 — True (bold lines) and estimated (thin lines) population densities of CL for the single component analysis

Table 2.

True population values and mean of parameter estimates (300 trials) for the single component model

Parameter	Population Values	Mean of Estimates
μ_CL1	5	7.271
μ_CL2	10	X
μ_V	50	49.16
μ_ka	1	0.9891
σ_CL² (cv%)	50	56.97
σ_V² (cv%)	20	18.84
σ_ka² (cv%)	20	18.30
σ	0.1	0.1163

Open in a new tab

6. Discussion

In this paper, a maximum a posteriori probability estimation approach is presented for nonlinear random effects finite mixtures models that has application to identifying hidden subpopulations in pharmacokinetic/pharmacodynamic studies. Previously reported nonparametric Bayesian approaches to this problem (Wakefield and Walker (1997), Rosner and Mueller (1997)) have advantages over the MAP estimation approach presented herein, including calculation of the complete posterior distribution of model parameters including the number of mixing components. However, the computational challenges associated with the proper solution to the nonparametric Bayesian mixture problem are considerable, whereas the MAP estimation approach using the EM algorithm with importance sampling presented here is straightforward in comparison.

In calculating the MAP estimator, the values of the parameters defining the conjugate prior densities may be available from previous studies. When no prior information is available, these parameters can be set to reflect very disperse priors (e.g, τ_k → 0 and other parameters set as illustrated in the example presented). For linear mixture models with diffuse priors, as noted by Fraley and Raftery (2005), it is expected that the MAP results will be similar to the MLE results when they latter can be calculated. The advantage of the MAP estimator in such cases is that it avoids the unboundedness that can be associated with maximum likelihood mixture model problems (Fraley and Raftery (2005), Ormoneit and Tresp (1998)).

For determining the number of components in mixture models, several measures have been suggested, including the BIC criterion used in the example presented in this paper. In addition, a priori knowledge or assumptions about the biological mechanism for the modeled PK/PD polymorphism can also facilitate the model selection procedure, when combined with a model selection measure. For example, for drugs primarily metabolized by the liver, the information on hepatic cytochrome P450 family can help to decide a reasonable range for the number of clearance subgroups (e.g., K=3 accounting for extensive, intermediate, and poor metabolizer subpopulations) thus limiting the number of competing models to be tested.

Acknowledgments

This work was supported in part by National Institute of Health grants P41-EB001978 and R01-GM068968.

Appendix: Asymptotic Properites

Let $φ_{n}^{MAP}$ be the posterior mode of φ given Yⁿ. Assuming that there is a “true” parameter φ₀ which lies in the interior of the support of p(φ), it can be shown under suitable hypotheses that $φ_{n}^{MAP}$ is consistent and asymptotically normal (White, 1994). However, from a Bayesian perspective, it is more natural to investigate the asymptotic behavior of the conditional density p(φ|Yⁿ). Under the assumptions stated below, it is found that p(φ|Yⁿ) is asymptotically normal with mean $φ_{n}^{MAP}$ . This gives another justification for calculating the MAP estimate.

This result is now stated more precisely. Assume that the Hessian matrix $A_{n} (φ) = \frac{\partial^{2} log p (φ ∣ Y^{n})}{\partial φ \partial φ}$ is invertible at $φ = φ_{n}^{MAP}$ . Then given the regularity conditions of Philppou and Roussas (1975) and Bernardo and Smith (2001), it can be shown that $Γ_{n}^{- 1 / 2} (φ_{n} - φ_{n}^{M L})$ converges in distribution to a standard multivariate normal random variable as n → ∞, where φ_n ~ p(φ|Yⁿ) and $Γ_{n}^{- 1} = - A_{n} (φ_{n}^{MAP})$ . It follows that asymptotically, $φ_{n}^{MAP}$ and Γ_n are the posterior mean and posterior covariance of p(φ|Yⁿ). For any component of $φ_{n}^{MAP}$ , the posterior standard error is the corresponding diagonal element of Γ_n.

The computation of Γ_n proceeds as follows:

\frac{\partial^{2} log p (φ ∣ Y^{n})}{\partial φ \partial φ} = (\sum_{i = 1}^{n} \frac{\partial^{2} log p (Y_{i} ∣ φ)}{\partial φ \partial φ}) + \frac{\partial^{2} log p (φ)}{\partial φ \partial φ}

and

- \sum_{i = 1}^{n} \frac{\partial log p (Y_{i} ∣ φ_{n}^{MAP})}{\partial φ \partial φ} \approx \sum_{i = 1}^{n} V_{i} (φ_{n}^{MAP})

where

V_{i} (φ) = \frac{\partial log p (Y_{i} ∣ φ)}{\partial φ} {(\frac{\partial log p (Y_{i} ∣ φ)}{\partial φ})}^{T}

It follows that

Γ_{n} \approx {(\sum_{i = 1}^{n} V_{i} (φ_{n}^{MAP}) - \frac{\partial^{2} log p (φ_{n}^{MAP})}{\partial φ \partial φ})}^{- 1} .

Note that in the maximum likelihood setting of Wang et al (2007), $Cov (φ_{n}^{M L}) \approx {(\sum_{i = 1}^{n} V_{i} (φ_{n}^{MAP}))}^{- 1}$ . So the difference between MAP and ML is the term $\frac{\partial^{2} log p (φ_{n}^{MAP})}{\partial φ \partial φ}$ . As n → ∞, the contribution of this term diminishes.

In this asymptotic analysis the precisions (σ²)⁻¹ and $\sum_{k}^{- 1}$ are considered as primary variables instead of the variances σ² and Σ_k. This is standard practice in the normal, gamma, Wishart model. Further using $\sum_{k}^{- 1}$ instead of Σ_k greatly simplifies the second derivative calculations.

We first calculate V_i(φ), i =1, …, n. The formulas below are similar to those given in given in Wang, et al (2007). The gradient components are calculated using the relation:

\frac{\partial}{\partial φ} log p (Y_{i} ∣ φ) = \sum_{k = 1}^{K} \int {\frac{\partial}{\partial φ} log p (Y_{i} ∣ φ, γ) w_{k} η (θ ∣ μ_{k}, \sum_{k})} g_{i k} (θ_{i}, φ) d θ,

We have:

s_{{(σ^{2})}^{- 1}} = \frac{\partial}{\partial {(σ^{2})}^{- 1}} log p (Y_{i} ∣ φ) = \sum_{k = 1}^{K} \int {(1 / 2) m_{i} σ^{2} - (1 / 2) {(Y_{i} - h_{i} (θ))}^{T} H_{i} {(θ)}^{- 1} (Y_{i} - h_{i} (θ)} g_{i k} (θ, φ) d θ

Next using the constraint: w_K = 1 – w₁ – … – w_K₋₁, we have for k =1, …, K − 1

s_{w_{k}} = \frac{\partial}{\partial w_{k}} log p (Y_{i} ∣ φ) = \frac{1}{w_{k}} \int g_{i k} (θ, φ) d θ_{i} - \frac{1}{w_{K}} \int g_{i K} (θ, φ) d θ .

And for k =1, …, K

\begin{array}{l} s_{μ_{k}} = \frac{\partial}{\partial μ_{k}} log p (Y_{i} ∣ φ) = \int g_{i k} (θ, φ) {{\sum_{k}}^{- 1} (θ - μ_{k})} d θ, \\ \frac{\partial}{\partial \sum_{k}^{- 1}} log p (Y_{i} ∣ φ) = \int {(1 / 2) \sum_{k} - (1 / 2) (θ - μ_{k}) {(θ - μ_{k})}^{T}} g_{i k} (θ, φ) d θ, \end{array}

so that

s_{vech (\sum_{k}^{- 1})} = \frac{\partial}{\partial vech (\sum_{k}^{- 1})} log p (Y_{i} ∣ φ) = vech [2 \frac{\partial}{\partial \sum_{k}^{- 1}} log p (Y_{i} ∣ φ) - Diagonal (\frac{\partial}{\partial \sum_{k}^{- 1}} log p (Y_{i} ∣ φ))]

where vech(X) is the vector of the lower triangular components of a symmetric matrix X. Putting these results together produces the vector

s_{i} = (s_{{(σ^{2})}^{- 1}}, s_{w_{1}}, \dots, s_{w_{K - 1}}, s_{μ_{1}}, \dots, s_{μ_{K}}, s_{vech (\sum_{1}^{- 1})}, \dots, s_{vech (\sum_{K}^{- 1})}),

and the formula

V_{i} (φ) = {(\sum_{i = 1}^{n} s_{i} {s_{i}}^{T})}^{- 1} .

All the above computations can be performed during the importance sampler calculation at the final iteration of the EM algorithm.

It remains to calculate $\frac{\partial^{2} log p (φ)}{\partial φ \partial φ}$ . For this calculation we represent the parameter φ as $φ = {{(σ^{2})}^{- 1}, w_{1}, \dots, w_{K - 1}, (μ_{k}, \sum_{k}^{- 1}), k = 1, \dots, K}$ . Now supposeθ has p components. Then each μ_k has p components and each $\sum_{k}^{- 1}$ has p(p+1)/2 unique components. Therefore $\frac{\partial^{2} log p (φ)}{\partial φ \partial φ}$ is an R×R matrix, where R = 1 + (p − 1) + Kp + Kp(p+1)/2. Now

log p (φ) = log p (σ^{- 2}) + log p (w_{1}, \dots w_{K - 1}) + \sum_{k = 1}^{K} log p (μ_{k} ∣ \sum_{k}^{- 1}) + \sum_{k = 1}^{K} log p (\sum_{k}^{- 1})

so that $B = \frac{\partial^{2} log p (φ)}{\partial φ \partial φ}$ is a block diagonal matrix of the form B =Diag (B_σ_⁻²; B_w; B(_{μ_kν_k}), k =1, …, K), $ν_{k} = vech (\sum_{k}^{- 1})$ .

To calculate B the following parameterizations for the prior p(φ) are assumed:

\begin{array}{l} p (κ) = c κ^{(a - 1)} exp (- b κ), κ = σ^{- 2} \\ p (w_{1}, \dots, w_{K - 1}) = c {(w_{1})}^{a_{1} - 1} \dots \cdot {(w_{K})}^{a_{K} - 1}, w_{K} = 1 - (w_{1} + \dots + w_{K - 1}) \\ p (μ_{k} ∣ \sum_{k}^{- 1}) = C {(det \sum_{k}^{- 1})}^{1 / 2} exp ((- τ_{k} / 2) {(μ_{k} - λ_{k})}^{T} \sum_{k}^{- 1} (μ_{k} - λ_{k}) \\ p (\sum_{k}^{- 1}) = C {(det \sum_{k}^{- 1})}^{β} exp (- t r (Ψ_{k} \sum_{k}^{- 1}) / 2), β = (q_{k} - d - 1) / 2 \end{array}

It follows:

\begin{array}{l} B_{σ^{- 2}} = \frac{\partial^{2}}{\partial σ^{- 2} \partial σ^{- 2}} log p (σ^{- 2}) = - (a - 1) σ^{4} \\ B_{w} = \frac{\partial^{2}}{\partial w \partial w} log p (w), w = (w_{1}, \dots, w_{K - 1}) \\ {[B_{w}]}_{r r} = - (a_{r} - 1) / {(w_{r})}^{2} + (a_{K} - 1) / {(w_{K})}^{2}, r = 1, \dots, p - 1 \\ {[B_{w}]}_{r s} = (a_{K} - 1) / {(w_{K})}^{2}, 1 \leq r, s \leq p - 1, r \neq s \\ B_{(μ_{k}, v_{k})} = [\begin{matrix} G_{p \times p} & H_{p \times p (p + 1) / 2} \\ H^{T} & J_{p (p + 1) / 2 \times p (p + 1) / 2} \end{matrix}] \\ G = \frac{\partial^{2}}{\partial μ_{k} \partial μ_{k}} log p (μ_{k} ∣ \sum_{k}^{- 1}) = - τ_{k} \sum_{k}^{- 1} \\ H = \frac{\partial^{2}}{\partial μ_{k} \partial vech (\sum_{k}^{- 1})} log p (μ_{k} ∣ \sum_{k}) = - τ_{k} D^{T} ((μ_{k} - λ_{k}) \otimes I) D, \\ J = J_{1} + J_{2} \\ J_{1} = \frac{\partial^{2}}{\partial vech (\sum_{k}^{- 1}) \partial vech (\sum_{k}^{- 1})} log p (μ_{k} ∣ \sum_{k}^{- 1}) = - (1 / 2) D^{T} \sum_{k} \otimes \sum_{k} D \\ J_{2} = \frac{\partial^{2}}{\partial vech (\sum_{k}^{- 1}) \partial vech (\sum_{k}^{- 1})} log p (\sum_{k}) = - β D^{T} \sum_{k} \otimes \sum_{k} D \end{array}

In the above equations, D is the permutation matrix satisfying D vech (X) =vec (X), where vec (X) is the vector of the stacked columns of the symmetric matrix X, and ⊗ is the Kronecker product (see Minka (2000)).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Bernardo JM, Smith AFM. Bayesian Theory, Section 5.3.2. Wiley; New York: 2001. [Google Scholar]
Cruz-Mesía R, Quintana FA, Marshall G. Model-based clustering for longitudinal data. Computational Statistics & Data Analysis. 2008;52:1441–1457. [Google Scholar]
Dempster AP, Laird N, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Statist Soc B. 1977;39:1–38. [Google Scholar]
Evans WE, Relling MV. Pharmacogenomics: Translating functional genomics into rational therapeutics. Science. 1999;286:487–491. doi: 10.1126/science.286.5439.487. [DOI] [PubMed] [Google Scholar]
Fraley C, Raftery AE. Technical Report no. 486. Department of Statistics, University of Washington; 2005. Bayesian regularization for Normal mixture estimation and model-based clustering. http://handle.dtic.mil/100.2/ADA454825. [Google Scholar]
Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. J of the Am Stat Assoc. 1990;85:398–409. [Google Scholar]
Katz D, Azen SP, Schumitzky A. Bayesian approach to the analysis of nonlinear models: Implementation and evaluation. Biometrics. 1981;37:137–142. [Google Scholar]
Kaila N, Straka RJ, Brundage RC. Mixture models and subpopulation classification: a pharmacokinetic simulation study and application to metoprolol CYP2D6 phenotype. J Pharmacokinet Pharmacodyn. 2007;34:141–156. doi: 10.1007/s10928-006-9038-9. [DOI] [PubMed] [Google Scholar]
Lemenuel-diot A, Laveille C, Frey N, Jochemsen R, Mallet A. Mixture modeling for the detection of subpopulations in a pharmacokinetic/pharmacodynamic analysis. J Pharmacokinet Pharmacodyn. 2006;34:157–181. doi: 10.1007/s10928-006-9039-8. [DOI] [PubMed] [Google Scholar]
Lunn DJ, Best N, Thomas A, Wakefield J, Spiegelhalter D. Bayesian analysis of population PK/PD models: general concepts and software. J of Pharmacokin and Pharmacodyn. 2002;29:217–307. doi: 10.1023/a:1020206907668. [DOI] [PubMed] [Google Scholar]
Mentré F, Mallet A, Baccar D. Optimal design in random effect regression models. Biometrika. 1997;84:429–442. [Google Scholar]
Minka T. Old and new matrix algebra useful for statistics. MIT Media Lab. 2000 http://research.microsoft.com/minka/papers/matrix.
Ormoneit D, Tresp V. Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates. IEEE Transactions on Neural Networks. 1998;9:639–650. doi: 10.1109/72.701177. [DOI] [PubMed] [Google Scholar]
Pauler DK, Laird NM. A mixture model for longitudinal data with application to assessment of noncompliance. Biometrics. 2000;56:464–472. doi: 10.1111/j.0006-341x.2000.00464.x. [DOI] [PubMed] [Google Scholar]
Racine-Poon A. A Bayesian approach to non-linear random effects models. Biometrics. 1985;41:1015–1024. [PubMed] [Google Scholar]
Rosner GL, Muller P. Bayesian population pharmacokinetic and pharmacodynamic analyses using mixture models. J Pharmacokinet Biopharm. 1997;2:209–234. doi: 10.1023/a:1025784113869. [DOI] [PubMed] [Google Scholar]
Sheiner LB, Halkin H, Peck C, Rosenberg B, Melmon KL. Improved computer assisted digoxin therapy: a method using feedback of measured serum digoxin concentrations. Ann Intern Med. 1975;82:619–627. doi: 10.7326/0003-4819-82-5-619. [DOI] [PubMed] [Google Scholar]
Tseng P. An analysis of the EM algorithm and entropy-like proximal point methods. Math Oper Res. 2005;29:27–44. [Google Scholar]
Wakefield JC, Smith AFM, Racine-Poon A, Gelfand AE. Bayesian analysis of linear and non-linear population models by using the Gibbs sampler. Applied Statistics. 1994;43:201–221. [Google Scholar]
Wakefield JC, Walker SG. Bayesian nonparametric population models: formulation and comparison with likelihood approaches. J Pharmacokinet Biopharm. 1997;25:235–253. doi: 10.1023/a:1025736230707. [DOI] [PubMed] [Google Scholar]
Wang X, Schumitzky A, D’Argenio DZ. Nonlinear random effects mixture models: Maximum likelihood estimation via the EM algorithm. Comput Stat Data Anal. 2007;51:6614–6623. doi: 10.1016/j.csda.2007.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
White H. Estimation, Inference and Specification Analysis. Cambridge University Press; 1994. Chapter 6. [Google Scholar]
Wu CF. On the convergence properties of the EM algorithm. Ann Stat. 1983;11:95–103. [Google Scholar]

[R1] Bernardo JM, Smith AFM. Bayesian Theory, Section 5.3.2. Wiley; New York: 2001. [Google Scholar]

[R2] Cruz-Mesía R, Quintana FA, Marshall G. Model-based clustering for longitudinal data. Computational Statistics & Data Analysis. 2008;52:1441–1457. [Google Scholar]

[R3] Dempster AP, Laird N, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Statist Soc B. 1977;39:1–38. [Google Scholar]

[R4] Evans WE, Relling MV. Pharmacogenomics: Translating functional genomics into rational therapeutics. Science. 1999;286:487–491. doi: 10.1126/science.286.5439.487. [DOI] [PubMed] [Google Scholar]

[R5] Fraley C, Raftery AE. Technical Report no. 486. Department of Statistics, University of Washington; 2005. Bayesian regularization for Normal mixture estimation and model-based clustering. http://handle.dtic.mil/100.2/ADA454825. [Google Scholar]

[R6] Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. J of the Am Stat Assoc. 1990;85:398–409. [Google Scholar]

[R7] Katz D, Azen SP, Schumitzky A. Bayesian approach to the analysis of nonlinear models: Implementation and evaluation. Biometrics. 1981;37:137–142. [Google Scholar]

[R8] Kaila N, Straka RJ, Brundage RC. Mixture models and subpopulation classification: a pharmacokinetic simulation study and application to metoprolol CYP2D6 phenotype. J Pharmacokinet Pharmacodyn. 2007;34:141–156. doi: 10.1007/s10928-006-9038-9. [DOI] [PubMed] [Google Scholar]

[R9] Lemenuel-diot A, Laveille C, Frey N, Jochemsen R, Mallet A. Mixture modeling for the detection of subpopulations in a pharmacokinetic/pharmacodynamic analysis. J Pharmacokinet Pharmacodyn. 2006;34:157–181. doi: 10.1007/s10928-006-9039-8. [DOI] [PubMed] [Google Scholar]

[R10] Lunn DJ, Best N, Thomas A, Wakefield J, Spiegelhalter D. Bayesian analysis of population PK/PD models: general concepts and software. J of Pharmacokin and Pharmacodyn. 2002;29:217–307. doi: 10.1023/a:1020206907668. [DOI] [PubMed] [Google Scholar]

[R11] Mentré F, Mallet A, Baccar D. Optimal design in random effect regression models. Biometrika. 1997;84:429–442. [Google Scholar]

[R12] Minka T. Old and new matrix algebra useful for statistics. MIT Media Lab. 2000 http://research.microsoft.com/minka/papers/matrix.

[R13] Ormoneit D, Tresp V. Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates. IEEE Transactions on Neural Networks. 1998;9:639–650. doi: 10.1109/72.701177. [DOI] [PubMed] [Google Scholar]

[R14] Pauler DK, Laird NM. A mixture model for longitudinal data with application to assessment of noncompliance. Biometrics. 2000;56:464–472. doi: 10.1111/j.0006-341x.2000.00464.x. [DOI] [PubMed] [Google Scholar]

[R15] Racine-Poon A. A Bayesian approach to non-linear random effects models. Biometrics. 1985;41:1015–1024. [PubMed] [Google Scholar]

[R16] Rosner GL, Muller P. Bayesian population pharmacokinetic and pharmacodynamic analyses using mixture models. J Pharmacokinet Biopharm. 1997;2:209–234. doi: 10.1023/a:1025784113869. [DOI] [PubMed] [Google Scholar]

[R17] Sheiner LB, Halkin H, Peck C, Rosenberg B, Melmon KL. Improved computer assisted digoxin therapy: a method using feedback of measured serum digoxin concentrations. Ann Intern Med. 1975;82:619–627. doi: 10.7326/0003-4819-82-5-619. [DOI] [PubMed] [Google Scholar]

[R18] Tseng P. An analysis of the EM algorithm and entropy-like proximal point methods. Math Oper Res. 2005;29:27–44. [Google Scholar]

[R19] Wakefield JC, Smith AFM, Racine-Poon A, Gelfand AE. Bayesian analysis of linear and non-linear population models by using the Gibbs sampler. Applied Statistics. 1994;43:201–221. [Google Scholar]

[R20] Wakefield JC, Walker SG. Bayesian nonparametric population models: formulation and comparison with likelihood approaches. J Pharmacokinet Biopharm. 1997;25:235–253. doi: 10.1023/a:1025736230707. [DOI] [PubMed] [Google Scholar]

[R21] Wang X, Schumitzky A, D’Argenio DZ. Nonlinear random effects mixture models: Maximum likelihood estimation via the EM algorithm. Comput Stat Data Anal. 2007;51:6614–6623. doi: 10.1016/j.csda.2007.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] White H. Estimation, Inference and Specification Analysis. Cambridge University Press; 1994. Chapter 6. [Google Scholar]

[R23] Wu CF. On the convergence properties of the EM algorithm. Ann Stat. 1983;11:95–103. [Google Scholar]

PERMALINK

Population Pharmacokinetic/Pharmacodyanamic Mixture Models via Maximum a Posteriori Estimation

Xiaoning Wang

Alan Schumitzky

David Z D’Argenio

Abstract

1. Introduction

2. Nonlinear Random Effects Finite Mixture Model and MAP Estimation Problem

3. Solution via the EM Algorithm

4. Subpopulation Classification and Model Selection

5. Example

Fig. 1.

Fig. 2.

Table 1.

Fig. 3.

Fig. 4.

Table 2.

6. Discussion

Acknowledgments

Appendix: Asymptotic Properites

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Population Pharmacokinetic/Pharmacodyanamic Mixture Models via Maximum a Posteriori Estimation

Xiaoning Wang

Alan Schumitzky

David Z D’Argenio

Abstract

1. Introduction

2. Nonlinear Random Effects Finite Mixture Model and MAP Estimation Problem

3. Solution via the EM Algorithm

4. Subpopulation Classification and Model Selection

5. Example

Fig. 1.

Fig. 2.

Table 1.

Fig. 3.

Fig. 4.

Table 2.

6. Discussion

Acknowledgments

Appendix: Asymptotic Properites

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases