Modelling Placebo Response via Infinite Mixtures

Thaddeus Tarpey; Eva Petkova

. Author manuscript; available in PMC: 2011 Jul 28.

Published in final edited form as: JP J Biostat. 2010 Jun 1;4(2):161–179.

Modelling Placebo Response via Infinite Mixtures

Thaddeus Tarpey ^1,^✉, Eva Petkova ²

PMCID: PMC3145361 NIHMSID: NIHMS167949 PMID: 21804745

Abstract

Non-specific treatment response, also known as placebo response, is ubiquitous in the treatment of mental illness, particularly in treating depression. The study of placebo effect is complicated because the factors that constitute non-specific treatment effects are latent and not directly observed. A flexible infinite mixture model is introduced to model these nonspecific treatment effects. The infinite mixture model stipulates that the non-specific treatment effects are continuous and this is contrasted with a finite mixture model that is based on the assumption that the non-specific treatment effects are discrete. Data from a depression clinical trial is used to illustrate the model and to study the evolution of the placebo effect over the course of treatment.

Keywords: bootstrap, convolution, depression, EM algorithm, Finite mixture, maximum likelihood estimation

AMS 1991 subject classifcations: Primary-62H30, Secondary-62P10

1 Introduction

Finite mixture models and other latent class models have come to play a dominant role in statistical analysis. These models stipulate that unobserved heterogeneity is due to the existence of distinct but unobserved latent classes. The popularity of these models may stem from the natural inclination of humans to put things into categories. In medical science, these models can help define distinct classes of diseases, such as different types of depression. In other cases, mixture models can be used when distinct classes of individuals are assumed to exist based on how they would respond to treatment. A recent example is provided by [10] who use a growth mixture model approach to in a depression trial assuming latent classes of individual exist corresponding to “never responders,” “drug only responders,” “placebo only responders” and “always responders.”

In many applications, including medical applications, it may not be clear if distinct latent classes actually exist. For example, with heart disease, severity can vary along the continuum defined by blood pressure and biologically distinct classes may not exist. Instead, disease diagnosis is made according a cut-point along the continuum. Assuming distinct classes exist when they do not exist can lead to controversy and misunderstanding [e.g. 5]. More recently the notion of whether or not distinct latent classes exist has arisen in regards to mental illnesses such as autism. A recent excerpt from the New York Times states that changes in diagnoses “are part of a bigger overhaul that will largely replace the old ‘you have it or you do not’ model of mental illness with a more modern view that psychiatric disorders should be seen as a continuum, with many degrees of severity [16].”

If distinct classes do not exist, finite mixture models that are defined in terms of latent indicator variables may not be appropriate. Instead infinite mixture models defined in terms of a continuous latent variable may be more appropriate. In this paper, we introduce an infinite mixture model approach. The motivation for the infinite mixture model is to model placebo effect when treating depression. An illustration with the placebo arm of a depression trial is given to illustrate the model.

In psychiatry, ill people who are treated and improve are called responders (to treatment). Improvement among responders treated with an active drug can be due to a specific effect of the chemical component of the drug or due to non-specific effects of the treatment, known also as placebo effects. Placebo effects are known to play a substantial role in mood improvement when treating depressed individuals. For instance, [7] performed a meta-analysis of data submitted to the Food and Drug Administration for four new-generation antidepressants and found exceptionally large placebo response rates that duplicated more than 80% of the improvement observed in the drug groups. High rates of placebo response make it difficult to quantify the specific effect of a medication. Thus, understanding and identifying the non-specific effects of treatment is one of the biggest challenges in antidepressant clinical research.

The outcomes of depressed subjects in the placebo arm of depression studies can be quite varied: from worsening, to no improvement, to substantial decrease of symptoms. The non-specific effects can vary with respect to time of onset, sustainability, and strength. Thus, non-specific treatment effects are a latent source of heterogeneity.

Although the physiological and psychological mechanisms behind the placebo effects remains mysterious, modelling data from the placebo arm of clinical trials can hopefully provide insight into the manifestations of the placebo effects. The following is a list of factors not attributable to specific effects of the drug that will help define the statistical model:

A pill effect where the act of taking a pill can affect symptoms.
Non-specific effects of treatment such as receiving attention from clinicians and having expectations of improvement.
The natural course of the illness, which might resolve on its own.
Life events.

[8] introduced an algebraic model for specific and non-specific effects. An error term ε represents the combination of factors that can have either positive or negative impact on outcomes such as items 3 and 4 above. When there is no intervention, including a placebo pill, the outcome is simply

y = β_{0} + ε,

(1)

where β₀ is the mean outcome. In this case, the outcome distribution can be considered homogeneous, perhaps even normal. [8] also introduced a non-specific effect variable, say x, consisting of items 1 and 2 above. The non-specific effect will be a source of heterogeneity in the outcomes that can distort the no-intervention distribution in (1). By examining outcomes in the placebo arm of a double-blind study, the distribution of the non-specific effect x can be studied without contamination from the specific chemical effect of the drug. However, the non-specific effect is not directly observed because it will be convolved with the error ε that comprises in part items 3 and 4 above. If y denotes the observed outcome in the placebo arm of a study, then a natural model for y is a simple linear regression model:

y = β_{0} + β_{1} x + ε,

(2)

where the latent predictor x is assumed independent of the error ε ~ N(0, σ²). [12] call (2) a latent regression model. The problem of modelling the non-specific effect becomes a de-convolution problem: from the observed outcome y, de-convolve the non-specific effect x from the random error ε.

The distribution of the latent regressor x in (2) will help answer questions about the nature of the non-specific effects of treatment. One possibility is that the latent non-specific effect is a discrete indicator variable that distinguishes people that do and do not experience non-specific (placebo) treatment effects. If there exists a latent class of subjects who do not experience a placebo effect, then treating these subjects with an active drug will allow a focused examination of the drug’s efficacy without contamination from non-specific effects of treatment. Another possibility is x is a continuous variable, in which case the non-specific effects vary continuously from non-existent to strong.

Often finite mixture models are postulated to model unexplained heterogeneity in outcomes [e.g. 3, 6, 15]. A 2-component finite mixture makes sense if there are two types of subjects, those that do and do not respond to non-specific effects of treatment. The 2-component finite mixture results from (2) when the latent regressor x is Bernoulli (e.g. x is a binary indicator variable for the presence or absence of non-specific effects). A more intuitive model for x would allow the degree of non-specific effect to range continuously from no effect to complete remission. In this case, the resulting model is an infinite mixture model.

An example of a continuous distribution for x is the exponential distribution. The resulting distribution for the outcome y in (2) is the exGaussian [9] indicating that the strength of the non-specific effects vary continuously and its frequency decreases exponentially with its magnitude. Because the latent variable in a 2-component finite mixture is a 0–1 Bernoulli variable, [12] proposed a continuous latent regressor supported on the interval (0, 1) as a generalization of the finite mixture model. In particular, they specified a beta distribution for the latent regressor x in (2) which can assume a wide variety of shapes.

In Section 2, we describe the general infinite mixture model and how to estimate the parameters. In Section 3, we use the infinite mixture model to study the evolution of the non-specific placebo effect in a clinical trial on depression. In Section 4 we compare the finite mixture model to the infinite mixture model when modelling the non-specific placebo effect. In Section 5 we discuss the implications of the results.

2 The Infinite Mixture Model

Let y be an observed response variable in (2) and assume the error ε ~ N(0, σ²). Let g(x; θ) denote the density for the latent predictor x, parameterized in terms of θ. If the latent variable x is discrete, then y has a finite mixture distribution. If g, the mixing density, is continuous, then y is an infinite mixture with density

f (y; β_{0}, β_{1}, σ^{2}, θ) = \int N (y; β_{0} + β_{1} x, σ^{2}) g (x; θ) dx,

(3)

where N(y; μ, σ²) denotes the normal density function of N(μ, σ²). With varying choices of the mixing density g, the infinite mixture model is very flexible. The shape of the mixing density g sheds light on the nature of the underlying latent distribution. Even though an infinite mixture model may represent a better fidelity to the truth in modelling compared to the finite mixture model, there are two problems that hamper utilizing the infinite mixture model:

In order to maximize the likelihood for parameter estimation, the integral in (3) typically needs to be solved numerically.
For certain parametric families of distributions g(x; θ), the infinite mixture likelihood surface can become relatively flat making parameter estimation difficult.

We address these issues in sequel.

2.1 Estimation

We use maximum likelihood to estimate the parameters of the infinite mixture model. This requires specifying a parametric form for the latent density g(x; θ). If y₁, …, y_n, represents the observed data and the latent predictor x has a density g(x; θ), then the marginal log-likelihood is:

l (β_{0}, β_{1}, σ^{2}, θ_{1}, θ_{1}) = \sum_{i = 1}^{n} log {\int_{0}^{1} N (y_{i}; β_{0} + β_{1} x, σ^{2}) g (x; θ) dx} .

(4)

Numerical methods are generally required to maximize the log-likelihood. In the results below, we use the quasi-Newton L-BFGS-B method [2] which is available in the R-software [11] using the “optim” function. This method allows box-constraints on the parameters which guarantee that that parameter estimates stay within the parameter space. Once the log-likelihood function is defined in R, then it is fairly straightforward to implement the optim function.

An alternative approach to maximizing the likelihood is to use the EM algorithm [4] which was used in [12]. However, the EM algorithm is known to be slow to converge and coding up the EM algorithm can become quite involved compared to simply using a built-in optimizer such as optim in R.

3 Modelling Placebo Response

In this section we examine the outcomes of subjects from a study where fluoxetine was compared to a placebo for treating depression. In order to study non-specific effects of treatment, we will focus attention on the placebo arm of the study. Subjects were evaluated at baseline and at weekly intervals for six weeks. The outcome of interest is a 21-item Hamilton Depression Score (HAM-D). Lower scores on the HAM-D indicate lower levels of depression severity. At each week, a subject’s improvement is measured by the difference between the baseline HAM-D score and the weekly HAM-D score, where larger differences indicate a greater degree of improvement. We shall model the distribution of improvement at each week via a latent regression model.

In this setting, the latent predictor x in (2) corresponds to the non-specific treatment effect. Other available covariates, age, sex, and race, were not significant at any of the six weeks. Theoretically, if x represents the non-specific effects, then a no-intercept latent regression model

y = β_{1} x + ε,

should hold because conditional on x = 0, the mean improvement should be zero. However, as in other regression settings, constraining the intercept to be zero can lead to a poor fit even if theoretically the intercept should be zero, since dropping the intercept parameter makes the model less flexible. Except for weeks 1 and 2, fitting a no-intercept model led to a notably poorer fit. Therefore an intercept was used in the latent regression model for weeks 3–6.

3.1 Week 1 and 2 Improvement

A no-intercept exGaussian model y = x + ε was fit to the week 1 improvement data, where x has an exponential distribution, g(x; θ) = e^−x/θ/θ, x > 0. The top-left panel of Figure 1 shows a histogram of the week 1 improvement. Justification for using the exGaussian latent predictor is provided in Section 3.2. Overlaid is the estimated exGaussian latent regression density. The maximum likelihood estimate of θ is θ̂ = 3.7 and the estimate of the error standard deviation is σ̂ = 3.9. The no-intercept exGaussian model was fit for week 2 improvement also and the results are shown in the top-right panel of Figure 1. For week 2 we estimated θ̂ = 5.7 and σ̂ = 3.9.

Top 2 Panels: Histograms of HAM-D improvement scores for weeks 1 (left) and 2 (right). Overlaid on each is the estimated latent regression response density using an exponential latent regressor. The bottom panel shows the estimated exponential latent regressor densities for week 1 (solid curve) and week 2 (dashed curve).

The bottom panel in Figure 1 shows the exponential density of the latent regressor x for week 1 (solid curve) and for week 2 (dashed curve). From this figure, we see that the initial non-specific effect skews the distribution in the direction of improvement at week 1. By week 2, the degree of a positive non-specific effect is greater indicated by the exponential density for week 2 being stochastically larger than the density for week 1. A bootstrap sampling was performed to assess the variability in the parameter estimates. Figure 2 shows parameter estimates obtained from 500 bootstrap samples from improvements at weeks 1 and 2. Clearly from this figure, the exponential parameter for week 2 is greater than for week 1. Also, the elliptical cloud of bootstrap estimates for weeks 1 and 2 indicates that the maximum likelihood estimators appear to have an approximate bivariate normal distribution.

500 bootstrap estimates of θ and σ in the exGaussian latent regression model for week 1 (open points) and week 2 (solid points)

3.2 A Flexible Latent Predictor

The exGaussian model produced a poor fit for the improvement distribution for the later weeks. Instead, we consider an alternative latent predictor density that can provide a variety of flexible shapes:

g (x; θ) = c (θ) {(x - θ)}^{2}, 0 < x < 1

(5)

where c(θ) is the normalization constant. The density (5) can assume skew left, right and “cup” shaped densities. To add even greater flexibility, we can introduce another parameter in the exponent and consider

g (x; θ_{1}, θ_{2}) = c (θ_{1}, θ_{2}) {[{(x - θ_{1})}^{2}]}^{θ_{2}}, 0 < x < 1,

(6)

for θ₂ ≥ 1 and where c(θ₁, θ₂) is a normalization constant. Note that θ₂ ≥ 1 constrains the latent predictor density to be smooth which seems biologically desirable. If θ₂ is allowed to take values less than one, the latent density or its derivatives can form a sharp bend, such as the absolute value function that results if θ₂ = 1/2. When g(x; θ) is “∪”-shaped, we obtain a continuous mixture generalization to the 2-component finite mixture model. Figure 3 shows some of the different shapes that result for the latent regressor for different values of θ₁ and θ₂.

A variety of latent regression density (6) shapes for different parameter values.

The latent density (6) also has the exGaussian as limiting case. To see this, write (6) as

g (x; θ_{1}, θ_{2}) = c {[{(x - θ_{1})}^{2}]}^{θ_{2}} = c θ_{1}^{2 θ_{2}} {[{(1 - \frac{x}{θ_{1}})}^{θ_{1}}]}^{2 θ_{2} / θ_{1}} .

(7)

As $θ_{1} \to \infty, {(1 - \frac{x}{θ_{1}})}^{θ_{1}} \to e^{- x}$ . Let θ₁ and θ₂ go to infinity such that 2θ₂/θ₁ →η < ∞. Then (6) converges to an exponential distribution constrained to the interval (0, 1) with probability density function

η e^{- η x} / (1 - e^{- η}), 0 < x < 1,

(8)

which follows by noting that the normalization constant c in (7) is a function of θ₁ and θ₂. For moderate to large values of η in (8), the denominator is approximately equal to one, the term β₁x in the latent regression model (2) will have an approximate exponential distribution, and the response y will be approximately exGaussian. When the latent regression model using the latent regressor density (6) was fit to the week 1 improvement data, the algorithm crashed. Next, an increasing sequence of upper bounds, using box constraints, were placed on the exponent parameter θ₂. In each case, the algorithm returned an estimate of θ₂ equal to the upper bound constraint. Additionally, the estimate of θ₁ increased as the upper bound on θ₂ increased. This suggests that a limiting exponential latent density (8) is appropriate for the week 1 improvement.

Applying the results of [1], the latent regression model with latent regressor density (6) is identifiable following the same arguments of [12]. The one exception is the equivalent parameterizations of β₀ + β₁x₁ + ε and (β₀ + β₁) − β₁x₂ + ε, where x₁ has a latent regressor density g(x₁; θ₁; θ₂) in (6) and x₂ has a latent regressor density g with parameters (1 − θ₁) and θ₂. Restricting β₁ > 0 avoids this dual parametrization issue.

3.3 Improvement Distributions for Weeks 3 to 6

Now we return to modeling placebo response using the infinite mixture for weeks 3 through 6 of the trial. First, we note that the improvement distribution at week 4 is approximately normal – the Shapiro-Wilk’s normality test produces a test statistic of W = 0.9904 with corresponding p-value = 0.054. The Shapiro-Wilk’s test for normality is known to be quite powerful; yet with a sample size of n = 289 here, the test does not definitively reject normality at week 4. The latent regression and finite mixture models are nearly unidentifiable when the response is approximately normal. Therefore, in this section, we will focus on weeks 3, 5, and 6.

Figure 4 shows the results of fitting the latent regression model to weeks 3, 5, and 6 using (6). The top panels show histograms of the weekly improvement distributions. Overlaid are the estimated latent regression densities. Also overlaid are the estimated 2-component normal mixture densities which are essentially indistinguishable from the latent regression densities. (Note that for week 4, improvement distribution, which is not shown here, the latent regression and finite mixture densities are also indistinguishable.)

Top panels show histograms of the improvement distributions for weeks 3, 5 and 6. Overlaid are the latent regression densities and the finite mixture densities. The bottom panels show the corresponding latent regressor density for each week.

The bottom panels in Figure 4 show the estimated latent regressor density for weeks 3, 5, and 6. In each case, the latent regressor density is “∪”-shaped. The parameter θ₁ in (6) is estimated to be 0.52, 0.57 and 0.51 for weeks 3, 5, and 6 respectively. The exponent parameter θ₂ in (6) is estimated to be 4.4, 1.0, and 1.0 for weeks 3, 5, and 6 respectively. The large exponent for week 3 produces a latent regressor density (bottom-left panel of Figure 4) that approximates a 0–1 Bernoulli, indicating the latent regression model is similar to a two-component normal mixture. However, for weeks 5 and 6, the exponent coefficient θ₂ is estimated to be approximately equal to 1, so that the latent predictor density is approximately a parabola.

Assuming the latent predictor represents the non-specific placebo effect, Figure 4 indicates that the non-specific effect varies continuously with subjects either pooling towards low to weak placebo effects or subjects pooling towards moderate to strong placebo effects, as indicated by the “∪”-shape of the latent predictors. In order to predict the latent non-specific effect, one could compute

{\hat{x}}_{i} = E [x | y_{i}]

(9)

where the conditional expectation is computed using the maximum likelihood estimates in place of the true parameters. (9) is analogous to the posterior probability in a finite mixture model.

3.4 A Restricted Model

Expanding the infinite mixture model (5) by introducing the exponent parameter θ₂ in (6) creates a relatively flat ridge on the log-likelihood surface in the direction of the θ₂ axis. Although introducing θ₂ makes the model more flexible, it causes the maximum likelihood estimate of θ₂ to become very unstable. To illustrate, Figure 5 shows the results of 500 bootstrap samples obtained from week 6 improvement distribution. The latent regression model was fit to each bootstrap sample and Figure 5 shows the 500 pairs of estimates of θ̂₁ and θ̂₂ in the latent regressor density (6). The strange, non-normal pattern seen in this figure is due to the fairly flat ridge along the θ₂ direction of the log-likelihood surface. This non-normal behavior in the estimator can be avoided by considering a restricted model where the exponent θ₂ is fixed.

500 bootstrap estimates of θ₁ and θ₂ of the latent regressor density (6) from the week 6 improvement scores.

To investigate the effect of restricting the value of θ₂ on the estimates of θ₁, we consider a simple scenario where the outcome y is related to the latent predictor by y = x + ε with x having the density (6) with θ₂ = 2 and ε ~ N(0, 0.25²). If the true value of θ₁ is θ₁₀, then the “population-based” true log-likelihood function is

L (θ_{1}) = \int f (y; θ_{1} = θ_{10}, θ_{2} = 2) log (f (y; θ_{1}, θ_{2} = 2)) dy,

(10)

and the misspecified log-likelihood is

L_{0} (θ_{1}) = \int f (y; θ_{1} = θ_{10}, θ_{2} = 2) log (f (y; θ_{1}, θ_{2} = 1)) dy .

(11)

The solid curve in Figure 6 is the true log-likelihood function (10) and the dashed curve is the misspecified log-likelihood function (11). In the left-panel of Figure 6, the true value of θ₁₀ = 0.5. The misspecified log-likelihood (11) takes its maximum value at this true value of θ₁ = 0.5 and the log-likelihood function is concave down and quite steep at this point. Thus, in this case, the maximum likelihood estimator of θ₁ using the misspecified log-likelihood will be unbiased and fairly stable. Also, the maximum of the misspecified log-likelihood is almost equal to the maximum of the true log-likelihood. In the right panel of Figure 6, the true value of θ₁₀ = 0.6. When the true value of θ₁ = 0.6, the maximum of the misspecified log-likelihood occurs to the right of 0.6 indicating that maximum likelihood estimator of θ₁ under the misspecified model will be slightly biased. Because the misspecified log-likelihood is flatter in this case, the parameter estimates using the misspecified log-likelihood will be less stable.

Plots of the true (solid curve) and misspecified (dashed curve) population-based log-likelihoods in (10) and (11). In the left-panel, the true value of θ₁ = 0.5 and in the right panel, the true value of θ₁ = 0.6.

Of course, models that are used in practice represent an approximation to the truth and hence they are misspecified to some degree. For week 6, if we implement model (5) and thus effectively restrict θ₂ = 1 in (6), this is akin to considering only a slice of the likelihood surface produced by model (6). The likelihood function along this slice in our case is quite well-behaved and concave near the value θ₁ = 0.5. This is analogous to the regression setting where an estimated regression coefficient may be very stable in the model but if additional terms are added to the model (such as higher polynomial terms), the estimated coefficient can become very unstable. For week 6, using the restricted quadratic model (5), the estimates of the remaining parameters and their distributions did not change appreciably than when using (6). Additionally, except for a few extreme outliers (which can be explained by the non-concavity of the log-likelihood) the bootstrap distribution for the latent regression parameter estimators appear approximately normal when specifying the quadratic latent regressor (5).

4 Finite or Infinite Mixture?

An alternative to the infinite mixture model for modelling the latent non-specific effect is to consider a finite mixture model. The finite mixture model in the placebo response example would be suitable if there exists two distinct types of people: those that experience non-specific effects and those that do not. Of course, the components of a finite mixture could correspond to two levels of some other unobserved covariate that may have little to do with the non-specific treatment effect. As noted above, the latent regression densities and the finite mixture densities are nearly indistinguishable from each other in weeks 3, 5, and 6 improvement scores, as depicted in Figure 4. This similarity is highlighted by the fact that the observed log-likelihoods for the infinite mixture model and the finite mixture model are very close in value: −974.0 and −973.9 for week 3; −944.8 and −944.8 for week 5; and −926.6 and −926.1 for week 6. The log-likelihood for the exGaussian fit at week 1 was −942.4 compared to −1004.6 for the 2-component finite mixture. At week 2, the exGaussian log-likelihood model is −1007.8 compared to −1002.8 for the 2-component finite mixture model. It should be noted that the 5-parameter finite mixture model is much more flexible than the 2-parameter exGaussian model. Although the infinite and finite mixture models are not nested within each other, the log-likelihoods do provide a useful way to informally compare the models.

If the underlying improvement distribution was truly a 2-component finite mixture corresponding to responders and non-responders to non-specific treatment effects, then the lower component mean should theoretically be zero. A constrained 2-component finite mixture was estimated for the weekly improvement distributions where the lower component mean was set to zero. Maximum likelihood estimates of the constrained and unconstrained finite mixture models were obtained using the EM algorithm and the models were compared via a likelihood ratio test. The hypothesis that the lower component mean is zero was rejected for weeks 1, 2, and 5 with p-values 0.0005, 0.0126 and 0.0402 respectively. The p-values for weeks 3, 4, and 6 were 0.0892, 0.1357, and 0.2781 respectively. In addition, the unconstrained lower component means were each estimated to be greater than zero for each week: μ̂₁ = 2.2, 1.9, 3.5, 4.2, 4.1, 1.8 for weeks 1–6 respectively.

Two of the parameters for the finite mixture model are the proportions of the population associated with each mixture component. The estimated proportions of the population associated with the larger finite mixture component mean (i.e., those experiencing a non-specific response) from the unconstrained model are 0.42 for week 1, 0.71 for week 2, 0.36 for week 3, 0.46 for week 4, 0.32 for week 5 and 0.58 for week 6. In other words, according to the finite mixture model, the percentage of the population experiencing a non-specific response jumps up and down as the trial progresses. If the underlying model is not a 2-component mixture, then these estimated proportions do not have any interpretation and the finite mixture model can simply be regarded as a nonparametric density estimate of the true density.

The infinite mixture, on the other hand, shows a fairly continuous shift from a skew-right distribution of non-specific treatment response at the beginning of the trial to a symmetric “∪”-shape distribution at the end of the trial.

In [14], a population-based log-likelihood was used to show that a 2-component finite mixture can approximate homogeneous distributions very closely. Therefore, determination of whether a distribution is a finite or infinite mixture cannot be ascertained based on goodness-of-fit. The problem with finite mixture models is that (1) the parameters of the finite mixture model do not have any clear meaning when the distribution is not a finite mixture; and (2) based on goodness-of-fit of finite mixture models, investigators can be led to believe that real and distinct sub-populations exist when they do not exist.

5 Discussion

The mean improvement in the placebo arm of the antidepressant treatment study increases from week 1 to week 6. The variability in the improvement distributions also increases until week 5 and then levels off. The changing means, variability and the emergent bimodality in the improvement densities are a result of an evolving non-specific treatment effect over time. According to the latent regression modelling, the non-specific treatment effect evolves from a strictly skew-right distribution when treatment begins to a “∪”-shaped distribution by the end of the 6 week trial stipulated by the quadratic latent predictor.

This paper does not specifically answer questions regarding the mechanisms of the non-specific treatment effects. However, assuming the infinite mixture latent regression model is a good approximation to the truth, the data analysis presented here indicates that individuals in the population can experience a non-specific effect of continuous magnitude and this magnitude evolves over time. Initially the non-specific treatment effects skews the outcomes of subjects in the placebo arm towards improvement. After a few weeks, subjects begin segregating towards either little non-specific effect or towards stronger non-specific response.

41% of the placebo treated subjects were rated responders where response is defined for a subject showing a 50% or greater drop in the HAM-D (17-item) score from the start of the study. This of course is a somewhat arbitrary definition of responder. According to our latent regression model for week 6, almost all subjects show some degree of non-specific response and since θ̂₁ = 0.51 from (6), about 50% of the subjects are experiencing moderate to very strong non-specific placebo effects.

In actual practice, depressed individuals are treated with an active drug, not a placebo pill. An important problem, and the motivation for this work, is to identify drug treated subjects that respond due to non-specific treatment effects. If responders to the non-specific treatment effects can be identified, then active chemical treatments could be targeted more towards those who do not benefit from non-specific effects. If there exist distinct classes of subjects that do and do not benefit from non-specific treatment effects, then the finite mixture model is appropriate. However, based on the arguments in Section 4, the infinite mixture model seems a better approximation to the truth. If distinct latent classes do not exist, treatment decisions would need to be made on somewhat arbitrary cutoff values along a continuum, perhaps based on (9). Instead of fitting a finite mixture model predicated on the assumption that distinct groups exist, a more appropriate statistical analysis may be to determine an optimal partition of the distribution that acknowledges that the non-specific treatment effects can vary continuously [e.g. see 13].

Unfortunately, from Figure 4 we see that the infinite mixture and the finite mixture densities are essentially identical for the improvement distributions later in the trial. Other choices for latent densities may also provide a good fit to the data but lead to different inferences. This illustrates that statistical models may not always be able to determine the true nature of illnesses and responses to treatment.

Acknowledgments

We would like to acknowledge Jonathan Stewart, MD, for his careful review of this manuscript and for providing expertise in the problem of placebo response. We are also grateful to Donald Klein, MD and Patrick McGrath, MD, for generously spending time with us to discuss the existing theories about placebo effects in the treatment of psychiatric illnesses. We thank Erin Tewksbury and Liping Deng for assistance with the data analysis and programming. The authors would like to thank the Eli Lilly Company for providing the data used in this paper. This work was supported by NIMH grant R01 MH68401.

Contributor Information

Thaddeus Tarpey, Department of Mathematics and Statistics, Wright State University, Dayton, Ohio, thaddeus.tarpey@wright.edu, (937)-775-2861 and fax (937) 775-2081.

Eva Petkova, Department of Child and Adolescent Psychiatry at New York University, and a Senior Researcher at the Nathan Kline Institute, New York, NY 10016-6023, ep120@med.nyu.edu.

References

1.Bruni C, Koch G. Identifiability of continuous mixtures of unknown gaussian distributions. Annals of Probability. 1985;13:1341–1357. [Google Scholar]
2.Byrd RH, Lu P, Nocedal J, Zhu C. A limited memory algorithm for bound constrained optimization. SIAM Journal of Scientific Computing. 1995;16:1190–1208. [Google Scholar]
3.Day NE. Estimating the components of a mixture of normal distributions. Biometrika. 1969;56:463–474. [Google Scholar]
4.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the American Statistical Association. 1977;39:1–38. [Google Scholar]
5.Everitt BS. Bimodality and the nature of depression. British Journal of Psychiatry. 1981;138:336–339. doi: 10.1192/bjp.138.4.336. [DOI] [PubMed] [Google Scholar]
6.Fraley C, Raftery A. Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association. 2002;97:611–631. [Google Scholar]
7.Kirsch Irving, Deacon BrettJ, Huedo-Medina TaniaB, Scoboria Alan, Moore ThomasJ, Johnson BlairT. Initial severity and antidepressant benefits: A meta-analysis of data submitted to the food and drug administration. Public Library of Science Medicine. 2008;5:e45. doi: 10.1371/journal.pmed.0050045. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Klein DonaldF. Control groups in pharmacotherapy and psychotherapy evaluations. Treatment. 1997:1. [Google Scholar]
9.Luce RD. Response Times: Their role in inferring elementary mental organization. New York: Oxford University Press; 1986. [Google Scholar]
10.Muthén Bengt, Brown HendricksC. Estimating drug effects in the presence of placebo response: Causal inference using growth mixture modeling. Statistics in Medicine. 2009;28:3363–3385. doi: 10.1002/sim.3721. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2003. ISBN 3-900051-00-3. [Google Scholar]
12.Tarpey T, Petkova E. Latent regression analysis. Statistical Modelling. 2010 doi: 10.1177/1471082X0801000202. To appear. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Tarpey T, Petkova E, Ogden RT. Profiling placebo responders by self-consistent partitions of functional data. Journal of the American Statistical Association. 2003;98:850–858. [Google Scholar]
14.Tarpey T, Yun D, Petkova E. Model misspecification: Finite mixture or homogeneous? Statistical Modelling. 2008;8:199–218. doi: 10.1177/1471082X0800800204. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Titterington DM, Smith AFM, Makov UE. Statistical Analysis of Finite Mixture Distributions. New York: Wiley; 1985. [Google Scholar]
16.Wallis Claudia. A powerful identity, a vanishing diagnosis, 2009. New York Times: 2009. Nov 2, 2009. [Google Scholar]

[R1] 1.Bruni C, Koch G. Identifiability of continuous mixtures of unknown gaussian distributions. Annals of Probability. 1985;13:1341–1357. [Google Scholar]

[R2] 2.Byrd RH, Lu P, Nocedal J, Zhu C. A limited memory algorithm for bound constrained optimization. SIAM Journal of Scientific Computing. 1995;16:1190–1208. [Google Scholar]

[R3] 3.Day NE. Estimating the components of a mixture of normal distributions. Biometrika. 1969;56:463–474. [Google Scholar]

[R4] 4.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the American Statistical Association. 1977;39:1–38. [Google Scholar]

[R5] 5.Everitt BS. Bimodality and the nature of depression. British Journal of Psychiatry. 1981;138:336–339. doi: 10.1192/bjp.138.4.336. [DOI] [PubMed] [Google Scholar]

[R6] 6.Fraley C, Raftery A. Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association. 2002;97:611–631. [Google Scholar]

[R7] 7.Kirsch Irving, Deacon BrettJ, Huedo-Medina TaniaB, Scoboria Alan, Moore ThomasJ, Johnson BlairT. Initial severity and antidepressant benefits: A meta-analysis of data submitted to the food and drug administration. Public Library of Science Medicine. 2008;5:e45. doi: 10.1371/journal.pmed.0050045. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Klein DonaldF. Control groups in pharmacotherapy and psychotherapy evaluations. Treatment. 1997:1. [Google Scholar]

[R9] 9.Luce RD. Response Times: Their role in inferring elementary mental organization. New York: Oxford University Press; 1986. [Google Scholar]

[R10] 10.Muthén Bengt, Brown HendricksC. Estimating drug effects in the presence of placebo response: Causal inference using growth mixture modeling. Statistics in Medicine. 2009;28:3363–3385. doi: 10.1002/sim.3721. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2003. ISBN 3-900051-00-3. [Google Scholar]

[R12] 12.Tarpey T, Petkova E. Latent regression analysis. Statistical Modelling. 2010 doi: 10.1177/1471082X0801000202. To appear. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Tarpey T, Petkova E, Ogden RT. Profiling placebo responders by self-consistent partitions of functional data. Journal of the American Statistical Association. 2003;98:850–858. [Google Scholar]

[R14] 14.Tarpey T, Yun D, Petkova E. Model misspecification: Finite mixture or homogeneous? Statistical Modelling. 2008;8:199–218. doi: 10.1177/1471082X0800800204. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Titterington DM, Smith AFM, Makov UE. Statistical Analysis of Finite Mixture Distributions. New York: Wiley; 1985. [Google Scholar]

[R16] 16.Wallis Claudia. A powerful identity, a vanishing diagnosis, 2009. New York Times: 2009. Nov 2, 2009. [Google Scholar]

PERMALINK

Modelling Placebo Response via Infinite Mixtures

Thaddeus Tarpey

Eva Petkova

Roles

Abstract

1 Introduction