Bayesian EMAX model with a mixture of normal distributions for dose-response in clinical trials

Fengming Tang; Susan Carlson; Jo Wick; Byron J Gajewski

doi:10.1016/j.cct.2021.106571

. Author manuscript; available in PMC: 2022 Nov 1.

Published in final edited form as: Contemp Clin Trials. 2021 Sep 20;110:106571. doi: 10.1016/j.cct.2021.106571

Bayesian EMAX model with a mixture of normal distributions for dose-response in clinical trials

Fengming Tang ^1,^2,^*, Susan Carlson ³, Jo Wick ¹, Byron J Gajewski ¹

PMCID: PMC8596292 NIHMSID: NIHMS1741041 PMID: 34555517

Abstract

When a dose-response relationship is monotonic, the EMAX model has been shown to provide a good empirical fit for designing and analyzing dose-response data across a wide range of pharmaceutical studies. However, the EMAX model has never been applied to a finite mixture distribution. Motivated by a proposal investigating DHA dose effect on preterm birth (PTB, <37 weeks gestation) rate, we developed an innovative Bayesian EMAX mixture model incorporating the three normal components finite mixture model into the EMAX framework. The proposed Bayesian EMAX mixture model analyzes gestational age as a continuous variable, which allows for statistically efficient estimates of PTB rate using various cut point with the same parsimonious model. For example, we can estimate the rate of early PTB (ePTB, <34 weeks gestation), PTB (<37 weeks gestation), and late-term birth (>41 weeks gestation) using the same model. We compared our proposed EMAX mixture model with an EMAX logistic model and an independent doses logistic model for a dichotomized endpoint using extensive simulations. Across the scenarios under consideration, the EMAX mixture model achieved higher power than the EMAX logistic model and the independent doses logistic model in detecting the effect of DHA supplementation on the PTB rate. The EMAX mixture model also resulted in smaller mean squared errors (MSE) in PTB rate estimates.

Keywords: EMAX model, dose-response model, Bayesian analysis, normal mixture distributions

1. Introduction

Preterm birth (PTB) is defined as birth before 37 weeks gestation. One in 10 U.S. pregnancies ends in PTB, yielding nearly half a million preterm infants born each year. PTB is the primary cause of infant mortality, costs the U.S. health system billions of dollars annually, and, for many of the infants who survive, results in continued individual, family, and societal challenges due to associated morbidity and disabilities. Despite the significant investment of the National Institutes of Health (NIH), the Centers for Disease Control and Prevention (CDC), and foundations such as the March of Dimes toward understanding and preventing PTB, researchers have only recently identified prevention strategies for spontaneous PTB. In a November 2018 Cochrane Review (1), researchers concluded there was strong evidence that consumption of the omega-3 fatty acid docosahexaenoic acid (DHA) could reduce PTB by 11%, and early PTB (ePTB, <34 weeks gestation) by 42%. These results are compelling. However, additional research is necessary to move from an observed effect of DHA to a scalable preventive intervention for PTB. The critical issue is that the DHA dose needed to reduce PTB is unknown. At present, the National Academy of Medicine does not set a Dietary Reference Intake (DRI) for DHA in pregnancy because the amount of DHA required to reduce PTB has not been established. Most prenatal supplements available in the U.S. contain ~0.2g DHA, a much lower dose than provided in most randomized controlled trials (RCTs) included in the Cochrane Review (≥0.6g DHA). A dose-response study is necessary to develop evidence-based policy and advise women about the DHA dose needed to reduce PTB. Our goal is to identify an efficient trial design to evaluate the effect of DHA dose on PTB.

One frequently used dose-response model is the pairwise independent doses model. In this model, no functional relationship is assumed between the dose and effect, and all doses are modeled independently and compared with each other. The independent doses model is often inefficient and results in lower power because of its lack of functional relationship assumption. When the dose-response relationship can be assumed monotonic, an EMAX (MAXimum Effect) model has been shown to provide a good empirical fit for designing and analyzing dose-response data across a wide range of pharmaceutical studies (2). The EMAX model assumes the dose-response relationship follows a nonlinear monotonic function with a parameter representing the maximum effect that can be achieved when dose approaches to infinity and another parameter representing the dose that achieves 50% of the maximum effect. One option to evaluate the DHA dose effect on PTB is to apply the EMAX model treating PTB as a dichotomous endpoint. However, studies have shown that dichotomizing continuous endpoints results in a loss of information and reduced power (3,4,5). We propose a Bayesian EMAX model that analyzes gestational age as continuous. Schwartz et al. showed that the distribution of gestational age can be described by a mixture of three normal distributions (6). Thus, we developed our EMAX mixture model for a continuous endpoint with a mixture distribution. We compared our model with two models that dichotomize gestational age: the EMAX model (EMAX logistic model) and the independent doses logistic model. Extensive simulations showed that the EMAX Mixture model achieved a much higher power for detecting the DHA dose effect on PTB than the other two models and resulted in much smaller mean squared errors (MSEs) in estimates of PTB rates. Additionally, the EMAX Mixture model is attractive because it allows for statistically efficient estimates of PTB rates using different gestational age cut-points within the same parsimonious model. For example, we can estimate the rate of early preterm birth (<34 weeks gestation), preterm birth (<37 weeks gestation), and late-term birth (>41 weeks gestation) using the same model.

The remainder of the article is organized as follows. In Sections 2.1 and 2.2, we describe the study motivation and cover the three statistical models in detail (EMAX Mixture, EMAX logistic, and independent doses logistic). Section 2.3 describes the simulation scenarios used to assess model operating characteristics. Section 2.4 is model calibration and type I error, and Section 2.5 provides simulation details. The simulation results are summarized in Section 3. In Section 4, we apply the three models to a simulated dataset to illustrate the models’ application. In Section 5, we conclude from our analysis and discuss the advantages and limitations of the EMAX Mixture model and future work.

2. Methods

2.1. Study summary

Our research was motivated by a proposal whose primary aim was to evaluate the dose-response relationship for DHA supplementation on PTB by leveaging the data from six NICHD supported randomized clinical trials (RCTs) of DHA supplementation in pregnancy conducted between 2006 and 2020 (R21 HD058269, R21 HD059019; R01 HD084586; R01 HD086001, R01 HD047315, R01 HD083292). The trials combined enrolled over 2000 U.S. women with a singleton pregnancy in four metropolitan areas (Kansas City, Chicago, Cincinnati, and Columbus). Six DHA doses were used across the trials: 0g (n=350), 0.2g (n=700), 0.45g (n=175), 0.6g (n=180), 0.8g (n=150) and 1g (n=550).

As a secondary aim, DHA supplementation was hypothesized to have a bigger effect on ePTB and/or PTB in participants with a lower phospholipid DHA level at enrollment. By dividing the participants into two groups according to their phospholipid DHA (as a percent of total fatty acids) at enrollment (Low: phospholipid DHA <6%; High: phospholipid DHA ≥ 6%), the proposal wanted to determine if phospholipid DHA status at enrollment influences the effect of DHA supplement on ePTB and/or PTB.

Our goal was to identify an efficient trial design to evaluate the primary and secondary aims of the proposal.

2.2. Statistical models

Let T_di denote the gestational age for participant i in arm d, where d represents the DHA supplement dose and can take values of 0g, 0.2g, 0.45g, 0.6g, 0.8g, and 1g. The number of participants in each arm, denoted by n_d, is 350, 700, 175, 180, 150, and 550 for the 6 doses, respectively. Let y_d be the number of ePTBs in dose d, which can be determined by $y_{d} = \sum_{i = 1}^{n_{d}} I (T_{d i} < 34)$ , where $I (x < a) = \{\begin{matrix} 1 if x < a \\ 0 if x \geq a \end{matrix}$ . This section will describe statistical models for the independent doses logistic model, EMAX logistic model, and the EMAX Mixture model.

2.2.1. Independent doses logistic model

Let p_d be the probability of an ePTB in dose d. The number of ePTBs in dose d follows a binomial distribution, y_d ~ Bin(n_d,p_d), and it is modeled independently for each dose. A normal distribution N(0,5²) is used as a vague prior for $θ_{d} = log (\frac{p_{d}}{1 - p_{d}})$ . When transformed back to probability scale using an anti-logit function, the prior yields a 95% equal-tailed interval of (0.001, 0.999). Hamiltonian Monte Carlo (Betancourt; Gelman et al., 2014) is used to obtain the posterior distribution of θ_d. The posterior probability p_d can be calculated using $p_{d} = \frac{exp (θ_{d})}{1 + exp (θ_{d})}$ . The posterior probability of dose d being better than the control arm, Pr(p_d < p₀ | data), can be estimated as the proportion of Monte Carlo draws satisfying p_d < p₀. The trial success is achieved when max(Pr(p_d < p₀ | data)) > δ_ind.1. The threshold δ_ind.1 is chosen by simulations to ensure a 5% type I error rate.

To determine whether phospholipid DHA at enrollment influences the effect of DHA supplementation, we model the high phospholipid DHA cohort and low phospholipid DHA cohort using the same model described above, but separately. We denote the odds ratio between the arm with the highest dose (d = 1g) and the control arm (d = 0g) using $O = \frac{p_{1}}{p_{0}}$ . The posterior probability of having a bigger DHA effect in the low phospholipid DHA cohort than in the high phospholipid DHA cohort is Pr(O_L < O_H | data). It can be calculated as the proportion of Monte Carlo draws satisfying O_L < O_H. Trial success is achieved when Pr(O_L < O_H | data) > δ_ind.2. The threshold δ_ind.2 is chosen by simulations to ensure a 5% type I error rate.

2.2.2. EMAX logistic model

As in the independent doses logistic model, y_d ~ Bin(n_d, p_d). Instead of modeling θ_d independently for each dose d, the EMAX function is used to model the relationship between θ_d and $d : θ_{d} = a_{1} + \frac{a_{2} * d}{d + a_{3}}$ .

a₁ is a constant offset. When d = 0, a₁ determines θ₀ solely, which in turn determines the ePTB rate in the control arm.
a₂ is a scalar coefficient reflecting the dose effect. It is the theoretical maximum effect above the constant offset that can be achieved. As dose tends to infinity the theoretical maximum efficacy on the logit scale is a₁ + a₂, thus the model is called the EMAX model.
a₃ is a positive scalar representing the effective dose strength that achieves 50% of the theoretical maximal effect above the constant offset. For an effective dose of d=a₃ the efficacy on logit scale is $a_{1} + \frac{a_{2}}{2}$ .

A non-informative prior distribution N(0,4) is used for a₁ and a_2. For a₃, a half-normal prior N(0,1) is used so that a₃ can take positive values only. Hamiltonian Monte Carlo (Betancourt; Gelman et al., 2014) is used to obtain the posterior distribution of a₁, a₂ and a₃ and the posterior probability p_d can be calculated using $p_{d} = \frac{exp (θ_{d})}{1 + exp (θ_{d})}, where θ_{d} = a_{1} + \frac{a_{2} * d}{d + a_{3}}$ . It is easy to prove that when a₂ is negative, θ_d decreases as d increases. Therefore, the success of a trial is defined as having a posterior probability of a₂ < 0 greater than a cutoff value, Pr(a₂ < 0) >δ_EMAX.1. The threshold δ_EMAX.1 is chosen by simulations to ensure a 5% type I error rate.

To determine whether phospholipid DHA at enrollment influences the effect of DHA supplementation on PTB, we model the high and low phospholipid DHA cohorts using the same model described above separately. Let a_2H and a_2L denote EMAX parameters a₂ in the high and low phospholipid DHA cohorts, respectively. The success of the trial is defined as having a posterior probability of a_2L < a_2H greater than a cutoff, Pr(a_2L < a_2H | data) > δ_EMAX.2, where Pr(a_2L < a_2H | data) can be estimated using the proportion of Monte Carlo draws satisfying a_2L < a_2H, and the threshold δ_EMAX.2 is chosen by simulations to ensure a 5% type I error rate.

2.2.3. EMAX Mixture model

In the finite mixture model developed by Schwartz et al.(6), gestational age T_di follows a finite mixture model with three normal components that describe the mixture of high-, medium-, and low-risk groups: N₁ = N(33.29,13.23), N₂ = N(38.26, 2.48), and N₃ = N(39.59, 0.960). The probability distribution function of T_di is f(T_di | Δ_1d,Δ_2d,Δ_3d) =

Δ_1dϕ(T_di | 33.29, 13.23) + Δ_2dϕ(T_di | 38.26,2.48) + Δ_3dϕ(T_di | 39.59,0.96), where ϕ(T | μ, σ²) is the normal probability density function with mean μ and variance σ², and Δ_1d, Δ_2d, and Δ_3d are the mixture weights for arm d, which can take values between 0 and 1 and with Δ_1d + Δ_2d + Δ_3d = 1. The three components represent the high, medium, and low-risk groups for ePTB and can model different populations by adjusting the mixture weights. Schwartz’s choice of three-component was driven by model identifiability and Bayesian Information Criterion instead of clinical significance. The component means and variances were derived by Schwartz from the North Carolina Detailed Birth Record (NCDBR) registry, with 336,129 records included in the final analysis. It has been proven to be representative and generalizable and has been used successfully in other studies of PTB (12,13,14).

Based on Schwartz’s finite mixture model, we propose a dose-response model that applies the EMAX function to finite mixture distributions. We call it the EMAX Mixture model. Let $θ_{1 d} = log (\frac{Δ_{1 d}}{Δ_{3 d}})$ represent the odds ratio of the mixture weights for the high- versus the low-risk groups’ normal components. Similarly, $θ_{2 d} = log (\frac{Δ_{2 d}}{Δ_{3 d}})$ is the odds ratio for the medium- versus low-risk groups’ normal components. The EMAX function is employed to model the relationship between the odds ratio comparing high- and low-risk groups, θ_1d, and dose d. This relationship is given by $θ_{1 d} = a_{1} + \frac{a_{2} d}{d + a_{3}}$ . Without losing the model generalizability, we assume the odds ratio comparing medium- to low-risk groups, θ_2d, stays constant for all doses: θ_2d = a₄.

a₁ and a₄ are the constant offsets. They determine the three mixture weights when the effective dose strength is 0: $Δ_{10} = \frac{exp (a_{1})}{1 + exp (a_{1}) + exp (a_{4})}$ , $Δ_{20} = \frac{exp (a_{4})}{1 + exp (a_{1}) + exp (a_{4})}$ and $Δ_{30} = \frac{1}{1 + exp (a_{1}) + exp (a_{4})}$ .
a₂ is the scalar coefficient reflecting the dose effect. When it is negative, as the dose increases the mixture weight of the 1^st component (high risk) decreases and the mixing weights of the 2^nd (median risk) and 3^rd (low risk) components increase. a₂ determines the theoretical maximum effect (the minimum weight of the 1^st component) above the constant offset that can be achieved. When the effective dose strength is not 0: $Δ_{1 d} = \frac{exp (a_{1} + \frac{a_{2} d}{d_{i} + a_{3}})}{1 + exp (a_{1} + \frac{a_{2} d}{d + a_{3}}) + exp (a_{4})}$ , $Δ_{2 d} = \frac{exp (a_{4})}{1 + exp (a_{1} + \frac{a_{2} d}{d + a_{3}}) + exp (a_{4})}$ , and $Δ_{3 d} = \frac{1}{1 + exp (a_{1} + \frac{a_{2} d}{d_{i} + d}) + exp (a_{4})}$ .
a₃ is a positive scalar representing the effective dose strength that achieves 50% of the theoretical maximal effect.

We use a vague prior N(0,4) for a₁, a₂, and a₄, and a half-normal N(0,1) for a₃ to restrict it to be positive. Hamiltonian Monte Carlo (Betancourt; Gelman et al., 2014) is used to obtain the posterior distribution of a₁, a₂, a₃ and a₄. The posterior distribution for mixture weights Δ_1d,Δ_2d,Δ_3d can be calculated using the formulas given above. The posterior probability of having ePTB (<34 weeks gestation age) for dose d can be calculated using $p_{d} = \int_{0}^{34} f (t |Δ_{1 d}, Δ_{2 d}, Δ_{3 d}) d t$ . By changing the upper integration bound, we can calculate the posterior probability of PTB rates at different cutoffs. For example, the posterior probability of having PTB (<37 weeks gestation age) is $\int_{0}^{37} f (t |Δ_{1 d}, Δ_{2 d}, Δ_{3 d}) d t$ . Allowing for statistically efficient PTB rate estimates using various cut points with the same parsimonious model makes the EMAX Mixture model attractive.

The success of a trial is defined as having a posterior probability of a₂ < 0 greater than a threshold, Pr(a₂ < 0) >δ_{EMAX_Mix.1}. The posterior probability of a₂ < 0 can be estimated using the proportion of Monte Carlo draws with a₂ < 0. δ_{EMAX_Mix.1} is chosen using simulations to ensure a 5% type I error rate.

The high and low baseline phospholipid DHA cohorts are modeled separately using the same model described above to investigate whether phospholipid DHA at enrollment influences the effect of DHA supplementation. Let a_2H and a_2L denote the EMAX parameters a₂ in the high and low phospholipid DHA cohorts, respectively. The success of the trial is defined as Pr(a_2L < a_2H | data) > δ_{EMAX_Mix.2}. The posterior probability of a_2L < a_2H can be estimated using the proportion of Monte Carlo draws with a_2L < a_2H. δ_{EMAX_Mix.2} is chosen using simulations to ensure a 5% type I error rate.

2.3. Simulation scenarios

Two sets of simulations were performed to compare the operating characteristics of the three models in consideration. The first set of simulations evaluates the dose-response relationship for the effect of DHA supplement. The second set evaluates whether phospholipid DHA at enrollment impacts the effect of DHA supplement.

Four virtual scenarios (Table 1) with realistic ePTB rates derived from an existing clinical trial were used to evaluate the dose-response relationship between DHA and ePTB. The “expected” scenario represents the most likely response we believe based on the result from Kansas University DHA Outcome Study (KUDOS) (7). The “optimistic” and “pessimistic” scenarios reflect the 97.5^th and 2.5^th percentiles of the expected response. Lastly, the improbable scenario that serves as our null hypothesis is labeled “no effect” in Table 1. In this scenario, the assumed rates of ePTB are the same across different doses. Therefore, the extent to which this scenario is “successful” actually reflects the Type I error rate.

Table 1.

Virtual scenarios (rate of ePTB) for evaluating dose-response relationship for the effect of DHA.

			Dose (g/day)
Scenario	0 (n=350)	0.2 (n=700)	0.45 (n=175)	0.6 (n=180)	0.8 (n=150)	1 (n=550)
1 (optimistic)	6.27%	4.90%	3.91%	3.52%	3.13%	2.85%
2 (expected)	3.34%	2.60%	2.00%	1.74%	1.47%	1.27%
3 (pessimistic)	1.56%	1.17%	0.87%	0.75%	0.63%	0.54%
4 (no effect)	3.34%	3.34%	3.34%	3.34%	3.34%	3.34%

Open in a new tab

Simulation scenarios investigating whether phospholipid DHA at enrollment impact DHA supplement’s effect are given in Table 2. In the “optimistic” scenario, the high phospholipid DHA group has a very low but constant ePTB rate of 1.56% across different doses. The low phospholipid DHA group has decreasing ePTB rates that range from 11.01% when d = 0g to 4.16% when d = 1g. The average ePTB rates of the high and low groups are equal to the “optimistic” scenario in Table 1 (6.27%, 4.9%, 3.91%, 3.52%, 3.31%, and 2.85% for dose of 0g, 0.2g, 0.45g, 0.6g, 0.8g, and 1g, respectively). The “no effect” scenario represents the null hypothesis where both high and low groups have ePTB rates equal to the “optimistic” scenario in Table 1.

Table 2.

Scenarios (rate of ePTB) for investigating whether DHA status at enrollment impacts the effect of DHA supplementation.

				Dose (g/day)
Scenario	DHA	0 (n=350)	0.2 (n=700)	0.45 (n=175)	0.6 (n=180)	0.8 (n=150)	1 (n=550)
1 (optimistic)	High	1.56%	1.56%	1.56%	1.56%	1.56%	1.56%
	Low	11.01%	8.19%	6.20%	5.43%	4.69%	4.16%
2 (no effect)	High	6.27%	4.90%	3.91%	3.52%	3.13%	2.85%
	Low	6.27%	4.90%	3.91%	3.52%	3.13%	2.85%

Open in a new tab

2.4. Model calibration

According to the U.S. Department of Health and Human Services Food and Drug Administration, Center for Drug Evaluation and Research (CDER), and Center for Biologics Evaluation and Research (CBER), the type I error rate can be estimated by the proportion of simulations that meet the success threshold in null scenarios (8), and power can be estimated by the proportion of simulations that meet the success threshold in alternative scenarios. To make designs comparable, success thresholds are chosen to achieve similar type I error rates across designs using simulations. This process is called model calibration. For example, Figure 1 is the plot of the proportion of successes (type 1 error rate) by threshold values (δ_{EMAX_Mix.1}) based on simulations using the EMAX Mixture model under the null scenario. As the threshold increases, the proportion of simulations meeting the success criterion decreases. When the threshold is 0.845, the type I error rate is roughly 5%. The more simulations we run for each scenario, the more precise the type I error rate can be. Using the same method, we identified δ_{EMAX_Mix.1} = 0.845 and $δ_{E M A X_{M i x}, 2} = 0.74$ for the EMAX Mixture model, δ_EMAX.1 = 0.955 and δ_EMAX,2 = 0.92 for the EMAX logistic model, and δ_ind.1 = 0.992 and δ_ind,2 = 0.97 for the independent doses logistic model. With these chosen thresholds, the null scenarios’ success rates were controlled under 5% in all models.

Type I error rate (Proportion of success) by threshold (δ_{EMAX_Mix.1}) based on simulations for the EMAX Mixture model in the null scenario

2.5. Simulations

For the independent doses logistic model and the EMAX logistic model, we simulated the number of participants with ePTB (y_d) using binomial distributions with n_d ∈ {350, 700, 175, 180, 150} and p_d given in Table 1 and Table 2.

For the EMAX mixture model, we first used a trial and error method to find values of (a₁, a₂, a₃, a₄) that would result in the early preterm birth rates specified in Table 1 and Table 2. These values are given in Table 3 and Table 4, respectively. We then calculated (Δ_1d, Δ_2d, Δ_3d) for each dose using the formulas: $Δ_{1 d} = \frac{exp (a_{1} + \frac{a_{2} d}{d + a_{3}})}{1 + exp (a_{1} + \frac{a_{2} d}{d + a_{3}}) + exp (a_{4})}$ , $Δ_{2 d} = \frac{exp (a_{4})}{1 + exp (a_{1} + \frac{a_{2} d}{d + a_{3}}) + exp (a_{4})}$ , and $Δ_{3 d} = \frac{1}{1 + exp (a_{1} + \frac{a_{2} d}{d + a_{3}}) + exp (a_{4})}$ . And finally, we used the normal mixture distributions f (T_di | Δ_1d, Δ_2d, Δ_3d) to generate gestational ages T_di.

Table 3.

Parameters used to simulate gestation ages for scenarios in Table 1.


Scenario	a ₁	a ₂	a ₃	a ₄
1 (optimistic)	−2.00	−1.85	1.15	−2.16
2 (expected)	−2.64	−3.6	2.5	−1.72
3 (pessimistic)	−3.53	−3.3	2	−2.52
4 (no effect)	−2.64	0	NA	−1.72

Open in a new tab

Table 4.

Parameters used to simulate gestation ages for scenarios in Table 2.


Scenario	DHA	a ₁	a ₂	a ₃	a ₄
1 (optimistic)	High	−3.53	0	NA	−2.52
	Low	−1.30	−2.5	1.15	−1.85
2 (no effect)	High	−2.00	−1.85	1.15	−2.16
	Low	−2.00	−1.85	1.15	−2.16

Open in a new tab

For each model and each scenario in Table 1 and Table 2, we ran 10,000 simulations. The maximum 95% margin of error for the estimated type I error rate and the model power is $1.96 \sqrt{0.5 * 0.5 / 10,000} < 0.01$ . For a true type 1 error rate of 5%, the margin of error for the estimated type I error rate is $1.96 \sqrt{0.05 * 0.95 / 10, 000} = 0.004$ . For a true power of 90%, the margin of error is $1.96 \sqrt{0.1 * 0.9 / 10, 000} = 0.005$ .

The simulations were implemented in R (9 and 10) and Stan (11). R was used to generate gestation age data, and Stan was used to perform analyses.

3. Simulation Results

In this section, we report the simulation results comparing the three models under different scenarios described in Section 2.3. We assessed two critical aspects of model performance: statistical power in detecting the effect of DHA supplementation on the PTB rate and mean squared errors (MSE) and bias in PTB rate estimates.

3.1. Power

Table 5 shows the simulation results for power (proportions of success simulations) for the optimistic, expected, and pessimistic scenarios in Table 1, where the goal was to evaluate the dose-response relationship. The EMAX Mixture model had the highest power and independent doses logistic model had the lowest power across all scenarios. In the order of EMAX Mixture, EMAX logistic, and independent doses logistic, power was 99.98%, 84.89%, and 59.86% in the optimistic scenario; 99.79%, 73.35%, and 48.43% in the expected scenario; and 96.76%, 48.92%, and 24.75% in the pessimistic scenario.

Table 5.

Power for the effective scenarios in Table 1 where the goal was to evaluate the dose-response relationship for effect of DHA supplement on ePTB


Scenario	EMAX Mixture	EMAX logistic	Independent logistic
1 (optimistic)	99.98%	84.89%	59.86%
2 (expected)	99.79%	73.35%	48.43%
3 (pessimistic)	96.76%	48.92%	24.75%

Open in a new tab

Table 6 shows the simulation results for power for the optimistic scenario in Table 2, where the aim was to investigate whether DHA level at enrollment impacted the effect of DHA supplement. The EMAX Mixture model had the highest power of 95.4%, and the EMAX logistic model of 35.1% followed it. The independent doses logistic model had the lowest power of 27.6%.

Table 6.

Power for the effective scenarios in Table 2 where the aim was to investigate whether DHA level at enrollment had an impact on the effect of DHA supplement on PTBs


Scenario	EMAX Mixture	EMAX logistic	Independent logistic
1 (optimistic)	95.4%	35.1%	27.6%

Open in a new tab

Compared with the independent doses logistic model, the EMAX Mixture and EMAX logistic models are more efficient because they take advantage of the monotonic dose-response relationship by using the EMAX function. Compared with the EMAX logistic model, the EMAX Mixture model is more efficient because it treats gestational age as a continuous variable, while the EMAX logistic model uses a dichotomized gestational age variable. Studies have shown that dichotomizing continuous endpoints results in a loss of information and reduced power (3,4,5).

3.2. MSE and bias

As described in Section 2.2, the posterior distribution of the probability of ePTB, p_d, can be obtained using Monte Carlo simulations. Let ${\hat{p}}_{d} | data$ denote the posterior mean of p_d, the expected posterior probability of ePTB can be obtained as the average of ${\hat{p}}_{d} | data$ across simulations, $E ({\hat{p}}_{d} | data) = \frac{\sum_{k = 1}^{S} {\hat{p}}_{d_{k} | data}}{S},$ where S is the number of simulations. The sample variance, ${\hat{V}}_{d}$ , can be calculated as ${\hat{V}}_{d} = \frac{\sum_{k = 1}^{S} {({\hat{p}}_{d_{k}} - E ({\hat{p}}_{d}))}^{2}}{S - 1}$ . The bias is the difference between the expected posterior probability $E ({\hat{p}}_{d})$ and the true probability $p_{d}^{T}$ , $bias = E ({\hat{p}}_{d}) - p_{d}^{T}$ . The mean squared error is $MSE = {bias}^{2} + {\hat{V}}_{d}$ .

Table 7 shows the simulation results for MSE× 10⁵ of $E ({\hat{p}}_{d})$ . Across all scenarios and doses, the EMAX Mixture model had the lowest MSE, and the independent doses logistic model had the highest MSE. When averaged across different doses, MSE× 10⁵ for the three models (EMAX Mixture, EMAX logistic, and independent doses logistic) were 1.12, 3.97, and 16.27, respectively, in the Optimistic scenario; 0.79, 2.09, and 8.18, respectively, in the Expected scenario; 0.32, 0.89, 3.61, respectively, in the Pessimistic scenario; 0.81, 2.77, and 14.11, respectively, in the no effect scenario.

Table 7.

MSE × 10⁵ of the expected estimated posterior ePTB rate $E ({\hat{p}}_{d} | data)$

Scenario	Model	Dose(g/day)						Average
Scenario	Model	0	0.2	0.45	0.6	0.8	1	Average
Optimistic	EMAX Mixture	2.31	0.92	0.58	0.68	0.96	1.29	1.12
	EMAX logistic	11.04	3.00	2.04	2.17	2.56	3.01	3.97
	Independent logistic	18.23	6.82	25.66	19.99	21.81	5.13	16.27
Expected	EMAX Mixture	2.02	0.65	0.37	0.35	0.51	0.81	0.79
	EMAX logistic	5.99	1.61	1.02	1.02	1.27	1.65	2.09
	Independent logistic	10.12	3.68	13.23	9.56	10.17	2.30	8.18
Pessimistic	EMAX Mixture	0.93	0.25	0.15	0.14	0.18	0.25	0.32
	EMAX logistic	2.66	0.64	0.43	0.44	0.52	0.64	0.89
	Independent logistic	4.67	1.67	5.70	4.25	4.37	0.97	3.61
No effect	EMAX Mixture	1.56	0.57	0.51	0.59	0.74	0.90	0.81
	EMAX logistic	5.26	1.93	1.65	1.95	2.55	3.26	2.77
	Independent logistic	9.93	4.78	21.93	18.65	23.52	5.82	14.11

Open in a new tab

Table 8 shows the simulation result for bias of $E ({\hat{p}}_{d})$ . In most cases, independent doses logistic model had the lowest bias. EMAX Mixture and the EMAX logistic had a comparable amount of bias. Nevertheless, the differences were very small in comparison with sample variance.

Table 8.

Bias of expected estimated posterior ePTB rate $E ({\hat{p}}_{d} | data)$

Scenario	Model	Dose(g/day)						Average
Scenario	Model	0	0.2	0.45	0.6	0.8	1	Average
Optimistic	EMAX Mixture	0.0004	−0.0013	0.0000	0.0008	0.0017	0.0024	0.0007
	EMAX logistic	0.0011	−0.0013	−0.0002	0.0005	0.0013	0.0019	0.0005
	Independent logistic	−0.0001	0.0000	0.0001	−0.0001	−0.0001	0.0000	0.0000
Expected	EMAX Mixture	0.0019	−0.0013	−0.0005	0.0003	0.0013	0.0021	0.0006
	EMAX logistic	0.0016	−0.0014	−0.0005	0.0004	0.0014	0.0022	0.0006
	Independent logistic	−0.0001	0.0000	0.0001	−0.0001	−0.0001	−0.0001	0.0000
Pessimistic	EMAX Mixture	0.0014	−0.0007	−0.0002	0.0002	0.0007	0.0011	0.0004
	EMAX logistic	0.0012	−0.0006	0.0001	0.0005	0.0010	0.0015	0.0006
	Independent logistic	0.0000	0.0000	−0.0001	0.0000	0.0000	0.0000	0.0000
No effect	EMAX Mixture	0.0016	0.0004	0.0000	−0.0002	−0.0002	−0.0003	0.0002
	EMAX logistic	0.0020	0.0005	−0.0001	−0.0002	−0.0002	−0.0001	0.0003
	Independent logistic	−0.0001	0.0001	0.0000	0.0001	−0.0002	0.0000	0.0000

Open in a new tab

4. Application to a simulated data set

To illustrate the three models’ application, we simulated a dataset using the expected scenario in Table 1. We then applied the three models to the simulated dataset and reported the analysis results.

4.1. Generating the simulated dataset

According to Section 2.5, to simulate a cohort with the true ePTB rates in the expected scenario in Table 1, a₁ =−2.64, a₂ = −3.6, a₃= 2.5, and a₄ = −1.72. The mixture weights for each dose were calculated using the formulas in Section 2.5 and they are given in Table 9. Gestational ages were then simulated using the normal mixture distributions. The descriptive statistics of the simulated data are given in Table 10. For dose 0 g/day, 0.2 g/day, 0.45 g/day, 0.6 g/day, 0.8 g/day, and 1g/day, we simulated 325, 690, 150, 175, 140, and 550 gestational ages, respectively. The mean gestational ages were 39.00, 39.05, 39.16, 39.08, 39.12, 39.15 weeks, respectively. The proportions of ePTB (<34 weeks) were 3.69%, 3.19%, 2.00%, 1.71%, 1.43%, and 1.45%, respectively. The proportions of PTB (<37 weeks) were 8.00%, 7.83%, 5.33%, 6.86%, 7.14%, and 6.00%, respectively.

Table 9.

Mixture weights used to simulate dataset.

Mixture weights	Dose(g/day)
Mixture weights	0	0.2	0.45	0.6	0.8	1
Δ_1d	0.0570	0.0442	0.0337	0.0292	0.0246	0.0212
Δ_2d	0.1430	0.1449	0.1465	0.1472	0.1479	0.1484
Δ_3d	0.8000	0.8108	0.8197	0.8236	0.8275	0.8304

Open in a new tab

Table 10.

Descriptive statistics of the simulated dataset

Dose (g/day)	n	mean	sd	min	Q1	median	Q3	max	ePTB (%)	PTB (%)
0	325	39.00	1.83	30.31	38.60	39.33	40.01	42.02	3.69%	8.00%
0.2	690	39.05	2.00	24.96	38.61	39.40	40.11	42.50	3.19%	7.83%
0.45	150	39.16	2.04	22.52	38.60	39.33	40.19	43.40	2.00%	5.33%
0.6	175	39.08	2.06	23.83	38.62	39.36	40.11	42.54	1.71%	6.86%
0.8	140	39.12	1.57	31.74	38.39	39.27	40.22	42.74	1.43%	7.14%
1	550	39.25	1.66	23.92	38.77	39.44	40.13	42.21	1.45%	6.00%

Open in a new tab

4.2. Analysis of the simulated data

The simulated gestational ages were analyzed as a continuous variable using the EMAX Mixture model. The numbers of ePTBs were analyzed as a binomial variable using the EMAX logistic model and independent doses logistic model. The STAN code for the three models can be found in the appendix.

The posterior distribution of model parameters for the EMAX Mixture model, independent logistic model and EMAX logistic model were summarized in table 11, table 12, and table 13, respectively.

The posterior probabilities of ePTB and their credible intervals are reported in Figure 2. The credible intervals of all three models covered the true values (box in the plot). The independent doses logistic model had the widest credible intervals. The EMAX Mixture model and the EMAX logistic model had similar lengths of credible intervals.

Analysis result for the simulated dataset: the posterior probability of ePTB

Both the EMAX Mixture model and EMAX logistic model had a Pr(a₂ < 0) greater than their corresponding cutoffs and can be claimed as successful: EMAX Mixture, Pr(a₂ < 0) = 0.994 > 0.845; EMAX logistic, Pr(a₂ < 0) = 0.989 > 0.955. The independent doses logistic model had a max(Pr(p_d < p₀ | data)) = 0.984, which is less than the cutoff of 0.992. The trial was not a success when the independent doses logistic model was used.

As mentioned in Section 2.2, one advantage of the EMAX Mixture model is that it can estimate the posterior probability of different cut points. For example, the posterior probability of PTB (<37weeks) was 8.66%, 7.34%, 6.59%, 6.31%, 6.05%, and 5.87%, respectively. If we were to estimate the probabilities of PTBs (<37weeks) using the EMAX logistic model and independent doses logistic model, we would have to conduct another set of analyses using the numbers of gestational ages < 37 weeks.

5. Conclusion and discussion

The EMAX model has never been applied to finite mixture distributions. The Bayesian EMAX Mixture model we proposed applies the EMAX model to a three normal components finite mixture distribution developed for gestational age by Schwartz et al. We compared the EMAX Mixture model with the EMAX logistic model and the independent doses logistic model using extensive simulations. Across different scenarios, the EMAX Mixture model achieved significantly higher power in detecting DHA effect on ePTB and resulted in much smaller MSE in the posterior expected estimate of ePTB rate. The EMAX Mixture model had comparable bias to the EMAX logistic model, but was slightly worse than the independent doses logistic model.

Another attractive feature of the EMAX Mixture model is that it allows for statistically efficient estimates of PTB rates using various cut points with the same parsimonious model. For example, we can estimate the rate of early preterm birth (<34 weeks gestation), preterm birth (<37 weeks gestation), and late-term birth (>41 weeks gestation) using the same model. In future work, when we conduct analyses on the data collected in the 6 RCTs, it will be valuable to report these estimates.

Though the EMAX Mixture model was motivated by the three normal finite mixture model used for gestational age, it can have a much wider range of applications. It can be modified to accommodate almost all kinds of mixture distributions. For example, if there are two, instead of three, normal components in the mixture distribution, the EMAX Mixture model can be easily adapted by removing θ_2d from the model and the mixture weight can be written as: $Δ_{1 d} = \frac{exp (a_{1} + \frac{a_{2} d}{d_{i} + a_{3}})}{1 + exp (a_{1} + \frac{a_{2} d}{d + a_{3}})}$ , and $Δ_{2 d} = \frac{1}{1 + exp (a_{1} + \frac{a_{2} d}{d + a_{3}})}$ . Additionally, the EMAX Mixture model can also be applied to non-normal finite mixture distributions by modifying the density function f (T_di | Δ_1d, Δ_2d, Δ_3d) accordingly.

One limitation of our study is that we assumed the mean and variance of the three normal distributions for gestational age determined by Schwartz et al. from NCDBR fit the new data well. Although these parameters had been used successfully in the past (12,13,14), it is possible but unlikely that the data from the 6 RCTs under consideration are very different from the NCDBR registry. In that case, one possible solution is to allow the model to estimate the mean and variance of the three normal distributions. The model will be more complicated and may have convergence issues. This will be explored in our future work.

6. Acknowledgement

This study was partially supported by NIH Clinical and Translational Science Award UL1TR002366.

8. Appendix

Stan code for EMAX Mixture model

data {
  int<lower=1> K; // number of mixture components
  int<lower=1> N; // number of data points
  real y[N]; // observations
  real<lower=0> dose[N];//treatment
  ordered[K] mu;
  vector<lower=0>[K] sigma;
}
parameters{
  real a11;
  real a12;
  real<lower=0> a13;
  real a21; 
}

model {
  vector[K] theta;
  real beta1; 
  real beta2;
  vector[K] lps;
  a11 ~ normal (0, 2); 
  a12 ~ normal (0, 2); 
  a13 ~ normal (0, 1); 
  a21 ~ normal (0, 2); 

  for (n in 1:N) {
    beta1 = exp(a11+a12*dose[n]/(a13+dose[n]));
    beta2 = exp(a21);
    theta[1] = beta1/(1+beta1+beta2);
    theta[2] = beta2/(1+beta1+beta2);
    theta[3] = 1/(1+beta1+beta2);
    lps=log(theta);
    for (k in 1:K) 
        lps[k] += normal_lpdf(y[n] | mu[k], sigma[k]);
    target += log_sum_exp(lps);
    }
}

Stan code for EMAX logistic model

data {
  int<lower=1> N; // number of data points
  int<lower=0, upper=1> y[N]; // observations
  real<lower=0> dose[N];//treatment
}

parameters{
  real a11;
  real a12;
  real<lower=0> a13;
}

model {
  real theta;
  a11 ~ normal (0, 2); 
  a12 ~ normal (0, 2); 
  a13 ~ normal (0.5, 1); 
  for (n in 1:N) {
    theta = a11+a12*dose[n]/(a13+dose[n]);
    y[n]~ bernoulli_logit(theta);
  }
}

Stan code for independent doses logistic model

data {
  int<lower=1> N; // number of data points
  int<lower=0, upper=1> y[N]; // observations
  matrix[N,6] dose;//6 treatments 
}
parameters{
  vector[6] beta;
 }

model {
  beta ~ normal(0,5);
  y~ bernoulli_logit(dose * beta);
 }

7. References

1.Middleton P, Gomersall JC, Gould JF, Shepherd E, Olsen SF, Makrides M. Omega-3 fatty acid addition during pregnancy. Cochrane Database Syst Rev. 2018; 11 (11) [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Thomas N, Sweeney K, Somayaji V. Meta-analysis of clinical dose-response in a large drug development portfolio. Stat Biopharm Res. 2014;6(4):302–317. [Google Scholar]
3.Altman DG, Royston P. The cost of dichotomising continuous variables. BMJ (Clinical research ed). 2006; 332:1080. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Deyi BA, Kosinski AS, Snapinn SM. Power considerations when a continuous outcome variable is dichotomized. Journal of biopharmaceutical statistics. 1998; 8:337–52. [DOI] [PubMed] [Google Scholar]
5.Peacock JL, et al. Dichotomising continuous data while retaining statistical power using a distributional approach. Statistics in medicine. 2012; 31:3089–103. [DOI] [PubMed] [Google Scholar]
6.Schwartz S, Gelfand A, and Miranda M, “Joint Bayesian Analysis of Birthweight and Censored Gestational Age Using Finite Mixture Models,” Statistics in Medicine. 2010; 29: 1710–1723. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Carlson SE, et al. DHA supplementation and pregnancy outcomes. The American journal of clinical nutrition. 2013; 97:808–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.U.S. Department of Health and Human Services Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER). Adaptive Designs for Clinical Trials of Drugs and Biologics Guidance for Industry. November. 2019.
9.R Core Team(2017). R: a Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing [Google Scholar]
10.Stan Development Team (2017) RStan: the R interface to Stan, version 2.16.1. (Available from http://mc-stan.org.)
11.Stan Development Team: Stan Modeling Language User’s Guide and Reference Manual, Version 2.16.0.. (Available from http://mc-stan.org.)
12.Lei Y, et al. Comparison of Dichotomized and Distributional Approaches in Rare Event Clinical Trial Design: a Fixed Bayesian Design. Journal of applied statistics. 2017; 44(8): 1466–1478 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Yelland LN, et al. Predicting the effect of maternal docosahexaenoic acid (DHA) supplementation to reduce early preterm birth in Australia and the United States using results of within country randomized controlled trials. Prostaglandins Leukot Essent Fatty Acids. 2016. September; 112: 44–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Gajewski BJ, Reese CS, Colombo J, & Carlson S (2016), “Commensurate Priors on a Finite Mixture Model for Incorporating Repository Data in Clinical Trials,” Statistics in Biopharmaceutical Research, 8(2), 151–160.. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Middleton P, Gomersall JC, Gould JF, Shepherd E, Olsen SF, Makrides M. Omega-3 fatty acid addition during pregnancy. Cochrane Database Syst Rev. 2018; 11 (11) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Thomas N, Sweeney K, Somayaji V. Meta-analysis of clinical dose-response in a large drug development portfolio. Stat Biopharm Res. 2014;6(4):302–317. [Google Scholar]

[R3] 3.Altman DG, Royston P. The cost of dichotomising continuous variables. BMJ (Clinical research ed). 2006; 332:1080. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Deyi BA, Kosinski AS, Snapinn SM. Power considerations when a continuous outcome variable is dichotomized. Journal of biopharmaceutical statistics. 1998; 8:337–52. [DOI] [PubMed] [Google Scholar]

[R5] 5.Peacock JL, et al. Dichotomising continuous data while retaining statistical power using a distributional approach. Statistics in medicine. 2012; 31:3089–103. [DOI] [PubMed] [Google Scholar]

[R6] 6.Schwartz S, Gelfand A, and Miranda M, “Joint Bayesian Analysis of Birthweight and Censored Gestational Age Using Finite Mixture Models,” Statistics in Medicine. 2010; 29: 1710–1723. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Carlson SE, et al. DHA supplementation and pregnancy outcomes. The American journal of clinical nutrition. 2013; 97:808–15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.U.S. Department of Health and Human Services Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER). Adaptive Designs for Clinical Trials of Drugs and Biologics Guidance for Industry. November. 2019.

[R9] 9.R Core Team(2017). R: a Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing [Google Scholar]

[R10] 10.Stan Development Team (2017) RStan: the R interface to Stan, version 2.16.1. (Available from http://mc-stan.org.)

[R11] 11.Stan Development Team: Stan Modeling Language User’s Guide and Reference Manual, Version 2.16.0.. (Available from http://mc-stan.org.)

[R12] 12.Lei Y, et al. Comparison of Dichotomized and Distributional Approaches in Rare Event Clinical Trial Design: a Fixed Bayesian Design. Journal of applied statistics. 2017; 44(8): 1466–1478 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Yelland LN, et al. Predicting the effect of maternal docosahexaenoic acid (DHA) supplementation to reduce early preterm birth in Australia and the United States using results of within country randomized controlled trials. Prostaglandins Leukot Essent Fatty Acids. 2016. September; 112: 44–49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Gajewski BJ, Reese CS, Colombo J, & Carlson S (2016), “Commensurate Priors on a Finite Mixture Model for Incorporating Repository Data in Clinical Trials,” Statistics in Biopharmaceutical Research, 8(2), 151–160.. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Bayesian EMAX model with a mixture of normal distributions for dose-response in clinical trials

Fengming Tang

Susan Carlson

Jo Wick

Byron J Gajewski

Abstract

1. Introduction

2. Methods

2.1. Study summary

2.2. Statistical models

2.2.1. Independent doses logistic model

2.2.2. EMAX logistic model

2.2.3. EMAX Mixture model

2.3. Simulation scenarios

Table 1.

Table 2.

2.4. Model calibration

Figure 1.

2.5. Simulations

Table 3.

Table 4.

3. Simulation Results

3.1. Power

Table 5.

Table 6.

3.2. MSE and bias

Table 7.

Table 8.

4. Application to a simulated data set

4.1. Generating the simulated dataset

Table 9.

Table 10.

4.2. Analysis of the simulated data

Figure 2.

5. Conclusion and discussion

6. Acknowledgement

8. Appendix

Stan code for EMAX Mixture model

Stan code for EMAX logistic model

Stan code for independent doses logistic model

7. References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases