Abstract
Aims
To clarify the hypothesis tests associated with the full covariate modelling (FCM) approach in population pharmacokinetic analysis, investigate the potential impact of multiplicity in population pharmacokinetic analysis, and evaluate simultaneous confidence intervals (SCI) as an approach to control multiplicity.
Methods
Clinical trial simulations were performed using a simple one‐compartment pharmacokinetic model. Different numbers of covariates, sample sizes, effect sizes of covariates, and correlations among covariates were explored. The false positive rate (FPR) and power were evaluated.
Results
The FPR for the FCM approach dramatically increases with number of covariates. The chance of incorrectly selecting ≥1 seemingly clinically relevant covariates can be increased from 5% to a 40–70% range for 10–20 covariates. The SCI approach may provide appropriate control of the family‐wise FPR, allowing more appropriate decision making. As a result, the power detecting real effects without incorrectly identifying non‐existing effects can be greatly improved by the SCI approach compared to the approach in current practice. The performance of the SCI approach is driven by the ratio of sample size to number of covariates. The FPR can be controlled at 5% and 10% using the SCI approach when the ratio was ≥20 and 10, respectively.
Conclusion
The FCM approach still lies within the framework of statistical testing, and therefore multiplicity is an issue for this approach. It is imperative to consider multiplicity reporting and adjustments in FCM modelling practice to ensure more appropriate decision making.
Keywords: population analysis < pharmacodynamics, modelling and simulation < pharmacodynamics, pharmacometrics
What is Already Known about this Subject
Full covariate modelling (FCM) is a popular approach for evaluating covariate effects and supporting dose recommendations in drug labels.
The underlying, corresponding hypothesis tests and multiplicity associated with the FCM approach have not been fully discussed.
What this Study Adds
Due to multiplicity, the false positive rate of FCM approach will be inflated as the number of covariates increases.
The simultaneous confidence interval approach may be considered in future practice to provide appropriate control of family‐wise error rate and therefore increase correct‐detection power.
Introduction
Covariate analysis based on population pharmacokinetic (PPK) modelling has been often used to support dose recommendations in drug labels 1, 2, 3. Stepwise covariate modelling is frequently used to build covariate models 4. Full covariate modelling (FCM) is another popular approach for evaluating covariate effects in PPK modelling 5, 6, 7, 8. In the FCM approach, after constructing a base structural population PK model, all candidate covariates are forced into the base model at once to form a so‐called full model. Then the effect of each covariate is evaluated by comparing the 95% confidence interval (CI) of the covariate effect against a certain predetermined threshold or range (based on known PK–pharmacodynamic relationships or a commonly accepted value when the PK– pharmacodynamic relationship is unknown). If the 95% CI is outside the predetermined range, the influence of the covariate is considered clinically relevant (Figure S1). The main, postulated advantage of this approach is that it can provide interpretation on clinical relevance of a covariate effect instead of relying on statistical significance for inference 6, therefore avoiding the difficulties in interpreting P values due to multiplicity associated with stepwise covariate selection 7.
However, the FCM approach interprets clinical relevance of covariates based on confidence intervals, which are often used to carry out statistical hypothesis tests. The close relationship and connection between hypothesis testing and confidence intervals is well established in statistical literature 9. So far, the underlying, corresponding hypothesis tests associated with the FCM approach have not been fully discussed in the pharmacometrics literature.
In addition, the FCM approach simultaneously evaluates multiple individual covariates or scientific questions. It is well known that multiplicity problem [inflated false positive rate (FPR) associated with multiple tests] arises when inferences are made across multiple individual questions or covariates, and the FPR increases with number of statistical tests 10. However, the multiplicity issue associated with the FCM approach in PPK analysis has not been fully investigated. Currently, the confidence intervals used to construct the forest plots in the FCM approach (Figure S1) are often based on individual, univariate t distributions and do not account for multiplicity. Therefore, it is important to characterize the multiplicity issue for the FCM approach and to identify an appropriate method to properly control multiplicity for the FCM approach.
In this paper, we intended to clarify the hypothesis tests associated with the FCM approach, and to investigate the multiplicity issue associated with FCM approach using simulation studies in the context of a simple one‐compartment linear PK model. Many multiple comparison procedures have been developed to control multiplicity for statistical testing 10. Recently, a simultaneous confidence interval (SCI) approach based on multivariate distributions has been proposed for general linear parametric models 11, 12. The SCI approach based on the multivariate t distribution will be evaluated as an alternative approach to control for multiplicity exhibited in full covariate modelling.
Methods
A nonlinear mixed‐effects model is considered to represent the PK data. Suppose there are n individuals in the study. Let yi be the vector of ni observations at sampling times ti = (ti1, ti2, …, tini) for the ith individual. The function f which is always nonlinear in at least some of its parameters, is assumed to describe the behavior of the PK concentrations over time. The statistical model for the PK data can be expressed as follows,
where θi = (θi1, θi2, …, θip)′ is the p vector of individual PK parameters. The random error vector ϵi follows a normal distribution with zero mean and variance–covariance matrix G and the random effects ηi = (ηi1, ηi2, …, ηip)′follows a multivariate normal distribution with mean zero and covariance R. Zi is the covariate matrix for the ith individual and α = (α1, α2, …, αp)′and β = (β1, β2, …, βq)′ are the fixed effects. To facilitate the interpretation of the magnitude of the estimated covariate effects, a multiplicative function is often used to parameterize the FCM in PPK analysis (6; see Supplemental Data).
Underlying hypothesis tests in FCM approach
The determination of the potential clinically relevant covariates in the FCM approach mainly relies on the forest plot where the 95% CIs of individual covariate effects are displayed (Figure S1). If the estimated 95% CI for a covariate effect is completely outside of a predetermined threshold or range (e.g. ±20%), then the covariate is considered clinically relevant. However, if the 95% CI for the covariate effect is completely within the predefined range, the covariate effect is considered clinically irrelevant (notice that it could also be statistically significant in the traditional sense, if zero is not included in the CI). In cases where the 95% CI crosses either one of the predefined limits, there are no sufficient data to determine the clinical relevance. The concept used for determining clinical relevance in the FCM approach can be readily translated into a hypothesis test framework where the null hypothesis (thereafter referred as clinical‐relevance null hypothesis) is that the effect is within a certain predefined range as follows:
(1) |
We reject the null hypothesis H0j (effect within ∆, e.g. ∆ = 20%) if
or
Secondly, the confidence interval method used in the forest plot can readily be used to test if there is any statistically significant effect for an individual covariate included in the full model:
(2) |
Thus, the appropriate t statistic is
The null hypothesis (thereafter referred as statistical‐significance null hypothesis) will be rejected if and not rejected otherwise.
Constructing simultaneous confidence intervals and adjustment of multiplicity
As mentioned before, the family‐wise FPR might be substantially inflated under the multiple comparisons problem underpinning covariate selection. Bonferroni procedure, which controls the family‐wise FPR by performing univariate t test at a significant level α/q (q representing the number of elementary null hypothesis), can control the family‐wise FPR at level < α. The Bonferroni test, however, tends to be too conservative, and may reduce power to identify actually relevant covariates 10. The SCI approach, which utilizes the correlation structure involved in the joint distribution of the univariate t test statistics, may allow more efficient control of family‐wise error compared to the Bonferroni test, and may be more powerful in detecting true covariate effects when possible correlations exist between test statistics 10. Formally, let c1 − α denote the critical value derived from the multivariate t distribution of at the prespecified significance level α by . We then reject the null hypothesis H0j if . Simultaneous confidence intervals for β = (β1, β2, …, βq) with joint coverage rate 1– α can be constructed as follows,
Simulations
The purpose of the simulations was to explore the FPR and power to detect relevant covariate for different types of hypothesis tests in the full covariate PPK modelling approach. A one‐compartment PK model with first‐order oral absorption and first‐order elimination was used in the simulations. Two sets of simulations were performed to evaluate the statistical‐significance null hypothesis and clinical‐relevance null hypothesis, separately. The first set of simulations was used to evaluate FPR under the statistical‐significance null hypothesis while the second set of simulations was performed to evaluate the maximum FPR under the clinical‐relevance null hypothesis where the effect of covariates is ≤Δ when the true effect is Δ. We used a threshold (Δ) of 20%, a commonly used conservative threshold for assessing potentially clinical relevance in population PK modelling when the actual threshold for clinical relevance based on exposure‐response analysis is not available. It should be mentioned that this threshold does not necessarily translate to decisions regarding dosing recommendations on drug labels. Since maximum FPR is investigated, the selection of the threshold does not affect the simulation results.
Within each set of simulations, multiple scenarios were simulated. The number of covariates in the simulation was set to 5, 10, 20 or 40, and the common correlation among the covariate variables were set to 0 or 0.5. For the first set of simulations, all but one covariate had no effect on drug clearance. Four different effect sizes were considered for the only covariate with non‐zero effect, representing a weak (10%), borderline (20%), median (40%) or large (80%) effect. The 80% effect provided maximum power in this simulation setting. For the second set of simulations, one variable out of the simulated covariates (5, 10, 20 or 40) had a 40% or 80% covariate effect, but all the other covariates had an effect of 20% on clearance. In addition, four different sample sizes (100, 200, 400 and 800) were evaluated in the simulations. In total, 128 and 64 different simulation scenarios were created for the first and second sets of simulations, respectively.
For each simulation scenario, 10 000 PK datasets were simulated using R version 3.2.3 13. Details of the simulations are in Supplemental Simulation Methods.
Determination of error rate and power
The statistical significance of an effect being >20% in absolute value for an individual covariate was tested using the t test for β = 0.2 at the (one‐sided) significance level of 0.05. The statistical significance for an effect being different from 0 for an individual covariate was tested using the t test for β = 0 at the (two‐sided) significance level of 0.05. The critical value for the univariate t test was calculated as the upper (1 − α)% or the (1 − α/2)% quantile of the univariate t distribution under the clinical‐relevance hypothesis (1) and the statistical‐significance hypothesis (2), respectively while the critical value for the joint multivariate t test was calculated by . Depending on the types of hypothesis tests (βj = 0, or |βj| ≤ 20%), the FPR was calculated as the fraction of simulation replicates where at least one significant variable was detected among the variables without an effect (or at least one variable with a > 20% effect among the variables with an effect = 20%). Because false identification of a nonexisting covariate effect in a model can lead to unnecessary dosing adjustments, the power in this paper is specifically defined as the power to detect the true significant covariate without falsely identifying any other nonsignificant variables (hereafter referred as correct‐detection power), and was calculated as the fraction of simulation replicates where the variable with an existing effect (or effect >20%, respectively) can be correctly detected without false identification of any variables without any effect (or effect ≤20%, respectively).
Maximum size of false positive effect
The maximum value of the estimated magnitude of the incorrectly identified covariate effects under either the statistical‐significance null hypothesis or the clinical‐relevance null hypothesis was calculated for each simulation scenario. This is referred as maximum size of false positive effect in this manuscript.
Application
In this section, we demonstrate the application of the SCI approach to evaluate the presence of covariate effects on the PK of an investigational drug and compare the results with those using conventional FCM approach based on univariate t distributions. Partial data from pooled clinical studies for an investigational drug in health subjects and patients were used. The absorption of the drug after oral administration is rapid with a mean time to maximum serum concentration around 1 h. The mean absolute bioavailability after single‐dose administration is approximately 30% due to extensive first‐pass metabolism. The drug is extensively metabolized (about 97% of the parent compound) mainly via Phase 2 conjugation pathways, and only a small amount is metabolized by Phase 1 oxidative pathways. Approximately 70% of the drug is excreted in urine in the conjugated form. A total of 3% of drug was excreted in urine as unchanged drug. The terminal half‐life is on average 4 hours after oral administration.
The data consist of a total of 10 114 PK samples from 1474 subjects. Ten covariates were included in the analysis: age, serum creatinine level, body weight, total bilirubin, total protein, formulation (capsule vs. tablet), race (white vs. others), sex (male vs. female), health status (patients vs healthy subjects), and alanine transaminase. The PK of the investigational drug followed a two‐compartment linear PK model with a first‐order absorption and first‐order elimination. Interindividual variability was included on the apparent clearance (CL/F) and volume of distribution for central compartment using an exponential error model and with random effects being assumed normally distributed. The model fitting was performed in NONMEM.
For illustration purpose, we only evaluate the covariate effects on CL/F. To construct the full covariate model, all the 10 candidate covariates were included in the model on CL/F at the same time. For continuous covariates, we estimated the covariate effect from the 5th and 95th percentiles to median (the value of the typical reference subject) 5. The variance–covariance matrix of the parameter estimates from NONMEM was then output to approximate the joint multivariate t distribution for the SCI approach and calculate the simultaneous confidence intervals. A sample R code is included in Supplemental Data.
Results
The convergence of both the base and full covariate models was achieved for the majority of simulation replicates (e.g. > 99% convergence).
FPR for FCM approach without adjusting for multiplicity
The probability of erroneously identifying at least one significant covariate was well above the 5% nominal significant level when evaluating more than one covariates in the FCM approach without adjusting for multiplicity with univariate t distributions (Figure 1A). The FPR clearly increased with the number of covariates. The relationship between error rate and number of covariates was similar to the predicted values based on the theoretical formula for inflation of FPR under independent statistical tests 10 when sample size was 800. The inflation of FPR was more severe as the sample size decreased. At least one falsely significant covariate would be almost certainly identified when the number of candidate covariates becomes large (e.g. ≥40). The correlation among the covariates does not appear to markedly influence the FPR as similar results were observed for independent and correlated covariates. Similar pattern of inflation of the maximum FPR was also observed for testing the clinical‐relevance null hypothesis that the absolute effect is ≤20% when the actual effects were 20% (Figure 1B).
Figure 1.
False positive rate by number of covariates based on unadjusted univariate t tests under (A) the statistical‐significance null hypothesis where there is no effect (βj = 0) [left panel] and (B) the clinical‐relevance null hypothesis where the effect is not greater than 20% (βj ≤ 0.2) when the true effect is 20% [right panel]. The horizontal line is the expected false positive rate of 5% if there is no multiplicity issue when a single covariate is tested. Solid dots represent data from simulations where no correlation among covariates, while open circles represent data from simulations where correlation = 0.5. The blue dashed curves represent the false positive rate calculated using the theoretical equation for multiplicity [1 – (1 – α)q, where q is number of covariates]
Impact of inflation of FPR on the correct‐detection power
Figure 2 shows the correct‐detection power to detect the single variable with a real effect without incorrectly identifying any other covariates without an effect. As a consequence of the inflated FPR, the correct‐detection power markedly decreased with number of covariates. When the number of covariates was five, the correct‐detection power of the univariate t tests appeared to be in an acceptable range (e.g. above 0.75) for detecting a 40% or 80% effect. However, when the number of covariates increased to 10, the correct‐detection power was markedly reduced to the 0.5–0.6 range. The correct‐detection power eventually was reduced to close to 0 when the number of covariates was 40 regardless of sample size and effect size. As expected, the correct‐detection power was higher for larger sample sizes and larger effect sizes. Similar impact of the false positives on the correct‐detection power was also observed for simulations with correlated covariates, and for detecting effects >20% for the clinical‐relevance null hypothesis where the effect was 20% (Figure 2B).
Figure 2.
Correct‐detection power by number of covariates based on unadjusted univariate t tests under (A) the statistical‐significance null hypothesis where there is no effect (βj = 0) [left panel] and (B) the clinical‐relevance null hypothesis where the effect is not greater than 20% (βj ≤ 0.2) [right panel] when the true effect is 20%. Solid dots represent data from simulations where no correlation among covariates, while open circles represent data from simulations where correlation = 0.5
Estimated maximum size of false positive effects
We also investigated the potential maximum magnitude (size) of false positive effects (Figure 3). The maximum size of false positive decreased with larger sample size and smaller number of covariates. The correlation among the covariates appears to increase the size of the false positive effects. When the sample size is 400 or 800, the size of false positive relative to the true effect is reasonably small (within 20%). However, if the number of subjects was 100 or 200, the maximum size of false positive (relative to the true effect) can be substantial (> 20%), and could reach 70% in extreme cases where 40 covariates were evaluated. A similar pattern was observed when testing an effect = 20% (Figure 3B). When the true effect was 20%, a 40–70% false positive effect could be identified (i.e. with lower bound of confidence interval >’20%) under current simulation conditions.
Figure 3.
Maximum size of false positive based on unadjusted univariate t tests under (A) the statistical‐significance null hypothesis where there is no effect (βj = 0) [left panel] and (B) the clinical‐relevance null hypothesis where the effect is not greater than 20% (βj ≤ 0.2) when the true effect is 20% [right panel]. The horizontal dashed line represents the true effect in the simulations. Solid dots represent data from simulations where no correlation among covariates, while open circles represent data from simulations where correlation = 0.5
Adjustment of multiplicity using SCI based on multivariate t distribution
Figure S2 shows the performance of the SCI approach on control of multiplicity for the full covariate PPK modelling approach. Compared to the univariate t distribution approach (Figure 1), the SCI approach markedly reduced the FPR. In particular, when the number of covariates was small (e.g. 5 and 10) or the sample size was large (e.g. 400 and 800), the adjusted FPR can be maintained close to the prespecified 5%. When the number of covariates was 20 or 40, substantial inflation of FPR can be still observed when the sample size was small (n = 100 or 200). Further evaluation revealed that the performance of the SCI approach appears to be related to the ratio of sample size to number of covariates (Figure 4). The SCI appeared to control the FPR very well (i.e. at 5%) when the ratio was ≥20. The SCI could still control the FPR at somewhat reasonable level (<0.1) at a ratio of 10. However, when the ratio was <10, although the performance of the SCI approach was still much better than that of the unadjusted univariate t distribution approach (Figure 4 and Figure S5), a steep relationship between FPR and the ratio was observed within this range.
Figure 4.
Performance of control of false positive rate by ratio of number of subjects to number of covariates using simultaneous confidence interval under (A) the statistical‐significance null hypothesis where there is no effect (βj = 0) [left panel] and (B) the clinical‐relevance null hypothesis where the effect is not greater than 20% (βj ≤ 0.2) when the true effect is 20% [right panel]. The horizontal dashed line is the expected false positive rate of 5% if there is no multiplicity issue. The vertical dashed line represents a ratio of 20. Solid dots represent data from simulations where no correlation among covariates, while open circles represent data from simulations where correlation = 0.5
Comparison of correct‐detection power with vs. without adjusting for multiplicity
The correct‐detection power was generally much higher after adjusting for multiplicity using the SCI approach when compared to that without adjusting for multiplicity using univariate t distributions (Figure 5). This is particularly true when the effect size is relatively large (e.g. βk = 0.4 or 0.8), where the correct‐detection power for the SCI approach was above 0.75 in the majority of simulation scenarios. When the effect size was small (e.g. βk = 0.1 or 0.2), the unadjusted correct‐detection power could be slightly higher than the adjusted power. In addition, the improvement in correct‐detection power (adjusted vs. unadjusted) was greater with increasing number of covariates.
Figure 5.
Comparison of correct‐detection powers to detect a real effect without falsely identifying a nonsignificant variable with vs. without adjusting for multiplicity. (A) Power to detect a 10, 20, 40 or 80% effect for the statistical‐significance null hypothesis where there is no effect (βj = 0) [left panel] and (B) power to detect a 40% or 80% effect for the clinical‐relevance null hypothesis where the effect is not greater than 20% (βj ≤ 0.2) when the true effect is 20% [right panel]. Solid dots represent data from simulations where no correlation among covariates, while open circles represent data from simulations where correlation = 0.5
Application
The parameter estimates for the covariate effects on the clearance of the PPK model are listed in Table 1. Based on the univariate t distribution approach, serum creatinine, alanine transaminase, and health status did not have any significant effect, whereas age, body weight, total bilirubin, total protein, race were significant different from 0 (P < 0.001). The effects of formulation and sex appear marginal (P < 0.1). After adjusting for multiplicity using the SCI approach, the effects of formulation and sex became not statistically significant.
Table 1.
Parameter estimates for the covariates in the population pharmacokinetic model
Parameter | Estimate | SE | t value | Unadjusted P value | Adjusted P value |
---|---|---|---|---|---|
Covariates on apparent clearance | |||||
Age (year) | −0.142 | 0.0325 | −4.36 | <0.0001 | 0.0001 |
Serum creatinine (μmol l –1 ) | 0.0118 | 0.0255 | 0.463 | 0.64 | 1.00 |
Body weight (kg) | 0.667 | 0.0675 | 9.88 | <0.0001 | <0.0001 |
total bilirubin (μmol l –1 ) | −0.158 | 0.0332 | −4.77 | <0.0001 | <0.0001 |
total protein (g l –1 ) | −0.678 | 0.178 | −3.81 | 0.0001 | 0.0014 |
Formulation (capsule vs. tablet) | 0.0319 | 0.0164 | 1.94 | 0.05 | 0.40 |
Race (white vs. others) | 0.132 | 0.029 | 4.56 | <0.0001 | <0.0001 |
Sex (men vs. women) | −0.0546 | 0.0303 | −1.8 | 0.07 | 0.50 |
Health status (patients vs. healthy) | −0.0473 | 0.0359 | −1.32 | 0.19 | 0.85 |
Alanine transaminase (UI dl –1 ) | −0.0269 | 0.0235 | −1.14 | 0.25 | 0.93 |
SE, standard error; Unadjusted P value was calculated based on univariate t distributions, while Adjusted P value was calculated based simultaneous confidence interval approach. t value is calculated as Estimate/SE
Figures S3 and 4 show the forest plots based on the univariate t distribution approach and the SCI approach, respectively. For illustration purposes, we used ±20% as the threshold range. It appears that for both approaches, the effects of age, serum creatinine, total protein, formulation, sex and health and their 95% CI are within the predefined 20% range. In addition, both univariate t distribution approach and the SCI approach show that the 95%CI for both total bilirubin and race crossed the 20% limits, suggesting that there may not be sufficient information to make a conclusion whether these effects are significantly outside of the 20% range. Both univariate t distribution approach and the SCI approach demonstrated that the body weight effect significantly different from 0 (CIs did not include 100%). The univariate t distribution approach suggests that body weight may have an effect significantly greater than 20% as the 95% CI of the 95th percentile weight is completely outside the 20% limit. However, after adjusting for the multiplicity, the SCI approach suggested that there may be no sufficient data to make the conclusion. Overall, both approaches suggest a small existing body weight effect, but this effect could be no greater than 20%.
Discussion
While stepwise covariate modelling dominates variable selection in regression analysis 12 and nonlinear mixed‐effects PPK modelling 4, FCM is becoming another commonly used approach in PPK analysis to identify influential covariates and to justify dose recommendations for special populations in drug labels 5, 6, 7, 8. Although the multiplicity issues for stepwise variable selection is well recognized 12, the underlying statistical hypothesis tests and the associated multiplicity for the FCM approach have not been well characterized and recognized in the present clinical pharmacology and pharmacometrics literature. Therefore, currently there is no formal regulatory guidance on control or report of multiplicity for FCM‐based covariate analysis. Since covariate analysis for PK data is often used to inform dose selection and adjustments in drug labels, a thorough investigation of multiplicity and its control in PPK analysis could provide helpful data for future practice and regulatory guidance in this area.
Our manuscript clarifies the underlying hypothesis tests associated with the FCM approach and demonstrates that the FPR dramatically increased due to multiplicity that arises from evaluating multiple covariate evaluations. The FPR increased with number of covariates included in the full covariate PPK models, with the probability of committing at least one false positive error being approximately 40–80% when 10 to 20 covariates were included in the analysis. This was the case for both the clinical‐relevance null hypothesis and the statistical‐significance null hypothesis. For FCM, evaluation of 10 to 20 covariates in an analysis is relatively common 5, 6, 7 although sometimes > 30 covariate evaluations could be included in an analysis 8. Our simulations also showed that because of inflated FPR, the correct‐detection power of the FCM approach to detect a real effect without committing a false positive error is low if the multiplicity is not appropriately controlled. In the present simulation, only 20–60% correct‐detection power could be achieved when evaluating 10–20 covariates. Since the current literature for FCM in population PK usually does not adjust for multiplicity, it is likely that some of the published findings may represent false positive conclusions. In addition, the estimated size of false positive covariate effects could be substantial (>20%) when sample size was relatively small (100–200 subjects). It is worth to pointing out that the estimation of the magnitude of the false positive effects in our simulations may represent the best‐case scenario since a very intensive PK sampling scheme (12 samples per subject) was implemented to reduce estimation error in the simulation. In real‐world applications, sparse PK sampling designs in late stage clinical development could lead to much larger estimation error, and therefore, larger estimated magnitude of false positive findings.
More importantly, we confirmed that, for PPK modelling, the SCI approach based on multivariate t distributions can control the family‐wise FPR close to predefined nominal levels (e.g. 5%) and markedly increase the correct‐detection power to an acceptable level, particularly for detecting an effect >20%. When the effect size was small (e.g. <20%), the unadjusted correct‐detection power based on univariate t distribution could be slightly higher than the SCI‐adjusted power in (Figure 5). Since the covariate analysis in PPK modelling is mainly intended to detect potential clinically relevant effects (most likely >20%) to identify the opportunities for dose adjustments, the SCI approach may be more appropriate approach than the univariate t distribution approach utilized in current practice. The univariate t distribution approach, however, may be used as a sensitivity analysis tool to provide robust interpretation of the results, and identify potential small covariate effects if interested.
In addition, the performance of the SCI approach was related to the ratio of sample size to number of covariates. The FPR could be controlled at 5% when the ratio was ≥20, and below 10% for a ratio of 10. When the ratio was lower than 10, even though SCI could not control the false positive rate to nominal 5%, its performance was still much better compared to unadjusted univariate t distribution approach (Figure 4 and Figure S5). Therefore, scientists need to be mindful about the ratio of number of subjects to number of covariates when conducting the FCM analysis and should use prior knowledge to narrow down the candidate covariate list to the most relevant research questions before analysis. This is particularly true in case of small sample size.
The theory for simultaneous confidence intervals has been widely applied to linear parametric and semiparametric statistical models 10, and was recommended to account for multiplicity in linear regression analysis 12. This approach only requires that the parameter estimates following an asymptotic multivariate normal distribution, and the use of a consistent estimate of its covariance matrix 10. Both are readily available from NONMEM (and most model fitting software used with PPK models) outputs. For illustration, we have implemented the SCI approach based on NONMEM runs for a set of real‐world PK data and provided the R script of the implementation in the appendix of the manuscript.
Resampling procedures such as bootstrapping, sampling importance resampling 14 and randomization tests 15 may be used to control for multiplicity to some extent 16, particularly when the theoretical distribution of a test statistic of interest is complex or unknown, or when the sample size is insufficient for standard asymptotic statistical inference. The performance of these methods relies on the number of re‐samples. In practice, large sampling numbers are required to obtain an accurate approximation of critical values for the construction of confidence intervals even for the simple regression case. In the example of the mouse growth data of Westfall 16, 1 000 000 re‐sampling replicates were needed. However, considering that nonlinear functions and high dimensional integration that are involved in PPK modelling and nonlinear mixed‐effects modelling, it may be too computational‐intensive and time‐consuming to carry out such a large number of model fittings in practice. Therefore, relatively, the SCI approach may be more time efficient, computationally feasible and practically appealing.
In summary, the FCM approach still lies within the framework of statistical testing, and therefore multiplicity is an issue for this approach as the confidence intervals for individual covariates currently used to construct forest plots are often based on univariate t distributions. Due to the dramatic inflation of the FPR and the potentially substantial estimated size of the false positive effects, it is imperative to consider multiplicity reporting and adjustments in full covariate PPK modelling practice to ensure more appropriate decision making. The SCI approach based on approximated multivariate distributions may be considered in future practice to provide appropriate control of family‐wise error rate and therefore increase correct‐detection power. Scientists need to be selective and use prior information to identify the most relevant research questions or candidate covariates before analysis. Finally, collecting PK in more subjects could help to reduce not only the chance but also the estimated size of false positive effects.
Competing Interests
There are no competing interests to declare.
Y.Y. is supported by the National Science Foundation of China (NSFC) (No. 11671375).
Supporting information
Figure S1 Clarification of the hypothesis tests in the full covariate modelling. The dotted vertical lines are plotted at 80% and 120%, representing ±20% the covariate effect compared to a typical reference subject (the solid vertical line at 100%). Test 1 is the hypothesis test where the null hypothesis is that the effect is within 20%, while Test 2 is the hypothesis test where the null hypothesis is that there is no any effect
Figure S2 Performance of control of false positive rate by number of covariates using simultaneous confidence interval under (a) the statistical‐significance null hypothesis where there is no effect (βj = 0) and (b) the clinical‐relevance null hypothesis where the effect is not greater than 20% (βj ≤ 0.2) when the true effect is 20%. The horizontal dashed line is the expected false positive rate of 5% if there is no multiplicity issue. Solid dots represent data from simulations where no correlation among covariates, while open circles represent data from simulations where correlation = 0.3
Figure S3 Covariate effects on clearance of an investigational drug (estimated using the full model approach) without control of multiplicity (ie, univariate t distribution). Confidence intervals are estimated by
Figure S4 Covariate effects on clearance of an investigational drug (estimated using the full model approach) after control of multiplicity using the simultaneous confidence interval approach
Figure S5 Performance of control of false positive rate by ratio of number of subjects to number of covariates using univariate t distribution approach under (a) the statistical‐significance null hypothesis where there is no effect (βj = 0) [left panel] and (b) the clinical‐relevance null hypothesis where the effect is not greater than 20% (βj ≤ 0.2) when the true effect is 20% [right panel]. The horizontal dashed line is the expected false positive rate of 5% if there is no multiplicity issue. The vertical dashed line represents a ratio of 20. Solid dots represent data from simulations where no correlation among covariates, while open circles represent data from simulations where correlation = 0.5
Xu, X. S. , Yuan, M. , Zhu, H. , Yang, Y. , Wang, H. , Zhou, H. , Xu, J. , Zhang, L. , and Pinheiro, J. (2018) Full covariate modelling approach in population pharmacokinetics: understanding the underlying hypothesis tests and implications of multiplicity. Br J Clin Pharmacol, 84: 1525–1534. doi: 10.1111/bcp.13577.
References
- 1. Duan JZ. Applications of population pharmacokinetics in current drug labelling. J Clin Pharm Ther 2007; 32: 57–79. [DOI] [PubMed] [Google Scholar]
- 2. Joerger M. Covariate pharmacokinetic model building in oncology and its potential clinical relevance. AAPS J 2012; 14: 119–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Menon‐Andersen D, Yu B, Madabushi R, Bhattaram V, Hao W, Uppoor RS, et al Essential pharmacokinetic information for drug dosage decisions: a concise visual presentation in the drug label. Clin Pharmacol Ther 2011; 90: 471–474. [DOI] [PubMed] [Google Scholar]
- 4. Jonsson EN, Karlsson MO. Automated covariate model building within NONMEM. Pharm Res 1998; 15: 1463–1468. [DOI] [PubMed] [Google Scholar]
- 5. Feng Y, Masson E, Dai D, Parker SM, Berman D, Roy A. Model‐based clinical pharmacology profiling of ipilimumab in patients with advanced melanoma. Br J Clin Pharmacol 2014; 78: 106–117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Gastonguay MR. Full covariate models as an alternative to methods relying on statistical significance for inferences about covariate effects: a review of methodology and 42 case studies. Twentieth Meeting, Population Approach Group in Europe; 2011 Jun 7–10; Athens [online]. Available at https://www.page-meeting.org/pdf_assets/1694-GastonguayPAGE2011.pdf (last accessed 25 April 2018).
- 7. Hu C, Zhang J, Zhou H. Confirmatory analysis for phase III population pharmacokinetics. Pharm Stat 2011; 10: 14–26. [DOI] [PubMed] [Google Scholar]
- 8. Hu C, Zhou H. An improved approach for confirmatory phase III population pharmacokinetic analysis. J Clin Pharmacol 2008; 48: 812–822. [DOI] [PubMed] [Google Scholar]
- 9. Lehmann EL, Romano JP. Testing Statistical Hypotheses. New York: Springer Science & Business Media, 2006. [Google Scholar]
- 10. Bretz F, Hothorn T, Westfall PH. Multiple Comparisons using R. Boca Raton, FL, USA: CRC Press, 2011. [Google Scholar]
- 11. Hothorn T, Bretz F, Westfall P. Simultaneous inference in general parametric models. Biom J 2008; 50: 346–363. [DOI] [PubMed] [Google Scholar]
- 12. Harrell F. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. New York: Springer, 2015. [Google Scholar]
- 13. R Development Core Team . R: A Language and Environment for Statistical Computing (www.r‐project.org). Vienna, Austria: R Foundation for Statistical Computing, 2008. [Google Scholar]
- 14. Dosne AG, Bergstrand M, Harling K, Karlsson MO. Improving the estimation of parameter uncertainty distributions in nonlinear mixed effects models using sampling importance resampling. J Pharmacokinet Pharmacodyn 2016; 43: 583–596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Wahlby U, Jonsson EN, Karlsson MO. Assessment of actual significance levels for covariate effects in NONMEM. J Pharmacokinet Pharmacodyn 2001;28: 231–252. PubMed PMID: 11468939. [DOI] [PubMed] [Google Scholar]
- 16. Westfall PH. On using the bootstrap for multiple comparisons. J Biopharm Stat 2011; 21: 1187–1205. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1 Clarification of the hypothesis tests in the full covariate modelling. The dotted vertical lines are plotted at 80% and 120%, representing ±20% the covariate effect compared to a typical reference subject (the solid vertical line at 100%). Test 1 is the hypothesis test where the null hypothesis is that the effect is within 20%, while Test 2 is the hypothesis test where the null hypothesis is that there is no any effect
Figure S2 Performance of control of false positive rate by number of covariates using simultaneous confidence interval under (a) the statistical‐significance null hypothesis where there is no effect (βj = 0) and (b) the clinical‐relevance null hypothesis where the effect is not greater than 20% (βj ≤ 0.2) when the true effect is 20%. The horizontal dashed line is the expected false positive rate of 5% if there is no multiplicity issue. Solid dots represent data from simulations where no correlation among covariates, while open circles represent data from simulations where correlation = 0.3
Figure S3 Covariate effects on clearance of an investigational drug (estimated using the full model approach) without control of multiplicity (ie, univariate t distribution). Confidence intervals are estimated by
Figure S4 Covariate effects on clearance of an investigational drug (estimated using the full model approach) after control of multiplicity using the simultaneous confidence interval approach
Figure S5 Performance of control of false positive rate by ratio of number of subjects to number of covariates using univariate t distribution approach under (a) the statistical‐significance null hypothesis where there is no effect (βj = 0) [left panel] and (b) the clinical‐relevance null hypothesis where the effect is not greater than 20% (βj ≤ 0.2) when the true effect is 20% [right panel]. The horizontal dashed line is the expected false positive rate of 5% if there is no multiplicity issue. The vertical dashed line represents a ratio of 20. Solid dots represent data from simulations where no correlation among covariates, while open circles represent data from simulations where correlation = 0.5