Abstract
The Mayo Clinical Score is used in clinical trials to describe the clinical status of patients with ulcerative colitis (UC). It comprises four subscores: rectal bleeding (RB), stool frequency (SF), physician's global assessment, and endoscopy (ENDO). According to recent US Food and Drug Administration guidelines (Ulcerative colitis: developing drugs for treatment, Guidance Document, https://www.fda.gov/regulatory‐information/s. 2022), clinical response and remission should be based on modified Mayo Score (mMS) relying on RB, SF, and ENDO. Typically, ENDO is performed at the beginning and end of each phase, whereas RB and SF are more frequently available. Item response theory (IRT) models allow the shared information to be used for prediction of all subscores at each observation time; therefore, it leverages information from RB and SF to predict ENDO. A UC disease IRT model was developed based on four etrolizumab phase III studies to describe the longitudinal mMS subscores, placebo response, and remission at the end of induction and maintenance. For each subscore, a bounded integer model was developed. The placebo response was characterized by a mono‐exponential function acting on all mMS subscores similarly. The final model reliably predicted longitudinal mMS data. In addition, remission was well‐predicted by the model, with only 5% overprediction at the end of induction and 3% underprediction at the end of maintenance. External evaluation of the final model using placebo arms from five different studies indicated adequate performance for both longitudinal mMS subscores and remission status. These results suggest utility of the current disease model for informed decision making in UC clinical development, such as assisting future clinical trial designs and evaluations.
Study Highlights.
WHAT IS THE CURRENT KNOWLEDGE ON THE TOPIC?
To our knowledge, this is the first disease model developed that uses item response theory with bounded integer methodology to assess placebo effect and predict remission of patients with ulcerative colitis (UC) based on the longitudinal modified Mayo Score (mMS).
WHAT QUESTION DID THIS STUDY ADDRESS?
Clinical response and remission in patients with UC rely on all three mMS subscores being available, whereas this disease model reliably predicts longitudinal mMS and remission at the end of induction and maintenance regardless of partially complete subscores or missing records.
WHAT DOES THIS STUDY ADD TO OUR KNOWLEDGE?
This disease model structure (Item response theory‐bounded integer [IRT‐BI]) is a valid platform for longitudinally predicting the clinical disease status of patients with UC and could improve model informed drug development for UC treatment by quantifying the placebo effect versus drug response.
HOW MIGHT THIS CHANGE CLINICAL PHARMACOLOGY OR TRANSLATIONAL SCIENCE?
This IRT‐BI disease model quantitatively assesses clinical status of patients receiving placebo and active treatment in order to better inform current clinical trial interpretation and/or future clinical trial designs.
INTRODUCTION
Ulcerative colitis (UC) is a type of chronic inflammatory bowel disease (IBD) affecting the colon in various patterns. Mucosal inflammation, ulcers, diarrhea, abdominal pain, and rectal bleeding (RB) are all possible UC symptoms; it may also cause severe bloody diarrhea and toxic megacolon leading to surgery. 1 , 2 , 3 , 4 , 5 Dysregulation of the mucosal immune system in response to environmental factors, such as commensal microbiota, plays an important role in the pathogenesis of UC. 1 , 2 , 4 , 5 Currently, there is no cure for UC and treatments focus on symptom management and preventing disease progression.
Clinical trials in UC come with substantial challenges. Endoscopies (ENDOs) for efficacy assessment are sparsely performed. As such, patient‐reported outcomes need to be used as secondary or primary end points. Historically high and variable placebo response rates 6 , 7 , 8 have been observed analyzing placebo‐controlled randomized clinical trials in UC, which make it difficult and challenging to accurately predict future treatment‐related responses. Quantitative insight on patients' remission status and placebo response is valuable for clinical trial designs and interpretation.
The Mayo Clinical Score (MCS) is an ordered categorical variable that was widely used in trials to describe the clinical status of patients with UC. It is built as a composite score, comprising four subscores, each ranging from 0 to 3: RB, stool frequency (SF), physician's global assessment (PGA), and ENDO subscores (Table S1). Recently, the US Food and Drug Administration (FDA) released a new guideline which suggests to exclude the more subjective PGA subscore for key efficacy measures. 9 The modified Mayo Score (mMS) consists of RB, SF, and ENDO. As such, derived efficacy end points, such as clinical response and remission rely on RB, SF, and ENDO scores being available. ENDO is typically performed at the beginning and the end of each study phase, whereas RB and SF are more frequently available. An analysis method that leverages the frequently available information of RB and SF, while also utilizing the objective but less frequent ENDO, could be an effective approach. A longitudinal model based on individual scores has previously been proposed using a proportional odds model. 10 Here, we propose a more parsimonious approach based on a bounded integer (BI) model in combination with item response theory (IRT) to describe all mMS subscores simultaneously. The IRT models allow the shared information to be used for predictions of all subscores and derived end points like clinical response and remission at each observation time, 11 , 12 regardless of when observations are available. Therefore, IRT methodology was used in the development of a UC disease model to predict longitudinal mMS subscores and remission at the end of induction and maintenance phases in patients in order to improve model informed drug development for UC treatment.
METHODS
Analysis dataset
Data from four randomized, double‐blind, placebo‐controlled, multicenter phase III clinical trials for etrolizumab in patients with moderately‐to‐severely active UC was used in model development 13 , 14 , 15 : HIBISCUS I/HIBISCUS II (NCT02163759/NCT02171429), HICKORY (NCT02100696), and LAUREL (NCT02165215). Patients who switched treatment from active to placebo during the studies were excluded. Study designs of these clinical trials are presented in Figure S1. All clinical trial information can be accessed through https://clinicaltrials.gov/. All trials were approved by the institutional review board or independent ethics committee. All patients provided written informed consent.
External evaluation dataset
Available longitudinal, patient‐level placebo arm data from clinical trials for UC was extracted from the TransCelerate database. Data were pooled from five randomized, double‐blind, placebo‐controlled, multicenter phase II and III clinical trials for moderately to severely active UC 16 , 17 , 18 , 19 : NCT00385736, NCT00408629, NCT00410410, NCT00787202, and NCT00853099. All clinical trial information can be accessed through https://clinicaltrials.gov/.
IRT model structure
Each mMS subscore was described by a BI model. 20 The BI model for each subscore consisted of a typical value (MEAN) and a standard deviation (SD) parameter. A shared treatment response acting equally on all subscores was added, thereby implicitly linking the scores by a latent disease status, constituting the IRT model. Baseline population and individual SF, RB, and ENDO were independently described and estimated in the model. Parameters related to time‐dependent trajectories of response (placebo or active treatment) were shared across all subscores; interindividual variability (IIV) for SD was assumed the same across subscores for parsimony. Illustration of the IRT‐BI model is presented in Figure 1 and Figure S2. The latent disease model was described using Equation 1, maximum response for placebo and active treatment was estimated and shared across all three subscores. The model code is provided in Data S1.
(1) |
R MAX, maximal placebo response or maximal active treatment response; T50, time to achieve 50% of the maximal response (placebo or active treatment).
FIGURE 1.
IRT structural model illustration. The IRT model is subdivided in item‐specific parameters and a latent disease model. BASE, baseline subscore; ENDO, endoscopy; IRT, item response theory; RB, rectal bleeding; SD, standard deviation; SF, stool frequency.
Assumptions related to the development of the IRT model were: (a) unidimensionality because one latent trait was measured by the items in a scale, (b) local or conditional independence – aside from a relationship with a common latent variable, there should be no association among the item responses. 21
PGA model structure
In the etrolizumab phase III clinical trials design, assessment of clinical response was based on total MCS subscores, including PGA score, and only patients who achieved clinical response (MCS with ≥3‐point decrease and ≥30% reduction from baseline as well as ≥1‐point decrease in RB subscore or an absolute RB of 0 or 1) at the end of induction were eligible for randomization in the maintenance phase. Therefore, a similarly structured BI model was also estimated for the PGA subscore to allow for evaluation of clinical responders at the end of induction in simulation‐based diagnostics. All parameters from the PGA subscore model (i.e., MEAN, SD, R max, and T50) were separated from the IRT model to prevent any influence/bias of PGA subscores on the IRT model estimates.
Model development
Placebo data were used in the development of a base model and active treatment data were added later to: (a) increase the overall number of observations and (b) increase the number of observed remissions, for model evaluation purposes. Upon addition of active treatment data, an active treatment response was estimated on top of the placebo response. Development of a more mechanistic treatment effect for etrolizumab was found beyond the scope of this paper.
A stepwise covariate model (SCM) building procedure based on placebo data was performed after the establishment of the base model and was later refined with addition of active treatment data. The covariates were prespecified based on etrolizumab population pharmacokinetic and exposure‐response analysis. 22 , 23
Parameters estimated in the final UC disease model were transformed from the quantile scale to an approximate mean mMS scale. The transformation was done in R by applying the Equation 2 to the model parameter estimates, whereas the pnorm function gives probabilities from a normal distribution with mean zero and a standard deviation of one.
(2) |
pnorm, the normal distribution.
Model evaluation
RB, SF, ENDO, and PGA subscore predictions using the final UC disease model were evaluated for three properties: (1) individual predictive performance, (2) item (or subscore) characteristics predictability, and (3) simulation properties. Predictive performance for longitudinal mMS subscores and for remission were of interest for assessment. Remission was defined as mMS less than 3, with individual subscores of less than 2 and a RB subscore of 0.
Plots of individual observed versus predicted subscores were created by extrapolating the available subscores (categorical) to a theoretical mMS at each timepoint by summing up the scores and adjusting it to the number of associated observations. In addition, the observed versus predicted remission at the end of induction was evaluated. The number of placebo observations at the end of maintenance was considered too limited for a reasonable separate evaluation.
The item characteristic curves (ICCs) were used to illustrate and assess the probability of response based on relationship between patients' latent variable and each item's difficulty (to improve). The ICCs were created in a similar fashion, as described by Lyauk and colleagues. 24 The uncertainty in the Empirical Bayes Estimate was taken into account for the predicted item characteristics that were compared to the observed. The observed ICC was estimated by a single generalized additive model (GAM) smooth of the attainment of subscore results versus the latent disease status. These were compared, per subscore and result, to the 90% confidence interval of 200 GAM results predicted from parameter vectors sampled from a normal distribution with mean individual empirical Bayes estimates and variance individual η Fisher information assessed variance.
The simulation properties of the models were evaluated using visual predictive checks (VPCs). The VPCs were performed with 200 simulations for both the analysis dataset and the external evaluation dataset. In the external evaluation, the final model with covariate was applied to the external data. Patients who achieved clinical response (MCS with ≥3‐point decrease and ≥30% reduction from baseline as well as ≥1‐point decrease in RB subscore or an absolute RB of 0 or 1) at the end of induction were randomized to the maintenance phase. In VPCs, this was taken into account by a dropout model, where the simulated scores at the end of induction versus the baseline simulated score guided the inclusion of the simulated subjects in the maintenance phase.
Software
The modeling analyses were performed using NONMEM version 7.4, 25 facilitated by Perl‐speaks‐NONMEM (PsN), version 4.9.0. 26 Data management and further processing of NONMEM output were performed using R version 3.5.3. 27
RESULTS
Data
The final UC disease model analysis dataset included a total of 31,090 RB, SF, and ENDO observations from 1390 patients, and 7893 PGA observations from 1382 patients with moderately to severely active UC. Summary of studies and subscore observations included in the analysis dataset is presented in Table 1 and Table S3. Summary of observed baseline MCS characteristics and the mean (95% confidence interval) observed longitudinal profiles of MCS subscore are presented in Table S2 and Figure S3, respectively. Patient demographics for HIBISCUS I/II, HICKORY, and LAUREL studies have been previously published. 13 , 14 , 15 , 22 , 23
TABLE 1.
Studies included in model development and external evaluation.
Clinical trial [Model Development] | HIBISCUS I | HIBISCUS II | HICKORY | LAUREL | Total |
---|---|---|---|---|---|
Induction/maintenance (weeks) | 10/– | 10/– | 14/52 | 10/52 | – |
Population | TNF Naïve | TNF Naïve | TNF IR | TNF Naïve | – |
Placebo N (induction/maintenance) | 72/– | 72/– | 94/27 | –/– | 238/27 |
Etrolizumab N (induction/maintenance) | 144/– | 143/– | 510/139 | 355/122 | 1152/261 |
Clinical trial [external evaluation] | AbbVie (NCT00385736) | AbbVie (NCT00408629) | BMS (NCT00410410) | Pfizer (NCT00787202) | AbbVie (NCT00853099) |
---|---|---|---|---|---|
Induction/maintenance (weeks) | 8/– | 8/44 | 12/40 | 8/– | 8/44 |
Population | TNF naïve | TNF naïve/IR | TNF naïve/IR | TNF naïve/IR | TNF naïve |
Placebo N (induction/maintenance) | 222/– | 256/143 | 135/20 | 46/– | 96/57 |
Abbreviations: IR, patients with inadequate response or intolerance to prior anti‐TNF treatment; N, number of subjects; TNF, tumor necrosis factor.
For the external evaluation, individual‐level longitudinal data from 755 adult placebo arm patients was pooled from five phase II and III clinical trials (data presented in Table 1 and Table S3). 16 , 17 , 18 , 19 Summary of MCS and subscores and baseline characteristics are summarized in Table S2C. The mean (95% confidence interval) observed longitudinal profiles of MCS subscores of the analysis (placebo only) and external evaluation data are presented in Figure S4.
Model development
The IRT‐BI model was developed to describe the relationship between the patients' observed mMS subscores and a hidden (latent) disease status. In the final model, all item‐specific parameters for RB, SF, and ENDO subscores were successfully estimated using a single latent variable described by the sum of two mono‐exponential functions: one for placebo response and one for active treatment response (which assumed zero for placebo).
Population typical values for the key IRT components at the approximate score scale were estimated as 1.65 for BASERB, 2.13 for BASESF, and 2.36 for BASEENDO, −0.731 for maximum placebo response (R MaxPL) and −0.137 for maximum etrolizumab response (R MaxACT). It is to be noted that etrolizumab maximum treatment response was estimated on top of the maximum placebo response meaning total response for subjects receiving etrolizumab was estimated as −0.868. Based on the final model, time to half of the maximal response was estimated as 2.67 weeks for placebo response (T50PL) and 4.57 weeks for etrolizumab (T50ACT). Final model parameter estimates, including those for PGA, are presented in Table 2. The SCM analysis identified higher baseline albumin levels were associated with lower baseline SF and ENDO subscores (BASESF and BASEENDO).
TABLE 2.
Final ulcerative colitis disease model parameters estimates.
Parameters | Value | RSE (%) | Score scale |
---|---|---|---|
BASERB | 0.130 | 9.19 | 1.65 |
BASESF | 0.557 | 2.96 | 2.13 |
BASEPGA | 0.598 | 2.01 | 2.18 |
BASEENDO | 0.798 | 1.90 | 2.36 |
SDRB | 0.308 | 1.49 | 0.360 |
SDSF | 0.315 | 1.52 | 0.319 |
SDPGA | 0.327 | 1.35 | 0.324 |
SDENDO | 0.498 | 2.73 | 0.427 |
R MaxPL,IRT | −0.630 | 4.39 | −0.731 |
R MaxPL,PGA | −0.592 | 5.88 | −0.669 |
R MaxACT,IRT | −0.125 | 26.4 | −0.137 |
R MaxACT,PGA | −0.157 | 28.7 | −0.164 |
T50PL,IRT | 2.67 | 3.21 | |
T50PL,PGA | 1.85 | 7.87 | |
T50ACT,IRT | 4.57 | 3.77 | |
T50ACT,PGA | 4.88 | 5.30 | |
COVBASE SF:Albumin | −0.038 | 16.3 | |
COVBASE ENDO:Albumin | −0.047 | 10.3 | |
IIVSD,IRT | 0.376 | 3.85 | |
IIVBASE,RB | 0.376 | 2.57 | |
IIVBASE,SF | 0.551 | 2.32 | |
IIVBASE,PGA | 0.191 | 5.45 | |
IIVBASE,ENDO | 0.287 | 5.45 | |
IIV R MaxPL,IRT | 0.539 | 6.13 | |
IIV R MaxPL,PGA | 0.397 | 7.81 | |
IIV R MaxACT,IRT | 0.663 | 2.71 | |
IIV R MaxACT,PGA | 0.617 | 4.01 |
Note: Value: model estimated parameter (Mayo clinical subscores estimated in bounded integer scale).
Abbreviations: BASE, baseline subscore; COV, covariate; ENDO, endoscopy; IIV, interindividual variability; IRT, item response theory model estimated parameters based on modified Mayo score of SF, RB, and ENDO; PGA, physician's global assessment; RB, rectal bleeding; R MaxACT, maximum active response; R MaxPL, maximum placebo response; RSE, relative standard error; Score Scale, Mayo clinical subscore scale; SD, standard deviation; SF, stool frequency; T50ACT, time to achieve 50% of the maximal active response; T50PL, time to achieve 50% of the maximal placebo response.
Model evaluation
The individual predictive performance of the predicted mMS sum of scores was adequate with no visible bias (Figure 2a). Remission at the end of the induction phase was correctly predicted 249 out of 311 times (80%) and non‐remission was correctly predicted 891 out of 1002 times (89%; Figure 2b). Remission for subjects receiving etrolizumab was correctly predicted more often compared to placebo, but the number of observations was also larger for the etrolizumab arms.
FIGURE 2.
Individual predictive performance of the final model, stratified by treatment. (a) Observed versus predicted adjusted a sum of scores per measurement superimposed with a smooth and the two‐dimensional kernel density; the black line depicts the line of identity. (b) Correctness of individual predictions of remission (upper panel) or non‐remission (lower panel) and the associated rules. aSum of all scores multiplied by three divided by the number of observations: theoretical mMS assuming all scores are available and 100% correlation between scores. mMS, modified Mayo Score.
The ICCs of the final model are presented in Figure 3. The probability of observing lower scores increases with the latent variable decreasing (because lower latent variable means higher treatment response). Overall, the subscores have similar discriminative ability (i.e., similar slopes). RB has distinctively lower difficulty, whereas ENDO has a higher difficulty (i.e., for 50% probability of an ENDO of 0, the overall mMS has to be decreased by ~70%, whereas for 50% probability of an RB of 0, ~30% decrease in mMS is needed). The item characteristics are overall adequately captured by the model.
FIGURE 3.
Item characteristic curves including uncertainty in empirical Bayes estimates. The figure shows the cumulative probabilities (red lines) along with 90% confidence interval from 200 generalized additive model results predicted from parameter vectors sampled from a normal distribution with mean individual empirical Bayes estimates and variance individual η Fisher information assessed variance. The 90% confidence interval was derived for each 0.5% quantile of the latent variable as the 5th and 95th percentile of 200 mean generalized additive model estimates.
VPC stratified by treatment for the analysis data (placebo vs. etrolizumab), is presented in Figure 4a. Remission at the end of induction and maintenance is presented in Figure 4b,c, respectively. The final model reliably predicted the observed mean subscores. Remission was overpredicted by 5% at the end of induction mainly due to the rule of all subscores less than 2. On the other hand, remission was underpredicted by 3% at the end of maintenance mostly due to the rule of RB = 0.
FIGURE 4.
Visual predictive checks for the final model and the analysis dataset, stratified by treatment. (a) The mMS subscores versus time since first dose, where the mean of observations is superimposed with the 90% confidence interval of 200 simulated means. (b, c) Simulated versus observed remission and the associated rules at end of induction and maintenance, where the median percentage of deviation is shown in the blue box, the absolute observed is shown with the black text, and the blue error bar depicts the 90% confidence interval of 200 simulated remission rates versus the observed. mMS, modified Mayo Score.
Finally, the VPC for the external evaluation data, stratified by study is shown in Figure 5. RB, SF, and ENDO are well‐predicted by the model. Remission at the end of induction was well‐described with only 3% underprediction. Remission at the end of maintenance was also adequately predicted with only 9% overprediction.
FIGURE 5.
Visual predictive checks for the final model and the external evaluation dataset, stratified by study. (a) The mMS subscores versus time since first dose, where the mean of observations is superimposed with the 90% confidence interval of 200 simulated means. (b, c) Simulated versus observed remission and the associated rules at end of induction and end of maintenance, where the median percentage of deviation is shown in the blue box, the absolute observed is shown with the black text, and the blue error bar depicts the 90% confidence interval of 200 simulated remission rates versus the observed. mMS, modified Mayo Score.
DISCUSSION
This paper describes the application of the IRT methodology to develop a disease model for the clinical status of patients with moderately to severely active UC, and quantification of the placebo effect. The clinical status of patients is related to the longitudinal changes in the mMS subscores. A BI model was developed for each subscore linked by a latent disease model. Baseline SF, RB, and ENDO were independently described, and the time‐dependent trajectory (i.e., treatment response) was shared across subscores, thereby indirectly linking all subscore models.
To our knowledge, this is the first UC disease model developed that uses IRT‐BI methodology to describe longitudinal mMS and assess clinical status, including remission for patients receiving placebo or active treatment. Currently, decisions to continue or discontinue UC clinical programs during drug development are mainly based on remission at the end of treatment (remitter or non‐remitter). As such, partially complete data (e.g., interim analysis) or missing subscores are not informative and longitudinal trends of treatment benefit that have not yet reached the full potential cannot be extrapolated. On the other hand, a longitudinal approach of directly analyzing distinct composite mMS can be potentially limiting or misleading due to implicit assumptions such as (a) the importance of the subscore relative to the total score disregarded, (b) the difficulty and discrimination between the subscores disregarded, 28 and (c) reliance on all scores being available and disregarding the responses with missing subscores. IRT offers an alternative approach by considering all the individual subscores and relating them to an underlying hidden latent variable, 11 defined here as UC disease status. What is unique about an IRT model is the decomposition of the data into assessment‐specific (RB, SF, and ENDO) and subject‐specific features that enable unique insights. The developed disease model can be used to more accurately describe and predict the effect of both placebo and active treatment during drug development programs for UC. The use of this model would leverage all key efficacy data. Each subject has multiple observations of different subscores (RB, SF, and ENDO) and the model can be used to predict the most likely (individual) outcome based on partial data. Therefore, longitudinal trends can be extrapolated to simulate the theoretical full potential of the candidate drug, including the onset and duration of the treatment effect.
In the current analysis, numerical and graphical model‐diagnostics were used to evaluate different components of the UC disease model for the analysis data and for the external evaluation data, which verified adequacy of the underlying structural model both at the individual level and at the population level. In addition, predictive performance was evaluated for individual subscores and for the key derived efficacy parameter “remission.” The predictive performance of the IRT model was reliable; RB, SF, and ENDO were adequately predicted for both the analysis data and the external data. Remission status was adequately described for both the analysis and external dataset. The identified covariate relationship was in general anticipated: lower baseline SF and ENDO with higher baseline albumin levels.
The item characteristics, that is, the relationship between subscores and the latent disease, were sufficiently characterized by the IRT model, as illustrated in Figure 3. The increase in response related to the probability of having lower scores for all mMS subscores is quantified in the model by the maximal placebo response and maximal active treatment response parameters. The ICCs also indicated that ENDO is the most difficult item (i.e., lower improvement probability): with −0.5 response (latent variable, ~30% decrease in mMS), the percentage of subjects with a score of 3 (more severe disease) is predicted 0% for RB versus ~25% for ENDO. Furthermore, based on the ICCs steepness of slopes, RB is the most discriminative item, which is quantified by the lower SD parameter in the model. These results are in agreement with a previous publication, which analyzed the prevalence of endoscopic improvement and endoscopic remission according to patients’ with UC reported outcome pooled data from active intervention and placebo arms of infliximab, golimumab, vedolizumab, and tofacitinib clinical trials. 29 The study indicated improvement in endoscopic disease activity in eight out of 10 patients with normalized SF and RB, but endoscopic remission was only observed in 50% of these patients. 29
A longitudinal mixed‐effects IRT model with Markov elements was developed before which it had the advantage of the dependency between subsequent observations. 30 In this analysis, observations records were not far apart, and the Markov effect models were not needed. Different to classic IRT models, here, a BI model was used to describe the items instead of a proportional odds model. Because the BI model needs two parameters, regardless of the number of score results, 21 all parameters were simultaneously estimated. In contrast, during classic IRT model development, the item characteristics are typically estimated on drug‐free data and henceforth fixed, prior to the estimation of the latent disease parameters using all data. To justify this sequential approach, adequate baseline data are needed also reflecting the range of post‐baseline latent disease status. The current UC disease model was suitable for the current data size and can be modified if needed (e.g., by reducing the number of simultaneously estimated parameters) to apply to different phases of drug development (i.e., clinical trial phases I to III). This model can be used for interim analysis of phase I or II with limited data to evaluate clinical status of patients and could facilitate early evaluation of active treatment efficacy more accurately by quantifying the placebo effect, and ultimately predict long‐term treatment outcomes. Although PGA was disregarded for the IRT model, the separate PGA model is available to support ongoing UC studies that still include this subscore in their primary efficacy assessments.
The external evaluation based on data from five clinical trials provided further re‐assurance, as the final UC disease model described external data reasonably well‐supporting the model usability to predict clinical status and remission of patients for early informed decision making in UC clinical development. Finally, this modeling concept could theoretically be applied to any disease that is described by multiple integer scores.
There are some limitations associated with disease models developed based on placebo data. In particular, this study does not include data of subjects that did not receive active drug nor placebo, which prevents the estimation of natural disease progression separate from the placebo effect. In addition, disease progression can vary widely among patients, and placebo groups may not adequately represent this heterogeneity, leading to models that may not generalize well to diverse patient populations. Nevertheless, placebo data from three multicenter phase III clinical trials were used to develop the base model, which should give an accurate reflection in the heterogeneity in response from natural disease progression in the presence of placebo treatment. Furthermore, a limitation of the current model is that we tested for key influential covariates, but we did not account for all possible influential covariates, such as comedication. However, standard covariate modeling techniques may not be suitable to quantify such heterogeneous exposures, but the model may be expanded to account for this using machine learning methods.
In summary, the main objective of this analysis was to quantitatively assess clinical status of patients and quantify placebo response in order to better inform future clinical trial design and interpretation. A population disease model was built to describe the mMS over time in placebo and etrolizumab treated patients with UC. The IRT‐BI model adequately described the subscores of mMS and reliably predicted the remission status for both the analysis data and external placebo data. These results suggest that this disease model structure is a valid platform for longitudinally predicting the clinical disease status of patients with UC. Other drug effects can be added as deemed appropriate, allowing the approach to propel model‐informed drug development across a range of UC programs.
AUTHOR CONTRIBUTIONS
A.M., J.L., E.L.P., J.Y.J., M.K., and N.K. wrote the manuscript. A.M., J.L., E.L.P., J.Y.J., M.K., and N.K. designed the research. A.M. and J.L. performed the research. J.L. and A.M. analyzed the data.
FUNDING INFORMATION
This work was supported by Genentech Inc., a member of the Roche group.
CONFLICT OF INTEREST STATEMENT
A.M., J.Y.J., M.K., and N.K. are employees of Genentech and own Roche stocks. J. L. and E.L.P. are employees of Pharmetheus AB and are paid consultants for Genentech, Inc. E.L.P. owns stocks in Pharmetheus AB.
Supporting information
FigureS1
FigureS2
FigureS3
FigureS4
TableS1
TableS2
TableS3
DataS1
ACKNOWLEDGMENTS
The authors thank the patients and their families who participated in these studies, and physicians and staff who conducted and/or managed these studies.
Moein A, Langenhorst J, Plan EL, Jin JY, Kågedal M, Kassir N. A disease model predicting placebo response and remission status of patients with ulcerative colitis using modified Mayo score. Clin Transl Sci. 2023;16:2310‐2322. doi: 10.1111/cts.13632
Anita Moein and Jurgen Langenhorst equally contributed to this article.
REFERENCES
- 1. Brown C, Gibson PR, Hart A, et al. Long‐term outcomes of colectomy surgery among patients with ulcerative colitis. Springerplus. 2015;4:573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Collaborators G.B.D.I.B.D . The global, regional, and national burden of inflammatory bowel disease in 195 countries and territories, 1990–2017: a systematic analysis for the global burden of disease study 2017. Lancet Gastroenterol Hepatol. 2020;5:17‐30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Danese S, Allez M, van Bodegraven AA, et al. Unmet medical needs in ulcerative colitis: an expert group consensus. Dig Dis. 2019;37:266‐283. [DOI] [PubMed] [Google Scholar]
- 4. Muller KR, Prosser R, Bampton P, Mountifield R, Andrews JM. Female gender and surgery impair relationships, body image, and sexuality in inflammatory bowel disease: patient perceptions. Inflamm Bowel Dis. 2010;16:657‐663. [DOI] [PubMed] [Google Scholar]
- 5. Ungaro R, Mehandru S, Allen PB, Peyrin‐Biroulet L, Colombel JF. Ulcerative colitis. Lancet. 2017;389:1756‐1770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Hindryckx P, Baert F, Hart A, et al. Clinical trials in ulcerative colitis: a historical perspective. J Crohns Colitis. 2015;9:580‐588. [DOI] [PubMed] [Google Scholar]
- 7. Jairath V, Zou G, Parker CE, et al. Systematic review and meta‐analysis: placebo rates in induction and maintenance trials of ulcerative colitis. J Crohns Colitis. 2016;10:607‐618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Su C, Lewis JD, Goldberg B, Brensinger C, Lichtenstein GR. A meta‐analysis of the placebo rates of remission and response in clinical trials of active ulcerative colitis. Gastroenterology. 2007;132:516‐526. [DOI] [PubMed] [Google Scholar]
- 9. US Food and Drug Administration . Ulcerative colitis: developing drugs for treatment. Guidance Document. 2022. https://www.fda.gov/regulatory‐information/search‐fda‐guidance‐documents/ulcerative‐colitis‐developing‐drugs‐treatment
- 10. Kawakatsu S, Zhu R, Zhang W, et al. A longitudinal model for the Mayo clinical score and its sub‐components in patients with ulcerative colitis. J Pharmacokinet Pharmacodyn. 2022;49:179‐190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Ueckert S. Modeling composite assessment data using item response theory. CPT Pharmacometrics Syst Pharmacol. 2018;7:205‐218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Ueckert S et al. Improved utilization of ADAS‐cog assessment data through item response theory based pharmacometric modeling. Pharm Res. 2014;31:2152‐2165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Peyrin‐Biroulet L, Hart A, Bossuyt P, et al. Etrolizumab as induction and maintenance therapy for ulcerative colitis in patients previously treated with tumour necrosis factor inhibitors (HICKORY): a phase 3, randomised, controlled trial. Lancet Gastroenterol Hepatol. 2022;7:128‐140. [DOI] [PubMed] [Google Scholar]
- 14. Rubin DT, Dotan I, DuVall A, et al. Etrolizumab versus adalimumab or placebo as induction therapy for moderately to severely active ulcerative colitis (HIBISCUS): two phase 3 randomised, controlled trials. Lancet Gastroenterol Hepatol. 2022;7:17‐27. [DOI] [PubMed] [Google Scholar]
- 15. Vermeire S, Lakatos PL, Ritter T, et al. Etrolizumab for maintenance therapy in patients with moderately to severely active ulcerative colitis (LAUREL): a randomised, placebo‐controlled, double‐blind, phase 3 study. Lancet Gastroenterol Hepatol. 2022;7:28‐37. [DOI] [PubMed] [Google Scholar]
- 16. Reinisch W, Sandborn WJ, Hommes DW, et al. Adalimumab for induction of clinical remission in moderately to severely active ulcerative colitis: results of a randomised controlled trial. Gut. 2011;60:780‐787. [DOI] [PubMed] [Google Scholar]
- 17. Sandborn WJ, Colombel J–F, Sands BE, et al. Abatacept for Crohn's disease and ulcerative colitis. Gastroenterology. 2012;143:62‐69.e64. [DOI] [PubMed] [Google Scholar]
- 18. Sandborn WJ, Ghosh S, Panes J, et al. Tofacitinib, an oral Janus kinase inhibitor, in active ulcerative colitis. N Engl J Med. 2012;367:616‐624. [DOI] [PubMed] [Google Scholar]
- 19. Sandborn WJ, van Assche G, Reinisch W, et al. Adalimumab induces and maintains clinical remission in patients with moderate‐to‐severe ulcerative colitis. Gastroenterology. 2012;142:257‐265.e1‐3. [DOI] [PubMed] [Google Scholar]
- 20. Wellhagen GJ, Kjellsson MC, Karlsson MO. A bounded integer model for rating and composite scale data. AAPS J. 2019;21:74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Schindler E, Friberg LE, Lum BL, et al. A pharmacometric analysis of patient‐reported outcomes in breast cancer patients through item response theory. Pharm Res. 2018;35:122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Kassir N, Zhu R, Moein A, et al. Exposure‐response relationships of etrolizumab in patients with moderately‐to‐severely active ulcerative colitis. CPT Pharmacometrics Syst Pharmacol. 2022;11:1234‐1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Moein A, Lu T, Jönsson S, et al. Population pharmacokinetic analysis of etrolizumab in patients with moderately‐to‐severely active ulcerative colitis. CPT Pharmacometrics Syst Pharmacol. 2022;11:1244‐1255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Lyauk YK, Jonker DM, Lund TM, Hooker AC, Karlsson MO. Item response theory modeling of the international prostate symptom score in patients with lower urinary tract symptoms associated with benign prostatic hyperplasia. AAPS J. 2020;22:115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Beal SL, Sheiner LB, Boeckmann AJ, Bauer RJ. NONMEM 7.4 Users Guides. ICON plc; 1989. ‐2019. [Google Scholar]
- 26. Lindbom L, Pihlgren P, Jonsson EN. PsN‐Toolkit—a collection of computer intensive statistical methods for non‐linear mixed effect modeling using NONMEM. Comput Methods Programs Biomed. 2005;79:241‐257. [DOI] [PubMed] [Google Scholar]
- 27. R Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; 2019. [Google Scholar]
- 28. Gottipati G, Karlsson MO, Plan EL. Modeling a composite score in Parkinson's disease using item response theory. AAPS J. 2017;19:837‐845. [DOI] [PubMed] [Google Scholar]
- 29. Dulai PS, Singh S, Jairath V, et al. Prevalence of endoscopic improvement and remission according to patient‐reported outcomes in ulcerative colitis. Aliment Pharmacol Ther. 2020;51:435‐445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Germovsek E, Ambery C, Yang S, Beerahee M, Karlsson MO, Plan EL. A novel method for Analysing frequent observations from questionnaires in order to model patient‐reported outcomes: application to EXACT(R) daily diary data from COPD patients. AAPS J. 2019;21:60. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
FigureS1
FigureS2
FigureS3
FigureS4
TableS1
TableS2
TableS3
DataS1