ABSTRACT
Model‐based meta‐analysis allows integration of aggregated‐level data (AD) from different clinical trials in one model to assess population efficacy/safety. However, AD is limited in individual‐level information, while individual‐patient‐level data (IPD) are hard to obtain. Combined modeling may take advantage of both sources. Chronic obstructive pulmonary disease (COPD) is a leading cause of poor health and death. This study established a combined ADIPD model of COPD clinical trials with forced expiratory volume in 1 s (FEV1) as an endpoint and explored methods for estimating interstudy variability (ISV), interindividual variability (IIV), and aggregation bias. Stochastic simulation and estimations (SSE) showed the best method in NONMEM to estimate ISV/IIV: using $LEVEL with equal weight of studies; for the AD part, ISVs from the AD model were fixed, estimating IIV with separate ETAs for each arm; the IPD part shared the fixed ISV and estimated IIV. An approximated normal distribution was derived for lognormal IIV to avoid aggregation bias. Covariate correlations were different at aggregated and individual levels, but did not introduce aggregation bias according to SSE. A separate AD model (published) and IPD model were built, then combined to form the ADIPD model. The ADIPD model included FEV1 baseline, disease progression, placebo effect, and Emax/constant dose‐responses for 23 compounds. Identified covariate relationships: higher age, female, higher disease severity, non‐current smoker related to lower baseline; higher baseline related to faster disease progression and higher drug effects. Covariate coefficients were estimated more precisely in the ADIPD model than the AD model. ADIPD modeling allows more informed clinical trial simulations for study design.
Trial Registration: ClinicalTrials.gov identifier: NCT01053988 and NCT01054885
Keywords: aggregated data, aggregation bias, combined modeling, COPD, individual‐patient data
Summary.
- What is the current knowledge on the topic?
-
○Model‐based meta‐analyses of aggregated data (AD) of COPD clinical trials have been published. However, AD lacked information and model inference on the individual‐level. Combined modeling of AD and individual‐patient data (IPD) is not widely applied.
-
○
- What question did this study address?
-
○This study aims to establish a combined ADIPD model for COPD, including finding methods for estimating interstudy and interindividual variability (ISV and IIV) and solving aggregation bias.
-
○
- What does this study add to our knowledge?
-
○A method utilizing $LEVEL and an interoccasion (IOV)‐like approach was found suitable to estimate ISV/IIV in NONMEM. An approximate normal distribution was derived for lognormal IIV for AD to avoid aggregation bias. Different covariate correlations in AD and IPD did not introduce aggregation bias. The combined ADIPD model of COPD was built and estimated both IIV and ISV and covariate coefficients.
-
○
- How might this change drug discovery, development, and/or therapeutics?
-
○This study proposed relevant methods and building strategies for the combined ADIPD model. The combined model took advantage of both AD and IPD and could allow more informed clinical trial simulations for designing studies.
-
○
1. Introduction
Model‐based meta‐analysis (MBMA) allows the integration of aggregated‐level data (AD) from different clinical trials into one pharmacometric model to assess and compare the efficacy and safety of different treatments and estimate interstudy variability (ISV) with covariate relations. However, AD data are limited in information about interindividual variability (IIV) and covariate relations that can explain such variability and possibly lead to ecological fallacies [1]. Meanwhile, individual patient‐level data (IPD) are typically expensive to acquire, not always available, and more computationally consuming. Combined modeling with both AD and IPD may take advantage of the information in both data types and allow more informed clinical trial simulations for designing studies. Compared to using AD alone, the inclusion of IPD has been reported to have benefits in improving ecological inference [2] and parameter estimation [3, 4, 5].
Various methods have been developed to combine aggregate data (AD) and individual patient data (IPD), each with distinct characteristics and approaches. These methods include: (1) the two‐stage approach, (2) IPD reconstruction, (3) multilevel modeling, and (4) Bayesian hierarchical regression [6]. The two‐stage method reduces IPD to AD to allow combining with additional AD, ignoring IPD information. Reconstruction of IPD can be done for binary or ordinal AD by sampling binary/ordinal distributions; then the reconstructed data are combined with IPD. The multilevel modeling allows different levels of data to contribute towards the estimation of different levels. The Bayesian hierarchical regression applies Markov Chain Monte Carlo methods, allowing AD and IPD to share information on common parameters. Among these methods, the multilevel modeling method is the most widely applied technique in pharmacometrics with tools such as NONMEM and MONOLIX [7]. Despite its compatibility with combined AD‐IPD frameworks, practical applications of multilevel modeling in this domain are uncommon. In this study, we showcase the building of a combined ADIPD model by the multilevel modeling method. A major issue in ADIPD modeling is aggregation bias [8], which refers to situations where parameter estimates based on aggregate‐level data are systematically different from those based on individual‐level data.
Chronic obstructive pulmonary disease (COPD) is the seventh leading cause of poor health and the third leading cause of death in 2019 according to the World Health Organization [9]. According to the Global Initiative for Chronic Obstructive Lung Disease (GOLD) [10], COPD is defined as “a heterogeneous lung condition characterized by chronic respiratory symptoms (dyspnea, cough, expectoration and/or exacerbations) due to abnormalities of the airways (bronchitis, bronchiolitis) and/or alveoli (emphysema) that cause persistent, often progressive, airflow obstruction.” COPD results from multiple risk factors such as smoking, air pollution, prematurity and genetics [9, 10]. The therapy is guided by the severity of airflow limitation, symptoms, exacerbations and multimorbidity. The medication includes short‐/long‐acting β2 agonists (SABA/LABA), short‐/long‐acting anticholinergic (SAMA/LAAC) and combined therapies with inhaled corticosteroids (ICS) like LABA+LAMA, LABA+ICS and LABA+LAMA+ICS [10]. COPD is a heterogeneous disease, patients experience exacerbations due to suboptimal management [10, 11]. The modeling approach could help predict drug efficacy and identify subpopulations for target treatment [12]. The incorporation of multiple clinical trials in modeling helps to fully reflect disease progression/variation in the population.
Legacy MBMA models [13] of randomized COPD trials have been developed with endpoints as mean forced expiratory volume in 1 s (FEV1) and exacerbation rate. The latest MBMA model evaluated mono‐, dual‐, and triple‐therapies of bronchodilators and anti‐inflammatories from 298 studies [13]. None of the MBMA models included IPD. In this paper, IPD of two clinical trials [14, 15] of the 298 studies were available, which could be used for combined ADIPD modeling.
This study aims to establish a combined model for the AD and IPD of COPD clinical trials, including finding suitable methods for estimating ISV/IIV and solutions to possible aggregation bias.
2. Methods
2.1. Data Description
AD: AD were the same dataset as the published MBMA model paper [13]. The dataset contained 4137 mean morning trough FEV1 measurements from 298 studies of 250,543 patients until November 24, 2020, involving 23 compounds in mono‐, dual‐, or triple‐therapies. The below two studies of IPD were also included among these 298 studies of AD. To avoid replication, these two studies were excluded from AD for the ADIPD model.
IPD: The IPD of two clinical trials [14, 15] of mono‐ or dual‐therapies of fluticasone furoate/vilanterol were obtained from GSK. The two studies are 24‐week, randomized, stratified (by smoking status), placebo‐controlled, double‐bind, parallel‐group, and multicenter studies. The demographic information of interest and FEV1 observations during treatment were summarized in Table S1 (details in publications [14, 15]).
2.2. Evaluation of Different Methods to Estimate ISV/IIV for the Combined ADIPD Model
The combined modeling of AD and IPD involves three levels of variabilities: ISV, IIV, and residual variability. In the combined ADIPD model, IIV contribution to variability in AD was defined as IIV variance/N. The remaining variability in AD is apportioned between ISV and residual variability. To deal with more than two levels in NONMEM, the $LEVEL method and the interoccasion variability (IOV)‐like method [16] were explored. $LEVEL functionality allows additional nested random levels (NONMEM 7.5.1). By default, option LEVWT = 0 weights each study equally, regardless of the number of subjects. Alternatively, LEVWT = 1 weights according to the number of subjects. The IOV‐like method assigns different random variables (ETAs) to the individuals of the lower level and forces them to originate from the same distribution. One limitation of the IOV‐like method is that it requires as many ETAs as the number of individuals at the lower level. In our AD, there are a maximum of 10 arms. However, IPD have over 1000 individuals within a study, which means over 1000 ETAs are required, causing unacceptable run‐time. The two methods can be combined by applying the IOV‐like method to AD only while sharing IIV/ISV with $LEVEL.
These methods were tested using stochastic simulation and estimation (SSE) by NONMEM 7.5.1 and Perl‐speaks‐NONMEM 5.3.1 (PsN). Overall, there were eight settings:
$LEVEL, LEVWT = 0, estimate ISV
$LEVEL, LEVWT = 0, fix ISV to AD‐only estimate
$LEVEL, LEVWT = 1, estimate ISV
$LEVEL, LEVWT = 1, fix ISV to AD‐only estimate
$LEVEL, LEVWT = 0, IOV‐like method for AD, estimate ISV
$LEVEL, LEVWT = 0, IOV‐like method for AD, fix ISV to AD‐only estimate
$LEVEL, LEVWT = 1, IOV‐like method for AD, estimate ISV
$LEVEL, LEVWT = 1, IOV‐like method for AD, fix ISV to AD‐only estimate
The SSE data were set to the same size as the real dataset, and for simplicity, it was assumed all arms were placebo. The parameter values and model structure were set similarly to the original AD model, but with a simpler placebo model without a mixture component. Parameter bias and precision were used as criteria to compare methods.
2.3. Aggregation Bias and Solving Methods
The nonlinearity of AD models leads to aggregation bias when applied to infer individual‐level relationships [8]. For example, when age is added as a nonlinear covariate effect , , which causes aggregation bias. The parameter in will not have the same interpretation in AD and IPD models if the degree of nonlinearity is high [17].
To avoid aggregation bias, nonlinearities of the original AD and IPD model were analyzed and solved in three ways: (1) revise nonlinear relationships as linear, (2) use normal distribution to approximate lognormal distribution of IIV, and (3) explore possible effects introduced by covariate correlations.
2.3.1. Revise Nonlinear Relationships as Linear
Three nonlinear parts of the published AD model [13] were revised to linear relationship: a mixture model of Emax(t) and immediate placebo effect was revised to an immediate placebo effect; a step function of baseline as a covariate effect on drug effect was revised to a linear equation; the covariate effects of lowest/highest disease severity on baseline, formulated as a variable of the two (lowest and highest), was revised to a single covariate (lowest+highest)/2 on baseline to keep consistent with IPD part of the model.
2.3.2. Use Normal Distribution to Approximate the Lognormal Distribution of IIV
The IIV of baseline in the IPD model followed a lognormal distribution. However, the means of IIVs are not lognormally distributed, and an approximate normal distribution was derived as described below.
An individual parameter (Equation 1) and its mean (Equation 2) of one arm were defined. followed a lognormal distribution and followed a normal distribution (Equation 3). The expectation and variance of were and (Equations 4 and 5). According to the central limit theorem, approximately followed a normal distribution with the expectation and variance (Equations (6), (7), (8)). This approximation is valid under a large sample size or small variance . The number of individuals within one arm ranged from 18 to 5724, and the variance of baseline was about 0.04 in AD, which meets the requirements.
| (1) |
where : study, : arm, : individual, is a parameter.
| (2) |
where is the number of patients in one arm.
| (3) |
let the expectation and variance of are:
| (4) |
| (5) |
According to central limit theory,
| (6) |
| (7) |
| (8) |
The above approximate distribution was evaluated by simulation in R 4.3.3 and an SSE using PsN and NONMEM. In the simulation, the true distribution of the mean baseline was derived by calculating the mean of the individual baseline, where the individual baseline was simulated by a lognormal distribution (sample sizes were 20, 200, 1000, and 5000), and then compared with the approximate distributions. The simulation was repeated 1000 times. In the SSE analysis, individual data were simulated by a model with baseline (lognormal distributed) and linear disease progression (normally distributed). Simulation parameter values were set according to the AD model [13]. Aggregated data was calculated by averaging the individual data. The simulation data included 10 studies, 10 arms for each study, and 200 patients for each arm. Estimation of aggregated data by two models of lognormal or approximate normal distribution of baseline was performed, respectively. Other parts of the two models were linear, and no drug effect was included. The SSE was repeated 100 times.
2.3.3. Exploration of Possible Effects Introduced by Covariate Correlations
Four covariates were added on baseline in the AD and the IPD models: age, sex, disease severity and smoking. The correlations of the individual covariates were calculated by assuming the variance/covariance is the sum of the variances/covariances of aggregated data and individual data within a study [18] as shown in Equation (9).
| (9) |
where and are the variances, of covariate 1 or 2, of AD and IPD within a study, respectively and and are the covariances, of covariates 1 and 2, of AD and IPD within a study, respectively. As there were two IPD studies, the means of (co)variances of the two studies were used as .
An SSE analysis was conducted to investigate whether differences in covariate correlations resulted in estimation bias. Individual covariates age and sex were simulated by two‐layered randomness: firstly, simulated mean age and mean sex for each study applying the same mean/variance/correlation of real AD, then using simulated means as the expectations to simulate individual covariates in each study applying the same variance/correlation of real IPD. The dataset with simulated age/sex and a model with the covariate effect of age/sex on baseline were used to simulate individual data. The AD was generated by averaging the IPD. Model estimation was performed on the AD and IPD datasets to check the estimation bias of covariate effects. The SSE was repeated 100 times.
2.4. The Workflow and Structure of the Combined ADIPD Model
The published AD model [13] was revised and linearized to avoid aggregation bias and keep consistent with IPD, as described above.
The IPD model was built separately and followed a similar structure to the original AD model. Various placebo models (no placebo/immediate/linear/saturable models), IIV structures (exponential/additive), and error models (additive/proportional/power) were tested. In IPD, fluticasone furoate and vilanterol were dosed alone or in combination; thus, the interaction effect of the two drugs was tested. Covariate modeling was performed using scmplus (PsN, 5.3.1) with forward inclusion (p < 0.05) and backward elimination (p < 0.01). Age, disease severity, smoking, and sex were investigated on baseline, disease progression slope, and drug effects.
The combined ADIPD model was established based on the separate AD and IPD models. The ADIPD model followed the same structure as the original MBMA model [13] (Equation 10) (Code in Supporting Information).
| (10) |
Where the notations represent: : ith study; : jth individual or arm; : baseline of FEV1; : placebo effect, a constant value at time > 0; : disease progression effect, linear function of time; : effect of drug, Emax or constant equation of dose, with onset time function for some drugs; : an estimated proportional correction for studies where the change from baseline (CFB) in trough FEV1 could be transformed to absolute value (absolute FEV1 = FEV1 baseline + CFB) but the FEV1 baseline value was measured post short‐acting bronchodilator (post‐SABD) [13]; : error model. For AD, it was an additive error model scaled by sample size For IPD, . is the individual prediction of FEV1, or is a random variable of residual error for AD or IPD. is IIV on residual error [19]. represents the relationship between residual error and IPRED.
Note: if both FEV1 baseline and FEV1 during treatment were measured post‐SABD, a mean absolute reversibility of 0.18 L was added to FEV1 baseline and a fractional reduction in the overall LABD effect was estimated [13].
Covariates from the AD and IPD models were added to the combined ADIPD model. Redundant covariate relationships were removed, for example, if age was added on baseline and predicted baseline was added on drug efficacy, then age on drug efficacy was removed as its effect would have been accounted for through the baseline‐age relationship. The significance of the covariates was tested by backwards elimination. Extemporaneous single value imputation of missing values of age and medical history/background were performed based on multi‐linear regression [20] as described in the MBMA model [13]. The missing values of smoking were also imputed based on the same method [20] (regression model in supplementary code in Supporting Information). The fraction of missing disease severity was small (0.62%) and replaced with the median value.
A linear combined ADIPD model was also established by setting all relationships linear, without applying the above linearization methods. The linear model was compared with the nonlinear model to evaluate the advantages of applying non‐linear relationships and linearization methods.
The combined models were evaluated by model convergence, reasonable parameter ranges, model fitting plots and visual prediction check (VPC) plots.
2.5. Applications and Benefits of the Combined ADIPD Model
The relative standard errors (RSE) of covariates were compared between the original AD model and the combined ADIPD model.
The combined model was applied to the simulation of multiple trials. First, the four covariates (age, sex, smoking, and disease severity) were simulated by R for a population of 100 studies, 200 patients per study. The covariates were simulated according to Table 1 by two‐layered randomness: (1) simulated mean of the covariates for each study, (2) then use the simulated means as expectation to simulate individual covariates. Second, the drug effects of two treatments (vilanterol 25 μg, vilanterol/fluticasone furoate 25/200 μg) on FEV1 for 24 weeks were simulated on the virtual population by NONMEM. Parameter uncertainty was not included in the simulation.
TABLE 1.
Covariate correlations of AD and IPD.
| Covariate | Correlation of IPD within study a | Correlation of IPD a | Correlation of AD | Ratio of AD to IPD |
|---|---|---|---|---|
| age_sex b | 0.141 | 0.150 | 0.301 | 2.005 |
| age_COPD c | 0.013 | 0.040 | 0.251 | 6.300 |
| age_smoking d | −0.322 | −0.336 | −0.587 | 1.746 |
| sex_COPD | 0.045 | 0.060 | 0.177 | 2.958 |
| sex_smoking | −0.094 | −0.116 | −0.484 | 4.170 |
| COPD_smoking | −0.015 | −0.039 | −0.237 | 6.134 |
“Correlation of IPD within study” is individual correlation within one study, while “correlation of IPD” is the overall individual correlation which includes both “correlation of IPD within study” and “Correlation of AD” (Equation 9).
Sex is categorical in IPD (0 = female, 1 = male) but is continuous in AD (proportion of male, 0 to 1).
COPD is disease severity level and is categorical in IPD (level = 1, 2, 3, 4) but is continuous in AD. COPD of AD was calculated by (highest disease level + lowest disease level)/2.
Smoking is categorical in IPD (0 = not smoker, 1 = smoker) but continuous in AD (proportion of smokers, 0 to 1).
3. Results
3.1. Evaluation of Different Methods to Estimate ISV and IIV for the Combined AD and IPD Model
SSE results indicated the best method (Figure 1, red rectangle) for estimating ISV/IIV for combined modeling of many AD (298 studies) and a few IPD (2 studies): using $LEVEL with equal weight of studies (LEVWT = 0); for the AD part, ISV from the AD model were fixed, estimating IIV with separate ETAs for each arm (the IOV‐like method, with ETA scaled with the size of arm); the IPD part shared the fixed ISV and estimated IIV. The method of applying $LEVEL for both the AD and IPD parts ($LEVEL, levwt = 0 or 1, fix/not fix ISV) did not work well. This was probably due to the small number of arms (maximum 10) within each study for the AD, which was insufficient for estimating two‐level variabilities. Applying IOV‐like methods for both the AD and IPD is impractical, as the methods require each individual to have their own ETA, resulting in over 1000 ETAs for one IPD study, causing long run‐times.
FIGURE 1.

Comparison of different methods to estimate ISV/IIV in the combined model. SSE was repeated 100 times. The parameter estimates of different methods are shown in different colors. The notation in the figure: B: baseline; DPS: disease progression slope; OM: omega, random effect parameter for ISV/IIV; PMX: maximal placebo effect; PT50: time when placebo effect reaches 50%; SI: sigma, random effect for residual error model; TH: theta, fixed effect parameter; The method of best performance is highlighted in red rectangle. The version without y‐scale truncation of this figure is shown in Figure S2.
3.2. Aggregation Bias and Solving Methods
The published AD model was linearized. The linearization and revision did not significantly change the model fitting performance (Figure S1). The parameter values other than the revised part were similar to the originally published model [13] with a maximum change fold from 0.774 to 1.41.
The simulations of the baseline parameter showed that the distribution of individual parameter values across all 1000 simulations followed a lognormal distribution (Figure 2, left panel) whereas the true distribution of the parameter mean (Figure 2, right panel, red curve) was close to the approximated normal distribution (Figure 2, right panel, blue curve). The lognormal distribution with variance scaled by sample size N (Figure 2, right panel, green curve) showed a discrepancy from the true distribution, where its median was smaller than the true distribution. With the sample size N increasing, the variances of all three distributions became smaller, and the discrepancy between the green and red curves was more apparent.
FIGURE 2.

Simulations of distributions of individual parameter and aggregated parameter, use FEV1 baseline as an example for illustration. The individual parameter baseline was simulated following a lognormal distribution 1000 times with different sample sizes per arm, and the individual parameter distributions across all 1000 simulations are shown (left column). The distribution of the parameter mean was simulated by three ways: calculation of mean (the true distribution, red curve); same lognormal distribution as the individual parameter with scaled SD = (individual SD)/√N (green curve); and the approximate normal distribution (blue curve). The sample size of each arm was set as 20, 200, 1000, 5000.
An SEE was performed to evaluate the effect of this aggregation bias on estimation. The SSE results showed that the AD model using the lognormal distribution (Figure 3A, green bar) had biases of 8% and 10% on baseline parameters, both typical (TH_B) and variance (OM_B), respectively. TH_B in this AD model tended to be estimated higher than the true value, which was consistent with the simulation results in Figure 2. In comparison, disease progression was assumed to follow a normal distribution in individual simulations, and there was no bias in the slope (TH_DPS) or its variance (OM_DPS). The adjusted AD model using an approximate normal distribution (Figure 3A, blue bar) showed no biases. The precision of parameters was lower in AD models compared to the IPD model. The two AD models showed no difference in objective function value (OFV) (Figure 3A, bottom right panel), which indicated that the two models could not be distinguished according to OFV. In the combined ADIPD model, the application of approximate normal distribution on baseline contributed to a decrease in OFV of 64, which indicated that consistency in AD and IPD within a combined model can improve the fitting.
FIGURE 3.

(A) SSE results of aggregation bias caused by the lognormal distribution of IIV assumption. IPD model (red) and two AD models were applied to estimate parameters of baseline (lognormal IIV distribution) and disease progression slope (normal IIV distribution) using 100 simulated datasets. The simulated datasets of AD were obtained by averaging the simulated datasets of IPD. The two AD models were applied to the lognormal distribution (green) and the approximated normal distribution (blue) of baseline. The final panel (black bar) showed the difference in OFVs of the two AD models. (B) SSE results of aggregation bias caused by covariate correlation. The IPD model (red) and the AD model (green) were applied to estimate parameters of baseline (normal IIV distribution) and disease progression slope and covariates on baseline of 100 simulated datasets. The simulated datasets of AD were obtained by averaging the simulated datasets of IPD. TH_BAGE1 and TH_BSEX were covariate coefficients of age and sex on baseline respectively.
The four covariates (age, sex, disease severity (COPD level) and smoking) showed higher correlations at the aggregated level than at the individual level (Table 1). This phenomenon could be explained by measurement error and confounding factors [18] such as inclusion criteria, study locations and population differences across studies. To test how this difference in covariate correlation would affect estimations, the SSE was performed based on covariate distributions similar to the real data. The results (Figure 3B) indicated that this correlation difference did not introduce parameter bias. However, the precision of the parameters was reduced in the AD model.
3.3. The Combined ADIPD Model
The AD model was linearized as described in the above section.
The IPD model followed a similar model structure as the AD model. After testing different options, the model structure of no placebo effect, exponential IIV on baseline, and additive IIV on disease progression/vilanterol efficacy were applied. The model with drug interaction did not significantly decrease OFV (less than 0.1); drug interaction was therefore not included in the model. The results of covariate modeling showed that higher age was related to lower baseline and lower vilanterol efficacy; higher disease severity was related to lower baseline, lower vilanterol efficacy, and slower disease progression; non‐current smoker and female sex were related to lower baseline (Table S2). The model showed a good fitting performance and prediction performance (Figure S3).
The ADIPD model structure followed a similar structure to the original AD model. The parameter estimates are shown in Table S3. IPD and AD parts shared the same model equations, except residual error model and lognormal IIV. Residual error models were different for AD and IPD parts: AD applied an additive error model and IPD applied a power error model with IIV. The error sources of AD included additional data collection errors (e.g., digitization), compared to IPD. The IIV of the disease progression slope in AD was excluded because it was small after being scaled by the sample size, and its omission helped reduce computation time. The diagnostic plots showed that the combined model fitted the observations well (Figure S4A). The ISV/IIV followed reference normal distribution (Figure S4B), apart from the ISV in DPS, which demonstrated a relatively high shrinkage (43.4%). Prediction‐corrected VPC showed observation percentiles were in the range of prediction intervals (Figure S4C).
The combined model included covariates from both AD and IPD models and redundant covariate relationships were removed (details in Table S4). The included covariate relationships: higher disease severity, higher age, female, non‐current smoker were related to lower baseline; higher predicted baseline related to higher drug efficacy and faster disease progression. All these covariates were statistically significant after backward stepwise modeling, with the OFV increasing from 15 to 563 when any one of the covariates was removed. Inclusion criteria of exacerbation history only existed as a covariate for AD as all IPD individuals had no exacerbation history.
A linear ADIPD model was established by making all relationships linear to avoid aggregation bias. The linear ADIPD model showed a higher OFV (−85,860) compared to the nonlinear ADIPD model (−90,035). Besides, the linear ADIPD model showed worse diagnostic plots than the nonlinear model (not shown). Therefore, applying appropriate nonlinear relationships and linearization methods in the ADIPD model improved data description.
3.4. Applications and Benefits of the Combined ADIPD Model
Comparing the ADIPD model to the original AD model, the covariate estimates in the combined ADIPD model had higher precision (Table 2).
TABLE 2.
Improvement on precisions of covariate estimates of the combined model.
| Relative standard error (RSE%) | Disease severity on baseline | Age on baseline | Baseline on anti‐inflammatory drug effect | Baseline on bronchodilator effect |
|---|---|---|---|---|
| Original AD model | 6.7% | 21.5% | 37.1% | 33.1% |
| Combined ADIPD model | 1.8% | 3.8% | 25.7% | 15.4% |
The combined model simulation showed both study‐level and individual‐level variations (Figure 4). In the figure, the two treatments (vilanterol 25 μg, vilanterol/fluticasone furoate 25/200 μg) performed slightly differently on the same set of population, and males showed on average higher FEV1 than females.
FIGURE 4.

The simulation from the combined ADIPD model with two treatments (vilanterol 25 μg, vilanterol/fluticasone furoate 25/200 μg). The two treatments were applied to the same virtual population of 100 studies, 200 patients per study. The distributions of FEV1 on the 24th week were plotted. The black dashed lines are the density plots of the simulated overall populations across all the studies in each panel. The vertical lines represent the means of the simulated overall populations across all the studies in each panel, and the values of mean and variance of the simulated overall population were noted for each panel. Curves of different colors represent different studies. The figure was facet by sex and treatments. The blue dashed lines are the density plots of the observed FEV1 at 24 weeks from study NCT01054885, and patient numbers were: VIL 25 μg: 35 for female and 116 for male, VIL/FF 25/200 μg: 53 for female and 110 for male.
4. Discussion and Conclusions
We have built a combined ADIPD model of COPD randomized clinical trials. The combined method of $LEVEL/IOV and fixing ISV was identified as a suitable method of ISV/IIV estimation for this combined model with many AD and a few IPD studies. The distribution of the mean of parameters with lognormal IIV was approximated by a normal distribution according to the central limit theorem. Although the covariates' correlations in AD were larger than those in IPD, this did not introduce aggregation bias. New significant covariates (sex and smoking on baseline) and multiple‐level variabilities (ISV/IIV) were identified in the ADIPD model. Covariate/variability precisions were improved through combined ADIPD modeling.
Several covariates were identified as significant for FEV1 baseline, and the underlying mechanisms are analyzed here. A negative correlation between age and baseline is possibly due to physiological lung aging [10]. Females showed a lower baseline, which could be due to combined effects of sex differences in anatomy (smaller airways, lungs and respiratory musculature in women [21, 22, 23]), hormonal differences [24], and more severe COPD symptoms in females than males [25]. Current smokers showed 3.6% higher FEV1 baseline than non‐current smokers in the model. However, this correlation does not imply causality. Smoking does not improve FEV1; rather, patients tend to cease smoking in the more severe COPD stage [26, 27, 28]. This study did not differentiate study‐ and individual‐level covariate effect, while Riley [29] proposed a two‐level covariate model which allowed AD or IPD to contribute to the two levels separately. Furthermore, different covariates can be incorporated into a two‐level model without restrictions, and linearization of the covariate equation is not required. We provide a two‐level covariate model in the Supporting Information.
The method of estimating ISV/IIV in this study is a two‐step process; firstly, use a pooled dataset of both AD and aggregation of IPD to estimate ISV, then fix the ISV and estimate IIV with combined AD and IPD. This method was chosen due to the dataset characteristics: many AD studies and a few IPD studies, and only a few arms in each AD study. In addition, LEVWT can only take values 0/1, that is, weight estimation of each super‐ID equally with study as super‐ID for AD and IPD, or weight by ID number within a super‐ID where patient is ID in IPD, and study is ID in AD. Weight estimation by study population size recorded as a covariate in the dataset is not possible. Such a combination of many AD and few IPD studies is common as pharmaceutical companies usually have a few in‐house studies of IPD and many publicly available AD. It is worth developing more direct methods for such a special case of the multi‐level variability problem. The estimation may also consider allowing different weights of AD and IPD by sample size.
Aggregation bias was a major problem for ADIPD modeling and should be considered when trying to apply AD results to illustrate IPD questions. The typical value of baseline decreased from 1.17 L (AD model) to 1.07 L (ADIPD model) after correction of lognormal IIV. This is shown in Figures 2 and 3A, where the theta of the uncorrected lognormal distribution of AD would be overestimated. Therefore, it was suggested to use the approximate normal distribution for the lognormal IIV of AD modeling. In this model, the residual error structures of AD and IPD differ, likely due to variation in ISV and distinct error sources. For instance, AD may include digitization inaccuracies from literature extraction; dropout impact could be more pronounced in AD compared to IPD; additionally, AD representation in the literature varies between raw data and least‐square means derived from MMRM analyses, influencing residual error. The difference in covariate correlation between AD and IPD is difficult to explain; this difference may be due to combined effects of measurement error and confounding factors. The estimation of the covariate coefficient, however, was not affected by this correlation difference according to SSE results.
We have not explored dropout‐related bias as dropout information was lacking in AD. Some studies handled dropout using MMRM, whereas in other studies, the average response was estimated without consideration for dropout. When dropout is due to disease progression (e.g., decline in FEV1), handling discontinuations via MMRM is expected to produce less bias in the disease slope estimate compared to omitting discontinued subjects from the analysis [30]. The IPD analysis can be expected to be even less affected by dropout due to individual‐level information and the hierarchical structure of the model. In principle, a dropout model developed on IPD can be included in the ADIPD model to correct the bias. However, dropout patterns can often differ between studies and drugs, and with only two IPD studies in this work, it is unlikely a reliable strategy.
The ADIPD model could be utilized to support the design of clinical trials such as designing inclusion criteria by simulating efficacy for virtual subpopulations [31], and comparing new treatments to standard of care/competitors across studies. In a mixed treatment comparison meta‐analysis, ADIPD could improve precision of therapy comparison for malaria [32].
In summary, the combined ADIPD modeling allowed better estimation of IIV and related covariate effects. The information on aggregated and individual level data could be used together to generate virtual populations incorporating both individual‐ and study‐level variabilities to assist clinical trial designs.
Author Contributions
L.Y. wrote the manuscript. M.C.K., M.O.K., S.Y., C.A., and A.B. designed the research. L.Y. performed the research. C.L.‐P. analyzed data; all authors reviewed and revised the paper.
Conflicts of Interest
S.Y., C.A. and A.B. are GSK employees and hold GSK shares. The other authors declare no conflicts of interest.
Supporting information
Data S1.
Acknowledgments
We would like to thank Yevgen Ryeznik (Department of Pharmacy, Uppsala University) for help in the derivation of the approximate normal distribution of lognormal IIV. The computations were enabled by resources in project [NAISS 2023/22‐1130] provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at UPPMAX, funded by the Swedish Research Council through grant agreement no. 2022‐06725.
Yang L., Llanos‐Paez C., Yang S., et al., “A Combined Model‐Based Meta‐Analysis of Aggregated and Individual FEV1 Data From Randomized COPD Trials,” CPT: Pharmacometrics & Systems Pharmacology 15, no. 2 (2026): e70059, 10.1002/psp4.70059.
Funding: GlaxoSmithKline funded this research in the form of a Research payment to Uppsala University.
Data Availability Statement
For requests for access to anonymized subject level data, please contact S.Y.
References
- 1. Berlin J. A., Santanna J., Schmid C. H., Szczech L. A., and Feldman H. I., “Individual Patient‐ Versus Group‐Level Data Meta‐Regressions for the Investigation of Treatment Effect Modifiers: Ecological Bias Rears Its Ugly Head,” Statistics in Medicine 21 (2002): 371–387. [DOI] [PubMed] [Google Scholar]
- 2. Jackson C., Best N., and Richardson S., “Improving Ecological Inference Using Individual‐Level Data,” Statistics in Medicine 25 (2006): 2136–2159. [DOI] [PubMed] [Google Scholar]
- 3. Agarwala N., Park J., and Roy A., “Efficient Integration of Aggregate Data and Individual Participant Data in One‐Way Mixed Models,” Statistics in Medicine 41 (2022): 1555–1572. [DOI] [PubMed] [Google Scholar]
- 4. Jones A. P., Riley R. D., Williamson P. R., and Whitehead A., “Meta‐Analysis of Individual Patient Data Versus Aggregate Data From Longitudinal Clinical Trials,” Clinical Trials 6 (2009): 16–27. [DOI] [PubMed] [Google Scholar]
- 5. Cooper H. and Patall E. A., “The Relative Benefits of Meta‐Analysis Conducted With Individual Participant Data Versus Aggregated Data,” Psychological Methods 14 (2009): 165–176. [DOI] [PubMed] [Google Scholar]
- 6. Riley R. D., Simmonds M. C., and Look M. P., “Evidence Synthesis Combining Individual Patient Data and Aggregate Data: A Systematic Review Identified Current Practice and Possible Methods,” Journal of Clinical Epidemiology 60 (2007): 431. [DOI] [PubMed] [Google Scholar]
- 7. Byon W., Smith M. K., Chan P., et al., “Establishing Best Practices and Guidance in Population Modeling: An Experience With an Internal Population Pharmacokinetic Analysis Guidance,” CPT: Pharmacometrics & Systems Pharmacology 2 (2013): 51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Ravva P., Karlsson M. O., and French J. L., “A Linearization Approach for the Model‐Based Analysis of Combined Aggregate and Individual Patient Data,” Statistics in Medicine 33 (2014): 1460–1476. [DOI] [PubMed] [Google Scholar]
- 9. “Chronic Obstructive Pulmonary Disease (COPD),” (2024), https://www.who.int/news‐room/fact‐sheets/detail/chronic‐obstructive‐pulmonary‐disease‐(copd).
- 10. Global Initiative for Chronic Obstructive Lung Disease , Global Strategy for Prevention, Diagnosis and Management of COPD: 2024 Report (Global Initiative for Chronic Obstructive Lung Disease, 2024), https://goldcopd.org/2024‐gold‐report/. [Google Scholar]
- 11. Stevermer J. J., Fisher L., Lin K. W., et al., “Pharmacologic Management of COPD Exacerbations: A Clinical Practice Guideline From the AAFP,” American Family Physician 104 (2021): online. [PubMed] [Google Scholar]
- 12. Standing J. F., “Understanding and Applying Pharmacometric Modelling and Simulation in Clinical Practice and Research,” British Journal of Clinical Pharmacology 83 (2017): 247–254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Llanos‐Paez C., Ambery C., Yang S., Beerahee M., Plan E. L., and Karlsson M. O., “Joint Longitudinal Model‐Based Meta‐Analysis of FEV1 and Exacerbation Rate in Randomized COPD Trials,” Journal of Pharmacokinetics and Pharmacodynamics 50 (2023): 297–314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Kerwin E. M., Scott‐Wilson C., Sanford L., et al., “A Randomised Trial of Fluticasone Furoate/Vilanterol (50/25 μg; 100/25 μg) on Lung Function in COPD,” Respiratory Medicine 107 (2013): 560–569. [DOI] [PubMed] [Google Scholar]
- 15. Martinez F. J., Boscia J., Feldman G., et al., “Fluticasone Furoate/Vilanterol (100/25; 200/25 μg) Improves Lung Function in COPD: A Randomised Trial,” Respiratory Medicine 107 (2013): 550–559. [DOI] [PubMed] [Google Scholar]
- 16. Karlsson M. O. and Sheiner L. B., “The Importance of Modeling Interoccasion Variability in Population Pharmacokinetic Analyses,” Journal of Pharmacokinetics and Biopharmaceutics 21 (1993): 735–750. [DOI] [PubMed] [Google Scholar]
- 17. French J. and Ravva P., “When and How Should I Combine Patient‐Level Data and Literature Data in a Meta‐Analysis?,” Page‐Meet, (2010), http://www.page‐meeting.org/pdf_assets/5031‐JFrench%20PAGE%202010%20Presentation%20v2%20for%20distribution.pdf.
- 18. Ostroff C., “Comparing Correlations Based on Individual‐Level and Aggregated Data,” Journal of Applied Psychology 78 (1993): 569–582. [Google Scholar]
- 19. Karlsson M. O., Jonsson E. N., Wiltse C. G., and Wade J. R., “Assumption Testing in Population Pharmacokinetic Models: Illustrated With an Analysis of Moxonidine Data From Congestive Heart Failure Patients,” Journal of Pharmacokinetics and Biopharmaceutics 26 (1998): 207–246. [DOI] [PubMed] [Google Scholar]
- 20. Johansson Å. M. and Karlsson M. O., “Comparison of Methods for Handling Missing Covariate Data,” AAPS Journal 15 (2013): 1232–1241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Brooks L. J., Byard P. J., Helms R. C., Fouke J. M., and Strohl K. P., “Relationship Between Lung Volume and Tracheal Area as Assessed by Acoustic Reflection,” Journal of Applied Physiology 64 (1988): 1050–1054. [DOI] [PubMed] [Google Scholar]
- 22. Ekström M., Schiöler L., Grønseth R., et al., “Absolute Values of Lung Function Explain the Sex Difference in Breathlessness in the General Population,” European Respiratory Journal 49 (2017): 1602047. [DOI] [PubMed] [Google Scholar]
- 23. Becklake M. R. and Kauffmann F., “Gender Differences in Airway Behaviour Over the Human Life Span,” Thorax 54 (1999): 1119–1138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Silveyra P., Fuentes N., and Rodriguez Bauza D., “Sex and Gender Differences in Lung Disease,” Advances in Experimental Medicine and Biology 1304 (2021): 227–258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. DeMeo D. L., Ramagopalan S., Kavati A., et al., “Women Manifest More Severe COPD Symptoms Across the Life Course,” International Journal of Chronic Obstructive Pulmonary Disease 13 (2018): 3021–3029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Liu C., Cheng W., Zeng Y., et al., “Different Characteristics of ex‐Smokers and Current Smokers With COPD: A Cross‐Sectional Study in China,” International Journal of Chronic Obstructive Pulmonary Disease 15 (2020): 1613–1619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Tian T., Jiang X., Qin R., et al., “Effect of Smoking on Lung Function Decline in a Retrospective Study of a Health Examination Population in Chinese Males,” Frontiers in Medicine 9 (2023): 843162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Alter P., Stoleriu C., Kahnert K., et al., “Characteristics of Current Smokers Versus Former Smokers With COPD and Their Associations With Smoking Cessation Within 4. 5 Years: Results From COSYCONET,” International Journal of Chronic Obstructive Pulmonary Disease 18 (2023): 2911–2923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Riley R. D., Lambert P. C., Staessen J. A., et al., “Meta‐Analysis of Continuous Outcomes Combining Individual Patient Data and Aggregate Data,” Statistics in Medicine 27 (2008): 1870–1893. [DOI] [PubMed] [Google Scholar]
- 30. Siddiqui O., Hung H. M. J., and O'Neill R., “MMRM vs. LOCF: A Comprehensive Comparison Based on Simulation Study and 25 NDA Datasets,” Journal of Biopharmaceutical Statistics 19 (2009): 227–246. [DOI] [PubMed] [Google Scholar]
- 31. Saramago P., Sutton A. J., Cooper N. J., and Manca A., “Mixed Treatment Comparisons Using Aggregate and Individual Participant Level Data,” Statistics in Medicine 31 (2012): 3516–3536. [DOI] [PubMed] [Google Scholar]
- 32. Donegan S., Williamson P., D'Alessandro U., Garner P., and Smith C. T., “Combining Individual Patient Data and Aggregate Data in Mixed Treatment Comparison Meta‐Analysis: Individual Patient Data May Be Beneficial if Only for a Subset of Trials,” Statistics in Medicine 32 (2013): 914–930. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1.
Data Availability Statement
For requests for access to anonymized subject level data, please contact S.Y.
