Accounting for Misclassification of Binary Outcomes in External Control Arm Studies for Unanchored Indirect Comparisons: Simulations and Applied Example

Mikail Nourredine; Antoine Gavoille; Côme Lepage; Behrouz Kassai‐Koupai; Michel Cucherat; Fabien Subtil

doi:10.1002/sim.70236

. 2025 Sep 10;44(20-22):e70236. doi: 10.1002/sim.70236

Accounting for Misclassification of Binary Outcomes in External Control Arm Studies for Unanchored Indirect Comparisons: Simulations and Applied Example

Mikail Nourredine ^1,^2,^3,^✉, Antoine Gavoille ^1,², Côme Lepage ^4,⁵, Behrouz Kassai‐Koupai ^2,³, Michel Cucherat ⁶, Fabien Subtil ^1,²

PMCID: PMC12422847 PMID: 40930536

ABSTRACT

Single‐arm control trials are increasingly proposed as a potential approach for treatment evaluation. However, the limitations of this design restrict its methodological acceptability. Regulatory agencies have raised concerns about this approach, although it is sometimes required in applications based solely on such studies. Consequently, the need for accurate indirect treatment comparisons has become critical, especially when constructing external control arms using routinely collected data as outcome measurements may differ from those recorded in the single‐arm trial leading to potential misclassification of outcomes. This study aimed to quantify the bias from ignoring misclassification of a binary outcome within unanchored indirect comparisons, through simulations, and to propose a likelihood‐based method to correct this bias (i.e., the outcome‐corrected model). Simulations demonstrated that ignoring misclassification results in significant bias and poor coverage probabilities. In contrast, the outcome‐corrected model reduced bias, improved 95% confidence interval coverage probability and root mean square error in various scenarios. The methodology was applied to two hepatocellular carcinoma trials illustrating a practical application. The findings underscore the importance of addressing outcome misclassification in indirect comparisons. The proposed correction method may improve reliability in unanchored indirect treatment comparisons.

Keywords: external control group, indirect treatment comparison; single‐arm study, measurement error, misclassification

1. Introduction

Randomized controlled trials (RCTs) are the gold standard for assessing experimental interventions. Nevertheless, practical or ethical limitations may prevent the implementation of a RCT, resulting in the adoption of single‐arm trials, in which all participants are administered the same treatment. Regulatory agencies have raised concerns about this approach [1], although it is sometimes required in applications based solely on such studies. The Food and Drug Administration is increasingly granting approvals based on the findings of single‐arm trials [2], especially in the field of oncology [3, 4]. In addition, the molecular fragmentation of cancers means that more and more subgroups need to be considered, making Phase III trials very difficult. Single‐arm trials require an External Control Arm (ECA) to estimate the treatment effect [5], and are part of the unanchored indirect comparison framework [6]. Studies have shown that indirect comparisons in health technology assessments were predominantly unanchored [7, 8]. For example, a new drug might be tested in a single‐arm trial, with the ECA constructed from the standard of care in Electronic Medical Records (EMRs), or a previous clinical trial. When Individual Patient Data (IPD) are available for both studies, causal effects can be estimated using methods for population‐adjusted indirect comparisons (PAICs) based on a frequentist approach, such as g‐computation, inverse‐probability treatment weighting, and targeted maximum likelihood estimation [9, 10, 11, 12]. When researchers only have access to IPD from the single‐arm trial and aggregate data (AgD) from other source, matching‐adjusted indirect comparison (MAIC) [13], and simulated treatment comparison (STC) [14] have been proposed. MAIC is based on propensity score weighting and STC is based on outcome regression adjustment [6]. STC and MAIC have been evaluated in anchored [15, 16, 17, 18] and unanchored [19, 20] indirect comparisons, without a clear conclusion as to whether MAIC or STC performs better.

PAIC methods can account for imbalance in patients characteristics, but not for differences in outcome measurements between the two studies [21]. Therefore, a commonly made assumption for estimating indirect treatment effects is that there is no difference in the outcome measurement between the single‐arm trial and the ECA. However, for example, in routinely collected data (i.e., real‐world data), outcome measures may differ from those recorded in clinical trials [22, 23]. A proxy measure $Y^{*}$ , evaluated in the ECA, can substitute the reference measure $Y$ evaluated in the single‐arm trial. Both outcomes $Y$ and $Y^{*}$ could measure the same concept, with one predicting the other using a different scale and possibly a different level. For example, in a single‐arm trial evaluating a new immunotherapy for cancer, radiological progression $Y$ might be assessed using RECIST criteria [24]; while the ECA might use clinical progression, $Y^{*}$ , collected in EMRs or a previous clinical trial. Similarly, various scales may assess depression in routinely collected data. Misclassification of outcome could bias the treatment effect either towards or away from the null value [25, 26, 27, 28]. Thus, to estimate an unbiased indirect treatment effect, one must first address the proxy outcome's misclassification. Misclassification in a binary outcome can be quantified by estimating sensitivity and specificity using ancillary studies [26]. For instance, validation studies involve measuring both the reference outcome $Y$ and its proxy $Y^{*}$ in the same individuals [26, 27]. When conducted within the same population (i.e., ECA), these are termed internal validation studies, and external when a third study is assessing $Y$ and $Y^{*}$ . Methods have already been developed for incorporating validation study data in regression outcome estimation using a Bayesian framework [29, 30] or a parametric frequentist perspective; we refer the reader to Carroll et al. [25] for general expressions with likelihood‐based methods and to Lyles et al. [31] for implementation examples. However, these methods have not yet been employed in the context of unanchored indirect comparisons.

The present study examines unanchored indirect comparison involving two studies: one with AgD or IPD for the experimental treatment assessing a binary reference outcome $Y$ and another with IPD for the control treatment, potentially derived from EMR data or a previous clinical trial, which uses a proxy outcome $Y^{*}$ . The objectives are to quantify the bias from ignoring misclassification of a binary outcome in unanchored indirect comparisons and to propose a method to correct this bias (i.e., the outcome‐corrected model). We first describe the indirect comparison framework and provides an overview of indirect treatment effect estimation; subsequently, the outcome‐corrected model is introduced. Methods with and without correction for misclassification in the binary outcome were evaluated in a simulation analysis and in a practical example using real data from two trials investigating advanced hepatocellular carcinoma: the SHARP trial [32], which evaluated the effect of sorafenib versus placebo on radiological progression measured using RECIST, and the PRODIGE‐11 trial [33], which evaluated the effect of sorafenib and pravastatin combined versus sorafenib on clinical progression.

2. Outcome Model for Indirect Comparison

2.1. Notations and Data Structures

There are three studies $S$ (Table 1), $S = 1$ study that included either AgD or IPD, labeled $s_{1}$ study; $S = 2$ the ECA study that included IPD, labeled $s_{2}$ study; and $S = 3$ that is the validation study used to estimate the outcome measurement error model labeled $s_{3}$ . The outcome $Y$ is the reference binary outcome response and $Y^{*}$ is the proxy binary outcome response. $X_{p}$ are prognostic and effect modifier covariates. Two treatments $A$ and $B$ will be considered; the patients from study $s_{1}$ received experimental treatment $A$ , and the patients from study $s_{2}$ received control treatment $B$ .

TABLE 1.

Data structure with an external validation study $s_{3}$ .

Study set

Prognostics and effect modifier covariates

Treatment

Error‐free outcome

Error‐prone outcome

Single‐arm trial

S = 1

X

A

Y

External control arm

S = 2

B

Y^{*}

Validation study

S = 3

Y

Open in a new tab

2.2. Unanchored Indirect Comparisons Framework

For a binary outcome, the form of indirect comparisons is as follows [6]: within a target population $S$ , the effect of treatment $A$ compared to treatment $B$ , $d_{AB (S)}$ , is calculated as the difference in the log odds between each treatment:

d_{AB (S)} = logit (μ_{A (S)}) - logit (μ_{B (S)})

The estimand of interest is the population‐average treatment effect of $A$ versus $B$ in the targeted population $s_{1}$ : $d_{AB (S = 1)}$ , along with its estimate:

{\hat{d}}_{AB (S = 1)} = logit ({\hat{μ}}_{A (S = 1)}) - logit ({\hat{μ}}_{B (S = 1)})

(1)

The challenge lies in estimating $logit ({\hat{μ}}_{B (S = 1)})$ , the predicted log odds of treatment $B$ in target population $s_{1}$ , as the participants in $s_{1}$ received only treatment $A$ .

2.3. Indirect Comparisons Without Measurement Error

Several methods can be used to estimate the indirect treatment effect from Equation (1) [9, 10, 11, 12]. G‐computation method involves fitting an outcome model using the IPD from study $s_{2}$ . This model then uses the prognostic and effect modifier variables from the study $s_{1}$ population to predict the outcome marginal value of treatment $B$ in target population $s_{1}$ [11, 12].

Now, consider a conditional indirect treatment effect as follows:

$d_{A B_{cond} (S)} = logit (μ_{A (S)} | X_{(S)}) - logit (μ_{B (S)} | X_{(S)})$ , where $X_{S}$ is a set of prognostic covariates in population $S$ . To predict $logit (μ_{B (S = 1)} | X_{(S = 1)})$ , the conditional log odds of outcome $Y$ under treatment $B$ in the target population $s_{1}$ , an outcome regression model is fitted to the IPD from study $s_{2}$ :

logit (Y_{i, B (S = 2)}) = β_{0} + \sum_{p = 1}^{P} β_{p} (X_{i, p (S = 2)} - {\overline{X}}_{p (S = 1)})

(2)

where $Y_{i, B (S = 2)}$ is the binary outcome value for a patient $i$ from study $s_{2}$ , and $X_{i, p (S = 2)}$ denotes the $p$ th prognostic covariate, along with it is associated coefficient $β_{p}$ . The prediction is centred on the mean covariate value of study $s_{1}$ , ${\overline{X}}_{p (S = 1)}$ . By doing so, ${\hat{β}}_{0}$ is interpreted as the predicted conditional log odds of outcome $Y$ under treatment $B$ for an average patient sampled from the target population $s_{1}$ , $logit ({\hat{μ}}_{B (S = 1)} | {\overline{X}}_{p (S = 1)})$ .

In case of AgD for $s_{1}$ , $logit (μ_{A (S = 1)})$ , the marginal log odds of outcome $Y$ under treatment $A$ in the target population $s_{1}$ , is derived from the reported summary estimate in a published article. However, it is unlikely that the conditional estimate (i.e., $logit ({\hat{μ}}_{A (S = 1)} | {\overline{X}}_{p (S = 1)})$ ) is available in a published article. Conversely, when IPD are available for study $s_{1}$ , a second outcome model is fitted to the data of $s_{1}$ with the same prognostic covariates:

logit (Y_{i, A (S = 1)}) = α_{0} + \sum_{p = 1}^{P} α_{p} (X_{i, p (S = 1)} - {\overline{X}}_{p (S = 1)})

(3)

where ${\hat{α}}_{0}$ is the estimated conditional log odds of outcome $Y$ under treatment $A$ for an average patient sampled from the target population $s_{1}$ , $logit ({\hat{μ}}_{A (S = 1)} | {\overline{X}}_{p (S = 1)})$ , to be consistent with ${\hat{β}}_{0}$ which has also a conditional effect interpretation.

Finally, a conditional population‐adjusted treatment effect is estimated using Equation (1); with $logit ({\hat{μ}}_{B (S = 1)} | {\overline{X}}_{p (S = 1)}) = {\hat{β}}_{0}$ and $logit ({\hat{μ}}_{A (S = 1)} | {\overline{X}}_{p (S = 1)}) = {\hat{α}}_{0}$ (IPD‐study $s_{1}$ ), or the log odds of the outcome, for example, reported in a published article (AgD‐study $s_{1}$ ).

The variance, $s_{AB (S = 1)}^{2}$ of $d_{A B_{cond} (S = 1)}$ is:

\begin{array}{l} s_{AB (S = 1)}^{2} & = var [logit ({\hat{μ}}_{A (S = 1)} | {\overline{X}}_{p (S = 1)}) - logit ({\hat{μ}}_{B (S = 1)} | {\overline{X}}_{p (S = 1)})] \\ = s_{A (S = 1)}^{2} + s_{B (S = 1)}^{2} - 2 cov [logit ({\hat{μ}}_{A (S = 1)} | {\overline{X}}_{p (S = 1)}) \\ - logit ({\hat{μ}}_{B (S = 1)} | {\overline{X}}_{p (S = 1)})] \end{array}

(4)

where $s_{A (S = 1)}^{2}$ and $s_{B (S = 1)}^{2}$ are the variance of $logit ({\hat{μ}}_{A (S = 1)} | {\overline{X}}_{p (S = 1)})$ , $logit ({\hat{μ}}_{B (S = 1)} | {\overline{X}}_{p (S = 1)})$ respectively. The covariance term in Equation (4) arises because the values of the covariates $X$ in study $s_{1}$ are used in both outcome models in study $s_{1}$ (Equation 3) to predict $α_{0}$ and $s_{2}$ (Equation 2) to predict $β_{0}$ . More precisely, for the pth covariate, the value of ${\overline{X}}_{p (S = 1)}$ are used. In the case of IPD availability for study $s_{1}$ a likelihood‐based method is proposed to account for this correlation (Section 3). When only AgD are available for study $s_{1}$ the covariance term cannot be evaluated and $s_{B (S = 1)}^{2}$ is estimated using the model variance in Equation (2) or bootstrap [6, 15].

2.4. Indirect Comparison With Measurement Error

Now consider the case for which only a proxy outcome $Y_{i}^{*}$ is available for study $s_{2}$ . This will bias the estimates of $β_{0}$ , $β_{p}$ from Equation (2):

logit (Y_{i, B (S = 2)}^{*}) = β_{0}^{*} + \sum_{p = 1}^{P} β_{p}^{*} (x_{i, p (S = 2)} - {\overline{x}}_{p (S = 1)})

where $β_{0}^{*}$ , $β_{p}^{*}$ are biased estimates for $β_{0}$ , $β_{p}$ . The magnitude and direction of bias will depend upon the diagnostic properties of $Y_{i}^{*}$ as a substitute for $Y_{i}$ [26]. Misclassification in a binary outcome can be expressed in terms of the sensitivity $Se = P (Y^{*} = 1 | Y = 1)$ and specificity $Sp = P (Y^{*} = 0 | Y = 0)$ . Misclassification may be differential if the specificity and sensitivity depend on covariates, or non‐differential if they do not [26]. The impact of non‐differential misclassification is that $β_{0}^{*}$ is closer to the null relative to $β_{0}$ [26]. However, if the misclassification is differential, the bias can either be away or towards the null value [26].

2.5. Outcome‐Corrected Model

The proposed outcome‐corrected model is designed to account for misclassification in binary outcomes when comparing treatments between a single‐arm trial and an ECA. The first step is to correct the biased estimates $β_{0}^{*}$ , $β_{p}^{*}$ , then employ them to perform the indirect treatment comparison. Let the outcome measurement error model be:

logit [P (Y^{*} = 1 | Y = y, Z_{l})] = γ_{0} + γ_{1} Y + \sum_{l = 2}^{L} γ_{l} Z_{l}

(5)

where $Z_{l}$ are covariates for a differential error measurement which are a subset of $X_{p}$ .

For patients with $Y = 1$ , sensitivity is

Se = P (Y^{*} = 1 | Y = 1, Z_{l} = z_{l}) = \frac{1}{1 + \exp (- (γ_{0} + γ_{1} + \sum_{l = 2}^{L} γ_{l} Z_{l}))}

and for patients with $Y = 0$ specificity is

Sp = P (Y^{*} = 0 | Y = 0, Z_{l} = z_{l}) = 1 - \frac{1}{1 + \exp (- (ζ_{0} + \sum_{l = 2}^{L} ζ_{l} Z_{l}))}

Note that if $γ_{L} = ζ_{L} = 0$ , it implies a non‐differential outcome measurement error with $Se = P (Y^{*} = 1 | Y = 1)$ , and $Sp = P (Y^{*} = 0 | Y = 0)$ . For simplicity, from now on it will be assumed a single continuous prognostic variable, $X$ and no differential measurement error. To correct the biased estimates $β_{0}^{*}$ , $β_{p}^{*}$ , information on parameters of the measurement error model is required. Often, there is a lack of a priori information about measurement errors, necessitating ancillary studies to estimate parameters of the outcome model (Equation 5). The STRATOS guidance identify different types of ancillary studies [26]. The following section considers an external validation study where both the proxy outcome $Y^{*}$ and the reference outcome $Y$ are available in a third study $s_{3}$ (Table 1).

According to Lyles et al. [31], using Carroll et al.'s [25] general likelihood, an external validation study can be used to correctly estimate $β_{0}, β_{p}$ from Equation (2). The external validation study is used to estimate the joint distribution $P (Y_{j}, Y_{j}^{*})$ . Assuming there are $j = 1, \dots, n_{s_{3}}$ . patients in the external validation study $s_{3}$ , each providing data as $(y_{j}^{*}, y_{j})$ , and assuming a non‐differential measurement error, the individual likelihood, $L_{j, s_{3}}$ for the external validation study $s_{3}$ is:

L_{j, s_{3}} = P (Y_{j}^{*} = y_{j}^{*} | Y_{j} = y_{j}) \times P (Y_{j} = y_{j})

Using $p_{y_{j}} = P (Y_{j} = y_{j})$ ,

\begin{array}{l} L_{j, s_{3}} & = {(Se \times p_{y_{j}})}^{y_{j}^{*} y_{j}} \times {((1 - Sp) (1 - p_{y_{j}}))}^{y_{j}^{*} (1 - y_{j})} \\ \times {((1 - Se) p_{y_{j}})}^{(1 - y_{j}^{*}) y_{j}} \times {(Sp (1 - p_{y_{j}}))}^{(1 - y_{j}^{*}) (1 - y_{j})} \end{array}

The likelihood for the external validation study $s_{3}$ is: $L_{s_{3}} = \prod_{j}^{n_{s_{3}}} L_{j, s_{3}}$ . Note that when using an internal validation study $s_{3}$ , the individual likelihood, $L_{j, s_{3}}$ , remained in the same form as previously described [31].

Then, study $s_{2}$ is used to estimate the conditional probability $P (Y_{i} = y_{i} | X_{i} = x_{i})$ . Assuming there are $i = 1, \dots, n_{s_{2}}$ patients in study $s_{2}$ , each providing data as $(y_{i}^{*}, x_{i})$ , and assuming a non‐differential outcome measurement error (i.e., $γ_{k} = 0$ ), the individual likelihood, $L_{i, s_{2}}$ for study $s_{2}$ is:

\begin{array}{l} L_{i, s_{2}} & = P (Y_{i}^{*} = y_{i}^{*} | X_{i} = x_{i}) \\ = \sum_{y_{i} = 0}^{1} P (Y_{i}^{*} = y_{i}^{*} | Y_{i} = y_{i}, X_{i} = x_{i}) P (Y_{i} = y_{i}| X_{i} = x_{i}) \end{array}

When $Y_{i}^{*} = 1$ , this term corresponds to:

\begin{array}{l} P (Y_{i}^{*} = 1 | Y_{i} = 0, X_{i} = x_{i}) \times P (Y_{i} = 0 | X_{i} = x_{i}) \\ + P (Y_{i}^{*} = 1 | Y_{i} = 1, X_{i} = x_{i}) \times P (Y_{i} = 1 | X_{i} = x_{i}) \end{array}

= (1 - Sp) \times P (Y_{i} = 0 | X_{i} = x_{i}) + Se \times P (Y_{i} = 1 | X_{i} = x_{i})

When $Y_{i}^{*} = 0$ , this term corresponds to:

\begin{array}{l} P (Y_{i}^{*} = 0 | Y_{i} = 0, X_{i} = x_{i}) \times P (Y_{i} = 0 | X_{i} = x_{i}) \\ + P (Y_{i}^{*} = 0 | Y_{i} = 1, X_{i} = x_{i}) \times P (Y_{i} = 1 | X_{i} = x_{i}) \end{array}

= Sp \times P (Y_{i} = 0 | X_{i} = x_{i}) + (1 - Se) \times P (Y_{i} = 1 | X_{i} = x_{i})

So, the individual likelihood of study $s_{2}$ is:

\begin{array}{l} L_{i, s_{2}} & = {[(1 - Sp) \times P (Y_{i} = 0 | X_{i} = x_{i}) + Se \times P (Y_{i} = 1 | X_{i} = x_{i})]}^{y_{i}^{*}} \\ \times {[Sp \times P (Y_{i} = 0 | X_{i} = x_{i}) + (1 - Se) \times P (Y_{i} = 1 | X_{i} = x_{i})]}^{(1 - y_{i}^{*})} \end{array}

where

P (Y_{i} = 1 | X_{i} = x_{i}) = \frac{1}{1 + \exp (- (β_{0} + β_{1} (x_{i (S = 2)} - {\overline{x}}_{(S = 1)})))}

(6)

The likelihood for study $s_{2}$ is: $L_{s_{2}} = \prod_{i}^{n_{s_{2}}} L_{i, s_{2}}$ . Finally the total likelihood is:

L = L_{s_{2}} \times L_{s_{3}}

(7)

When only AgD for study $s_{1}$ are available, the likelihood is maximized to estimate the corrected ${{\hat{β}}_{0}$ , ${\hat{β}}_{p}}$ , which are used to predict $logit ({\hat{μ}}_{B (S = 1)} | {\overline{X}}_{(S = 1)})$ (Equation 2).

When IPD for study $s_{1}$ are available, $logit (Y_{m, A (S = 1)})$ is modeled with the binomial likelihood of the outcome $Y$ in $s_{1}$ including $m = 1, \dots, M$ patients: $L_{m, s 1}$ , using as probability parameters the outcome model in Equation (3). As indicated above, correlation between $logit ({\hat{μ}}_{A (S = 1)})$ and $logit ({\hat{μ}}_{B (S = 1)})$ stems for the fact that observed values of $X_{(S = 1)}$ , the prognostic variable in study $s_{1}$ , are used to predict $logit (μ_{i, B (S = 1)})$ . To account for the correlation, the likelihood, $L_{x}$ of the random variable $X_{S = 1}$ , is added. Then, the total likelihood to maximized is

L = L_{s_{2}} \times L_{s_{3}} \times L_{s 1} \times L_{x}

(8)

To estimate $s_{AB (S = 1)}^{2}$ , the variance of the indirect treatment effect $d_{AB (S = 1)}$ (Equation 4), the Fischer information is used. Since the covariate $X$ in study $s_{1}$ and $s_{2}$ (Equations 3 and (6)) are each centered with $μ_{x (S = 1)}$ (i.e., the true mean of the random variable $X_{(S = 1)}$ ), ${\hat{α}}_{0}$ from Equation (3) and ${\hat{β}}_{0}$ from Equation (6) are interpreted as the conditional log odds of treatment $A$ and $B$ , respectively. The indirect treatment effect from Equation (1) is estimated using the difference between ${\hat{α}}_{0}$ and ${\hat{β}}_{0}$ , consequently, the variance and covariance are estimated using the Fischer information. Specifically, $s_{A (S = 1)}^{2}$ the variance of $μ_{A (S = 1)}$ , is the diagonal term for ${\hat{α}}_{0}$ (Equation 3); $s_{B (S = 1)}^{2}$ , the variance of $μ_{B (S = 1)}$ , is the diagonal term for ${\hat{β}}_{0}$ (Equation 6), and the covariance term (Equation 4) is the covariance between ${\hat{α}}_{0}$ and ${\hat{β}}_{0}$ . Confidence intervals can then be derived assuming a normal distribution of the estimate.

All the PAIC methods presented herein build upon the assumption of conditional exchangeability of treatment effects [6], meaning that there were no unknown or unmeasured prognostic factors or effect modifiers missing from the models. The outcome‐corrected model additionally assumes a correctly specified outcome measurement error model (Equation 5). Specifically, the outcome measurement error model (Equation 5) must account for all variables that contribute to a non‐differential measurement error. When estimating the outcome measurement error model using an external validation study, it is essential to assume transportability [25, 26], which refers to the applicability of sensitivity and specificity parameters across both the external validation study $s_{3}$ and the ECA (study $s_{2}$ ).

3. Simulations Study Plan

We adhere to the ADEMP reporting framework [34] by describing the Aim (Section 3.1), the Data‐generating mechanism (Section 3.2), the Estimands (Section 3.3), the Methods under investigation (Section 3.4), and the Performance measures (Section 3.5).

3.1. Aims

This simulation study aimed to quantify the bias resulting from ignoring the misclassification of a binary outcome and to evaluate the performance of three different methods described in Section 2, in the context of unanchored indirect comparison involving a single‐arm trial compared to an ECA.

3.2. Data‐Generating Mechanisms

As a reminder, there are three studies $s$ (Table 1), the targeted population study $s_{1}$ that included IPD or AgD, the external control group study that included IPD $s_{2}$ , and the external validation study $s_{3}$ that is used to estimate the outcome measurement error model. Two treatments $k = A, B$ are considered, the patients from study $s_{1}$ who received experimental treatment $A$ , and the patients from study $s_{2}$ who received control treatment $B$ .

A binary outcome for individual $i$ is generated under a logistic regression model:

y_{i, s, k} \sim Bern (θ_{i, s, k})

logit (θ_{i, s, k}) = β_{0, k} + β_{1} x_{i, s}

With $x_{i, s}$ a prognostic variable that follows a normal distribution with mean $μ_{x (s)}$ and variance $σ_{x}^{2}$ : $x_{i, s} \sim N (μ_{x (s)}, σ_{x}^{2})$ . $β_{1}$ is the coefficient for the prognostic variable $x_{i, s}$ . As $β_{1}$ is not stratified by treatment $k$ , the variable $x_{i, s}$ is thus only a prognostic variable and not an effect modifier. And $β_{0, k}$ is the log odds of the outcome for treatment $k$ for $x_{i, s} = 0$ . The proxy binary outcome $Y^{*}$ is generated as a non‐differential error of $Y$ as follows:

y_{i, s, k}^{*} \sim Bern (θ_{i, s, k}^{*})

logit (θ_{i, s, k}^{*}) = γ_{0} + γ_{1} y_{i, s, k}

where $γ_{0} = logit (1 - Sp)$ , and $γ_{1} = logit (Se) - γ_{0}$ . To set $β_{1}$ , the coefficient for the prognostic variable, we express it as a function of the prevalence of the outcome in study $s_{2}$ , denoted $p_{y_{(s 2)}}$ . Specifically, $β_{1}$ is calculated using the formula: $β_{1} = \frac{logit (p_{y_{(s 2)}}) - β_{0, B}}{μ_{x (s = 2)}}$ . Furthermore, the log odds under treatment $A$ in study $s_{1}$ , $β_{0, A}$ , is determined by the ratio of the standardized treatment effect size $D_{1}$ to the standardized “total” differences between trials $D_{2}$ (Supporting Information material). The standardized treatment effect size $D_{1}$ is informally the standardized distance between study $s_{1}$ and $s_{2}$ due to the treatment effect $A$ versus $B$ . And the standardized “total” difference $D 2$ is the standardized distance between study $s_{1}$ and $s_{2}$ due to the treatment effect and the difference of distribution in prognostic variables. As so, the ratio $r = \frac{D_{1}}{D_{2}}$ represents the proportion of difference between study $s_{1}$ and $s_{2}$ that is due to the treatment effect and $1 - r$ the percentage of difference due to difference in patients' prognostic variables. The $r$ value was set to $0.6$ , indicating that the standardized treatment effect constitutes 0.6 times the standardized “total” differences between trials. Thus 40% of the “total” standardized differences in outcome between trials are attributable to the “impact” of the prognostic variable. This impact is determined by the magnitude of $β_{1}$ and the degree of overlap in the distribution of $X$ between the studies.

The values of the fixed parameters used in the simulations are as follow: $μ_{x (s = 1)} = 0.8$ , $μ_{x (s = 2, 3)} = 1, σ_{x} = 0.1$ , $β_{0, B} = - 0.1$ , and prevalence of outcome in the study $s_{2}$ , $p_{y (s 2)} = 0.2$ , leading to a treatment effect odds ratio (OR) of 1.47. Five thousand simulations per scenario were performed.

Different scenarios were defined according to the different combinations of the following parameters, with references in bold type and underlined:

Specificity $Sp$ : 0.7 , 0.8, 0.9;
Sensitivity $Se$ : 0.7, 0.8, 0.9 ;
Overall sample size $n_{s}$ in each trial $s$ , $n_{s 1}$ : 500 , $n_{s 2}$ and $n_{s 3}$ : 200, 500 , 1500

We specified two additional scenarios with sensitivity and specificity fixed at 0.7, with $n_{s 2} = n_{s 3} = 500$ , and $n_{s 2} = n_{s 3} = 1000$ .

3.3. Estimand

The estimand of interest is the population‐average treatment effect of $A$ versus $B$ in the targeted population of the single‐arm, $s_{1}$ : $d_{AB (S = 1)}$ ; which would represent a treatment effect in a randomized trial.

3.4. Methods Under Investigation

Three methods were evaluated:

–
The reference method (see details in Section 2.3), which uses the true outcome $Y$ in both study $s_{1}$ and study $s_{2}$
–
The uncorrected method (see details in Section 2.4), which uses the true outcome $Y$ in study $s_{1}$ and the proxy outcome $Y^{*}$ in study $s_{2},$ but ignores the misclassification in the outcome.
–
The outcome‐corrected model (see details in Section 2.5), which method uses the true outcome $Y$ in study $s_{1}$ and the proxy outcome $Y^{*}$ in study $s_{2}$ , and correct the misclassification in the outcome.

3.5. Performance Measures

The performance measures included both absolute and relative bias, along with 95% confidence interval (95% CI) coverage probability of the confidence intervals, empirical standard error (SE), root mean square error (RMSE) and the non‐convergence's proportion of the optim() simulation. As there is no explicit solution for maximizing likelihood in Equations (7) and (8), numerical maximization tools can be use. Specifically, the optim() function with “BFGS” (a quasi‐Newton method) for the method argument was used in R [35]. For convergence, the parameter spaces for standard deviation and probability were restricted using log and logit transformations. Analyses were performed using R version 4.3.3.

The simulation results are presented in Section 4 for scenarios where IPD are available for study $s_{1}$ . See Tables S2 and S3 for the results when only AgD for study $s_{1}$ are available.

4. Simulation Results

The results presented here are based on simulations using IPD for study $s_{1}$ , employing three different methods: the reference method, outcome‐corrected model, and the uncorrected method. Results using AgD for study $s_{1}$ are outlined in the Supporting Information materials.

4.1. Variation According to Specificity

In this scenario, specificity varied between 0.7 and 0.9 while other parameters were held at their reference values. Table 2 presents the performance measures for each method. Figure S1a illustrates the distribution of absolute bias of the uncorrected method and the outcome‐corrected model.

TABLE 2.

Performance measure according to specificity and sensitivity in an IPD‐IPD setting.

		Se = 0.9			Sp = 0.7
Performance measure	Methods	Sp = 0.7	Sp = 0.8	Sp = 0.9	Se = 0.7	Se = 0.8	Se = 0.9
Absolute bias	Reference	0.00	0.00	0.00	0.00	0.00	0.00
	Uncorrected	−0.92	−0.60	−0.26	−0.72	−0.82	−0.92
	Outcome‐corrected	0.02	0.00	0.00	−0.02	0.01	0.02
Relative bias (%)	Reference	0.53	0.79	0.96	0.27	0.65	0.53
	Uncorrected	237.27	156.13	67.66	187.72	212.26	237.27
	Outcome‐corrected	4.12	1.04	0.51	5.85	2.32	4.12
Empirical SE	Reference	0.26	0.26	0.26	0.26	0.26	0.26
	Uncorrected	0.22	0.23	0.25	0.22	0.22	0.22
	Outcome‐corrected	0.55	0.45	0.37	0.82	0.66	0.55
95% CI coverage (%)	Reference	95.50	95.60	95.50	95.50	95.60	95.50
	Uncorrected	1.90	25.50	79.60	10.30	4.50	1.90
	Outcome‐corrected	96.10	95.90	95.40	97.20	96.50	96.10
RMSE	Reference	0.26	0.26	0.26	0.26	0.26	0.26
	Uncorrected	0.94	0.64	0.36	0.76	0.85	0.94
	Outcome‐corrected	0.55	0.45	0.37	0.82	0.66	0.55
Non‐convergence (n)	Reference	0	0	0	0	0	0
	Uncorrected	0	0	0	0	0	0
	Outcome‐corrected	9	1	0	143	42	9

Open in a new tab

Note: The reference method uses the true outcome in both study $s_{1}$ and $s_{2}$ ; the uncorrected method ignores outcome misclassification in study $s_{2}$ ; the outcome‐corrected model accounts for outcome misclassification in study $s_{2}$ .

Abbreviations: CI: confidence interval; IPD: individual patient‐data; RMSE: root mean square error; SE: square error; Se: sensitivity; Sp: specificity.

Simulations had a relative bias for the uncorrected method of 67% when both specificity and sensitivity were at 0.9, increasing to 156% and 237% for specificities of 0.8 and 0.7, respectively. The 95% CI coverage probability of the uncorrected method was 79.6% when both specificity and sensitivity were at 0.9, decreasing to 26.3% for a specificity of 0.8.

The absolute bias of the outcome‐corrected model remained low, close to zero as the specificity approached 0.9, with a relative bias consistently below 5%. The highest absolute bias of the outcome‐corrected model was 0.02 with a specificity at 0.7. Reducing specificity did increase the variance of estimations, the empirical SE increased by 1.5‐fold between specificities of 0.9 and 0.7. The 95% CI coverage probability for the outcome‐corrected model remained around 95.5% across specificity values. The outcome‐corrected model failed to converge in nine instances (1.8%) at a specificity of 0.7.

For specificities of 0.7 and 0.8, the RMSE of the outcome‐corrected model was approximately twice as high as that of the reference method but approximately half that of the uncorrected method. When both sensitivity and specificity were at 0.9, the outcome‐corrected model demonstrated lower bias, better coverage probability, and a similar RMSE to the uncorrected method (Table 2).

4.2. Variation According to Sensitivity

In this scenario, sensitivity varied between 0.7 and 0.9 while other parameters were held at reference values. Table 2 presents the performance measures for each method. Figure S1b illustrates the distribution of absolute bias of the uncorrected method and the outcome‐corrected model.

The uncorrected method exhibited relative bias increased as the sensitivity approached 0.9; 187% at a sensitivity of 0.7, and 237% at a sensitivity of 0.9 (Table 2). For a specificity of 0.7, 30% of the cases were false positives, which is particularly significant given the low prevalence of the outcome. Moreover, increasing sensitivity without a corresponding improvement in specificity led to an apparent rise in the frequency of the outcome. This rise can easily be misinterpreted as a treatment effect, rather than being attributed to the improved identification of true cases. The 95% CI coverage probability of the uncorrected method was 10.3% when both specificity and sensitivity were 0.7.

For the outcome‐corrected model, the absolute bias remained roughly the same (below 6%) as sensitivity increased (Figure S1b and Table 2). The 95% CI coverage probabilities remained above 95%, reaching 97.2% at a sensitivity of 0.7, and 96% for sensitivities of 0.8 and 0.9. The outcome‐corrected model failed to converge in 143 instances (2.9%) at sensitivity of 0.7.

For a sensitivity of 0.8, the RMSE of the outcome‐corrected model was three times higher than that of the reference method but around 1.5‐fold lower than that of the uncorrected method. When both specificity and sensitivity were at 0.7, the outcome‐corrected model had high variance, with an empirical SE four times higher than the other methods and a RMSE higher than the uncorrected method (Table 2).

4.3. Variation According to Sample Size in Study $s_{3}$ and $s_{2}$

In this scenario, the sample sizes in the study $s_{3}$ and $s_{2}$ varied between 200 and 1500, while other parameters were held at their reference values. Table 3 presents the performance measures for each method. Figure S2 illustrates the distribution of absolute bias of the uncorrected method and the outcome‐corrected model. Figure S3 presents the distribution of absolute bias of sensitivity and specificity estimated by the outcome‐corrected model for different sample size in study $s_{3}$ .

TABLE 3.

Performance measure according to sample size $n_{3}$ and $n_{2}$ in an IPD‐IPD setting.

		n ₂ = 500			n ₃ = 500
Performance measure	Method	n ₃ = 200	n ₃ = 500	n ₃ = 1500	n ₂ = 200	n ₂ = 500	n ₂ = 1500
Absolute bias	Reference	0.00	0.00	0.00	0.00	0.00	0.00
	Uncorrected	−0.92	−0.92	−0.91	−0.93	−0.92	−0.91
	Outcome‐corrected	0.02	0.02	0.02	−0.01	0.02	0.01
Relative bias (%)	Reference	0.53	0.53	0.41	0.40	0.53	0.08
	Uncorrected	237.71	237.27	236.68	241.26	237.27	236.51
	Outcome‐corrected	3.96	4.12	5.56	2.30	4.12	3.39
Empirical SE	Reference	0.26	0.26	0.26	0.41	0.26	0.17
	Uncorrected	0.22	0.22	0.22	0.33	0.22	0.15
	Outcome‐corrected	0.61	0.55	0.54	0.86	0.55	0.35
95% CI coverage (%)	Reference	95.60	95.50	95.10	94.90	95.50	95.20
	Uncorrected	1.70	1.90	2.00	21.30	1.90	0.00
	Outcome‐corrected	96.20	96.10	95.90	96.20	96.10	95.80
RMSE	Reference	0.26	0.26	0.26	0.41	0.26	0.17
	Uncorrected	0.94	0.94	0.94	0.99	0.94	0.92
	Outcome‐corrected	0.61	0.55	0.54	0.86	0.55	0.35
Non‐convergence (n)	Reference	0	0	0	0	0	0
	Uncorrected	0	0	0	0	0	0
	Outcome‐corrected	72	9	8	170	9	0

Open in a new tab

Abbreviations: CI: confidence interval; IPD: individual patient‐data; RMSE: root mean square error; SE: square error.

Since the uncorrected method does not use the validation study $s_{3}$ its performance measures did not change with $n_{s 3}$ . The relative bias of the uncorrected method remained around 237% when $n_{s 3}$ varied. Because the uncorrected method ignores misclassification, increasing $n_{s 2}$ did not affect the bias either (Table 3, and Figure S2a). As the bias remained the same, the 95% CI coverage probability decreased as $n_{s 2}$ increased, dropping to 21% with 200 patients in study $s_{2}$ , to 0% with 1500 patients in study $s_{2}$ .

For the outcome‐corrected model, increasing $n_{s 3}$ did not reduce the bias with an absolute bias remaining around 0.01 and the 95% CI coverage around 96% (Table 3). Larger sample size $n_{s 3}$ enhanced the precision in sensitivity and specificity estimates (Figure S3). Increasing $n_{s 2}$ did not decrease the absolute bias (Table 3 and Figure S2); however, it increased precision, reducing empirical SE by 2.5‐fold, and increasing RMSE by a factor of 2.5, between 200 and 1500 patients. RMSE improvement was more important with larger $n_{s 2}$ than $n_{s 3}$ , dropping from 0.86 to 0.35 for $n_{s 2}$ increasing from 200 to 1500 patients, and from 0.61 to 0.54 for $n_{s 3}$ increasing from 200 to 1500 patients. A low sample size in $s_{2}$ had a greater impact on non‐converged iterations, with 170 (3.4%) non‐converged iterations for $n_{s 2} = 200$ , and 72 (1.4%) non‐converged iterations for $n_{s 3} = 200$ . When both sensitivity and specificity were at 0.7, the same pattern was observed (Table S1).

4.4. Aggregated Data Results

Results using AgD for study $s_{1}$ are provided in the Supporting Information materials and are generally consistent with those obtained using IPD for study $s_{1}$ .

5. Applied Example: PRODIGE‐11 and SHARP Trials

The estimand was the effect of adding the sorafenib to the standard of care in adults with advanced‐stage hepatocellular carcinoma. Two RCTs, the SHARP‐trial [32] as the AgD‐study $s_{1}$ and PRODIGE‐11 trial [33] as the IPD‐study $s_{2}$ , were used. Although SHARP and PRODIGE‐11 are comparative trials, only a single arm from each study was extracted for the analysis, focusing on the placebo arm from SHARP (i.e., standard of care) and the sorafenib‐alone arm from PRODIGE‐11. This estimand was chosen because the treatment effect of sorafenib versus placebo from the SHARP trial served as a reference. The validation study (study $s_{3}$ ) was internal, using a subset of patients from the PRODIGE‐11 trial who had both the reference outcome $Y$ and the proxy outcome $Y^{*}$ assessed.

The design characteristics of each trial are outlined in Table 4. The SHARP trial [32], a double‐blind, placebo‐controlled study, included 602 patients with advanced hepatocellular carcinoma, randomized to receive either sorafenib (400 mg twice daily) or a placebo. Radiological progression, defined as the time from randomization to disease progression, was based on independent radiological review according to RECIST criteria. Data on radiological progression at 4 months for the placebo group (i.e., standard of care) were extracted from Kaplan–Meier curves. The PRODIGE‐11 trial [33], a randomized, unblinded controlled trial included 323 patients with advanced hepatocellular carcinoma, randomized to receive either sorafenib (400 mg twice daily) or sorafenib (400 mg twice daily) plus pravastatin (40 mg daily). Radiological progression was assessed according to RECIST criteria every 12 weeks, and clinical progression every 4 weeks. Baseline characteristics of the sorafenib‐arm in PRODIGE‐11 and placebo‐arm in SHARP trial are presented in Table 5. From a reduced set of prognostic covariates, we included those with a standardized mean difference above 0.1, as an empirical threshold [36], between the two groups: age, etiology (using alcohol as reference), and extrahepatic metastases, without any interaction terms.

TABLE 4.

Design characteristics PRODIGE‐11 and SHARP trial designs.

	SHARP	PRODIGE‐11
Data	Aggregated data	Individual patient data
Design	Randomized, double blind	Randomized, unblinded
Inclusion–exclusion criteria	Adults, with advanced‐stage hepatocellular carcinoma, as confirmed by pathological analysis. Not eligible for or had disease progression after surgical or locoregional therapies. Child‐Pugh liver function class A Life expectancy of 12 weeks or more ASAT and ALAT < 5 N
Inclusion–exclusion criteria	ECOG < 3	WHO‐PS < 3 CLIP score 0–4
Localisation	USA, Europe, Asia	France
Inclusion interval	2005–2006	2010–2013
Treatment	Sorafenib 400 mg twice daily Placebo	Sorafenib 400 mg twice daily Sorafenib 400 mg twice daily plus pravastatin 40 mg daily
Outcome	$Y :$ Radiological progression according to RECIST assessed every 6 weeks	$Y^{*} :$ Clinical progression assessed every 4 weeks

Open in a new tab

Abbreviations: ASAT: aspartate amino‐transferase; ALAT: alanine amino‐transferase; CLIP: Cancer of the Liver Italian Program; ECOG: Eastern Cooperative Oncology Group; WHO‐PS: World Health Organization Performance Status.

TABLE 5.

Baseline characteristics for SHARP and PRODIGE‐11 studies.

	SHARP trial Placebo‐arm (n = 303)	PRODIGE‐11 trial Sorafenib‐arm (n = 161)	SMD
Age (year)—mean (SD)	66.3 (0.2)	68 (9)	0.14
Sex—n (%)			0.03
Male	264 (87)	142 (88)
Female	39 (13)	19 (12)
Etiology—n (%)			0.62
Hepatitis (B or C)	137 (45)	31 (20)
Alcohol	80 (26)	81 (50)
Other or unknown	86 (30)	49 (30)
ECOG performance status—n (%)			0.07
0	164 (54)	93 (58)
> 1	139 (46)	68 (42)
Macroscopic vascular invasion—n (%)	123 (41)	60 (39)	0.07
Extrahepatic metastases—n (%)	150 (50)	49 (30)	0.41
Child–Pugh class—n (%)			0.06
A	297 (98)	155 (97)
B	6 (2)	4 (2.5)

Open in a new tab

Abbreviations: ECOG: Eastern Cooperative Oncology Group; n: number; SD: standard deviation; SMD: standardized mean difference.

The treatment effect between sorafenib and placebo in the SHARP trial [32] on radiological progression at 4 months was OR = 0.52, used as a reference value. Using the radiological progression (true outcome $Y$ ) in the PRODIGE‐11 trial, the indirect treatment effect of sorafenib (from PRODIGE‐11) and placebo (from SHARP) was OR = 0.6, 95% CI 0.34–0.99 (5000 bootstrap iterations). Using the proxy outcome for the PRODIGE‐11 trial (i.e., clinical progression) without accounting for misclassification (i.e., uncorrected method), the indirect treatment effect was OR = 0.36, 95% CI 0.18–0.59 (5000 bootstrap iterations), and using the outcome‐corrected model OR = 0.55, 95% CI 0.09–5.86 (5000 bootstrap iterations).

6. Discussion

The present study addressed the challenges of unanchored indirect treatment comparisons in the presence of misclassification in binary outcomes. The focus was on single‐arm trial compared to an ECA. The aims were to quantify the bias introduced by ignoring misclassification and to introduce a method to correct this bias (i.e., the outcome‐corrected model). The simulations served as a proof of concept in relatively straightforward scenarios. The reference method was the indirect conditional treatment effect estimated in a setting without measurement error. The outcome‐corrected model was compared to a method without the measurement error correction (i.e., the uncorrected method).

The simulations found that ignoring misclassification in binary outcomes leads to substantial bias in estimating indirect treatment effects with single‐arm trial compared to an ECA. Even with a specificity and sensitivity at 0.9, and an outcome model accounting for all prognostic variables, the uncorrected method had a relative bias of 67% and a 95% CI coverage of 79%. Since the uncorrected method fails to reduce bias, increasing the ECA (study s ₂) sample size resulted in a dramatic decrease in the 95% CI coverage. The outcome‐corrected model had lower bias and better coverage probabilities than the uncorrected method, even when both specificity and sensitivity were at 0.9. When sensitivity and specificity decreased, estimations remained minimally biased, with a relative bias below 6%. However, this reduction in sensitivity or specificity led to an increase in variance. Increasing the sample size of the ECA (study $s_{2}$ ) had a greater impact on improving the RMSE than increasing the sample size of the validation study ( $s_{3}$ ). The sample size of the validation study $s_{3}$ is used for estimating the joint probability of reference and proxy outcomes (Equation 5). With a non‐differential error measurement, the model is straightforward and thus even small sample size $n_{s 3}$ can provide precise estimates of sensitivity and specificity, quickly saturating the RMSE. In contrast the sample size of the ECA (study $s_{2}$ ) helps to estimate the conditional probability $P (Y_{i} = y_{i} | X_{i} = x_{i})$ , which is needed to predict the effect of treatment $B$ in study $s_{1}$ . Since the ECA (study $s_{2}$ ) could consist of routinely collected data from EMRs, it is more feasible to achieve a large sample size compared to study $s_{3}$ , where both outcomes are measured. Alternatively, study $s_{3}$ could be an internal validation sample where a random sample of patients from $s_{2}$ will have both measurements.

All the PAIC methods presented herein build upon the assumption of conditional exchangeability of treatment effects [6]. The outcome‐corrected model additionally assumes a correctly specified outcome measurement error model (Equation 5). The simulations used an outcome‐model correctly specified, and the impact of a misspecified outcome measurement error model was not evaluated (Equation 5). The sample size of the validation study $s_{3}$ should have a greater impact when there is non‐differential misclassification. Simulations did not consider measurement errors in prognostic covariates, which could have a substantial impact on the indirect treatment effect [26]. Simulations were performed in a setting where the transportability assumption [25, 26] holds. Transportability might not be considered feasible if the validation study $s_{3}$ and study $s_{2}$ involve non‐comparable populations; [37] for instance, if the ECA (study $s_{2}$ ) includes patients with advanced‐stage cancer from a general hospital, while the external validation study $s_{3}$ includes patients from a specialized oncology center known for earlier detection and treatment of cancer. The specialized center's patients might have different disease progression patterns, affecting the sensitivity and specificity of outcome measurements. As a result, applying the sensitivity and specificity estimates from the external validation study $s_{3}$ to the ECA (study $s_{2}$ ) may lead to a biased indirect treatment effect. Furthermore, if one assumes that the treatment impacts the measurement error (i.e., differential error measurement), it should be preferable that all patients in the validation study $s_{3}$ receive the same treatment as in the ECA (study $s_{2}$ ). For all these reasons, an internal validation study $s_{3}$ may be more suitable because it reduces concerns about transportability and providing flexibility to accommodate general patterns of differential misclassification [31]. Additionally, the individual likelihood, $L_{j, s_{3}}$ presented in Section 3 remained the same with an internal validation study $s_{3}$ [31]. Another limitation is that the method proposed here estimates a conditional indirect treatment effect, and thus could suffer of non‐collapsibility when using a non‐linear link function for the outcome model [38, 39]. When AgD are available in study $s_{1}$ , STC also estimate conditional indirect treatment effect with bootstrap estimates for confidence intervals [6] without accounting for the covariance term in Equation (4). When IPD are available for both studies $s_{1}$ and $s_{2}$ , estimating the conditional indirect treatment effect allowed a likelihood‐based method to be used to estimate the variance of the indirect treatment effect; when estimating a marginal indirect effect the variance of the indirect treatment effect are estimated with bootstrap or robust variance [12].

We illustrated the application of these methods using the AgD for the standard of care (i.e., placebo arm) in SHARP [32] and the IPD for the sorafenib‐alone arm in PRODIGE‐11 trials [33]. Leveraging the established efficacy of sorafenib compared to placebo in the SHARP population, we used this treatment effect estimate as reference. The reference effect of sorafenib compared to placebo was OR = 0.52. The point estimates of the indirect treatment effect, using the outcome‐corrected model and the true outcome from the PRODIGE‐11 trial (radiological progression) were close to the reference value (OR = 0.55 and OR = 0.6, respectively). Ignoring outcome misclassification resulted in an overestimation of the indirect treatment effect (OR = 0.36). However, with only 161 patients in the sorafenib arm of the PRODIGE‐11 trial, the 95% CI estimated by the outcome‐corrected model was wide. This is conservative, as it transfers the uncertainty in measurement to the uncertainty in the decision (provided that one refrains from concluding the absence of a difference) but may be inefficient for small sample sizes. These findings align with simulation results, where the empirical SE of the outcome‐corrected model was twice that of the reference for a sample size of 200 patients (Table S3).

In the applied example, the outcome‐corrected model assumes no correlation between prognostic factors. Additional simulations are needed to evaluate the potential reduction in estimation variance when incorporating the correlation between prognostic factors into the likelihood, by adding a multivariate probability distribution to model their joint distribution when all IPD are available. It is crucial to emphasize that all methods will be biased when prognostic factors and effect modifiers are absent [6]. Furthermore, the inclusion of unnecessary prognostic factors may amplify the variance of estimation [40]. These challenges are prevalent in health technology assessment and are compounded by insufficient sample size to adjust for all covariates of interest. Consequently, it may be beneficial to employ quantitative bias methods to investigate how robust treatment effect estimates are to unmeasured confounders [41].

While the proposed correction method may enhance the reliability of unanchored indirect treatment comparisons when outcomes differ from those recorded in single‐arm trials, its implementation demands strict conditions that may be hard to meet. Consequently, our simulation study not only illustrates the method's potential benefits but also serves as a cautionary note against the limitations and risks inherent in single‐arm studies using external control groups with proxy outcomes.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Figure S1: Treatment effect absolute bias with different specificity (upper) and sensitivity (lower) values for the uncorrected and the outcome corrected model (i.e., corrected).

SIM-44-0-s002.tiff^{(50MB, tiff)}

Figure S2: Treatment effect absolute bias with different sample size for study $s_{2}$ (upper) and study $s_{3}$ (lower) for the uncorrected and the outcome corrected model (i.e., corrected).

SIM-44-0-s001.tiff^{(50MB, tiff)}

Figure S3: Absolute bias for specificity (upper) and sensitivity (lower) estimated by the outcome‐corrected model for varying sample size of study $s_{3}$ .

SIM-44-0-s003.tiff^{(50MB, tiff)}

Data S1: Supporting information.

SIM-44-0-s004.docx^{(39.5KB, docx)}

Acknowledgments

The authors would like to thank the Fédération Francophone de Cancérologie Digestive (FFCD) for providing the data for the PRODIGE‐11 trial.

Nourredine M., Gavoille A., Lepage C., Kassai‐Koupai B., Cucherat M., and Subtil F., “Accounting for Misclassification of Binary Outcomes in External Control Arm Studies for Unanchored Indirect Comparisons: Simulations and Applied Example,” Statistics in Medicine 44, no. 20‐22 (2025): e70236, 10.1002/sim.70236.

Funding: The authors received no specific funding for this work.

Data Availability Statement

The data that support the findings of this study are openly available in OSF at https://osf.io/bqs5t/.

References

1. Food and Drug Administration, HHS , “International Conference on Harmonisation; Choice of Control Group and Related Issues in Clinical Trials; Availability,” Notice Fed Register 66, no. 93 (2001): 24390–24391. [PubMed] [Google Scholar]
2. Zhang A. D., Puthumana J., Downing N. S., Shah N. D., Krumholz H. M., and Ross J. S., “Assessment of Clinical Trials Supporting US Food and Drug Administration Approval of Novel Therapeutic Agents, 1995–2017,” JAMA Network Open 3, no. 4 (2020): e203284, 10.1001/jamanetworkopen.2020.3284. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Hatswell A. J., Baio G., Berlin J. A., Irs A., and Freemantle N., “Regulatory Approval of Pharmaceuticals Without a Randomised Controlled Study: Analysis of EMA and FDA Approvals 1999–2014,” BMJ Open 6, no. 6 (2016): e011666, 10.1136/bmjopen-2016-011666. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Tenhunen O., Lasch F., Schiel A., and Turpeinen M., “Single‐Arm Clinical Trials as Pivotal Evidence for Cancer Drug Approval: A Retrospective Cohort Study of Centralized European Marketing Authorizations Between 2010 and 2019,” Clinical Pharmacology and Therapeutics 108, no. 3 (2020): 653–660, 10.1002/cpt.1965. [DOI] [PubMed] [Google Scholar]
5. Davi R., Mahendraratnam N., Chatterjee A., Dawson C. J., and Sherman R., “Informing Single‐Arm Clinical Trials With External Controls,” Nature Reviews. Drug Discovery 19, no. 12 (2020): 821–822, 10.1038/d41573-020-00146-5. [DOI] [PubMed] [Google Scholar]
6. Phillippo D., Ades A., Dias S., Palmer S., Abrams K., and Welton N., NICE DSU Technical Support Document 18: Methods for Population‐Adjusted Indirect Comparisons in Submission to NICE (National Institute for Health and Care Excellence, 2016). [Google Scholar]
7. Serret‐Larmande A., Zenati B., Dechartres A., Lambert J., and Hajage D., “A Methodological Review of Population‐Adjusted Indirect Comparisons Reveals Inconsistent Reporting and Suggests Publication Bias,” Journal of Clinical Epidemiology 163 (2023): 1–10, 10.1016/j.jclinepi.2023.09.004. [DOI] [PubMed] [Google Scholar]
8. Sultana N. and Ren S., “Review of Methods Used to Estimate Treatment Effects Against Relevant Comparators Using Evidence From Single‐Arm Studies in NICE Single Technology Appraisals,” Value in Health 25, no. 12 (2022): S10, 10.1016/j.jval.2022.09.056. [DOI] [Google Scholar]
9. Chatton A., Le Borgne F., Leyrat C., et al., “G‐Computation, Propensity Score‐Based Methods, and Targeted Maximum Likelihood Estimator for Causal Inference With Different Covariates Sets: A Comparative Simulation Study,” Scientific Reports 10, no. 1 (2020): 9219, 10.1038/s41598-020-65917-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Schuler M. S. and Rose S., “Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies,” American Journal of Epidemiology 185, no. 1 (2017): 65–73, 10.1093/aje/kww165. [DOI] [PubMed] [Google Scholar]
11. Naimi A. I., Cole S. R., and Kennedy E. H., “An Introduction to G Methods,” International Journal of Epidemiology 46, no. 2 (2017): 756–792, 10.1093/ije/dyw323. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Faria R., Alava M. H., Manca A., and Wailoo A. J., “NICE DSU Technical Support Document 17: The Use of Observational Data to Inform Estimates of Treatment Effectiveness in Technology Appraisal: Methods for Comparative Individual Patient Data,” https://www.sheffield.ac.uk/sites/default/files/2022‐02/TSD17‐DSU‐Observational‐data‐FINAL.pdf.
13. Signorovitch J. E., Sikirica V., Erder M. H., et al., “Matching‐Adjusted Indirect Comparisons: A New Tool for Timely Comparative Effectiveness Research,” Value in Health 15, no. 6 (2012): 940–947, 10.1016/j.jval.2012.05.004. [DOI] [PubMed] [Google Scholar]
14. Caro J. J. and Ishak K. J., “No Head‐To‐Head Trial? Simulate the Missing Arms,” PharmacoEconomics 28, no. 10 (2010): 957–967, 10.2165/11537420-000000000-00000. [DOI] [PubMed] [Google Scholar]
15. Phillippo D. M., Dias S., Ades A. E., and Welton N. J., “Assessing the Performance of Population Adjustment Methods for Anchored Indirect Comparisons: A Simulation Study,” Statistics in Medicine 39, no. 30 (2020): 4885–4911, 10.1002/sim.8759. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Jiang Y. and Ni W., “Performance of Unanchored Matching‐Adjusted Indirect Comparison (MAIC) for the Evidence Synthesis of Single‐Arm Trials With Time‐To‐Event Outcomes,” BMC Medical Research Methodology 20, no. 1 (2020): 241, 10.1186/s12874-020-01124-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Remiro‐Azócar A., Heath A., and Baio G., “Methods for Population Adjustment With Limited Access to Individual Patient Data: A Review and Simulation Study,” Research Synthesis Methods 12, no. 6 (2021): 750–775, 10.1002/jrsm.1511. [DOI] [PubMed] [Google Scholar]
18. Weber D., Jensen K., and Kieser M., “Comparison of Methods for Estimating Therapy Effects by Indirect Comparisons: A Simulation Study,” Medical Decision Making 40, no. 5 (2020): 644–654, 10.1177/0272989X20929309. [DOI] [PubMed] [Google Scholar]
19. Hatswell A. J., Freemantle N., and Baio G., “The Effects of Model Misspecification in Unanchored Matching‐Adjusted Indirect Comparison: Results of a Simulation Study,” Value in Health 23, no. 6 (2020): 751–759, 10.1016/j.jval.2020.02.008. [DOI] [PubMed] [Google Scholar]
20. Ren S., Ren S., Welton N. J., and Strong M., “Advancing Unanchored Simulated Treatment Comparisons: A Novel Implementation and Simulation Study,” Research Synthesis Methods 15, no. 4 (2024): 657–670, 10.1002/jrsm.1718. [DOI] [PubMed] [Google Scholar]
21. Degtiar I. and Rose S., “A Review of Generalizability and Transportability,” Annual Review of Statistics and Its Application 10, no. 1 (2023): 501–524, 10.1146/annurev-statistics-042522-103837. [DOI] [Google Scholar]
22. Dreyer N. A., Hall M., and Christian J. B., “Modernizing Regulatory Evidence With Trials and Real‐World Studies,” Therapeutic Innovation & Regulatory Science 54, no. 5 (2020): 1112–1115, 10.1007/s43441-020-00131-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Monti S., Grosso V., Todoerti M., and Caporali R., “Randomized Controlled Trials and Real‐World Data: Differences and Similarities to Untangle Literature Data,” Rheumatology 57, no. Supplement_7 (2018): vii54–vii58, 10.1093/rheumatology/key109. [DOI] [PubMed] [Google Scholar]
24. Eisenhauer E. A., Therasse P., Bogaerts J., et al., “New Response Evaluation Criteria in Solid Tumours: Revised RECIST Guideline (Version 1.1),” European Journal of Cancer 45, no. 2 (2009): 228–247, 10.1016/j.ejca.2008.10.026. [DOI] [PubMed] [Google Scholar]
25. Carroll R. J., Ruppert D., Stefanski L. A., and Crainiceanu C. M., Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed. (Chapman and Hall/CRC, 2006), 10.1201/9781420010138. [DOI] [Google Scholar]
26. Keogh R. H., Shaw P. A., Gustafson P., et al., “STRATOS Guidance Document on Measurement Error and Misclassification of Variables in Observational Epidemiology: Part 1‐Basic Theory and Simple Methods of Adjustment,” Statistics in Medicine 39, no. 16 (2020): 2197–2231, 10.1002/sim.8532. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Nab L., Van Smeden M., Keogh R. H., and Groenwold R. H. H., “Mecor: An R Package for Measurement Error Correction in Linear Regression Models With a Continuous Outcome,” Computer Methods and Programs in Biomedicine 208 (2021): 106238, 10.1016/j.cmpb.2021.106238. [DOI] [PubMed] [Google Scholar]
28. Yi G. Y., Statistical Analysis With Measurement Error or Misclassification (Springer, 2017), 479, 10.1007/978-1-4939-6640-0. [DOI] [Google Scholar]
29. Gerlach R. and Stamey J., “Bayesian Model Selection for Logistic Regression With Misclassified Outcomes,” Statistical Modelling 7, no. 3 (2007): 255–273, 10.1177/1471082X0700700303. [DOI] [Google Scholar]
30. Daniel Paulino C., Soares P., and Neuhaus J., “Binomial Regression With Misclassification,” Biometrics 59, no. 3 (2003): 670–675, 10.1111/1541-0420.00077. [DOI] [PubMed] [Google Scholar]
31. Lyles R. H., Tang L., Superak H. M., et al., “Validation Data‐Based Adjustments for Outcome Misclassification in Logistic Regression: An Illustration,” Epidemiology 22, no. 4 (2011): 589–597, 10.1097/EDE.0b013e3182117c85. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Llovet J. M., Hilgard P., de Oliveira A. C., et al., “Sorafenib in Advanced Hepatocellular Carcinoma,” New England Journal of Medicine 359, no. 4 (2008): 378–390. [DOI] [PubMed] [Google Scholar]
33. Jouve J. L., Lecomte T., Bouché O., et al., “Pravastatin Combination With Sorafenib Does Not Improve Survival in Advanced Hepatocellular Carcinoma,” Journal of Hepatology 71, no. 3 (2019): 516–522, 10.1016/j.jhep.2019.04.021. [DOI] [PubMed] [Google Scholar]
34. Morris T. P., White I. R., and Crowther M. J., “Using Simulation Studies to Evaluate Statistical Methods,” Statistics in Medicine 38, no. 11 (2019): 2074–2102, 10.1002/sim.8086. [DOI] [PMC free article] [PubMed] [Google Scholar]
35. R Core Team, R , “R: A Language and Environment for Statistical Computing,” 2013.
36. Kibuchi E., Sturgis P., Durrant G. B., and Maslovskaya O., “The Efficacy of Propensity Score Matching for Separating Selection and Measurement Effects Across Different Survey Modes,” Journal of Survey Statistics and Methodology 12, no. 3 (2024): 764–789, 10.1093/jssam/smae017. [DOI] [Google Scholar]
37. Shaw P. A., Gustafson P., Carroll R. J., et al., “STRATOS Guidance Document on Measurement Error and Misclassification of Variables in Observational Epidemiology: Part 2—More Complex Methods of Adjustment and Advanced Topics,” Statistics in Medicine 39, no. 16 (2020): 2232–2263, 10.1002/sim.8531. [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Greenland S., Pearl J., and Robins J. M., “Confounding and Collapsibility in Causal Inference,” Statistical Science 14 (1999): 29–46, 10.1214/ss/1009211805. [DOI] [Google Scholar]
39. Colnet B., Josse J., Varoquaux G., and Scornet E., “Risk Ratio, Odds Ratio, Risk Difference… Which Causal Measure is Easier to Generalize?” 2023, http://arxiv.org/abs/2303.16008.
40. Colnet B., Josse J., Varoquaux G., and Scornet E., “Re‐Weighting the Randomized Controlled Trial for Generalization: Finite‐Sample Error and Variable Selection,” Journal of the Royal Statistical Society. Series A, Statistics in Society 188, no. 2 (2025): 345–372, 10.1093/jrsssa/qnae043. [DOI] [Google Scholar]
41. Popat S., Liu S. V., Scheuer N., et al., “Addressing Challenges With Real‐World Synthetic Control Arms to Demonstrate the Comparative Effectiveness of Pralsetinib in Non‐Small Cell Lung Cancer,” Nature Communications 13, no. 1 (2022): 3500, 10.1038/s41467-022-30908-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1: Treatment effect absolute bias with different specificity (upper) and sensitivity (lower) values for the uncorrected and the outcome corrected model (i.e., corrected).

SIM-44-0-s002.tiff^{(50MB, tiff)}

Figure S2: Treatment effect absolute bias with different sample size for study $s_{2}$ (upper) and study $s_{3}$ (lower) for the uncorrected and the outcome corrected model (i.e., corrected).

SIM-44-0-s001.tiff^{(50MB, tiff)}

Figure S3: Absolute bias for specificity (upper) and sensitivity (lower) estimated by the outcome‐corrected model for varying sample size of study $s_{3}$ .

SIM-44-0-s003.tiff^{(50MB, tiff)}

Data S1: Supporting information.

SIM-44-0-s004.docx^{(39.5KB, docx)}

Data Availability Statement

The data that support the findings of this study are openly available in OSF at https://osf.io/bqs5t/.

[sim70236-bib-0001] 1. Food and Drug Administration, HHS , “International Conference on Harmonisation; Choice of Control Group and Related Issues in Clinical Trials; Availability,” Notice Fed Register 66, no. 93 (2001): 24390–24391. [PubMed] [Google Scholar]

[sim70236-bib-0002] 2. Zhang A. D., Puthumana J., Downing N. S., Shah N. D., Krumholz H. M., and Ross J. S., “Assessment of Clinical Trials Supporting US Food and Drug Administration Approval of Novel Therapeutic Agents, 1995–2017,” JAMA Network Open 3, no. 4 (2020): e203284, 10.1001/jamanetworkopen.2020.3284. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70236-bib-0003] 3. Hatswell A. J., Baio G., Berlin J. A., Irs A., and Freemantle N., “Regulatory Approval of Pharmaceuticals Without a Randomised Controlled Study: Analysis of EMA and FDA Approvals 1999–2014,” BMJ Open 6, no. 6 (2016): e011666, 10.1136/bmjopen-2016-011666. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70236-bib-0004] 4. Tenhunen O., Lasch F., Schiel A., and Turpeinen M., “Single‐Arm Clinical Trials as Pivotal Evidence for Cancer Drug Approval: A Retrospective Cohort Study of Centralized European Marketing Authorizations Between 2010 and 2019,” Clinical Pharmacology and Therapeutics 108, no. 3 (2020): 653–660, 10.1002/cpt.1965. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0005] 5. Davi R., Mahendraratnam N., Chatterjee A., Dawson C. J., and Sherman R., “Informing Single‐Arm Clinical Trials With External Controls,” Nature Reviews. Drug Discovery 19, no. 12 (2020): 821–822, 10.1038/d41573-020-00146-5. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0006] 6. Phillippo D., Ades A., Dias S., Palmer S., Abrams K., and Welton N., NICE DSU Technical Support Document 18: Methods for Population‐Adjusted Indirect Comparisons in Submission to NICE (National Institute for Health and Care Excellence, 2016). [Google Scholar]

[sim70236-bib-0007] 7. Serret‐Larmande A., Zenati B., Dechartres A., Lambert J., and Hajage D., “A Methodological Review of Population‐Adjusted Indirect Comparisons Reveals Inconsistent Reporting and Suggests Publication Bias,” Journal of Clinical Epidemiology 163 (2023): 1–10, 10.1016/j.jclinepi.2023.09.004. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0008] 8. Sultana N. and Ren S., “Review of Methods Used to Estimate Treatment Effects Against Relevant Comparators Using Evidence From Single‐Arm Studies in NICE Single Technology Appraisals,” Value in Health 25, no. 12 (2022): S10, 10.1016/j.jval.2022.09.056. [DOI] [Google Scholar]

[sim70236-bib-0009] 9. Chatton A., Le Borgne F., Leyrat C., et al., “G‐Computation, Propensity Score‐Based Methods, and Targeted Maximum Likelihood Estimator for Causal Inference With Different Covariates Sets: A Comparative Simulation Study,” Scientific Reports 10, no. 1 (2020): 9219, 10.1038/s41598-020-65917-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70236-bib-0010] 10. Schuler M. S. and Rose S., “Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies,” American Journal of Epidemiology 185, no. 1 (2017): 65–73, 10.1093/aje/kww165. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0011] 11. Naimi A. I., Cole S. R., and Kennedy E. H., “An Introduction to G Methods,” International Journal of Epidemiology 46, no. 2 (2017): 756–792, 10.1093/ije/dyw323. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70236-bib-0012] 12. Faria R., Alava M. H., Manca A., and Wailoo A. J., “NICE DSU Technical Support Document 17: The Use of Observational Data to Inform Estimates of Treatment Effectiveness in Technology Appraisal: Methods for Comparative Individual Patient Data,” https://www.sheffield.ac.uk/sites/default/files/2022‐02/TSD17‐DSU‐Observational‐data‐FINAL.pdf.

[sim70236-bib-0013] 13. Signorovitch J. E., Sikirica V., Erder M. H., et al., “Matching‐Adjusted Indirect Comparisons: A New Tool for Timely Comparative Effectiveness Research,” Value in Health 15, no. 6 (2012): 940–947, 10.1016/j.jval.2012.05.004. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0014] 14. Caro J. J. and Ishak K. J., “No Head‐To‐Head Trial? Simulate the Missing Arms,” PharmacoEconomics 28, no. 10 (2010): 957–967, 10.2165/11537420-000000000-00000. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0015] 15. Phillippo D. M., Dias S., Ades A. E., and Welton N. J., “Assessing the Performance of Population Adjustment Methods for Anchored Indirect Comparisons: A Simulation Study,” Statistics in Medicine 39, no. 30 (2020): 4885–4911, 10.1002/sim.8759. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70236-bib-0016] 16. Jiang Y. and Ni W., “Performance of Unanchored Matching‐Adjusted Indirect Comparison (MAIC) for the Evidence Synthesis of Single‐Arm Trials With Time‐To‐Event Outcomes,” BMC Medical Research Methodology 20, no. 1 (2020): 241, 10.1186/s12874-020-01124-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70236-bib-0017] 17. Remiro‐Azócar A., Heath A., and Baio G., “Methods for Population Adjustment With Limited Access to Individual Patient Data: A Review and Simulation Study,” Research Synthesis Methods 12, no. 6 (2021): 750–775, 10.1002/jrsm.1511. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0018] 18. Weber D., Jensen K., and Kieser M., “Comparison of Methods for Estimating Therapy Effects by Indirect Comparisons: A Simulation Study,” Medical Decision Making 40, no. 5 (2020): 644–654, 10.1177/0272989X20929309. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0019] 19. Hatswell A. J., Freemantle N., and Baio G., “The Effects of Model Misspecification in Unanchored Matching‐Adjusted Indirect Comparison: Results of a Simulation Study,” Value in Health 23, no. 6 (2020): 751–759, 10.1016/j.jval.2020.02.008. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0020] 20. Ren S., Ren S., Welton N. J., and Strong M., “Advancing Unanchored Simulated Treatment Comparisons: A Novel Implementation and Simulation Study,” Research Synthesis Methods 15, no. 4 (2024): 657–670, 10.1002/jrsm.1718. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0021] 21. Degtiar I. and Rose S., “A Review of Generalizability and Transportability,” Annual Review of Statistics and Its Application 10, no. 1 (2023): 501–524, 10.1146/annurev-statistics-042522-103837. [DOI] [Google Scholar]

[sim70236-bib-0022] 22. Dreyer N. A., Hall M., and Christian J. B., “Modernizing Regulatory Evidence With Trials and Real‐World Studies,” Therapeutic Innovation & Regulatory Science 54, no. 5 (2020): 1112–1115, 10.1007/s43441-020-00131-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70236-bib-0023] 23. Monti S., Grosso V., Todoerti M., and Caporali R., “Randomized Controlled Trials and Real‐World Data: Differences and Similarities to Untangle Literature Data,” Rheumatology 57, no. Supplement_7 (2018): vii54–vii58, 10.1093/rheumatology/key109. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0024] 24. Eisenhauer E. A., Therasse P., Bogaerts J., et al., “New Response Evaluation Criteria in Solid Tumours: Revised RECIST Guideline (Version 1.1),” European Journal of Cancer 45, no. 2 (2009): 228–247, 10.1016/j.ejca.2008.10.026. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0025] 25. Carroll R. J., Ruppert D., Stefanski L. A., and Crainiceanu C. M., Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed. (Chapman and Hall/CRC, 2006), 10.1201/9781420010138. [DOI] [Google Scholar]

[sim70236-bib-0026] 26. Keogh R. H., Shaw P. A., Gustafson P., et al., “STRATOS Guidance Document on Measurement Error and Misclassification of Variables in Observational Epidemiology: Part 1‐Basic Theory and Simple Methods of Adjustment,” Statistics in Medicine 39, no. 16 (2020): 2197–2231, 10.1002/sim.8532. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70236-bib-0027] 27. Nab L., Van Smeden M., Keogh R. H., and Groenwold R. H. H., “Mecor: An R Package for Measurement Error Correction in Linear Regression Models With a Continuous Outcome,” Computer Methods and Programs in Biomedicine 208 (2021): 106238, 10.1016/j.cmpb.2021.106238. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0028] 28. Yi G. Y., Statistical Analysis With Measurement Error or Misclassification (Springer, 2017), 479, 10.1007/978-1-4939-6640-0. [DOI] [Google Scholar]

[sim70236-bib-0029] 29. Gerlach R. and Stamey J., “Bayesian Model Selection for Logistic Regression With Misclassified Outcomes,” Statistical Modelling 7, no. 3 (2007): 255–273, 10.1177/1471082X0700700303. [DOI] [Google Scholar]

[sim70236-bib-0030] 30. Daniel Paulino C., Soares P., and Neuhaus J., “Binomial Regression With Misclassification,” Biometrics 59, no. 3 (2003): 670–675, 10.1111/1541-0420.00077. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0031] 31. Lyles R. H., Tang L., Superak H. M., et al., “Validation Data‐Based Adjustments for Outcome Misclassification in Logistic Regression: An Illustration,” Epidemiology 22, no. 4 (2011): 589–597, 10.1097/EDE.0b013e3182117c85. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70236-bib-0032] 32. Llovet J. M., Hilgard P., de Oliveira A. C., et al., “Sorafenib in Advanced Hepatocellular Carcinoma,” New England Journal of Medicine 359, no. 4 (2008): 378–390. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0033] 33. Jouve J. L., Lecomte T., Bouché O., et al., “Pravastatin Combination With Sorafenib Does Not Improve Survival in Advanced Hepatocellular Carcinoma,” Journal of Hepatology 71, no. 3 (2019): 516–522, 10.1016/j.jhep.2019.04.021. [DOI] [PubMed] [Google Scholar]

[sim70236-bib-0034] 34. Morris T. P., White I. R., and Crowther M. J., “Using Simulation Studies to Evaluate Statistical Methods,” Statistics in Medicine 38, no. 11 (2019): 2074–2102, 10.1002/sim.8086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70236-bib-0035] 35. R Core Team, R , “R: A Language and Environment for Statistical Computing,” 2013.

[sim70236-bib-0036] 36. Kibuchi E., Sturgis P., Durrant G. B., and Maslovskaya O., “The Efficacy of Propensity Score Matching for Separating Selection and Measurement Effects Across Different Survey Modes,” Journal of Survey Statistics and Methodology 12, no. 3 (2024): 764–789, 10.1093/jssam/smae017. [DOI] [Google Scholar]

[sim70236-bib-0037] 37. Shaw P. A., Gustafson P., Carroll R. J., et al., “STRATOS Guidance Document on Measurement Error and Misclassification of Variables in Observational Epidemiology: Part 2—More Complex Methods of Adjustment and Advanced Topics,” Statistics in Medicine 39, no. 16 (2020): 2232–2263, 10.1002/sim.8531. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70236-bib-0038] 38. Greenland S., Pearl J., and Robins J. M., “Confounding and Collapsibility in Causal Inference,” Statistical Science 14 (1999): 29–46, 10.1214/ss/1009211805. [DOI] [Google Scholar]

[sim70236-bib-0039] 39. Colnet B., Josse J., Varoquaux G., and Scornet E., “Risk Ratio, Odds Ratio, Risk Difference… Which Causal Measure is Easier to Generalize?” 2023, http://arxiv.org/abs/2303.16008.

[sim70236-bib-0040] 40. Colnet B., Josse J., Varoquaux G., and Scornet E., “Re‐Weighting the Randomized Controlled Trial for Generalization: Finite‐Sample Error and Variable Selection,” Journal of the Royal Statistical Society. Series A, Statistics in Society 188, no. 2 (2025): 345–372, 10.1093/jrsssa/qnae043. [DOI] [Google Scholar]

[sim70236-bib-0041] 41. Popat S., Liu S. V., Scheuer N., et al., “Addressing Challenges With Real‐World Synthetic Control Arms to Demonstrate the Comparative Effectiveness of Pralsetinib in Non‐Small Cell Lung Cancer,” Nature Communications 13, no. 1 (2022): 3500, 10.1038/s41467-022-30908-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Accounting for Misclassification of Binary Outcomes in External Control Arm Studies for Unanchored Indirect Comparisons: Simulations and Applied Example

Mikail Nourredine

Antoine Gavoille

Côme Lepage

Behrouz Kassai‐Koupai

Michel Cucherat

Fabien Subtil

ABSTRACT

1. Introduction

2. Outcome Model for Indirect Comparison

2.1. Notations and Data Structures

TABLE 1.

2.2. Unanchored Indirect Comparisons Framework

2.3. Indirect Comparisons Without Measurement Error

2.4. Indirect Comparison With Measurement Error

2.5. Outcome‐Corrected Model

3. Simulations Study Plan

3.1. Aims

3.2. Data‐Generating Mechanisms

3.3. Estimand

3.4. Methods Under Investigation

3.5. Performance Measures

4. Simulation Results

4.1. Variation According to Specificity

TABLE 2.

4.2. Variation According to Sensitivity

4.3. Variation According to Sample Size in Study s3 and s2

TABLE 3.

4.4. Aggregated Data Results

5. Applied Example: PRODIGE‐11 and SHARP Trials

TABLE 4.

TABLE 5.

6. Discussion

Conflicts of Interest

Supporting information

Acknowledgments

Data Availability Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

4.3. Variation According to Sample Size in Study $s_{3}$ and $s_{2}$