Skip to main content
Sage Choice logoLink to Sage Choice
. 2018 Jul 20;28(9):2738–2753. doi: 10.1177/0962280218786526

A novel measure of drug benefit–risk assessment based on Scale Loss Score

Gaelle Saint-Hilary 1,, Veronique Robert 2, Mauro Gasparini 1, Thomas Jaki 3, Pavel Mozgunov 3
PMCID: PMC6728751  PMID: 30025499

Abstract

Quantitative methods have been proposed to assess and compare the benefit-risk balance of treatments. Among them, multicriteria decision analysis (MCDA) is a popular decision tool as it permits to summarise the benefits and the risks of a drug in a single utility score, accounting for the preferences of the decision-makers. However, the utility score is often derived using a linear model which might lead to counter-intuitive conclusions; for example, drugs with no benefit or extreme risk could be recommended. Moreover, it assumes that the relative importance of benefits against risks is constant for all levels of benefit or risk, which might not hold for all drugs. We propose Scale Loss Score (SLoS) as a new tool for the benefit–risk assessment, which offers the same advantages as the linear multicriteria decision analysis utility score but has, in addition, desirable properties permitting to avoid recommendations of non-effective or extremely unsafe treatments, and to tolerate larger increases in risk for a given increase in benefit when the amount of benefit is small than when it is high. We present an application to a real case study on telithromycin in Community Acquired Pneumonia and Acute Bacterial Sinusitis, and we investigated the patterns of behaviour of Scale Loss Score, as compared to the linear multicriteria decision analysis, in a comprehensive simulation study.

Keywords: Benefit–risk, bounds penalisation, decision-making, loss score, multicriteria decision analysis

1 Introduction

A drug benefit–risk assessment consists of balancing its favourable therapeutic effects versus adverse reactions it may induce.1 The benefit–risk balance is a strong predictor of the therapy’s long-term viability and a key element for decision-making during the drug’s development, the regulatory approval process, and the post-marketing follow-up.24 For many years, a qualitative description of evidences had been the main approach to establish a drug’s profile.5,6 This approach, however, tends to lack transparency since the decision of taking (dropping) a drug is based on a large amount of data coming from different sources and on criteria which can vary for different experts. Structured frameworks and quantitative methodologies have been recently proposed to make a benefit–risk assessment more comprehensive and consistent.712

According to the European Medicine Agency Benefit–Risk Methodology Project,6,1315 one of the most comprehensive quantitative approaches is multicriteria decision analysis (MCDA).1619 It has also been recommended by several highly profiled expert groups, e.g. see IMI PROTECT Work package 5.20 The main idea of MCDA is to calculate a single utility score using multiple criteria and taking into account the importance of each criterion. While non-linear forms of the utility score are recognised in various application areas of MCDA,21,22 a linear aggregation of treatment’s effects on benefits and risks remains the most common choice for the drug development.17,2326 The major advantage of the linear model is its intuitive interpretation: a poor efficacy can be compensated by a good safety, and vice-versa. However, the linear utility score can result in the recommendation of highly unsafe or poorly effective drugs27,28 and, consequently, in a counter-intuitive conclusion. Moreover, the linearity implies that the relative tolerance in the toxicity increase is constant for all levels of benefit. This leads to implicit assumptions on decision-makers’ preferences which might not hold for all drugs. Avoiding these pitfalls is possible with, first, adopting good practices to ensure that the modeling approach makes sense,27 and, second, by using non-additive and non-linear models.12,29 The main objectives of this work are to explicitly illustrate the issues of the additive linear MCDA model through a comprehensive simulation study, and to provide an alternative approach, namely, Scale Loss Score, for aggregation of treatment’s effect overcoming these issues. The proposed approach is based on recent developments in the theory of estimation in restricted parameter spaces30,31 and is shown to have soundness in the context of drug evaluation.

The case study of telithromycin (Ketek®) raises questions regarding the suitability of a linear MCDA utility score for the drug benefit–risk assessment. Telithromycin was approved for the treatment of infections in several indications in 2001 by the EMA32 and in 2005 by the FDA.33 It was (qualitatively) reassessed in 2006–2007 by both agencies based on updated safety data. In particular, some serious visual adverse reactions, syncopes and acute liver failures have been reported. The terms of the marketing authorisations were varied in order to better describe the drug safety profile, and two indications were removed from the labeling by the FDA, among them Acute Bacterial Sinusitis (ABS). More recently, the IMI PROTECT Benefit–Risk Group34 applied MCDA to this clinical example. Even if this assessment was performed solely for the purpose of testing the methodology, the main results indicated a fairly strong superiority of telithromycin versus the comparators in ABS, which is not consistent with the concerns expressed by the health authorities. Consequently, alternative methods more accurately reflecting decision-makers’ preferences are of great interest.

In this work, we extend the assumption of non-linearity of preferences, which is well established in other fields such as microeconomics or ecology,21,35,36 to the drug development context. We advocate two properties that a desirable measure of drug benefit–risk assessment should have:

  1. Decreasing level of risk tolerance relative to benefits: an increase in risk could be more tolerated when benefit improves from ‘very low’ to ‘moderate’, compared to from ‘moderate’ to ‘very high’.

  2. Non-effective or/and extremely unsafe treatments should never be recommended.

Motivated by recent developments in the theory of the weighted information measures37,38 and in the theory of estimation in restricted parameter spaces,31 we propose Scale Loss Score (SLoS) as a novel measure for the benefit–risk assessment which shares both of these properties. The first property is achieved through convex preferences between efficacy and safety and the second one by a strong penalisation of extremely low benefit and high risk values.

We perform a comprehensive simulation study investigating the performances of SLoS and MCDA in many different scenarios. Note that this is, to our knowledge, the first time the properties of MCDA are systematically explored by simulations in the medical context. We also apply the new measure to the motivating clinical context of telithromycin. The elicitation of criterion weights for linear MCDA utility scores is widely discussed in the literature.11,12,3945 Therefore, we provide an algorithm of mapping MCDA weights to SLoS weights so that the same elicitation process could be followed while preserving the weight interpretation.

The rest of the paper is organised as follows. The MCDA utility score and the novel measure are detailed in Section 2. Section 3 describes the application of both measures in the real case study (telithromycin). We present a simulation study in Section 4 and conclude with discussion in Section 5. Additional information including source code to reproduce the results may be found in the Supplemental Material.

2 Methods

2.1 MCDA utility score

The original proposal of MCDA16,17 ignores the uncertainty of parameter estimates. As this uncertainty can bare crucial information, an extension of MCDA taking into account the variability of estimates was proposed by Waddingham et al.46 This approach is often called Probabilistic MCDA (or Stochastic MCDA) and is described below.

2.1.1 Utility score

Consider m treatments (indexed by i) which are assessed on n criteria (indexed by j). We adopt the following notation:

  1. ξij is the performance of treatment i on criterion j, so that treatment i is characterised by the vector ξi=(ξi1,,ξin).

  2. The monotonically increasing partial value functions 0uj(·)1 are used to normalise the criterion performances. Let ξ'j and ξj be the most and the least preferable values, then uj(ξj)=0 and uj(ξ'j)=1. The inequality uj(ξij)>uj(ξhj) indicates that the performance of the treatment i is preferred to the performance of the treatment h on criterion j. In this work, we focus on linear partial value functions, one of the most common choice in drug benefit–risk assessment.11,16,23,25,46 They can be written as
    uj(ξij)=ξij-ξjξ'j-ξj (1)
  3. The weights indicating the relative importance of the criteria are known constants denoted by wj such that j=1nwj=1. The vector of weights used for the analysis is denoted by w=(w1,,wn).

The MCDA linear utility score is obtained as

u(ξi,w):=j=1nwjuj(ξij) (2)

The higher the utility score, the more preferable the benefit–risk ratio. Then, the comparison of treatments i and h is based on

Δu(ξi,ξh,w):=u(ξi,w)-u(ξh,w)

While maximising utility is common in economics,36 the concept of a loss function is usually preferred in statistical decision theory and Bayesian analysis for parameter estimation.35 The complement of the MCDA utility score, u¯(ξi,w)=1-u(ξi,w), could be considered as a linear loss score to be minimised, and it can be used equivalently as a measure of discrimination.

Although the term ‘MCDA’ outside of the health domain refers to the general methodology to summarise several characteristics in a single aggregated score, in this work we adopt the notation ‘MCDA’ for the additive utility score with linear partial values functions corresponding to the conventional model adopted so far in the drug benefit–risk assessment.12

2.1.2 Estimation

Within a Bayesian approach, the utility score u(ξi,w) is a random variable having a prior distribution. Given observed outcomes xi=(xi1,,xin) and xh=(xh1,,xhn) (corresponding to treatment performances ξi and ξh, respectively) for i and h, one can obtain the posterior distribution of Δu(ξi,ξh,w). The inference is based on the complete posterior distribution and the conclusion on the benefit–risk balance is supported by the probability of treatment i to have a greater utility score than treatment h:

Puih=(Δu(ξi,ξh,w)>0|xi,xh) (3)

The probability (3) is used to guide a decision on taking/dropping a drug. A possible way to formalise the decision based on this probability is to compare it to a threshold confidence level 0.5ψ1. Then, Puih>ψ would mean that one has enough evidence to say that treatment i has a better benefit–risk balance than h with a level of confidence ψ. Note that Puih=0.5 corresponds to the case where the benefit–risk profiles of i and h are equal according to MCDA.

2.1.3 Weight elicitation

Weighting is a structured way to capture the stakeholders’ preferences between the criteria. It is recognised as a complex problem since it involves both clinical and societal value judgments.41 Methods for quantifying subjective preferences have been widely studied in the literature,11,12,39,40,42,43,45 among which Discrete Choice Experiment and Swing-Weighting appeared to be appropriate in terms of theoretical foundations, cognitive burden, feasibility and robustness.16,44,47,48 In the MCDA framework, the weight assigned to one criterion is interpreted as a scaling factor which relates one increment on this criterion to increments on all other criteria.

2.1.4 MCDA illustration: two criteria

Let us consider an example with two criteria (one benefit indexed by 1, one risk indexed by 2) to illustrate an insight on the linear utility score in equation (2). The utility score for treatment i at fixed parameter values θi1,θi2 takes the form

u(θi1,θi2,w):=wu1(θi1)+(1-w)u2(θi2) (4)

As values u1(θi1),u2(θi2)(0,1), one can interpret u1(θi1) as a probability of benefit and 1-u2(θi2) as a probability of risk. The contours of equal linear loss score u¯(θi1,θi2,w)=1-u(θi1,θi2,w) for all values of u1(θi1) and (1-u2(θi2)) using w = 0.5 (left panel) and w = 0.25 (right panel) are given in Figure 1.

Figure 1.

Figure 1.

Left panel: contours of equal linear loss score u¯(θi1, θi2, w = 0.5). Right panel: contours of equal linear loss score u¯(θi1, θi2, w = 0.25).

Lower values of u¯(θi1,θi2,w) correspond to better drug benefit–risk profiles. It is minimised (right bottom corner) when the maximum possible benefit is reached (u1(θi1)=1) with no risk (1-u2(θi2)=0). The contours of equation (4) are linear, with a constant slope w/(1-w). It implies that if one treatment has an increased probability of risk of x% compared to another, its benefit probability should be increased by (1-w)/w× x% to have the same utility score. This holds for all values of benefit and risk. While the linear form of the utility score makes the interpretation simple, it might lead to some counter-intuitive conclusions. Below, we illustrate possible paradoxes for w = 0.25, i.e. when the importance of the risk is three times higher than the importance of the benefit.

  1. The benefit–risk trade-off is the same for all values of the risk/benefit.

Consider two cases where a drug increases the benefit probability from (a) 0.15 to 0.30 and (b) from 0.80 to 0.95 compared to another therapy. In case (a), the increase doubles the benefit probability and a higher increase in toxicity can usually be tolerated. At the same time, the same increase in case (b) is not as relatively large, therefore it can be argued that only a smaller increase in the risk probability may be tolerated. However, the linear utility score implies that the same increase in risk to match the benefit increase can be sacrificed.

  1. Drugs with 0% benefit or 100% risk can be recommended.

Consider the first example in Table 1: drug 1 that cannot treat patients and causes adverse events only would be preferred. At the same time, drug 2 that adds 11% toxicity, but 30% efficacy would not be chosen. Similarly, in the second example in Table 1, drug 1 that leads to an adverse event for 100% patients would be preferred.

Table 1.

Examples of MCDA linear utility scores with two criteria and w = 0.25.

Example 1
Example 2
Drug 1 Drug 2 Drug 1 Drug 2
Benefit: u1(θi1) 0.00 0.30 0.96 0.50
Risk: 1-u2(θi2) 0.09 0.20 1.00 0.85
Utility score: u(θi1,θi2,w=0.25) 0.6825 0.6750 0.2400 0.2375

Even if none of those drugs are likely to be taken to the market, the goal of MCDA is to rank treatments and these examples reveal some counter-intuitive conclusions to which MCDA can lead. Note that decreasing values of w would help to solve the paradox in Example 1, but would worsen it in Example 2.

We advocate two properties of a benefit–risk analysis measure:

  1. for a given increase in benefit, one can tolerate a larger increase in risk if the amount of benefit is small than if it is high, and

  2. one is not interested in the level of risk (benefit) if the drug does not treat (harm all) patients.

Formally, these properties correspond to (i) the concavity of equal loss score contours (or, equivalently, the convexity of equal utility score contours) and to (ii) a strong penalisation of extreme low benefit values and extreme high risk values. We would like to stress that the convexity of utility (concavity of loss) is widely advocated in microeconomics and is believed to reflect preferences in a more adequate way than linear ones in many applications.21,36

One can check that none of these properties are satisfied for MCDA due to its linearity. There are two forms of linearity in MCDA: in the partial value functions and in the utility score. Note that property (i) of decreasing level of risk tolerance relative to benefits can be achieved by varying the shape of the partial value functions (for instance, using concave functions for benefit and linear functions for risk). However, the explicit elicitation of non-linear forms for the partial value functions may be challenging. As the linear partial function remains a common choice in drug benefit–risk assessment, we propose a novel measure of aggregation which allows for both properties to be achieved even under linear partial value functions.

2.2 Scale loss score

2.2.1 Derivation

As an alternative to the linear MCDA utility score (equation (2)), we define SLoS for aggregation of treatment’s performances as

l(ξi,w):=j=1n(1uj(ξij))wj (5)

where wj is the weight indicating the average relative importance of criterion j compared to the others and uj(·) is a linear partial value function (equation (1)). The form of SLoS is motivated by the scale symmetric loss function31,49 and the precautionary loss function.30 These functions allow to stay away from ‘boundary’ values uj(·)=0. In the context of the benefit–risk assessment, they correspond to an extremely undesirable performance of a drug: low benefit or high risk. SLoS can be interpreted as a divergence between drug i characteristics ξi and the ‘perfect’ benefit–risk characteristics (1,,1)n×1. As a loss score is used rather than a utility score, lower values of l(ξi,w) correspond to more desirable performances of the drug.

Clearly, l(ξ',w) is minimised for ξ' such that uj(ξ'j)=1 for all j=1,,n, i.e. at the point of the ideal benefit–risk profile. Additionally, l(ξ(k),w)=+ for ξ(k) a vector of parameters containing ξk such that uk(ξk)=0, for at least one k{1,,n}, so the loss score for a treatment with at least one extreme negative performance is equal to infinity. The lower bounds are determined by the least preferred values ξk used in the partial value functions, and correspond to unacceptable levels of benefit or risk. It should be noted that SLoS is intentionally sensitive to these unacceptable values, therefore their choice could have a non-negligible impact on the results. While unacceptable values of 0 (for benefit) or 1 (for risk) may be obvious choices for probabilities of a binary outcome, the unacceptable value for a continuous outcome may be more subjective and requires a careful investigation.

SLoS is a measure of the benefit–risk balance permitting to discriminate treatments according to their performances and according to the weights attributed to the criteria. The lower the SLoS, the more preferable the benefit–risk ratio, and the comparison of treatments i and h is based on

Δl(ξi,ξh,w):=l(ξi,w)-l(ξh,w)

2.2.2 Estimation

Similarly to MCDA, we consider a Bayesian model and assign a prior probability distribution to ξij. Given the observed outcomes xi=(xi1,,xin) and xh=(xh1,,xhn) for the treatments i and h, one can obtain a posterior distribution of Δl(ξi,ξh,w). Again, the inference is based on the complete posterior distribution and the conclusion on the benefit–risk balance is supported by the probability of treatment i to have a smaller SLoS than treatment h:

Plih=(Δl(ξi,ξh,w)<0|xi,xh)

This probability can be compared to a fixed confidence threshold ψ as in the MCDA approach.

2.2.3 SLoS illustration: two criteria

To illustrate the properties of SLoS, consider the example presented in Section 2.1.4 with one benefit and one risk. The SLoS for treatment i in the point of fixed parameter values takes the form

l(θi1,θi2,w):=(1u1(θi1))w+(1u2(θi2))1-w (6)

Figure 2 presents the contours of SLoS (equation (6)) for all pairs of u1(θi1) and 1-u2(θi2) using w=0.5 (left panel) and w=0.25 (right panel). The tangents of the contours at the point (0.5,0.5) are presented on the graph for the purpose of the weight mapping detailed in the next section.

Figure 2.

Figure 2.

Left panel: contours of l(θi1, θi2, w = 0.5). Right panel: contours of l(θi1, θi2, w = 0.25). Red lines correspond to tangents at the point (0.5,0.5).

SLoS is minimised when the benefit–risk balance of the drug is maximised, at the point (1,0) (right bottom corner), where the maximum possible benefit is reached with no risk. The loss score is infinite for extreme low benefit values and extreme high risk values, thus non-effective or extremely unsafe treatments could never be recommended. Considering the cases presented in Table 1, the drug 2 had a SLoS equal to 2.53 for the first example and of 5.34 for the second example, and it is preferred to drug 1 which SLoS is infinite in both cases.

The contour lines of equal loss are concave, which is equivalent to having convex preferences between additional benefit and avoided risk, and have the form

1-u2(θi2)=1-(z-u1(θi1)-w)-11-w, for u1(θi1)>z-1/w

for a fixed value l(θi1,θi2,w)=z. The slope of the tangent of the contour at a given u1(θi1) for the loss score value z is

w1-w(z-u1(θi1)-w)w-21-w.u1(θi1)-(w+1) (7)

The slope decreases as benefit increases. It follows that the relative importance of the benefit criterion over the risk criterion decreases with the amount of benefit itself. In other words, an increase in toxicity is more tolerated if, in parallel, efficacy improves from ‘very low’ to ‘moderate’, compared to from ‘moderate’ to ‘very high’.

2.2.4 Weight elicitation

Since comprehensive work has been published and is currently being continued on the weight elicitation for MCDA, we present a way to map MCDA weights wj to SLoS weights wj. Note that the slope of MCDA contour tangents is constant for all values of parameters and defined by the weights wj only, while the slope of SLoS contour tangents is non-constant and defined by both wj and values of the criteria. To map weights, we would interpret wj as an average relative importance of each criterion over the others. With two criteria, the weight wj corresponding to the MCDA weight wj can be found from the equality of the slopes of the tangents of MCDA and SLoS contours in the middle point u1(θi1)=u2(θi2)=0.5 of treatment i performances

wj1-wj.22wj-1=wj1-wj (8)

where the slopes of SLoS and MCDA contour tangents in this point are given on the left- and right-hand sides, respectively.

The weight mapping (equation (8)) does not have an analytical solution, but the approximate value of wj can be obtained by line search. The mapping of the weights is illustrated in Figure 3. The weights are the same when both criteria are considered equally important (w=w=0.5), while w < 0.5 corresponds to slightly greater values of w. For instance, w=0.30 corresponds to w = 0.25.

Figure 3.

Figure 3.

Weight mapping.

Considering an arbitrary number of criteria, the mapping (equation (8)) can be applied to each value of the MCDA weights. For instance, using four criteria with a weight vector w=(0.30,0.15,0.15,0.15,0.25), the vector of SLoS weights is equal to w=(0.35,0.21,0.21,0.21,0.30). It should be noted that, in this case, the weights wj do not necessarily sum to 1, but this does not prevent from calculating SLoS, for which formula (5) still applies.

Mapping weights to the middle point of the benefit and risk treatment performance range relies on the assumption that MCDA weights were elicited across the entire range, or that the trade-off between criteria was anchored on average at the middle point. However, in practice, MCDA weights could have been elicited at any other point and extrapolated. In this case, the mapping procedure above could be performed accordingly by finding the SLoS weight satisfying the equality of the slopes of MCDA and SLoS contour tangents in any other point of interest.

3 Case study: telithromycin

We illustrate the use of SLoS and MCDA in a real clinical context on the case-study thelithromycin (Ketek®) reported by the IMI PROTECT Benefit–Risk Group.34

Telithromycin was approved in 2001 for several indications as an alternative when beta-lactam antibiotics are not appropriate, and we will focus on the indications Community Acquired Pneumonia (CAP) and Acute Bacterial Sinusitis (ABS) as they well illustrate similarities and differences between the two methods. A Probabilistic MCDA model was considered in the IMI PROTECT report34 (called Stochastic Multicriteria Acceptability Analysis with fixed weights), and MCDA utility scores presented here are derived from the original report.

Telithromycin is compared to a single alternative called ‘comparator’, which comprises amoxicillin-clavulanic acid, cefuroxime and clarithromycin, used as comparators in clinical studies and pooled together. The probabilities of five binary criteria, one benefit and four adverse events (AE), were transformed using linear partial value functions (equation (1)) with the following most and least preferred probabilities of occurrence ξ'j and ξj34

  • Benefit: cure rate (CAP: ξ'1=1,ξ1=0.4; ABS: ξ'1=0.86,ξ1=0.71),

  • Risks:
    • – Hepatic AE (CAP: ξ'2=0,ξ2=0.1; ABS: ξ'2=0,ξ2=0.02),
    • – Cardiac AE (CAP: ξ'3=0,ξ3=0.1; ABS: ξ'3=0,ξ3=0.01),
    • – Visual AE (CAP: ξ'4=0,ξ4=0.1; ABS: ξ'4=0,ξ4=0.02),
    • – Syncope (CAP: ξ'5=0,ξ5=0.1; ABS: ξ'5=0,ξ5=0.01).

Using uniform priors and given the number of cures and AE (see Supplemental Material, Table S1), Beta posterior distributions for the event probabilities are approximated using 100,000 simulations in R,50 and are used to compute the corresponding distributions of the partial value functions. Means and 95% Credibility Interval (CrI) of the probabilities and of the partial value functions, and the MCDA weights, are summarised in Table 2.

Table 2.

Mean and 95% CrI of the Beta posterior distributions of benefit and risk parameters and of corresponding partial value functions, with their MCDA weight, for Telithromycin (Teli.) and Comparator (Comp.).

CAP
ABS
MCDA
Teli. Comp. Teli. Comp. weights
Cure rate
ξi1
  Mean 0.908 0.877 0.828 0.772 30%
  95% CrI [0.896;0.919] [0.855;0.897] [0.800;0.855] [0.715;0.824]
u1(ξi1)
  Mean 0.846 0.795 0.787 0.414
  95% CrI [0.827;0.864] [0.759;0.829] [0.601;0.964] [0.036;0.760]
Hepatic AE
ξi2
  Mean 0.044 0.042 0.011 0.004 15%
  95% CrI [0.034;0.056] [0.031;0.054] [0.006;0.017] [0.001;0.009]
u2(ξi2)
  Mean 0.561 0.582 0.468 0.789
  95% CrI [0.444;0.664] [0.457;0.691] [0.158; 0.707] [0.542; 0.942]
Cardiac AE
ξi3
  Mean 0.005 0.004 0.002 0.002 15%
  95% CrI [0.002;0.01] [0.001;0.009] [0.000;0.004] [0.000;0.006]
u3(ξi3)
  Mean 0.947 0.956 0.849 0.790
  95% CrI [0.902;0.979] [0.909;0.985] [0.579;0.982] [0.414;0.974]
Visual AE
ξi4
  Mean 0.011 0.004 0.013 0.005 15%
  95% CrI [0.006;0.018] [0.001;0.009] [0.008;0.020] [0.002;0.011]
u4(ξi4)
  Mean 0.887 0.956 0.357 0.736
  95% CrI [0.823;0.937] [0.909;0.986] [0.016; 0.625] [0.461; 0.914]
Syncope
ξi5
  Mean 0.002 0.004 0.001 0.002 25%
  95% CrI [0.000;0.005] [0.001;0.008] [0.000;0.003] [0.000;0.006]
u5(ξi5)
  Mean 0.977 0.964 0.924 0.789
  95% CrI [0.945;0.995] [0.922;0.990] [0.719;0.998] [0.414;0.974]

This information was used to approximate the posterior distributions of MCDA linear utility score and SLoS. The mapped SLoS weights corresponding to the MCDA weights are w=(0.35,0.21,0.21,0.21,0.30).

For the CAP indication, MCDA and SLoS provide similar results, with probabilities that telithromycin is better than the comparator equal to 59% and 51%, respectively. These results indicate that telithromycin has a slightly better benefit–risk profile than the comparator, but with large uncertainty.

For the ABS indication, the probability that the benefit–risk balance of telithromycin is better than the comparator is equal to 71% using MCDA and 55% using SLoS. While they both indicate results in favour of telithromycin, this advantage appears to be much more uncertain with SLoS than with MCDA. The difference between the methods can be mainly explained by a higher rate of Visual AE with telithromycin (1.3% versus 0.5%), which is close to the least preferred value for this criterion in this indication (ξ4=2%). This leads to low values of the corresponding partial value function (mean (95% CrI) u4(ξ14): 0.36[0.02;0.63]), and values at the lower end of the distribution are strongly penalised by SLoS. At the same time, the mass of the corresponding partial value function distribution of the comparator (mean (95% CrI) u4(ξ24): 0.74[0.46;0.91]) is shifted further from the bound, which results in lower value of SLoS. A similar argument could be applied to Hepatic AE, and the combination of these safety issues is more penalised by SLoS than by MCDA, despite the worse cure rate of the comparator. Even if the benefit–risk assessment by IMI PROTECT34 was performed in order to test the methodologies and may have been conducted differently in the actual regulatory context, it is worth noting that the conclusion obtained using SLoS for the ABS indication is more in line with the concerns expressed by the Committee for Medicinal Products for Human Use (CHMP) regarding the atypical safety profile of the drug51 and the removal of this indication from the labeling by the FDA.33 This could be an example of SLoS reflecting the decision-makers’ preferences more accurately than MCDA.

A sensitivity analysis was conducted using MCDA weights to compute SLoS (omitting the weight mapping) and the conclusions are globally robust, with the probability of telithromycin being better than the comparator equal to 57% for CAP and 62% for ABS.

In the next section, we present a simulation study illustrating the properties of SLoS and MCDA in many different scenarios.

4 Simulation study

4.1 Setting

To investigate the performances of SLoS and MCDA, we simulated randomised controlled clinical trials with two treatments i = 1, 2, named T1 and T2, N = 100 patients per group, and two uncorrelated binary criteria (j = 1 for benefit and j = 2 for risk). We assume that benefit events are desirable (e.g. treatment response), while risk events should be avoided (e.g. adverse event), with the performance parameters ξij being their probability of occurrence. The partial value functions are defined as u1(ξi1)=ξi1 and u2(ξi2)=1-ξi2. Equally important criteria with weights wj=wj=0.5, j = 1, 2, are considered.

The investigated scenarios are summarised in Table 3, where the expected probabilities of event θij are presented for T1 (•) and T2 (♦). Nine sets of T1 characteristics are fixed. For each set, all possible combinations of T2 characteristics with θ21,θ22{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9} are considered. This results in 81 profiles for T2 and in 729 cases in total: we explored the grid of treatment performances in order to identify under which conditions MCDA and SLoS lead to different conclusions. For example, the first scenario corresponds to fixed expected probabilities of benefit and risk for treatment T1θ11=θ12=0.5 compared to all considered combinations of probabilities of event for T2. In this scenario, we expect SLoS to recommend T1 more than MCDA when T2 is associated with an extreme risk or no benefit. In the other scenario where θ11=θ12=0.1, T1 has almost no benefit, so it should not be recommended despite its good safety profile. Indeed, even if T1 does not harm the patients (it is similar to a placebo), administrating it to the patients implies we make the assumption that it has a positive effect, while it has not in reality. This interpretation is close to the usual type I error, and is not acceptable from a regulatory and health economics perspective. Similarly, when θ11=θ12=0.9, T1 should not be recommended despite its outstanding efficacy as it is associated with an extreme risk. All intermediate cases are considered.

Table 3.

Simulation scenarios with two criteria.

Probability of Benefit θi1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Probability of Risk θi2 0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1

•: treatment T1; ♦: treatment T2.

Let S be the total number of simulated trials and K the number of samples generated to approximate the distributions of interest. In each trial s=1,,S, the number of events for each criterion was simulated using a Binomial likelihood xijsBin(N,θij). Then, values ξijsk are sampled from the posterior distribution of the parameters B(xijs,N-xijs) for k=1,,K, assuming implicitly an improper conjugate prior B(0,0). The posterior distributions of the utility score and the loss score are approximated by the samples u(ξisk,w) and l(ξisk,w).

Assuming the threshold confidence level ψ=0.8, MCDA and SLoS are compared using [Pu1,2>0.8],[Pl1,2>0.8] and φ=[Pl1,2>0.8]-[Pu1,2>0.8]. As a difference between probabilities, φ ranges in (-1,1). A value -1φ<0 indicates that SLoS recommends treatment T1 more often than MCDA and 0<φ1 that SLoS recommends T1 less often than MCDA. The two approaches are in agreement when φ=0. A similar analysis for [Pl2,1>0.8] and [Pu2,1>0.8] is presented in the Supplemental Material (Figure S1). Simulations with other choices of ψ led to similar conclusions on the comparison between the two methods and are not presented here.

The analyses were conducted using R, with S = 2500 simulated clinical trials and K = 2000 simulations to estimate the parameter distributions.

4.2 Results

The results are presented in Figure 4. All nine scenarios for treatment T1 are presented in rows and numbered 1 to 9. Each graph corresponds to fixed expected probabilities of event for treatment T1 (•), and each cell corresponds to a combination of expected probabilities of benefit and risk for T2. The probabilities [Pl1,2>0.8] are presented on the left panel, [Pu1,2>0.8] on the middle panel, and φ on the right panel, for which positive values are displayed in blue, negative values in red, and null values in white.

Figure 4.

Figure 4.

Results of MCDA and SLoS performances in all simulation scenarios for two equally important criteria (wj = wj = 0.5 for j = 1,2). • = T1. Left panel: P[P1,2 l > 0.8]. Middle panel: P[P1,2 u > 0.8]. Right panel: Φ = P[P1,2 l > 0.8] − P[P1,2 u > 0.8], for which blue cells (resp., red cells) indicate that SLoS recommends T1 more often (resp., less often) than MCDA.

In scenario 1, the two measures are in agreement to recommend T1, which has moderate benefit and risk (θ11=θ12=0.5), when T2 has less benefit and more risk. On the diagonal, SLoS favours T1 to more effective treatments but with very high risk (respectively, to safer treatments but with very low benefit). In contrast, MCDA recommends more effective but highly unsafe treatments, or safer but no effective treatments, compared to T1. For example, when θ21=0.8 and θ22=0.9 (large benefit but high risk), SLoS favours T1 in 100% of the trials while MCDA recommends it in 62% only, resulting in φ=0.38. Also, when θ21=0.6 and θ22=0.7 (increased benefit by 0.1 and risk by 0.2 compared to T1), SLoS favours T1 in 80% and MCDA in 57% of the cases (φ=0.23). This reflects the property of SLoS that increases in risk are less tolerated when the amount of benefit is large enough. Similar patterns are observed in scenarios 2 and 3 where treatment T1 has either a low benefit and a large risk, or a large benefit and a low risk, but not extreme probabilities of event.

In scenario 4, T1 has almost no benefit nor risk, with θ11=θ12=0.1. As expected, it is almost never recommended by SLoS, but it could be recommended by MCDA in scenarios where the alternative T2 has some benefit but a higher increase in risk. For example, when θ21=0.2 and θ22=0.3 (increased benefit by 0.1 and risk by 0.2 compared to T1), MCDA recommends T1 in 70% of the cases while it is never recommended by SLoS (φ=-0.70). This is consistent with the stated desirable property that we are not interested in the level of risk if the drug does not treat the patients. On the other hand, when θ21=0.3 and θ22=0.2 (increased benefit by 0.2 and risk by 0.1 compared to T1), SLoS discriminates better the treatments and recommends T2 in 100% of the cases while MCDA recommends it in only 68% (Supplemental Material, Figure S1). Similar conclusions are obtained in scenario 5, where T1 has both extreme efficacy and risk (θ11=θ12=0.9): SLoS never recommends the unsafe treatment T1 if alternative treatments T2 have lower risk and at least some small benefit, while MCDA recommends T1 as compared to treatments with a larger decrease in benefit than in risk. This is the case for instance when θ21=0.6 and θ22=0.7 (decreased benefit by 0.3 and risk by 0.2 compared to T1), where T1 is recommended in 65% of the cases by MCDA and never recommended by SLoS (φ=-0.65). In contrast, when θ21=0.7 and θ22=0.6 (decreased benefit by 0.2 and risk by 0.3 compared to T1), SLoS favours T2 in 100% and MCDA in 67% of the cases (Supplemental Material, Figure S1).

Scenarios 6 and 7 correspond to treatment T1 with either both low benefit and risk (θ11=θ12=0.3) or large benefit and risk (θ11=θ12=0.7) but where the probabilities of event are not extreme. The measures are in agreement to recommend T1 when T2 is indisputably worse. On the diagonal, T1 is more often recommended by SLoS when T2 has no benefit nor risk (θ21=θ22=0.1) or very large benefit and risk (θ21=θ22=0.9). On the other hand, SLoS favours more treatments with benefit and risk probabilities closer to 50%. For example, in scenario 6, when θ21=0.4 and θ22=0.5 (increased benefit by 0.1 and risk by 0.2 compared to T1), SLoS recommends T1 in only 17% of the cases, but MCDA in 59% (φ=-0.42). Similarly, when θ21=0.5 and θ22=0.4 (increased benefit by 0.2 and risk by 0.1 compared to T1), T1 is not favoured by any of the methods, but SLoS recommends the alternative T2 in 88% of the cases and MCDA in only 59% (Supplemental Material, Figure S1). Similar results are observed in scenario 7 for the same examples.

In all scenarios, both methods are in agreement to recommend T1 when it is indisputably better than T2, i.e. more effective and safer (or to recommend T2 when T1 is indisputably worse, i.e. less effective and more toxic, see Supplemental Material Figure S1). This is well illustrated in scenarios 8 (θ11=0.1 and θ12=0.9) and 9 (θ11=0.9 and θ12=0.1). In scenario 8, MCDA discriminates slightly better treatments with no efficacy or high risk between themselves, while SLoS penalises them equally, as they should not be recommended anyway.

Overall, both MCDA and SLoS have good performances to discriminate the benefit–risk balance of the treatments. They provide similar conclusions in many situations, and the cases where they differ highlight the two desirable properties of SLoS. Over all possible scenarios, SLoS recommends safer treatments than MCDA in half of the scenarios, and less safe treatments in the other half.

4.3 Sensitivity analyses

While the case of equally important and uncorrelated criteria is considered above, we investigated the robustness of the results in cases of:

  • Equally important criteria wj=wj=0.5 for j = 1, 2 and strongly correlated criteria: ρ=0.8 (positive correlation) and ρ=-0.8 (negative correlation).

  • More weight on the risk criterion, using MCDA weights (w1, w2) = (0.25, 0.75) and mapped SLoS weights (w1,w2) = (0.30, 0.70), no correlation between the criteria.

  • More weight on the risk criterion, with (w1,w2)=(w1,w2) = (0.25, 0.75) (no mapping), no correlation between the criteria. This scenario aims at evaluating the impact of the weight mapping on the results, by comparing its results to those of the previous case.

The results of the sensitivity analyses are given in the Supplemental Material.

Both measures are robust to positive and negative correlations between the outcomes, with very similar results (Supplemental Material, Figures S2–S5). When an MCDA weight of 25% is given to the benefit, both measures penalise more the risk, but analogous differences and similarities as before could be observed between them (Supplemental Material, Figures S6–S7). Since the mapping is not far from an identity transformation, omitting it does not have a major impact on the results (Supplemental Material, Figures S8–S9).

A simulation study was also conducted with four criteria (two benefits j = 1, 2 and two risks j = 3, 4), for which the investigated scenarios are summarised in the Supplemental Material, Table S1, under the following assumptions:

  • Equally important criteria with weights wj=wj=0.25 for j=1,,4, no correlation between the criteria.

  • Equally important criteria with weights wj=wj=0.25 for j=1,,4, correlated criteria (see correlation matrices in the Supplemental Material).

  • More weight on the risk criteria, with MCDA weights (w1,w2,w3,w4) = (0.10, 0.10, 0.40, 0.40) and mapped SLoS weights (w1,w2,w3,w4) = (0.15, 0.15, 0.43, 0.43), no correlation between the criteria.

  • More weight on the risk criteria, with (w1,w2,w3,w4) = (w1,w2,w3,w4) = (0.10, 0.10, 0.40, 0.40) (no mapping), no correlation between the criteria.

Similar conclusions could be drawn when comparing MCDA and SLoS using four criteria, even if the interpretation of the simulation scenarios is somewhat less straightforward as the amount of possible situations (low/moderate/high benefits and risks) increases (Supplemental Material, Figures S10-S19).

Overall, the conclusions are robust to correlations, number of criteria, weighting and weight mapping for both measures.

5 Discussion

In this paper, we propose SLoS as a new tool for drug benefit–risk assessment. It offers the same advantages as MCDA to summarise the benefit–risk balance of the treatments in a single measure, but it has additional desirable properties permitting to avoid recommendations of non-effective or extremely unsafe treatments, and to tolerate larger increases in risk for a given increase in benefit when the amount of benefit is small than when it is high. In contrast, we have shown that the linear form of the MCDA utility score involves implicit assumptions of the decision-makers, such as a constant benefit–risk trade-off for all values of benefit or risk, and might lead to counter-intuitive conclusions. It is worth noting that these additive and linear properties were shown to be inadequate in other application areas of MCDA,21,22 and its limitations in the health domain have been highlighted as well.27,28

The independence of the benefit and risk criteria is usually assumed for the sake of simplicity. Correlations could be taken into account in the analyses; however, our simulation study shows that both measures are robust to correlations between outcomes.

Importantly, SLoS penalises drugs with no efficacy, which is sensible for comparisons between active treatments. Indeed, a ‘no treatment/placebo’ option, in the absence of placebo effect, will most likely be strongly penalised by SLoS due to its lack of efficacy, although it may be preferable to any active treatment with a small amount of efficacy but that causes more harm overall. Therefore, MCDA’s recommendations may be more reliable in such cases and this should be carefully considered before choosing the method and when interpreting the results. However, the area of application of SLoS remains large, as many drug comparisons involve a standard of care, or a placebo with expected effects that are non-negligible.52

The MCDA weights of the criteria should be elicited according to the preferences of the decision-makers (regulators, experts, patients, etc.) and methods have been proposed in the literature for this purpose.11,12,16,42,44,45,47,48 We propose a simple mapping to obtain SLoS weights from MCDA weights, so that the same elicitation process could be followed while preserving the weight interpretation. It should be noted that the mapping is not far from an identity transformation, and omitting it does not strongly affect the results. We considered in this paper fixed weights, but extended models have been proposed where the weights are treated as random variables to account for an uncertainty in their assignments.23,26

As an aggregation method involving multiple criteria, SLoS could be included within the family of non-linear MCDA models. It was shown that SLoS has the desirable properties even under the linear partial value functions on which this work has focused only. An alternative approach between linear MCDA and SLoS could be to handle the decreasing level of risk tolerance relative to benefits by varying the shape of the partial value functions. For instance, one can derive linearly-weighted partial value functions used in the linear utility score which exhibit the same degree of decreasing risk tolerance as SLoS. This, however, seems to be non-trivial and requires extensive attention. Furthermore, as stated above, the explicit elicitation of non-linear forms for partial value functions may be difficult for project teams. The weight elicitation and their interpretation appear also more challenging, in particular if the shapes of the partial value functions are different from one criterion to another. Meanwhile, an exploration of the use of non-linear partial value functions both in the framework of the additive utility score and SLoS is of great practical interest and is to be investigated.

In many cases, SLoS and MCDA provide similar conclusions, but SLoS shows clear advantages when treatments have no benefit or extreme risk. In general, this situation may occur in early stage drug developments, or at least before the time of marketing authorisation application, since treatments with no evidence of efficacy or high toxicities usually do not reach this point and are stopped before. Until now, benefit–risk assessments were mainly conducted in late stage by the sponsor and/or regulatory agencies, but it is recommended to initiate the benefit–risk assessment earlier in order to better support internal decisions and discussions with health authorities about the development strategy.41 Therefore, SLoS could be used in early development, and then updated during the following phases and the regulatory process until post-marketing surveillance, in order to ensure a transparent and consistent benefit–risk assessment throughout the drug life-cycle.

Supplementary Material

Supplementary material
Supplementary material
Supplementary material
Supplementary material
Supplementary material

Acknowledgements

The authors would like to thank both reviewers and the editor for insightful comments and suggestions that helped to improve the original work significantly. Veronique Robert is an employee of Institut de Recherches Internationales Servier (IRIS). The views expressed in this publication are those of the authors and not necessarily those of the National Health Service, the National Institute for Health Research or the Department of Health.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Gaelle Saint-Hilary’s research is supported by the Institut de Recherches Internationales Servier (IRIS). Pavel Mozgunov, Thomas Jaki and Mauro Gasparini have received funding from the European Union Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 633567. Thomas Jaki’s contribution arises in part from his Senior Research Fellowship (NIHR-SRF-2015-08-001) supported by the National Institute for Health Research.

Supplemental material

Supplemental material for this article is available online.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material
Supplementary material
Supplementary material
Supplementary material
Supplementary material

Articles from Statistical Methods in Medical Research are provided here courtesy of SAGE Publications

RESOURCES