Skip to main content
Schizophrenia Bulletin logoLink to Schizophrenia Bulletin
. 2021 Aug 18;48(1):27–36. doi: 10.1093/schbul/sbab094

How Efficacious Are Antipsychotic Drugs for Schizophrenia? An Interpretation Based on 13 Effect Size Indices

Stefan Leucht 1,2,#,, Spyridon Siafis 1,#, Rolf R Engel 3, Johannes Schneider-Thoma 1, Irene Bighelli 1, Andrea Cipriani 4,5, Toshi A Furukawa 6, John M Davis 7,8
PMCID: PMC8781341  PMID: 34405881

Abstract

Background

The magnitude of the superiority of antipsychotics over placebo is debated. One reason is that the effect-size index which is usually used in meta-analyses is in standard deviation units. Many other indices, some of which are more intuitive, exist.

Methods

We explain the formulae, advantages, and limitations of 13 effect-size indices: Mean Difference (MD), Standardized-Mean-Difference (SMD), Correlation Coefficient, Ratio-of-Means (RoM, endpoint and change data), Improvement Fraction (IF), Drug-Response Fraction (DRF), Minimally-Clinically-Important-Difference-Units (MCIDU), Number-Needed-to-Treat-derived from SMD (NNT), Odds Ratio (OR), Relative Risk (RR), and Risk Difference (RD) derived from SMD, Drug-response and Placebo-response in percent. We applied these indices to meta-analyses comparing antipsychotic drugs with placebo for acute schizophrenia.

Results

The difference of all antipsychotics pooled vs placebo (105 trials with 22741 participants) was: MD 9.4 (95% CI 8.4,10.2) PANSS points, SMD 0.47 (0.42,0.51), Correlation coefficient 0.23 (0.21,0.25), RoM endpoint 0.83 (0.81,0.85), RoM change 1.94 (1.84,2.02), IF (%) 49 (46,51), DRF (%) 94 (84,102), MCIDU 0.63 (0.56,0.68), NNT 5 (5,6), OR 2.34 (2.14, 2.52), RR 1.67 (1.59,1.73), RD 20% (18–22), and 50% (48, 52) improved on drug compared to 30% on placebo. Results of individual drugs compared to placebo are presented, as well.

Conclusions

Taken together these indices show a substantial, but not a large superiority of antipsychotics compared to placebo. The general chronicity of the patients in the trials must be considered. Future meta-analyses should report other effect size indices in addition to the Standardized-Mean-Difference, in particular percentage responders in the drug and placebo groups. They can be easily derived and would enhance the interpretation of research findings.

Keywords: antipsychotics, schizophrenia, efficacy, effect size, standardised mean difference, minimally-clinically- important-difference

Introduction

While it is undisputed that antipsychotic drugs are effective for the acute treatment of schizophrenia, there is controversy about the magnitude of this difference compared to placebo.1–4 One reason is that as different rating scales are used to measure the same concept, eg the Positive and Negative Syndrome Scale or the Brief Psychiatric Rating Scale as measures for overall symptoms of schizophrenia, standardized-mean-differences are usually applied as the effect size index. The units of standardized-mean-differences such as Cohen's d5 or Hedges's g are, however, standard deviations which are difficult to interpret. Indeed, in a survey of 531 participants from 8 countries, Johnston et al6 found that presenting results as a standardized-mean-difference was poorly understood by the participants and SMD was perceived as the least useful effect size index. Cohen's rule that a standardized-mean-differences of 0.2, 0.5, and 0.8 mean small, medium, and large differences between interventions is universally used. However, Cohen himself warned that his rule of thumb strongly depends on the context. An effect size of 0.2 can be important when it relates to a severe side-effect, and it can be meaningless for a minor side-effect. For example, Rosenthal and diMatteo7 described that the effect of a daily dose of aspirin on cardio-vascular conditions only amounts to a SMD of 0.07. However, if one looks at the consequences, 34 of 1000 fewer die because of cardiac infarction (https://www.psychometrica.de/effect_size.html).7 Many more effect size indices are available and some of them are more intuitive. However, most of them have never been applied in meta-analyses of antipsychotics and other psychotropic drugs. We, therefore, applied 13 effect size indices to the results of recent meta-analyses of antipsychotic drugs1,8 with the aim to tackle the interpretation of antipsychotic drug efficacy from different angles. We also provide the general formulae and the limitations of these indices which may enhance their application in future meta-analyses of psychiatric treatments.

Method

We applied 13 effect size indices for continuous data to the primary outcome Positive and Negative Syndrome Scale total score (PANSS9) of a recent meta-analysis comparing antipsychotic drugs as a group with placebo in the acute treatment of schizophrenia,8 and to the results of single antipsychotics vs placebo from a network meta-analysis.1 Eight are effect sizes indices for continuous outcomes: Mean Difference (MD), Standardized Mean Difference (SMD), Correlation Coefficient r, Ratio of Means for endpoint, and change data (RoM), Improvement Fraction (IF), Drug Response Fraction (DRF) and Minimally Clinically Important Difference Units (MCIDU). As it has been shown that indices which were originally developed for dichotomous outcomes can be interpreted more correctly,6 we added the following 5 indices: Number-Needed-to-Treat derived from SMD (NNT), Odds Ratio (OR), Relative Risk (RR), and Risk Difference (RD) derived from SMD, Control Event Rate (CER, here placebo response rate) vs Experimental Event Rate in percent (EER) which is here the response rate in the antipsychotic group.

The interpretation of the effect size indices requires an understanding of the underlying general formulae. We thus first present the indices' general formulae based on a hypothetical trial comparing an antipsychotic drug with placebo with results similar to the sample size weighted mean values found in the meta-analyses1,8 (see eAppendix1).

  • Sample size in both groups 100

  • Mean PANSS total score at baseline in both groups = 95

  • Mean PANSS total score at endpoint antipsychotic group = 75

  • Mean PANSS total score at endpoint placebo group = 85

  • Standard deviation for PANSS total score at endpoint/change from baseline to endpoint = 20

We then transformed the Standardized Mean Difference of all antipsychotics pooled compared to placebo found in Leucht et al8 and for the single antipsychotic drugs vs placebo found by Huhn et al1 to the other 12 effect size indices (additional formulae and assumptions are presented in eAppendix2). Only second-generation antipsychotics, which are nowadays the most frequently used drugs, and haloperidol were chosen for this purpose.

Explanation of the Formulae of the Effect Size Indices

Mean Difference

The mean difference (MD) is the simple subtraction of the means of the two groups being compared. From this point of view, it is the easiest to interpret effect size index:

Formula 1

Mean difference = mean group A mean group B

It can be calculated for both the mean rating scale score (eg PANSS total) at endpoint and for change from baseline to endpoint which can be combined in a meta-analysis.10

Here:

MD = 85 − 75 = 10 (for PANSS at endpoint)

MD = 20 − 10 = 10 (for PANSS change from baseline to endpoint)

Standardized Mean Difference

The Mean Difference cannot be applied if different scales are used to measure the same concept, eg the PANSS or the Brief Psychiatric Rating Scale (BPRS) in schizophrenia.11 Possible PANSS total score values range from a minimum of 30 to a maximum of 210, while the BPRS scores span from 18 to 126. Thus, 10 points difference on the PANSS does not mean the same as 10 points on the BPRS. Therefore, the Mean Differences of PANSS and BPRS studies need to be standardized before combining them in a meta-analysis. The Standardized Mean Difference (SMD) accomplishes this standardization by dividing the Mean Difference by the standard deviation, either of both groups combined or of the control group. By this procedure, the Mean Difference is converted to a measure in standard deviation units. When the generic term “effect size” is used in publications, the authors usually mean standardized mean difference.

Formula 2

Standardized mean difference = mean differencestandard deviation

In our case:

SMD= 1020= 0.50

The mathematically correct interpretation of SMD is that antipsychotics shift the distribution curve of the PANSS total score by half a standard deviation (0.50) to the better, see figure 1. It has been shown that clinicians do not interpret SMDs correctly, probably because they are standard deviations which are not intuitive.6 One option is to convert the standardized mean difference can be converted back to mean difference by multiplying it with the standard deviation of the scale of interest.

Fig. 1.

Fig. 1.

Illustration of the meaning of the Standardized Mean Difference.SMD = Standardized mean difference. SMD is expressed as Cohen's d. A standard deviation of 20 which is approximately the average in antipsychotic drug trials was used. PANSS = Positive and Negative Syndrome Scale.

Formula 3

Mean difference = standardized mean difference × standard deviation

In our case, when converting back to PANSS units:

Mean Difference = 0.50 × 20 = 10

Correlation Coefficient

The advantage of Pearson's product moment correlation coefficient (r) is that it can be applied to correlations, dichotomous, and continuous data, and to situations where different scales are used to measure the same concept.12 It can span from −1 to 1, with −1 indicating a perfect negative linear relation, 1 indicating a perfect positive linear relation, and 0 indicating no linear relation between the two variables (antipsychotic and placebo) and outcome (any definition, either binary [eg responder – nonresponder] or continuous [eg rating scale endpoint or difference]).

Formula 4

r=SMD(SMD2+ (n1+ n2)2n1×n2)

Where:

SMD = standardized mean difference as in formula 2 above

n1 and n2 = the sample sizes of the antipsychotic and the placebo groups.

The ratio of n1 and n2, eg a = n1/n2, rather the absolute numbers of n1 and n2 are important for the formula, which could be rewritten as

r=SMD(SMD2+ (a+ 1)2a)

In our example, we assumed equal group sizes, n1 = n2 or a = 1:

r=0.5(0.52+ (1+ 1)21)= 0.24

In a study where half the total population (here: antipsychotic and placebo-groups combined) responded, r can be interpreted as an absolute percentage difference in responders between groups (in other words, a risk difference, see formula 10c below), thus 24% in our case.13

Moreover, Rosenthal13 has shown that r is usually approximately SMD/2. This relationship can be used to also convert a given SMD to a percentage difference in responders,13 in our case SMD = 0.5, thus a 25% difference in responders.

The Ratio of Means

The ratio of means translates PANSS points differences between groups to percentage differences.14 In this regard it is similar to relative effect measures of dichotomous outcomes such as the risk ratio (formula 10 below).

Formula 5

Ratio of means = mean group Amean group B

Studies using different scales to measure the same concept, eg the PANSS and the BPRS, can also be combined with this formula. It can also be applied to either endpoint data or change from baseline data, but not to combine both data types.

In our case, for the mean PANSS at endpoint:

Ratio of means = 4555 = 0.82

Thus, participants treated with antipsychotics have only 82% of the symptoms as patients treated with placebo (or in other words, antipsychotics lead to an 18 % lower symptom score (100% − 82% = 18%).

It must be noted that the 30 minimum points of the PANSS, meaning no symptoms, always need to be subtracted before this calculation.15,16

In our case, for the mean PANSS change from baseline to endpoint:

Ratio of means = 2010 = 2

Thus, antipsychotics lead to a 2 times greater reduction of the PANSS than placebo.

The difference in the numbers and the interpretation of the results applying the at endpoint data (18% better/only 82% of symptoms with drug, see above) and the change data (2 times more symptom-reduction with drug) prevent the combination of endpoint and change data of this index in meta-analysis.

Improvement Fraction

A Prevented Fraction is frequently used in meta-analyses of dentistry17 to calculate how much caries affection of teeth is avoided eg by toothpaste. We converted this measure to an Improvement Fraction, because our outcome is improvement of PANSS, not prevention of worsening.

Formula 6

Improvement fraction = mean change drug mean change placebomean change placebo

In our case:

Improvement    fraction =  20 1010= 1  or  100%

Thus, compared to placebo drugs improve the PANSS by 100% more.

Drug Response Fraction

The Drug Response Fraction can be useful if one wants to know how much of the improvement in the drug group is due to drug response and not due to placebo response.

Formula 7

Drug response fraction =  mean change drug mean change placebomean change drug

In our case:

Drug response fraction =    20 1020= 0.5  or  50%

Thus, 50% of the improvement in the drug group is due to superiority of drug compared to placebo, rather than to placebo response.

Minimally Clinically Important Difference Units

The minimally clinically important difference (MCID) is the smallest change in a treatment outcome that an individual patient would identify. Multiple ways of deriving MCID are available,18 and the MCID index strongly depends on their results. We used 2 examples: The first MCID is from published equipercentile linking analyses in which we found that a reduction of 15 PANSS total score points from baseline correspond to minimal improvement according to the Clinical Global Impression Improvement Score (CGI-I).19 The second MCID follows a distribution-based approach by Furukawa et al20 who suggested that the MCID could be half the standard deviation of the change, MCID = 0.5 * SD, thus in our case 0.5 × 20 = 10.

If the Mean Difference between drug and placebo is divided by this MCID, we obtain an effect size index in MCID units (MCIDU).

Formula 8

MCID effect size =    mean differenceMCID

If the MCID is assumed to be 1519:

MCID effect size = 1015 = 0.67

If the MCID is assumed to be 1020:

MCID effect size = 1010 = 1.00

This result means that antipsychotics on average improve overall symptoms by 0.67/1.00 MCID units.

Number-Needed-To-Treat-To-Benefit

A transformation of Standardized Mean Difference to effect size indices for dichotomous data can be useful, eg in situations where the necessary numbers (in essence two by two tables) for their direct calculation are not available, or generally because these indices maybe more interpretable for readers.6 It has been shown that a transformation using the formula by Chinn21 works well for experimental event rates and control event rates between 20% and 80%,21,22 but not for extremely low or high values. The Number-Needed-To-Treat-To-Benefit (NNTB) means how many patients need to be treated with an antipsychotic compared to placebo (or another antipsychotic) to have one additional responder. It can be derived from SMD with the following formula22–24:

Formula 9

NNTB=1( Φ (SMD Ψ (CER)) CER)   

Where

Φ = the cumulative distribution function of the standard normal distribution and Ψ is its inverse

CER = is the control event rate, in our case the response rate of the placebo group of 30% (see above).

This formula shows that, given a certain SMD, NNT will differ according to the Control Event Rate (CER), here the placebo response rate. CER depends on a) the response threshold and b) the event-rate in the control-group associated with that threshold. In our case below, we used “at least minimally improved” as a threshold which roughly corresponds to at least 20% reduction of the PANSS or BPRS total score from baseline.25 Based on this threshold approximately 30% of patients respond to placebo in antipsychotic drug trials26 which we used as Control Event Rate.

In our case:

NNTB=1( Φ (0.5  Ψ (0.3)) 0.3)   = 5

Thus, 5 patients must be treated with an antipsychotic instead of placebo so that one is at least minimally improved.

Odds Ratios and Relative Risks Derived from Standardized Mean Differences

Standardized Mean Differences (SMDs) can also be converted to relative risks.

First, an Odds Ratio (OR) is estimated21:

Formula 10a

logOR = standardized mean difference × π3 1.81× standardized mean difference = 0.905

OR = 2.47

Where:

π = 3.14159 (the classical circle number pi)

In our case

Then we convert Odds Ratio to Relative Risk:

Formula 10b

Relative risk =  odds ratio(1CER + (CER × odds ratio))

Where CER = control (=placebo) event rate

In our case:

logOR 1.81× 0.5= 0.905  and  OR= 2.47

Then:

Relative risk=2.47[1 0.3+ (0.3× 2.47)]= 1.71

Thus, 1.71 times more patients treated with an antipsychotic are at least minimally improved than those treated with placebo.

Experimental Event Rate (%), in Our Case the Response Rate of the Antipsychotic Group, Derived from the Standardized Mean Difference

From the Relative Risk (RR) we can obtain the EER (percentage responders in the drug group) by:

Formula 11

Experimental event rate (%) = control event rate×relative risk

In our case:

Experimental event rate, here drug response rate (%) = 0.3×171 = 0.51 or 51 % 

Thus, 52% of the patients are at least minimally improved

Risk Difference (%), in Our Case the Difference in Response Rate Between Antipsychotics and Placebo Difference

The Risk Difference (RD) is the difference in responder rates between drug and placebo:

Formula 12

Risk difference (%) = percentage responders drug group 100×CER

In our case:

RD (%)= 51% 30%= 21%

Thus, an absolute 21% more of the patients in the antipsychotic group than in the placebo group are at least minimally improved.

Results

Table 1 summarizes the results based on the various effect size indices and the respective 95% credible intervals. It also presents a summary of the strengths and weaknesses of the indices. Effect sizes for single drugs vs placebo derived from Huhn et al1 are presented in eAppendix3.

Table 1.

Summary of Results and Strengths and Weaknesses of the Various Effect Size Indices

Antipsychotics vs Placebo Based on Different Effect Size Indices General Strengths and Weaknesses of the Effect Size Indices (Bolt Print Means a Positive Property)
Results Mean (95% CI) Scales Measuring the Same Concept Can be Combined Change and at Endpoint Data Can be Combined Studies With Positive Value in One Group and Negative Value in the Other Can be Combined The Same Formula Works for Continuous and Dichotomous Outcomes Problem With Results Near Zero Interpretability
Mean difference (PANSS units) 9.4 (8.4, 10.2) No Yes Yes No No +c
Standardized mean difference 0.47 (0.42, 0.51) Yes Yes a Yes No No +
Correlation coefficient 0.23 (0.21, 0.25) Yes Yes Yes Yes No +(+)
Ratio of means at endpoint/ratio of means change from baseline to endpoint 0.83 (0.81, 0.85) 1.94 (1.84, 2.02) Yes No No No Yes ++
Improvement fraction (%) 94 (84, 102) Yes No No No Yes ++
Drug response fraction (%) 48.5 (45.7, 50.5) Yes No No No Yes ++
Minimally clinically important difference assuming a MCID of 15 PANSS or 10 PANSS points 0.63 (0.56, 0.68) 0.94 (0.84, 1.02) Yes Yes Yes No No
Number-needed-to-treat b 5 (5,6) Yes Yes a Yes No No +
Odds ratio b 2.34 (2.14, 2.52) Yes Yes a Yes No No
Relative risk b 1.67 (1.59, 1.73) Yes Yes a Yes No No +
Risk difference b 20 (18, 22) Yes Yes a Yes No No ++
Response rate drug (EER) vs response rate placebo (CER) 50% (48, 52) versus 30% (27–34) b Yes Yes a Yes No No +++

The effect size indices were derived from Leucht et al8. Bold means a positive property.

aTheoretically to combine endpoint and change data using SMD and all indices derived from SMD can lead to wrong results.10 However, one large meta-epidemiological study suggested that to combine them is possible.27

bAll the effect size indices in italics are primarily effect sizes for dichotomous outcomes. We used the average percentage responders of 30% in the placebo groups found in Leucht et al 20178 based on the threshold of at least 20% PANSS/BPRS total score reduction from baseline or at least minimally improved on the CGI. As we describe in the text, the percentage placebo responders are part of formulae 9, 10b, 11, 12, 13, and it can be varied, eg if a lower or higher placebo response and be expected in a population. − = Questionable, + = ok, ++ good, +++ very good, n.a. = not applicable, because the control event rate can be chosen.

cLess interpretable when the unit is not natural or the scale is not well known.

Discussion

As the standardized mean difference is an effect size index in standard deviation units, it is difficult to interpret.6 We, therefore, applied 13 indices to the results of recent meta-analysis comparing antipsychotic drugs with placebo. We found that the mean SMD was 0.47, corresponding to a mean difference of 9.4 PANSS points between drug and placebo, a correlation coefficient r of 0.23 (approximately an absolute response difference of 23%), that drug-treated patients have a 17% lower PANSS at endpoint (RoMe) or a 1.94 times (RoMc) or 94% (IF) greater reduction of the PANSS from placebo, that 48.5% of the improvement in the drug group is due to superiority of drug rather than to placebo response (DRF), that the superiority corresponds to 0.63/0.94 MCID units depending on the MCID used, a NNT of 5, an OR/RR,/RD of 2.34/1.67/20%, and that under drug 50% are at least minimally improved compared to 30% with placebo. In the following text we discuss the efficacy of antipsychotic in the light of the various effect sizes indices, we provide further insights into the advantages and limitations of these measures, and we make recommendations as to how results should be presented in future meta-analyses.

Meta-analyses of psychiatric treatments usually apply the Standardized Mean Difference as effect size index. A large survey has, however, shown that clinicians cannot interpret it, probably because its units are unintuitive standard deviations.6 Cohen suggested a rule of thumb that a SMD of 0.2 is a small, 0.5 a medium, and 0.8 a large difference between groups according to which5 the superiority of antipsychotics over placebo would be medium-sized (SMD 0.47, table 2). Although Cohen himself warned that the interpretation depends on the context, his rule is nowadays universally used. An effect size of 0.2 can be important when it relates to a severe side-effect, and it can be meaningless for a minor side-effect. Table 2 presents some of Cohen's original examples. Moreover, as the Standardized Mean Difference (and all indices using SMD in their formulae: r, NNT, OR, RR, RD, percentage responders) accounts for within-group variability, it has the problem that if in two trials the true effect (as measured by the difference in means) is identical, but the standard deviations (SDs) are different, then the SMDs will be different.1 For example, when explanatory trials with tight inclusion criteria are combined with pragmatic trials with broad inclusion criteria in a meta-analysis, the latter consequently have higher SDs.1 In a related vein, the SMD depends on the choice of the standard deviation in the denominator of Formula 2.1 Many software packages such as Cochrane's RevMan 535 apply Hedges's g (an adjustment of Cohen's d for small sample sizes) which uses the average of the standard deviations of the two groups compared. In contrast, eg Glass' delta (Δ) uses only the SD from the comparator group (see Cochrane handbook1 and Cooper and Hedges12 for further explanations).

Table 2.

Cohen’s Classification of SMDs5

SMD Size Some of Cohen's Examples
0.20 Small The mean height between 15- and 16-year-old girls ~1.27 cm
The difference in mean IQ between twins and nontwins (3 IQ-points, IQ measured with a scale with mean 100, and SD 15)b
0.50 Medium “Large enough to be visible to the naked eye”
The difference in height between 14- and 18-year-old girls ~2.54 cm
An average difference in IQ between clerical and semiskilled workers or between members of professional and managerial occupational groups
Difference between teachers and general clerks in the World War II Classification Test (7.5 IQ-points)b
0.80 Large “Grossly perceptible and therefore large differences”
The difference in height between 13- and 18-year-old girls ~4 cma
The difference in mean IQ between clerical and semiskilled workers and between professionals and managers (~ 12 IQ pointsb)
IQ difference estimated between holders of the Ph.D. degree and typical college freshmen, or between college graduates and persons with only a 50–50 chance of passing in an academic high school curriculum

a4 cm taken from a growth chart produced by the National Center for Health Statistics in collaboration with the Center for Chronic Disease Prevention and Health promotion, 2000 http://www.cdc.gov/growthcharts, because Cohen5 did not indicate an exact number.

bNumbers supplemented by us.

Due to the difficulty of interpreting Standardized Mean Differences, some authors such as the Cochrane Schizophrenia Group generally prefer Mean Differences which keep the original units of the rating scale. The Mean Difference, however, does not allow to combine studies which used different rating scales to measure the same concept. As another example, a mean reduction of 4 seconds in completing a Trailmaking B task is not even remotely comparable to an increase of a mean 4 categories solved on the Wisconsin Card Sorting Test, although both tests are indices of a similar cognitive function and consist in an improvement of 4 units. Moreover, it is arguable whether nonexperts, let alone clinicians know what a 9.4 PANSS total score difference (table 1) between antipsychotics as a group and placebo mean. Once readers are familiar with the generic concept of Standardized Mean Differences, they might interpret SMDs more easily than Mean Differences which is supported by a meta-epidemiological study.28 However, when original units are intuitive such as kilogram or QTc prolongation, meta-analysts should primarily use mean differences.

The ratio of means (RoM) expresses differences in percentages which may be easier to interpret for clinicians than standard deviations.14 As a disadvantage change data and endpoint data cannot be combined. Applying RoM to the PANSS at endpoint it is 83% (indicating a 17% lower score with drug than with placebo), comparably small, but in terms of PANSS reduction from baseline it is 194% (indicating 94% more decrease with drug), thus quite large (table 1). Moreover, the Ratio of Means requires that the values of both groups are either always positive or always negative.14 This requirement is met when at endpoint data are used, but not for the usually presented change data. For example, in antipsychotic drug trials patients on placebo sometimes worsen on average, while the patients of the antipsychotic group on average improve. Therefore, RoM is not ideal for meta-analyses of antipsychotic drugs. Still, it is possible to provide at least an estimate for RoM in such situations: the means in the antipsychotic groups and the placebo groups can first be averaged, eg using a single-group meta-analysis. Then, as long as both means have the same sign which is usually the case due to placebo-response, they can be divided to provide RoM (Formula 5), although the result can only be an estimate, because some principles of pairwise meta-analysis are ignored by this approach.

The major advantage of correlation coefficient r is its universal usability for continuous and dichotomous data, change and at endpoint data, and when different scales to measure the same concept are used.13 Its interpretation as an absolute percentage difference in responders (here an absolute 23% difference between drugs and placebo, table 1), even in the case of continuous outcomes, is also attractive. We feel that 23% is a substantial, although not an extreme difference. Rosenthal called the translation of r into a responder rate difference the “binomial effect size display.” 13 However, it is based on the assumption that half the population improved, and half did not. This would roughly be the case in our meta-analysis comparing antipsychotics with placebo, if the cutoff “at least minimally improved” were applied (51% responded to drugs and 30% to placebo, thus the 23% difference derived from r approximately fits), but not for the response cutoff “at least much better” (23% vs 14%8).

The unit of the Improvement Fraction (“how much is the difference between drug and placebo in relationship to the placebo change”) and of the Drug Response Fraction (“how much of the improvement in the drug group is due to drug response and not due to placebo response”) is percent, which may again ease interpretation. They can be applied to both change and endpoint data and all values except no change in the drug group are possible (due to the impossibility to divide by zero). However, values close to zero can bias results. Some meta-analyses in dentistry, the area where the related measure Prevented Fraction was initially used,17 therefore had to exclude such studies.29 These measures may be useful in special situations, eg when one wants to know what the true drug effect without the placebo effect is.

There is a lot of debate about the MCID and – when applied as an effect size index – readers need a lot of background knowledge to understand the meaning.30 Multiple ways of deriving MCID are available,18 and the MCID index strongly depends on their results. Using the MCID from equipercentile linking analyses19 would lead to a drug-placebo difference of 0.63 which is below the MCID. In contrast, using a distribution-based approach by Furukawa et al20 the difference would be 0.94 MCID unit. Moreover, the MCID is the minimally clinically important difference for patients, but our equipercentile linking results were based on the clinical global impressions of clinicians.19 The simple reason is that MCIDs based on patient opinion are not available.

The last four measures, the Number-Needed-To-Treat-To-Benefit, the Odds Ratio, the Relative Risk, and the Placebo and Drug Response Rates are originally effect size indices for dichotomous outcomes. Formulae 9–12 can be useful to convert mean PANSS change/at endpoint to these indices, if two-by-two tables are not available for their direct calculation. It is an advantage of these measures that a “baseline risk” (the response in the placebo group) can be chosen. For Formulae 9–12 we used the average response in the placebo group (30% were at least minimally improved8), but in a clinical situation where one would expect little placebo response a lower value can be chosen. For example, if we assumed a placebo response of 10% for the comparison of all antipsychotics combined vs placebo, the NNTB would be 9 (8, 10) the RR 2.06 (1.92, 2.19), the percentage responders in the drug group 21% and the risk difference 10%. In our opinion, such a presentation of the placebo response rate (30%) together with the drug–response rate (50%) clinically is the easiest to understand measure, because all other measures just show the difference between drug and placebo, but not at which level the difference occurs. In other words, we need to know the placebo response to understand whether a relative risk or absolute risk difference is clinically meaningful.

Limitations of the analysis are that we used the standardized mean differences of the original meta-analyses rather than rerunning the meta-analyses from scratch. The reason was that, as we explained above, some measures such as RoM for change data could not be used for some individual trials. We did not discuss effect size indices for dichotomous outcomes such as odds ratios, relative risks, and risk differences, except in the context of their extrapolation from SMDs. These effect size indices for dichotomous outcomes have been extensively compared with one another and their interpretation is more straightforward.10 We emphasize that the conversion of SMDs to ORs (and subsequently RR, RD, NNT, EER) relies on the assumption that the continuous outcome fits to a logistic distribution.21 Their estimation from Standardized Mean Differences using the formula by Chinn21 works well “in the middle,” approximately EERs and CERs between 20% and 80%,21,22 but not for extremely low or high values. We did also not discuss variations of effect sizes using standard deviation units (eg Hedges's g, Glass's ∆) or the formulae to derive measures of variance, but the general principles are the same. Table 1 only provides the principal pros and cons of the various indices, but there are other debates about more specific aspects.28,31 Moreover, rating scales can differ in psychometric properties such as ceiling and floor effects, sensitivity to change, and variability. Finally, all indices except the Mean Difference in principle allow the combination of scales as long as they measure the same construct. This is probably the case for the PANSS and the BPRS total score as measures of overall psychopathology, but much less clear for disorder specific and generic quality of life scales.

Taken together we feel that these indices reflect clear, but not large antipsychotic drug effects. For example, the mean SMD of 0.47 is similar to that of antihypertensive medication (0.55).32,33 We emphasize that as we have extensively discussed previously26 most studies were conducted in chronic, probably partially refractory patients. First episode patients respond much better, but a placebo-controlled first episode study does not exist.34 Such a study would be important because it could also be that placebo-response is greater in first-episode patients. Future meta-analyses should report other effect size indices than the standardized mean difference, in particular percentage responders in the drug and placebo groups, as well. They can be easily derived and enhance the interpretation of research findings.

Supplementary Material

sbab094_suppl_Supplementary_eAppendix

Acknowledgments

In the last three years, S.L. has received honoraria as a consultant and/or advisor and/or for lectures from Alkermes, Angelini, Eisai, Gedeon Richter, Janssen, Johnson and Johnson, Lundbeck, Medichem, Merck Sharpp and Dome, Otsuka, Recordati, Rovi, Sandoz, Sanofi Aventis, Sunovion, TEVA. A.C. has received research and consultancy fees from INCiPiT (Italian Network for Paediatric Trials), CARIPLO Foundation and Angelini Pharma. A.C. is supported by the National Institute for Health Research (NIHR) Oxford Cognitive Health Clinical Research Facility, by an NIHR Research Professorship (grant RP-2017-08-ST2-006), by the NIHR Oxford and Thames Valley Applied Research Collaboration and by the NIHR Oxford Health Biomedical Research Centre (grant BRC-1215-20005). The views expressed are those of the authors and not necessarily those of the UK National Health Service, the NIHR, or the UK Department of Health. T.A.F. reports grants and personal fees from Mitsubishi-Tanabe, personal fees from MSD, grants and personal fees from Shionogi, outside the submitted work; in addition, T.A.F. has a patent 2020-548587concerning smartphone CBT apps pending, and intellectual properties for Kokoro-app licensed to Mitsubishi-Tanabe. The other authors have no conflict of interest to declare.

References

  • 1. Huhn M, Nikolakopoulou A, Schneider-Thoma J, et al. Comparative efficacy and tolerability of 32 oral antipsychotics for the acute treatment of adults with multi-episode schizophrenia: a systematic review and network meta-analysis. Lancet. 2019;394(10202): 939–951. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Moncrieff J. The Bitterest Pills: The Troubling Story of Antipsychotic Drugs. Houndmills, UK: Palgrave McMillan; 2013. [Google Scholar]
  • 3. Gøtzsche PC, Young AH, Crace J. Does long term use of psychiatric drugs cause more harm than good? BMJ. 2015;350:h2435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Tandon R, Fleischhacker WW. Comparative efficacy of antipsychotics in the treatment of schizohrenia: a critical assessment. SchizophrRes. 2005;79:145–155. [DOI] [PubMed] [Google Scholar]
  • 5. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, New Jersey: Lawrence Erlbaum Associates; 1988. [Google Scholar]
  • 6. Johnston BC, Alonso-Coello P, Friedrich JO, et al. Do clinicians understand the size of treatment effects? A randomized survey across 8 countries. CMAJ. 2016;188(1):25–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Rosenthal R, DiMatteo MR. Meta-analysis: recent developments in quantitative methods for literature reviews. Annu Rev Psychol. 2001;52:59–82. [DOI] [PubMed] [Google Scholar]
  • 8. Leucht S, Leucht C, Huhn M, et al. 60 years of placebo-controlled antipsychotic drug trials in acute schizophrenia: systematic review, Bayesian meta-analysis and meta-regression of efficacy predictors. Am J Psychiatr. 2017;17(10):927–942. [DOI] [PubMed] [Google Scholar]
  • 9. Kay SR, Fiszbein A, Opler LA. The positive and negative syndrome scale (PANSS) for schizophrenia. Schizophr Bull. 1987;13(2):261–276. [DOI] [PubMed] [Google Scholar]
  • 10. Higgins JPT, Thomas J, Chandler J, et al. Cochrane Handbook for Systematic Reviews of Interventions Version 6.1 (Updated September 2020). Vol Available from www.training.cochrane.org/handbook: Cochrane; 2019. [Google Scholar]
  • 11. Overall JE, Gorham DR. The brief psychiatric rating scale. Psychol Rep. 1962;10:790–812. [Google Scholar]
  • 12. Cooper H, Hedges LV.. The Handbook of Research Synthesis. New York: Russell Sage Foundation; 1994. [Google Scholar]
  • 13. Rosenthal R. Meta-Analytic Procedures for Social Research. Vol 6. 2nd ed. New York, London, New Delhi: Sage Publications; 1991. [Google Scholar]
  • 14. Friedrich JO, Adhikari NK, Beyene J. The ratio of means method as an alternative to mean differences for analyzing continuous outcome variables in meta-analysis: a simulation study. BMC Med Res Methodol. 2008;8:32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Leucht S, Davis JM, Engel RR, Kane JM, Wagenpfeil S. Defining ‘response’ in antipsychotic drug trials: recommendations for the use of scale-derived cutoffs. Neuropsychopharmacology. 2007;32(9):1903–1910. [DOI] [PubMed] [Google Scholar]
  • 16. Obermeier M, Mayr A, Schennach-Wolff R, Seemüller F, Möller HJ, Riedel M. Should the PANSS be rescaled? Schizophr Bull. 2010;36(3):455–460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Dubey ST, Lehnhoff RW, Radike AW. A statistical confidence interval for the true per cent reduction in caries-incidence studies. J Dent Res. 1964;44(5):921–923. [DOI] [PubMed] [Google Scholar]
  • 18. Wright A, Hannon J, Hegedus EJ, Kavchak AE. Clinimetrics corner: a closer look at the minimal clinically important difference (MCID). J Man Manip Ther. 2012;20(3):160–166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Leucht S, Kane JM, Etschel E, Kissling W, Hamann J, Engel RR. Linking the PANSS, BPRS, and CGI: clinical implications. Neuropsychopharmacology. 2006;31(10):2318–2325. [DOI] [PubMed] [Google Scholar]
  • 20. Furukawa TA, Scott IA, Guyatt G. Measuring Patients’ Experience. In: Gyatt G, Drummond R, Meade MO, Cook DJ, eds. Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. 3rd ed. New York: McGraw-Hill Education; 2015:219–234. [Google Scholar]
  • 21. Chinn S. A simple method for converting an odds ratio to effect size for use in meta-analysis. Statist Med. 2000;19:3127–3131. [DOI] [PubMed] [Google Scholar]
  • 22. Furukawa TA, Leucht S. How to obtain NNT from Cohen’s d: comparison of two methods. PLoS One. 2011;6(4):e19070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Furukawa TA. From effect size into number needed to treat. Lancet. 1999;353(9165):1680. [DOI] [PubMed] [Google Scholar]
  • 24. da Costa BR, Rutjes AW, Johnston BC, et al. Methods to convert continuous outcomes into odds ratios of treatment response and numbers needed to treat: meta-epidemiological study. Int J Epidemiol. 2012;41(5):1445–1459. [DOI] [PubMed] [Google Scholar]
  • 25. Leucht S, Kane JM, Kissling W, Hamann J, Etschel E, Engel RR. What does the PANSS mean? Schizophr Res. 2005;79(2-3):231–238. [DOI] [PubMed] [Google Scholar]
  • 26. Leucht S, Leucht C, Huhn M, et al. Sixty years of Placebo-controlled antipsychotic drug trials in acute schizophrenia: systematic review, bayesian meta-analysis, and meta-regression of efficacy predictors. Am J Psychiatry. 2017;174(10):927–942. [DOI] [PubMed] [Google Scholar]
  • 27. da Costa BR, Nüesch E, Rutjes AW, et al. Combining follow-up and change data is valid in meta-analyses of continuous outcomes: a meta-epidemiological study. J Clin Epidemiol. 2013;66(8):847–855. [DOI] [PubMed] [Google Scholar]
  • 28. Takeshima N, Sozu T, Tajika A, Ogawa Y, Hayasaka Y, Furukawa TA. Which is more generalizable, powerful and interpretable in meta-analyses, mean difference or standardized mean difference? BMC Med Res Methodol. 2014;14:30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Marinho VC, Worthington HV, Walsh T, Chong LY. Fluoride gels for preventing dental caries in children and adolescents. Cochrane Database Syst Rev. 2015;2015(6):CD002280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Devji T, Carrasco-Labra A, Qasim A, et al. Evaluating the credibility of anchor based estimates of minimal important differences for patient reported outcomes: instrument development and reliability study. BMJ. 2020;369:m1714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Thorlund K, Walter SD, Johnston BC, Furukawa TA, Guyatt GH. Pooling health-related quality of life outcomes in meta-analysis-a tutorial and review of methods for enhancing interpretability. Res Synth Methods. 2011;2(3):188–203. [DOI] [PubMed] [Google Scholar]
  • 32. Leucht S, Hierl S, Kissling W, Dold M, Davis JM. Putting the efficacy of psychiatric and general medicine medication into perspective: review of meta-analyses. Br J Psychiatry. 2012;200(2):97–106. [DOI] [PubMed] [Google Scholar]
  • 33. Law M, Morris JK, Jordan R, Wald N. Headaches and the treatment of blood pressure: results from a meta-analysis of 94 randomized placebo-controlled trials with 24,000 participants. Circulation. 2005;112(15):2301–2306. [DOI] [PubMed] [Google Scholar]
  • 34. Zhu Y, Li C, Huhn M, et al. How well do patients with a first episode of schizophrenia respond to antipsychotics: a systematic review and meta-analysis. Eur Neuropsychopharmacol J Eur College Neuropsychopharmacol. 2017;27(9):835–844. [DOI] [PubMed] [Google Scholar]
  • 35. Review Manager (RevMan) [Computer program]. Version 5.4 [computer program ]. Version; 2020.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sbab094_suppl_Supplementary_eAppendix

Articles from Schizophrenia Bulletin are provided here courtesy of Oxford University Press

RESOURCES