Abstract
Antidepressants effectiveness in major depressive disorder (MDD) is still questioned because the extrapolation of randomized controlled trial (RCT) results to “real life” settings is problematic. The application of the RCT paradigm in a disorder of this type, where global care plays a central role, raises questions regarding the internal and external validity of this type of study. Outcome measurement, attrition rates, the ability of the double‐blind design to control for expectations, placebo response, the representativeness of trial participants and publication bias are major methodological pitfalls. This review discusses these issues. It is illustrated using original data and proposes some alternatives for assessing antidepressant effectiveness via different approaches. Some are easy to implement, such as ecological measures, qualitative approaches, improvement of analytical strategy and improvement of blinding procedures. Some are sophisticated, involving temporary deception to deal with the confounding effect of expectations, and they raise ethical issues. Others resort to external validity, this being the case in observational studies. But all are necessary to explore antidepressant effectiveness. Copyright © 2013 John Wiley & Sons, Ltd.
Keywords: antidepressants, clinical trials, depression, effectiveness, methodology
Introduction
The usefulness of antidepressants in major depressive disorder (MDD) is still questioned (Ioannidis, 2008) and the issue goes beyond the scientific debate, with a backdrop of conflicts of interest and some concerns about the medicalization of modern society (Lacasse and Leo, 2005). The usefulness of a drug is usually reflected by its efficacy (under optimal circumstances), its effectiveness (in routine care), and its efficiency (does it maximize value for money?) (Bombardier and Maetzel, 1999). Among these concepts, effectiveness seems to be the most relevant question for clinicians. Randomised controlled trials (RCTs) are generally used to address this issue.
Since the first published RCT versus placebo to explore the efficacy of streptomycin in tuberculosis (Medical Research Council, 1948), this design has become a gold standard. But tuberculosis is quite different from a mood disorder that fits a bio‐psychosocial model (Garcia‐Toro and Aguirre, 2007), where therapeutic benefits could result from the context in which the study is performed. Global care (including the ethical meaning of this term) (Beauchamp and Childress, 2008) plays a central role, and psychological factors such as expectations may influence the results. Outcome definition, analysis and extrapolation of the results are likewise somewhat specific in mood disorders.
The National Institute for Clinical Excellence (NICE) guidelines on the treatment and management of depression in adults (NICE, 2010) advise caution when considering the application of RCTs results in routine practice, and suggests that better ways of assessing effectiveness have yet to be developed.
The present paper has two aims: (1) to present various methodological shortcomings in the evaluation of antidepressant effectiveness concerning internal (reliability of the results) and external (scope for generalization) validity of trials, which are two fundamental inter‐connected and sometimes contradictory guarantees; (2) to present certain alternatives to address these issues in order to achieve a balance between these two concepts. We have illustrated our reflexion with secondary analyses from a previous meta‐analysis (Naudet et al., 2011).
Methodological issues
Outcome measurement and analysis is problematic
Outcome measurement
The efficacy of antidepressants is assessed using continuous outcomes (the mean change on a scale), or categorical outcomes (response rate and remission rates).
Concerning continuous outcomes, the Hamilton Depression Rating Scale (HDRS; Hamilton, 1960) and the Montgomery–Asberg Depression Rating Scale (MADRS; Montgomery and Asberg, 1979) are recognized as gold standards (Duru and Fantino, 2008). Nevertheless they can show up differences that are statistically significant in formal terms even for differences that are not significant from a clinical point of view (Ioannidis, 2008): the identification of a minimum clinically relevant difference is not straightforward (Falissard et al., 2003). In addition, they contain items that are not specific to depression (sleeping difficulties, anxiety, agitation and somatic complaints) and may highlight non‐mood‐related benefits (Moncrieff, 2002). Moreover, these scales tend to be used by tradition rather than because of their perfect validity and reliability. For example, a review showed that the HDRS was not optimal psychometrically and was conceptually flawed (Bagby et al., 2004).
The Clinical Global Impression (CGI; Guy, 1976) is used as a global assessment. It comprises a single item with high “face validity”, but it may be more prone to rater bias (Gaudiano and Herbert, 2005). Its validity is debated mainly because the response format used in the CGI is more likely to be ambiguous (what is the definition of a patient who is “Severely ill”?) and is prone to cultural misunderstanding (for example concerning the meaning of “moderate”) (Kadouri et al., 2007).
The scales considered here are clinician‐version evaluations, while self‐administered questionnaires like the Beck Depression Inventory (BDI; Beck et al., 1961) or ecological measures such as computerized assessments of depression (Greist et al., 2002; Mundt et al., 2006) are far from being systematically reported in antidepressant studies.
Clinicians, in their day‐to‐day practice, are used to dealing with binary outcomes such as response and remission, which have considerable prognostic value (Judd et al., 1998). Although these concepts appear intuitive, no real gold standard exists and categorical outcomes are generally calculated from continuous data, and are provided by the proportion of people who meet a predefined level of improvement (response) or fall below a predefined threshold score (remission) at a given time point. This does not take into account the longitudinal aspect of these concepts, and creates the impression of clear‐cut patterns where the data does not suggest any. This phenomena is interpreted as a major bias (Kirsch and Moncrieff, 2007) or as proof of antidepressant effectiveness (Gibbons et al., 2012) depending on the authors’ preconceived beliefs.
Attrition rates
Among patients enrolled in a RCT, typically 20 to 40% fail to complete the study. Whereas a loss to follow‐up of 5% or lower is usually of little concern, a loss of 20% or more prevents good quality intention‐to‐treat (ITT) analysis, can cause biased estimates of the treatment effect (Dumville et al., 2006) and restricts the scope for generalizing results (Leon et al., 2006). The two approaches to the analysis of incomplete data used in most of the studies by which efficacy of new‐generation antidepressants is established (conducted between 1990 and 2010) are far from ideal: (1) complete case analysis assumes that missing data are “Missing Completely At Random” (dropout is unrelated to the phenomenon studied or to patient characteristics) which is not likely to be valid; (2) the last observation carried forward (LOCF) procedure, which is the most frequently used method, negatively impacts treatment arm results since dropouts are assumed not to improve beyond their removal from the study. It ignores the natural history of MDD (Posternak et al., 2006). Differential dropout rates between groups may artificially inflate the superiority of one study condition over another. This method does not incorporate the uncertainty surrounding the imputed data in the analyses (Leon et al., 2006).
Regarding categorical outcomes, the maximum bias hypothesis (non‐assessed patients are recorded as in remission if they belong to the placebo group and as having not responded if they belong to the antidepressant group) could be considered as the most complete and accurate measure of robustness for an analysis. It is rarely performed in MDD trials. In fact it leads to the reverse conclusion (placebo superiority) as shown in Table 1.
Table 1.
Studies | Optimistic bias | ITTLOCF | OC | Attrition = failure | Maximum bias |
---|---|---|---|---|---|
Sheehan | 3.35 [2.32; 4.84] | 1.34 [0.96; 1.87] | 1.58 [1.09; 2.28] | 1.52 [0.98; 2.37] | 0.56 [0.41; 0.75] |
Rudolph | 4.12 [2.71; 6.26] | 1.66 [1.19; 2.32] | 1.71 [1.16; 2.52] | 1.77 [1.13; 2.78] | 0.51 [0.41; 0.65] |
Mendels | 2.02 [1.54; 2.66] | 1.27 [0.99; 1.63] | 1.31 [1.03; 1.65] | 1.44 [1.08; 1.91] | 0.82 [0.69; 0.98] |
WXL101497 | 1.67 [1.39; 2.00] | 1.38 [1.15; 1.67] | 1.34 [1.12; 1.60] | 1.38 [1.14; 1.69] | 1.03 [0.87; 1.21] |
AK130940 | 1.82 [1.53; 2.16] | 1.33 [1.11; 1.59] | 1.31 [1.11; 1.55] | 1.30 [1.06; 1.58] | 0.87 [0.74. 1.02] |
Total | 2.31 [1.75; 3.06] | 1.36 [1.23; 1.51] | 1.36 [1.23; 1.50] | 1.39 [1.24; 1.57] | 0.74 [0.59; 0.94] |
Note: Data were extracted from our previous meta‐analysis: out of 26 randomized double‐blind trials, five studies on venlafaxine versus placebo had extractable data. Meta‐analyses of response rates using a random effect model were performed under different hypotheses about missing data. Five situations were considered:
• Optimistic bias analysis: non‐assessed patients are recorded as in remission if they belong to the antidepressant group and as having not responded if they belong to the placebo group;
• ITTLOCF: patient status is derived from the LOCF method on continuous outcomes;
• OC: observed case analysis;
• Attrition = failure: non‐assessed patient are recorded as not having responded in both groups;
• Maximum bias: non‐assessed patients are recorded as in remission if they belong to the placebo group and as not having responded if they belong to the antidepressant group.
Results are presented as relative risk. Positive relative risk favours venlafaxine and negative relative risk favours placebo.
This example illustrates the uncertainty that arises from missing data when assessing antidepressant effect, which can vary from a marked superiority of antidepressants over placebo to a superiority of placebo over antidepressants, depending on the imputation method used for missing data.
The response rate in the placebo group affects internal validity
Response rate in placebo group
Response in the placebo group is substantial in RCTs on antidepressants and has led to many negative trials (Enserink, 1999). Evidence of an increasing placebo response rate over the years has been documented (Walsh et al., 2002) and justifies the continued use of placebo‐control trials even if there are a number of proven treatments for MDD (Benedetti et al., 2005). However, the cause of this increase is unclear; it could be hypothesized that increasing antidepressant availability, greater social acceptability (Olfson et al., 2002) and the changes in the methods by which patients are recruited into therapeutic trials (Walsh et al., 2002) could have resulted in changes in clinically important population characteristics (for example outpatients with less severe episodes) and could have contributed to changes in the placebo effect. Moreover, expectations about the therapeutic benefit of treatment may have changed and could affect the results of antidepressant RCTs (Krell et al., 2004; Noble et al., 2001; Rutherford et al., 2010; Sotsky et al., 1991) because they are linked with the placebo effect.
Meta‐analyses (Fournier et al., 2010; Khan et al., 2002; Kirsch et al., 2008) suggest that the baseline severity of depressive symptoms is related to clinical trial outcome. The minimum baseline HDRS score needed to reach a clinically meaningful difference between antidepressant and placebo was found to be approximately 28 (very severely depressed patients) (Kirsch et al., 2008) or 25 (Fournier et al., 2010). Despite disagreements regarding whether the increasing superiority of antidepressants relative to placebo as severity increases is due to an increasing efficacy of antidepressants or a declining efficacy of placebo, the association between the drug‐placebo difference and baseline severity is consistent and robust in the different meta‐analyses.
Placebo response and internal validity
Randomization (Vandenbroucke, 2004) is a cornerstone of internal validity: it enables unbiased allocation of treatment (Schulz and Grimes, 2002) and complies with statistical theory for random sampling. Blind allocation of treatment makes it possible to infer the specific treatment effect, thus addressing the problem of patient expectations (Fisher, 1971). However, regarding antidepressants, the ability of a double‐blind design to preserve the benefit of randomization is disputed (Perlis et al., 2010). The first reason is that the majority of patients and doctors correctly distinguish between placebo and active medication (Bystritsky and Waikar, 1994; Rabkin et al., 1986): this will be referred to as “unblinding”. For instance the blinding could be compromised by the emergence of adverse effects (Perlis et al., 2010) known to be associated with a specific medication; informed consent forms, which list common adverse effects, may increase this risk (Brownell and Stunkard, 1982). Moreover, the possibility of belonging to a placebo group could lead to lower expectations of how much a patient is likely to improve during the trial. The likelihood of response and remission is significantly higher in comparator versus placebo‐controlled trials (Naudet et al., 2011; Rutherford et al., 2010; Rutherford et al., 2009; Sinyor et al., 2010; Sneed et al., 2008). This phenomenon could be differential between antidepressant arms and placebo arms: a greater probability of receiving placebo predicted a greater antidepressant efficacy versus placebo (Papakostas and Fava, 2009), without influencing attrition rates (Tedeschini et al., 2010). This has methodological implications since it could lead to an under‐estimation of the placebo effect in placebo‐controlled trials, and ethical consequences because patient improvement in such studies is poorer.
It has been hypothesized by certain authors that the apparent antidepressant effect is actually an active placebo effect (Kirsch and Sapirstein, 1998) (the different physiological experiences resulting from the ingestion of an active drug and an inert placebo may lead patients and assessors to suspect the nature of the medication and this will then introduce bias due to different expectations for treatment effect).
Understanding the placebo response
While a high response rate in a placebo group is a major methodological problem, there is considerable debate about the size, the nature and the mechanism of the placebo effect in depression. The placebo effect is quite difficult to define and has two main interpretations: the effect of the placebo intervention, and the effect of patient–provider interaction.
In a first approach, the effects of a placebo can be estimated as the difference between the placebo arm and a no‐treatment arm. A meta‐analysis utilizing a controversial approach found that a placebo effect accounted for about 50% of the response, “natural history” for about 25% and antidepressant effect for 25% (Kirsch and Sapirstein, 1998). In contrast, in another controversial and underpowered meta‐analysis of RCTs of placebo versus no‐intervention (Hrobjartsson and Gotzsche, 2010), there was no statistically significant effect of placebo interventions in depression.
Beyond these two contrasted approaches, scientific knowledge about the placebo effect in MDD is derived from RCTs, which are pragmatic and compare an antidepressant to a placebo in order to prove the superiority of the antidepressant, regardless of the underlying mechanisms. The calculation of the antidepressant–placebo difference by comparing marginal response rates is thus based on the postulate that all placebo responders should be antidepressant responders (additive model, Figure 1) whereas theoretically, antidepressant response and placebo response could be independent or, at least, substantially overlapping phenomena (non‐additive model, Figure 2) with four different types of patients: (1) placebo‐only responder; (2) treatment‐only responders; (3) placebo and treatment responders; (4) non‐responders (Kirsch, 2000; Rihmer and Gonda, 2008). It is also noteworthy that RCTs endeavour to reduce placebo effect, typically by eliminating subjects who show a strong placebo response before randomization (Benedetti et al., 2005; Muthen and Brown, 2009). These different aspects limit our understanding of the placebo effect.
Moreover response to placebo is not strictly a placebo effect (the psychobiological reaction to the administration of an inert treatment based on expectation and conditioning or other learning processes) (Ernst and Resch, 1995; Finniss et al., 2010): the clinical improvement following administration of a placebo (placebo response) could result from many different factors, such as spontaneous improvement (Posternak and Zimmerman, 2000), statistical regression to the mean, co‐interventions, biases as well as the placebo effect. Indeed, spontaneous improvement can result from environmental or biological (e.g. seasonal) factors which afford scope for scientific investigation. Spontaneous improvement may be common in clinical practice (Posternak et al., 2006; Posternak and Zimmerman, 2000); the number of follow‐up assessments (Posternak and Zimmerman, 2007) is related to a significant therapeutic effect.
The extrapolation of RCT results is problematic
External validity of RCTs in major depressive disorder
Whilst the vast majority of patients with clinical depression are catered for in primary care, most of the research findings upon which decisions are based have involved secondary care patients. In a Cochrane Review, the authors found only 14 studies versus placebo in primary care with extractable data, of which 10 studies examined tricyclic agents, two examined selective serotonin reuptake inhibitors (SSRIs) and two included both classes (Arroll et al., 2009). This contrasts with a plethora of literature on antidepressants in secondary care outpatients. These patients differ from primary care patients (Araya, 1999; Suh and Gallo, 1997): they are less severely depressed, experience a milder course of illness, have a distinct symptom profile with more complaints of fatigue and somatic symptoms, and are more likely to have accompanying physical complaints (Linde et al., 2011).
Antidepressant RCTs use numerous exclusion criteria (comorbid medical condition, short duration of depressive episode, comorbid personality disorder, mild depression, treatment response during placebo lead‐in period, comorbid anxiety disorder, long duration of depressive episode, comorbid substance use disorder, prior non‐response to treatment, comorbid dysthymia, current suicidal ideation). Some of these criteria are arguable from a fundamental viewpoint. Efficacy trials are designed to answer specific questions and they are required to investigate the disorder independently from comorbidities, which undoubtedly affect response, depending partly on the agent tested: for example, the fact that antidepressants have anxiolytic effects justifies the exclusion of comorbid anxious disorders so as to explore efficacy in depression on its own. Nevertheless, this greatly reduces scope for generalization (Posternak et al., 2002) in a disorder where comorbidity is the rule and a conclusion of effectiveness should not derive solely from these studies.
Subjects treated in antidepressant trials represent a minority of patients treated for MDD in routine clinical practice (Zimmerman et al., 2002). One study among psychiatric outpatients suggests that patients that were excluded were a more chronically ill group with more numerous previous episodes, greater psychosocial impairment, and more frequent personality disorders (Zimmerman et al., 2005). Furthermore, participants are generally recruited by newspaper advertisement, paid for their participation in the study and may not be representative of “real life” patients (Greist et al., 2002). Even the main inclusion criterion (i.e. suffering from MDD) could reduce the external validity of such studies since there could be deficits in knowledge and in the application of this criterion by clinicians (Zimmerman and Galione, 2010).
Some data suggest that antidepressants may not or not adequately assist recovery in a “real life” setting (Brugha et al., 1992; Ronalds et al., 1997). In a retrospective analysis of a cohort of inpatients (Seemuller et al., 2010), patients eligible (applying classic inclusion criteria) for a RCT and patients not eligible differed significantly for several baseline measures and for final Global Assessment of Functioning scores, but not for any other outcome measure, such as depression rating scores. However, this study only recruited inpatients (a more homogenous population) and the analysis was not adjusted on prognostic factors at baseline or on associated treatment.
In another similar analysis applied to an outpatient cohort (Wisniewski et al., 2009), patients eligible for a RCT had a better response to treatment, which persisted even after adjustments for baseline differences. The design of this study provides a better control for confounders. A meta‐regression comparison (Naudet et al., 2011) showed that antidepressant response is lower in observational studies compared to RCTs. This result has recently been replicated (van der Lem et al., 2012).
Finally, RCTs typically last 6–8 weeks whereas it is recommended that an antidepressant treatment be continued for at least six months after remission of the episode of depression (NICE, 2010).
Meta‐analysis limitations
The limitations of a meta‐analysis are linked to the limitations of the individual studies included (Egger et al., 2001) and all the earlier‐mentioned methodological problems have to be considered. Moreover, most studies on the effects of drugs are sponsored by the pharmaceutical industry. These studies have been shown to be more likely to demonstrate positive effects for the sponsor's drug than independent studies (Lexchin et al., 2003). When meta‐analyses are not based on registered trials (e.g. FDA‐registered trials), a publication bias can occur (Turner et al., 2008). It has been shown that the publication bias can lead to considering reboxetine as a serious antidepressant agent, whereas it is probably an ineffective and potentially harmful antidepressant (Eyding et al., 2010). Since 2005, RCTs need to be registered prior to participant enrolment, but two points could be improved: unpublished but registered study results must be accessible and selective outcome reporting (Mathieu et al., 2009) must be avoided.
These considerations should lead to caution in the interpretation of efficacy meta‐analyses, and also in interpretation of meta‐analyses concerning the influence of methodological factors (Huf et al., 2011). These are precisely some of the studies on which some of the earlier remarks are based. It gives an idea of the uncertainty surrounding the discussion presented here.
Table 2 illustrates all the points discussed earlier with a descriptive analysis of 26 RCTs on venlafaxine or fluoxetine.
Table 2.
Outcome measurement | |
Is a clinician‐version evaluations used? | |
Yes | 26 (100%) |
No | 0 (0%) |
Is a self‐administered questionnaire used? | |
Yes | 16 (62 %) |
No | 10 (38 %) |
Is an ecological measure used? | |
Yes | 0 (0%) |
No | 26 (100%) |
Attrition and its management | |
Percent of patients failing to complete the study | 14%, 25%, 33%, 37%, 50% (NA = 3) |
Is last observation carried forward method used? | |
Yes | 25 (96 %) |
No | 1 (4 %) |
Is a mixed model used? | |
Yes | 1 (4 %) |
No | 25 (96 %) |
Is complete case analysis used? | |
Yes | 7 (27 %) |
No | 19 (73 %) |
Response rate in placebo group and internal validity | |
Percentages of responders | 26%, 34%, 41%, 48%, 63% (NA = 4) |
Is the “unblinding” phenomena evaluated? | |
Yes | 0 (0%) |
No | 26(100%) |
External validity of RCTs in MDD | |
What category of patients is studied? | (NA = 2) |
Inpatients | 3 (11.5 %) |
Outpatients | 18 (69%) |
Outpatients in Primary Care | 3 (11.5 %) |
Is a severity score used as an inclusion criterion? | |
Yes | 26 (100 %) |
No | 0 (0 %) |
Is a treatment response during placebo lead‐in period a non‐inclusion criterion? | (NA = 1) |
Yes | 22 (88 %) |
No | 3 (12 %) |
Study duration (weeks) | 4, 6, 8,12, 13 (NA = 1) |
Meta‐analysis limitations | |
Is there an industry sponsorship in the study? | (NA = 2) |
Yes | 24 (100 %) |
No | 0 (0%) |
Note: Results are presented as numbers (percentage) for qualitative outcomes and as minimum, first quartile, median, third quartile, maximum for quantitative outcomes.
Methodological alternatives to answer the question of antidepressant effectiveness
Improving outcome measurement and analysis
Outcome measurement
Determination of the effectiveness of antidepressants should not be based exclusively on mere interviewer ratings of outcome, which can be prone to statistical noise and/or bias. A more robust approach is needed, and outcomes should be assessed in multi‐modal fashion (Gaudiano and Herbert, 2005).
Categorical outcomes like response and remission should not be exclusively calculated from continuous data such as the HDRS (Kirsch and Moncrieff, 2007).
Assessment of categorical self‐report (remission and response) using valid instruments is needed for sensitivity analysis. It has been suggested that depressed patients consider symptom resolution as only one of the factors in determining the state of remission, and that the presence of positive features of mental health such as optimism, vigour, and self‐confidence is a better indicator of remission than the absence of the symptoms of depression (Zimmerman et al., 2006).
Furthermore these two concepts should not be assessed at a single time point but should address the question of passing time, and whether there is stability over several weeks (Bandelow et al., 2006). Continuous (BDI), collateral information, behavioural ratings and physiological indices should be obtained to complete the information derived from clinician‐rated scales and to examine convergence of these data (Petkova et al., 2000).
Finally, the use of qualitative approaches should be developed in RCTs (Lewin et al., 2009) and could be of interest in antidepressant trials to understand the effects of interventions and to focus on patients’ experiences, as these processes are difficult to explore using quantitative methods alone. Mixed (qualitative‐quantitative) methods could be of interest in this way (Falissard et al., 2013). The procedure is simple, and is at present under development: video‐interviews based on the improved Clinical Global Impression (iCGI) procedure (Kadouri et al., 2007) are performed and are randomly shown in a blind manner to different groups of raters (experts, clinicians or medical students, etc.) who classify them according to whether the patient received a placebo or an antidepressant. The test is a permutation test. It enables the identification of differences between groups. A qualitative analysis of the videos will enable comparison of the experiences of patients under antidepressants and under placebo in a phenomenological perspective. This would also enable a broader measurement of adverse outcomes including unwanted psychological effects as an important aspect, which could contribute to a more fine‐grained comparison of conditions. In addition it tackles the limitations of the CGI mentioned earlier.
Attrition rate management could be improved
Before dealing with missing data, it is important to prevent them. Nevertheless, “attrition‐reduced studies” can present problems for generalization to clinical practice where the attrition rate is high. We therefore recommend that for patients who are lost to follow up, an effort should be made to obtain the principal outcome without interfering with their adherence to treatment using more accessible assessments by telephone (Greist et al., 2002) or home visits: investigators should try to obtain complete follow‐up data on all subjects, irrespective of their adherence to the treatment protocol (Lavori, 1992).
Secondly, the outcomes of subjects who withdrew should be described and compared to those of completers (Dumville et al., 2006). Concerning the handling of missing values, no universally applicable method can be recommended. Nevertheless, it should be well thought out, and pre‐defined in the protocol. Three general approaches to the analysis of incomplete data can be used: (1) analysis of complete cases; (2) missing data imputation (LOCF or multiple imputation); (3) analysis of incomplete data (survival analysis, mixed model, model of missingness). ITT analysis should, as in all RCTs, be the rule for the main analysis. Here, mixed‐effect models are useful because subjects who have missing data are not completely excluded from the analyses and the missing data are not imputed. Nevertheless, it is performed under the Missing At Random hypothesis (i.e. “missingness” is explained by observed outcomes or covariates, presumably pre‐dropout, but not unobserved outcomes). This type of analysis is therefore likely to favour arms with attrition. Finally, collecting data that can help predict attrition, for instance by asking participants to rate the likelihood of attending the subsequent assessment session, can change the problem of dropout from Not Missing at Random (i.e. missingness is explained by unobserved outcomes) to Missing At Random, but this should be used cautiously in the analysis of data (Leon et al., 2006).
Multiple imputation (Little and Rubin, 1987) procedures assume that data are Missing At Random: all non‐missing values of outcomes at all time points and baseline demographics are used in the models, which generate imputed estimates. Generally, five imputation data sets are generated and estimates are combined so that standard errors reflect the variability introduced by the imputation process.
Present‐day studies tend to implement these two approaches (Lynch et al., 2011) whereas other analyses such as “completer‐only analysis”, LOCF and analysis under the maximum bias hypothesis are used as sensitivity analyses to assess the robustness of the analytical strategy.
Controlling for the placebo response
Placebo response improvement should be sought in effectiveness studies
RCTs versus placebo aim to reduce the placebo effect, whereas in day‐to‐day clinical practice, everything is done to enhance placebo effect. Thus to asses antidepressant effectiveness it is reasonable to consider certain adjustments and explanatory designs potentiating the placebo effect in depression, allowing comparison with conditions that mimic all the theoretically important elements of placebo response associated with pharmacotherapy (e.g. expectation of improvement, doctor involvement and contact, credible treatment rationale, etc.) (Gaudiano and Herbert, 2005).
As the risk of unblinding is substantial an assessment of the integrity of double‐blind procedures should be performed routinely (Antonuccio et al., 1999; Even et al., 2000) by asking clinicians and patients to report the study condition to which they think or guess they have been assigned. Concerning clinician rating scales, keeping raters blind to the study design and hypothesis can protect against bias from their expectations.
Multi‐arm studies where different doses that may or may not be effective are used alongside a similar active comparator and placebo can address this question. Nevertheless, such studies are not valid when side effects are dose‐dependent. The use of an “active” placebo, with side effects mimicking those of the active drug, has been proposed. This method was developed in the early days of antidepressant research, but is rarely used in modern psychotropic studies (Perlis et al., 2010). A meta‐analysis of antidepressant trials using active placebos suggested smaller effect sizes than those observed in the presumably less blinded trials using inert substances (Moncrieff et al., 2004). However, the ability of a design of this sort to prevent unblinding is not established, as the raters were able to guess better than by chance what medication the patients were taking (Uhlenhuth and Park, 1964; Weintraub and Aronson, 1963).
Thus a four‐arm “balanced placebo trial design” using antidepressants, active placebo controls and intentional deception of subjects (patients are given information in a way that produces false beliefs) in a latin square design has been proposed (Kirsch and Sapirstein, 1998) (Figure 3) and this could diminish the ability of subjects to discover the study condition to which they have been assigned. Subjects are randomized in four arms: (1) a “deception” arm where patients receive the real drug and they are told they are receiving a placebo; (2) a “deception” arm where the patients receive the active placebo and are told they are receiving the real drug; (3) a “non‐deception” arm where patients receive the real drug and they are told they are receiving the real drug; (4) a “non‐deception” arm where the patients receive the active placebo and are told they are receiving the placebo. This design makes it possible to distinguish between an additive model and a non‐additive model. Nevertheless, using an active placebo deliberately induces risk of adverse effects (even if they are benign or even potentially therapeutic) and this is an ethical problem (Perlis et al., 2010). The “balanced placebo trial design” has not yet been used in clinical trials on antidepressant medication, because of the ethical issues involved with temporary deception (Dowrick et al., 2007; Waring, 2008).
Alternatives designs to control for expectations
However, as in standard trials unblinding can be highly problematic, temporary deception is a key point in controlling for expectations (because accurately informing subjects could bias response to treatment). Although its mechanisms are unclear, it is undeniable that deception is a key element in placebo potency (Lakoff, 2002). Two approaches have been suggested to minimize the ethical difficulties linked to temporarily deceiving subjects (Dowrick et al., 2007): (1) pre‐consent (subjects are informed that the study involves deception, and are asked to consent to its use, without being informed of the nature of the deception); (2) “minimized” deception. This can take the form of a three‐arm RCT in which the effects of placebo, active medication, and usual care are examined and where there is temporary deception concerning the placebo arm (Figure 4). Patients are told that they will be randomized to receive “usual care + nothing” or “usual care + antidepressant”. Pre‐consent (approach 1) (Wendler and Miller, 2004) (“You should be aware that the investigators have intentionally left out information about certain aspects of this study”) respects the subject's autonomy but could reduce the pragmatic effectiveness of the study because participants may guess the nature of the deception. “Minimized” deception (approach 2) is likewise possible because the information given about risk and benefit in the “usual care + nothing” group at the time when they provide consent is correct, but this nevertheless provides a placebo group. In both cases, the subjects are informed of the nature of the deception at the end of their participation.
This design is useful to preserve the methodological benefit of randomization and to obtain an unbiased assessment of the benefit of the antidepressant against the placebo and the benefit of the placebo against nothing. Certain criteria may justify deceiving the patient: (1) the use of deception is necessary and no equally effective, non‐deceptive approach is feasible; (2) the use of deception is justified by the study's social value; (3) subjects are not deceived about aspects of the study that would affect their willingness to participate, including potential risks and benefits; (4) subjects are informed of the nature of the deception at the end of their participation; (5) in the case of pre‐consent, subjects are informed prospectively of the use of deception and consent to its use (Wendler and Miller, 2004).
Nevertheless, another objection against studies involving deception is the risk of psychological harm to research participants (Bortolotti and Mameli, 2006). A number of studies performed among healthy volunteers participating in psychology experiments have found that being deceived does not upset most subjects (Wendler and Miller, 2004) but the impact of a design that involves deceiving subjects among depressive patients is not known. It could undermine patients’ trust in physicians in general, as has been suggested in a qualitative study (Dowrick et al., 2007). Thus if a trial uses deception techniques, investigators should obtain data on the impact of the deception on mood and the therapeutic alliance.
Even if they do not provide the same information, alternative trial designs can be considered. One option is to adopt a design in which all study participants are informed that they will start with a placebo and that an active drug may be substituted after a while and that they may (or may not) be informed when this switch is made. This protocol could provide information for three of the four arms of the balanced placebo design without any deception being required – the exception being “told drug/no drug” – (Colloca et al., 2004; Dowrick et al., 2007). Nevertheless, it is prone to “unblinding” because subjects can guess when the switch is made, even if they are not told.
Another design to preserve the benefit of randomization could be a non‐inferiority study comparing a placebo (presented as a new therapeutic alternative with fewer side effects) to an active antidepressant. This design is well justified for patients with a baseline HDRS score of 25 which was identified as the score needed to reach a clinically meaningful difference (Fournier et al., 2010). Here there is no deception because in this case, the placebo is a real therapeutic alternative. Nevertheless, an inclusion criterion of this sort limits the scope for generalizing the results.
In this respect, it has been recently argued that consent forms in RCTs versus placebo should generate positive expectations regarding the possible effect of a placebo (spontaneous improvement without the use of medication) to reduce patient fears of a negative outcome following study participation (Severus et al., 2012).
Another idea could be a double‐blind trial comparing an antidepressant to homeopathy. In MDD, there is not enough evidence about the efficacy of homeopathy (Pilkington et al., 2005) but it elicits expectations in patients and could be considered as a good comparator to control for expectations if we postulate that the clinical effects of homoeopathy are placebo effects (Shang et al., 2005). A comparison of this sort could be performed in a double‐blind design, but to enhance the effect of expectations about the treatment, it should be performed in open label, or better, in a four‐arm design using antidepressants, homeopathy, blinding and open‐label, in a latin square design (Figure 5). This design can evaluate both efficacy and effectiveness of antidepressant and homeopathy (i.e. placebo). Nevertheless, it is prone to “unblinding” and the randomization process does not take patient preferences into account between antidepressant and alternative medicine, and it can interfere with the treatment process. As an example, one study tried to compare homeopathy to fluoxetine and placebo in primary care, but failed because of recruitment difficulties, many of them linked to patient preferences (Katz et al., 2005). Indeed, this design can only meaningfully be applied in those depressed patients who feel that either anti‐depressants or homeopathic anti‐depressants could potentially work for their disorder. This results in a selection bias, with a restriction of the target population, and can in fact go against the concept of effectiveness. This is also the case for sophisticated designs ensuring internal validity such as the “balanced placebo trial design”. Recommendations concerning external validity are thus necessary.
Enabling extrapolation of RCT results
The external validity of antidepressant studies should be improved
Recruitment difficulties arising from patient preferences can lead to a selection bias, yielding a non‐representative sample of patients, and affect external validity. At the very least, patients who have been screened, patients who are eligible and patients who refuse to participate should be identified (Moher et al., 2001; Schulz et al., 2010). An interesting alternative is to perform a randomized trial with patient preference arms (for patients who agree to randomization, treatment is allocated by randomization, and for patients who refuse randomization but agree to participate, a choice of treatment is offered). Treatment and follow‐up are identical in the different groups (Brewin and Bradley, 1989; Chilvers et al., 2001; Howard and Thornicroft, 2006). This has been proposed for homeopathy (Figure 6) (Katz et al., 2005). This type of design directly synchronizes a RCT and an observational study to generate alternative evidence for assessing antidepressant drug treatment. The double‐blind design makes it possible to control for the indication bias, and the two preference arms make it possible to partly reduce the selection bias introduced by the randomization process. A simple method of analysis is the use of a model with the principal outcome as the dependent variable and treatment, design, and treatment‐design interaction as explanatory variables. Nevertheless a design of this type requires an even larger number of patients than a RCT and the analysis should be interpreted with caution because of the potential influence of unmeasured confounders (Gemmell and Dunn, 2011).
As the use of restrictive eligibility criteria limits the scope for generalizing RCT results, populations in the next generation of (sophisticated) RCTs should differ from the target populations of “real‐life” depressive patients as little as possible. Studies among primary care patients are needed. The only inclusion criterion should be “patient needing an antidepressant for depression”. The only exclusion criterion should be “contraindication of the treatment”. Using current suicidal ideation as an exclusion criterion could be argued for from an ethical point of view. But depressed patients who are assigned to a placebo in antidepressant clinical trials are not at greater risk for suicide than those assigned to active treatment (Khan et al., 2000) whereas patients assigned to antidepressant treatment could well be at greater risk (Fergusson et al., 2005). Moreover, these patients are treated with antidepressant in “real life” and antidepressants are not studied in these particular patients.
To assess whether the patients included are truly representative of patients treated in a real‐life setting, we suggest comparing them with registries for their principal clinical and socio‐demographical characteristics.
A study of effectiveness should last at least six months after patient remission to obtain more information on the longitudinal effect of antidepressants. Large observational studies comparing antidepressants to usual care or to alternative medicine are needed, because they have other characteristics that make them useful sources of evidence, in that they tend to last longer and to enrol more patients than do randomized trials (Bluhm, 2009). Statistical modelling should enable adjustment on confounding factors (Concato and Horwitz, 2004; Lawlor et al., 2004) which should be pre‐specified in the protocol and assessed with as little measurement error as possible to avoid misclassification bias (Mertens, 1993).
Conclusion
Methodological alternatives to the orthodox RCT should be developed to interpret results accurately and ensure internal and external validity. Some are simple and could be implemented in RCT easily. Others are sophisticated and raise ethical issues because they involve temporary deception of the patient. Nevertheless, improvements in study design for antidepressant effectiveness assessment are needed to further knowledge, to improve patient care and to determine what costs health authorities should cover. It is a challenge to develop study designs addressing the inevitable tension between internal and external validity, which can often appear as contradictory. The methodological tools presented here can be useful. The concept of antidepressant effectiveness should be developed along different axes and based on a convergence of arguments from a range of different study designs.
Declaration of interest statement
There are no conflicts of interest regarding this paper. All authors have completed the Unified Competing Interest form at http://www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare that (1) all authors have no support at all from any company for the submitted work; (2) N.F. has relationships (board membership or travel/accommodations expenses covered/reimbursed) with Servier, BMS, Lundbeck and Janssen who might have an interest in the work submitted in the previous three years; M.B has relationship (consultancy and travel/accommodations expenses covered/reimbursed) with Janssen, BMS, Otsuka, Lundbeck, Lilly, Servier, Astra Zeneca, Medtronics, Syneïka and has received grants for research from Medtronic, Lilly and Astra Zeneca in the previous three years; R.J.M. has no relationships with any company that might have an interest in the submitted work in the previous three years; F.B has relationship (board membership or consultancy or payment for manuscript preparation or travel/accommodations expenses covered/reimbursed) with Sanofi‐Aventis, Servier, Pierre‐Fabre, MSD, Lilly, Janssen‐Cilag, Otsuka, Lundbeck, Genzime, Roche, BMS who might have an interest in the work submitted in the previous three years; (3) N.F. R.J.M. F.B. spouses, partners, or children have no financial relationships that may be relevant to the submitted work. M.B. spouse is an employee of Janssen; none of the authors have any non‐financial interests that may be relevant to the submitted work.
Acknowledgements
This paper was supported by the Institut National de la Santé et de la Recherche Médicale (INSERM). The authors thank Eric Bellissant (M.D., PhD) for his very interesting comments, Claudine Naudet and Angela Swaine Verdier for revising the English.
Conceived and designed the experiments: N.F., F.B. Performed the experiments: N.F. Analysed the data: N.F. Contributed reagents/materials/analysis tools: N.F., F.B. Wrote the paper: N.F. Revised the paper critically for important intellectual content: M.B., R.J.M., F.B. Final approval of the version to be published: N.F., M.B., R.J.M., F.B.
References
- Antonuccio D.O., Danton W.G., DeNelsky G.Y., Greenberg R.P., Gordon J.S. (1999) Raising questions about antidepressants. Psychotherapy and Psychosomatics, 68, 3–14. [DOI] [PubMed] [Google Scholar]
- Araya R. (1999) The management of depression in primary health care. Current Opinion in Psychiatry, 12, 103–107. [Google Scholar]
- Arroll B., Elley C.R., Fishman T., Goodyear‐Smith F.A., Kenealy T., Blashki G., Kerse N., Macgillivray S. (2009) Antidepressants versus Placebo for Depression in Primary Care, Cochrane Database System Review, CD007954, Chichester, John Wiley & Sons. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bagby R.M., Ryder A.G., Schuller D.R., Marshall M.B. (2004) The Hamilton Depression Rating Scale: has the gold standard become a lead weight? American Journal of Psychiatry, 161, 2163–2177. [DOI] [PubMed] [Google Scholar]
- Bandelow B., Baldwin D.S., Dolberg O.T., Andersen H.F., Stein D.J. (2006) What is the threshold for symptomatic response and remission for major depressive disorder, panic disorder, social anxiety disorder, and generalized anxiety disorder? Journal of Clinical Psychiatry, 67, 1428–1434. [DOI] [PubMed] [Google Scholar]
- Beauchamp T.L., Childress J.F. (2008) Principles of Biomedical Ethics, New York, Oxford University Press. [Google Scholar]
- Beck A.T., Ward C.H., Mendelson M., Mock J., Erbaugh J. (1961) An inventory for measuring depression. Archives of General Psychiatry, 4, 561–571. [DOI] [PubMed] [Google Scholar]
- Benedetti F., Mayberg H.S., Wager T.D., Stohler C.S., Zubieta J.K. (2005) Neurobiological mechanisms of the placebo effect. Journal of Neuroscience, 25, 10390–10402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bluhm R. (2009) Some observations on observational research. Perspectives in Biology and Medicine, 52, 252–263. [DOI] [PubMed] [Google Scholar]
- Bombardier C., Maetzel A. (1999) Pharmacoeconomic evaluation of new treatments: efficacy versus effectiveness studies? Annals of the Rheumatic Diseases, 58(Suppl 1), I82–I85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bortolotti L., Mameli M. (2006) Deception in psychology: moral costs and benefits of unsought self‐knowledge. Accountability in Research, 13, 259–275. [DOI] [PubMed] [Google Scholar]
- Brewin C.R., Bradley C. (1989) Patient preferences and randomised clinical trials. British Medical Journal, 299, 313–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brownell K.D., Stunkard A.J. (1982) The double‐blind in danger: untoward consequences of informed consent. American Journal of Psychiatry, 139, 1487–1489. [DOI] [PubMed] [Google Scholar]
- Brugha T.S., Bebbington P.E., MacCarthy B., Sturt E., Wykes T. (1992) Antidepressants may not assist recovery in practice: a naturalistic prospective survey. Acta Psychiatrica Scandinavica, 86, 5–11. [DOI] [PubMed] [Google Scholar]
- Bystritsky A., Waikar S.V. (1994) Inert placebo versus active medication. Patient blindability in clinical pharmacological trials. Journal of Nervous and Mental Disease, 182, 485–487. [DOI] [PubMed] [Google Scholar]
- Chilvers C., Dewey M., Fielding K., Gretton V., Miller P., Palmer B., Weller D., Churchill R., Williams I., Bedi N., Duggan C., Lee A., Harrison G. (2001) Antidepressant drugs and generic counselling for treatment of major depression in primary care: randomised trial with patient preference arms. British Medical Journal, 322, 772–775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Colloca L., Lopiano L., Lanotte M., Benedetti F. (2004) Overt versus covert treatment for pain, anxiety, and Parkinson's disease. Lancet Neurology, 3, 679–684. [DOI] [PubMed] [Google Scholar]
- Concato J., Horwitz R.I. (2004) Beyond randomised versus observational studies. Lancet, 363, 1660–1661. [DOI] [PubMed] [Google Scholar]
- Dowrick C.F., Hughes J.G., Hiscock J.J., Wigglesworth M., Walley T.J. (2007) Considering the case for an antidepressant drug trial involving temporary deception: a qualitative enquiry of potential participants. BMC Health Services Research, 7, 64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dumville J.C., Torgerson D.J., Hewitt C.E. (2006) Reporting attrition in randomised controlled trials. British Medical Journal, 332, 969–971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duru G., Fantino B. (2008) The clinical relevance of changes in the Montgomery–Asberg Depression Rating Scale using the minimum clinically important difference approach. Current Medical Research Opinion, 24, 1329–1335. [DOI] [PubMed] [Google Scholar]
- Egger M., Smith G.D., Sterne J.A. (2001) Uses and abuses of meta‐analysis. Clinical Medicine, 1, 478–484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Enserink M. (1999) Can the placebo be the cure? Science, 284, 238–240. [DOI] [PubMed] [Google Scholar]
- Ernst E., Resch K.L. (1995) Concept of true and perceived placebo effects. British Medical Journal, 311, 551–553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Even C., Siobud‐Dorocant E., Dardennes R.M. (2000) Critical approach to antidepressant trials. Blindness protection is necessary, feasible and measurable. British Journal of Psychiatry, 177, 47–51. [DOI] [PubMed] [Google Scholar]
- Eyding D., Lelgemann M., Grouven U., Harter M., Kromp M., Kaiser T., Kerekes M.F., Gerken M., Wieseler B. (2010) Reboxetine for acute treatment of major depression: systematic review and meta‐analysis of published and unpublished placebo and selective serotonin reuptake inhibitor controlled trials. British Medical Journal, 341, c4737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Falissard B., Lukasiewicz M., Corruble E. (2003) The MDP75: a new approach in the determination of the minimal clinically meaningful difference in a scale or a questionnaire. Journal of Clinical Epidemiology, 56, 618–621. [DOI] [PubMed] [Google Scholar]
- Falissard B., Milman D., Cohen D. (2013) A Generalization of the ≪Lady‐Tasting‐Tea≫ Procedure to Link Qualitative and Quantitative Approaches in Psychiatric Research International Journal of Statistics in Medical Research, 2, 88–93. [Google Scholar]
- Fergusson D., Doucette S., Glass K.C., Shapiro S., Healy D., Hebert P., Hutton B. (2005) Association between suicide attempts and selective serotonin reuptake inhibitors: systematic review of randomised controlled trials. British Medical Journal, 330, 396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Finniss D.G., Kaptchuk T.J., Miller F., Benedetti F. (2010) Biological, clinical, and ethical advances of placebo effects. Lancet, 375, 686–695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fisher R.A. (1971) The Design of Experiments. Macmillan Pub Co; 9 edition (June 1971) . [Google Scholar]
- Fournier J.C., DeRubeis R.J., Hollon S.D., Dimidjian S., Amsterdam J.D., Shelton R.C., Fawcett J. (2010) Antidepressant drug effects and depression severity: a patient‐level meta‐analysis. Journal of the American Medical Association, 303, 47–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garcia‐Toro M., Aguirre I. (2007) Biopsychosocial model in depression revisited. Medical Hypotheses, 68, 683–691. [DOI] [PubMed] [Google Scholar]
- Gaudiano B.A., Herbert J.D. (2005) Methodological issues in clinical trials of antidepressant medications: perspectives from psychotherapy outcome research. Psychotherapy and Psychosomatics, 74, 17–25. [DOI] [PubMed] [Google Scholar]
- Gemmell I., Dunn G. (2011) The statistical pitfalls of the partially randomized preference design in non‐blinded trials of psychological interventions. International Journal of Methods in Psychiatric Research, 20, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gibbons R.D., Hur K., Brown C.H., Davis J.M., Mann J.J. (2012) Benefits from antidepressants: synthesis of 6‐week patient‐level outcomes from double‐blind placebo‐controlled randomized trials of fluoxetine and venlafaxine. Archives of General Psychiatry, 69(6), 572–579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Greist J.H., Mundt J.C., Kobak K. (2002) Factors contributing to failed trials of new agents: can technology prevent some problems? Journal of Clinical Psychiatry, 63(Suppl 2), 8–13. [PubMed] [Google Scholar]
- Guy W. (1976) ECDEU Assessment Manual for Psychopharmacology, Rockville, MD, US Department of Health, Public Health Service, Alcohol, Drug Abuse and Mental Health Administration. [Google Scholar]
- Hamilton M. (1960) A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry, 23, 56–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Howard L., Thornicroft G. (2006) Patient preference randomised controlled trials in mental health research. British Journal of Psychiatry, 188, 303–304. [DOI] [PubMed] [Google Scholar]
- Hrobjartsson A., Gotzsche P.C. (2010) Placebo Interventions for all Clinical Conditions, Cochrane Database System Review, CD003974, Chichester, John Wiley & Sons. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huf W., Kalcher K., Pail G., Friedrich M.E., Filzmoser P., Kasper S. (2011) Meta‐analysis: fact or fiction? How to interpret meta‐analyses. World Journal of Biological Psychiatry, 12, 188–200. [DOI] [PubMed] [Google Scholar]
- Ioannidis J.P. (2008) Effectiveness of antidepressants: an evidence myth constructed from a thousand randomized trials? Philosophy, Ethics, and Humanities in Medicine, 3, 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Judd L.L., Akiskal H.S., Maser J.D., Zeller P.J., Endicott J., Coryell W., Paulus M.P., Kunovac J.L., Leon A.C., Mueller T.I., Rice J.A., Keller M.B. (1998) Major depressive disorder: a prospective study of residual subthreshold depressive symptoms as predictor of rapid relapse. Journal of Affective Disorders, 50, 97–108. [DOI] [PubMed] [Google Scholar]
- Kadouri A., Corruble E., Falissard B. (2007) The improved Clinical Global Impression Scale (iCGI): development and validation in depression. BMC Psychiatry, 7, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katz T., Fisher P., Katz A., Davidson J., Feder G. (2005) The feasibility of a randomised, placebo‐controlled clinical trial of homeopathic treatment of depression in general practice. Homeopathy, 94, 145–152. [DOI] [PubMed] [Google Scholar]
- Khan A., Leventhal R.M., Khan S.R., Brown W.A. (2002) Severity of depression and response to antidepressants and placebo: an analysis of the Food and Drug Administration database. Journal of Clinical Psychopharmacology, 22, 40–45. [DOI] [PubMed] [Google Scholar]
- Khan A., Warner H.A., Brown W.A. (2000) Symptom reduction and suicide risk in patients treated with placebo in antidepressant clinical trials: an analysis of the Food and Drug Administration database. Archives of General Psychiatry, 57, 311–317. [DOI] [PubMed] [Google Scholar]
- Kirsch I. (2000) Are drug and placebo effects in depression additive? Biological Psychiatry, 47, 733–735. [DOI] [PubMed] [Google Scholar]
- Kirsch I., Deacon B.J., Huedo‐Medina T.B., Scoboria A., Moore T.J., Johnson B.T. (2008) Initial severity and antidepressant benefits: a meta‐analysis of data submitted to the Food and Drug Administration. PLoS Medicine, 5, e45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kirsch I., Moncrieff J. (2007) Clinical trials and the response rate illusion. Contemporary Clinical Trials, 28, 348–351. [DOI] [PubMed] [Google Scholar]
- Kirsch I., Sapirstein G. (1998) Listening to Prozac but hearing placebo: a meta‐analysis of antidepressant medication. Prevention & Treatment 1. [Google Scholar]
- Krell H.V., Leuchter A.F., Morgan M., Cook I.A., Abrams M. (2004) Subject expectations of treatment effectiveness and outcome of treatment with an experimental antidepressant. Journal of Clinical Psychiatry, 65, 1174–1179. [DOI] [PubMed] [Google Scholar]
- Lacasse J.R., Leo J. (2005) Serotonin and depression: a disconnect between the advertisements and the scientific literature. PLoS Medicine, 2, e392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lakoff A. (2002) The mousetrap: managing the placebo effect in antidepressant trials. Molecular Interventions, 2, 72–76. [DOI] [PubMed] [Google Scholar]
- Lavori P.W. (1992) Clinical trials in psychiatry: should protocol deviation censor patient data? Neuropsychopharmacology, 6, 39–48; discussion 9–63. [PubMed] [Google Scholar]
- Lawlor D.A., Davey S.G., Bruckdorfer K.R., Kundu D., Ebrahim S. (2004) Observational versus randomised trial evidence. Lancet, 364, 755. [DOI] [PubMed] [Google Scholar]
- Leon A.C., Mallinckrodt C.H., Chuang‐Stein C., Archibald D.G., Archer G.E., Chartier K. (2006) Attrition in randomized controlled clinical trials: methodological issues in psychopharmacology. Biological Psychiatry, 59, 1001–1005. [DOI] [PubMed] [Google Scholar]
- Lewin S., Glenton C., Oxman A.D. (2009) Use of qualitative methods alongside randomised controlled trials of complex healthcare interventions: methodological study. British Medical Journal, 339, b3496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lexchin J., Bero L.A., Djulbegovic B., Clark O. (2003) Pharmaceutical industry sponsorship and research outcome and quality: systematic review. British Medical Journal, 326, 1167–1170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Linde K., Schumann I., Meissner K., Jamil S., Kriston L., Rucker G., Antes G., Schneider A. (2011) Treatment of depressive disorders in primary care — protocol of a multiple treatment systematic review of randomized controlled trials. BMC Family Practice, 12, 127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Little R.J.A., Rubin D.B. (eds) (1987) Statistical Analysis with Missing Data, New York, John Wiley & Sons. [Google Scholar]
- Lynch F.L., Dickerson J.F., Clarke G., Vitiello B., Porta G., Wagner K.D., Emslie G., Asarnow J.R. Jr, Keller M.B., Birmaher B., Ryan N.D., Kennard B., Mayes T., DeBar L., McCracken J.T., Strober M., Suddath R.L., Spirito A., Onorato M., Zelazny J., Iyengar S., Brent D. (2011) Incremental cost‐effectiveness of combined therapy vs medication only for youth with selective serotonin reuptake inhibitor‐resistant depression: treatment of SSRI‐resistant depression in adolescents trial findings. Archives of General Psychiatry, 68, 253–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mathieu S., Boutron I., Moher D., Altman D.G., Ravaud P. (2009) Comparison of registered and published primary outcomes in randomized controlled trials. Journal of the American Medical Association, 302, 977–984. [DOI] [PubMed] [Google Scholar]
- Council M.R. (1948) Streptomycin treatment of pulmonary tuberculosis: a Medical Research Council investigation. British Medical Journal (Clinical Research Edition) 769–782. [Google Scholar]
- Mertens T.E. (1993) Estimating the effects of misclassification. Lancet, 342, 418–421. [DOI] [PubMed] [Google Scholar]
- Moher D., Schulz K.F., Altman D.G. (2001) The CONSORT statement: revised recommendations for improving the quality of reports of parallel‐group randomised trials. Lancet, 357, 1191–1194. [PubMed] [Google Scholar]
- Moncrieff J. (2002) The antidepressant debate. British Journal of Psychiatry, 180, 193–194. [DOI] [PubMed] [Google Scholar]
- Moncrieff J., Wessely S., Hardy R. (2004) Active Placebos versus Antidepressants for Depression, Cochrane Database System Review, CD003012, Chichester, John Wiley & Sons. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Montgomery S.A., Asberg M. (1979) A new depression scale designed to be sensitive to change. British Journal of Psychiatry, 134, 382–389. [DOI] [PubMed] [Google Scholar]
- Mundt J.C., Katzelnick D.J., Kennedy S.H., Eisfeld B.S., Bouffard B.B., Greist J.H. (2006) Validation of an IVRS version of the MADRS. Journal of Psychiatric Research, 40, 243–246. [DOI] [PubMed] [Google Scholar]
- Muthen B., Brown H.C. (2009) Estimating drug effects in the presence of placebo response: causal inference using growth mixture modeling. Statistics in Medicine, 28, 3363–3385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- N. Institute for Clinical Excellence (NICE) (ed.) (2010) The Treatment and Management of Depression in Adults (updated edition), National Clinical Practice Guideline 90, London, The British Psychological Society and The Royal College of Psychiatrists. [Google Scholar]
- Naudet F., Maria A.S., Falissard B. (2011) antidepressant response in major depressive disorder: a meta‐regression comparison of randomized controlled trials and observational studies. PLoS One, 6, e20811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noble L.M., Douglas B.C., Newman S.P. (2001) What do patients expect of psychiatric services? A systematic and critical review of empirical studies. Social Science and Medicine, 52, 985–998. [DOI] [PubMed] [Google Scholar]
- Olfson M., Marcus S.C., Druss B., Elinson L., Tanielian T., Pincus H.A. (2002) National trends in the outpatient treatment of depression. Journal of the American Medical Association, 287, 203–209. [DOI] [PubMed] [Google Scholar]
- Papakostas G.I., Fava M. (2009) Does the probability of receiving placebo influence clinical trial outcome? A meta‐regression of double‐blind, randomized clinical trials in MDD. European Neuropsychopharmacology, 19, 34–40. [DOI] [PubMed] [Google Scholar]
- Perlis R.H., Ostacher M., Fava M., Nierenberg A.A., Sachs G.S., Rosenbaum J.F. (2010) Assuring that double‐blind is blind. American Journal of Psychiatry, 167, 250–252. [DOI] [PubMed] [Google Scholar]
- Petkova E., Quitkin F.M., McGrath P.J., Stewart J.W., Klein D.F. (2000) A method to quantify rater bias in antidepressant trials. Neuropsychopharmacology, 22, 559–565. [DOI] [PubMed] [Google Scholar]
- Pilkington K., Kirkwood G., Rampes H., Fisher P., Richardson J. (2005) Homeopathy for depression: a systematic review of the research evidence. Homeopathy, 94, 153–163. [DOI] [PubMed] [Google Scholar]
- Posternak M.A., Solomon D.A., Leon A.C., Mueller T.I., Shea M.T., Endicott J., Keller M.B. (2006) The naturalistic course of unipolar major depression in the absence of somatic therapy. Journal of Nervous and Mental Disease, 194, 324–329. [DOI] [PubMed] [Google Scholar]
- Posternak M.A., Zimmerman M. (2000) Short‐term spontaneous improvement rates in depressed outpatients. Journal of Nervous and Mental Disease, 188, 799–804. [DOI] [PubMed] [Google Scholar]
- Posternak M.A., Zimmerman M. (2007) Therapeutic effect of follow‐up assessments on antidepressant and placebo response rates in antidepressant efficacy trials: meta‐analysis. British Journal of Psychiatry, 190, 287–292. [DOI] [PubMed] [Google Scholar]
- Posternak M.A., Zimmerman M., Keitner G.I., Miller I.W. (2002) A reevaluation of the exclusion criteria used in antidepressant efficacy trials. American Journal of Psychiatry, 159, 191–200. [DOI] [PubMed] [Google Scholar]
- Rabkin J.G., Markowitz J.S., Stewart J., McGrath P., Harrison W., Quitkin F.M., Klein D.F. (1986) How blind is blind? Assessment of patient and doctor medication guesses in a placebo‐controlled trial of imipramine and phenelzine. Psychiatry Research, 19, 75–86. [DOI] [PubMed] [Google Scholar]
- Rihmer Z., Gonda X. (2008) Is drug‐placebo difference in short‐term antidepressant drug trials on unipolar major depression much greater than previously believed? Journal of Affective Disorders, 108, 195–198. [DOI] [PubMed] [Google Scholar]
- Ronalds C., Creed F., Stone K., Webb S., Tomenson B. (1997) Outcome of anxiety and depressive disorders in primary care. British Journal of Psychiatry, 171, 427–433. [DOI] [PubMed] [Google Scholar]
- Rutherford B., Sneed J., Devanand D., Eisenstadt R., Roose S. (2010) Antidepressant study design affects patient expectancy: a pilot study. Psychological Medicine, 40, 781–788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rutherford B.R., Sneed J.R., Roose S.P. (2009) Does study design influence outcome? The effects of placebo control and treatment duration in antidepressant trials. Psychotherapy and Psychosomatics, 78, 172–181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schulz K.F., Altman D.G., Moher D. (2010) CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. PLoS Medicine, 7, e1000251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schulz K.F., Grimes D.A. (2002) Allocation concealment in randomised trials: defending against deciphering. Lancet, 359, 614–618. [DOI] [PubMed] [Google Scholar]
- Seemuller F., Moller H.J., Obermeier M., Adli M., Bauer M., Kronmuller K., Holsboer F., Brieger P., Laux G., Bender W., Heuser I., Zeiler J., Gaebel W., Schennach‐Wolff R., Henkel V., Riedel M. (2010) Do efficacy and effectiveness samples differ in antidepressant treatment outcome? An analysis of eligibility criteria in randomized controlled trials. Journal of Clinical Psychiatry, 71, 1425–1433. [DOI] [PubMed] [Google Scholar]
- Severus E., Seemuller F., Berger M., Dittmann S., Obermeier M., Pfennig A., Riedel M., Frangou S., Moller H.J., Bauer M. (2012) Mirroring everyday clinical practice in clinical trial design: a new concept to improve the external validity of randomized double‐blind placebo‐controlled trials in the pharmacological treatment of major depression. BMC Medicine, 10, 67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shang A., Huwiler‐Muntener K., Nartey L., Juni P., Dorig S., Sterne J.A., Pewsner D., Egger M. (2005) Are the clinical effects of homoeopathy placebo effects? Comparative study of placebo‐controlled trials of homoeopathy and allopathy. Lancet, 366, 726–732. [DOI] [PubMed] [Google Scholar]
- Sinyor M., Levitt A.J., Cheung A.H., Schaffer A., Kiss A., Dowlati Y., Lanctot K.L. (2010) Does inclusion of a placebo arm influence response to active antidepressant treatment in randomized controlled trials? Results from pooled and meta‐analyses. Journal of Clinical Psychiatry, 71, 270–279. [DOI] [PubMed] [Google Scholar]
- Sneed J.R., Rutherford B.R., Rindskopf D., Lane D.T., Sackeim H.A., Roose S.P. (2008) Design makes a difference: a meta‐analysis of antidepressant response rates in placebo‐controlled versus comparator trials in late‐life depression. American Journnal of Geriatric Psychiatry, 16, 65–73. [DOI] [PubMed] [Google Scholar]
- Sotsky S.M., Glass D.R., Shea M.T., Pilkonis P.A., Collins J.F., Elkin I., Watkins J.T., Imber S.D., Leber W.R., Moyer J., et al. (1991) Patient predictors of response to psychotherapy and pharmacotherapy: findings in the NIMH Treatment of Depression Collaborative Research Program. American Journal of Psychiatry, 148, 997–1008. [DOI] [PubMed] [Google Scholar]
- Suh T., Gallo J.J. (1997) Symptom profiles of depression among general medical service users compared with specialty mental health service users. Psychological Medicine, 27, 1051–63. [DOI] [PubMed] [Google Scholar]
- Tedeschini E., Fava M., Goodness T.M., Papakostas G.I. (2010) Relationship between probability of receiving placebo and probability of prematurely discontinuing treatment in double‐blind, randomized clinical trials for MDD: a meta‐analysis. European Neuropsychopharmacology, 20, 562–567. [DOI] [PubMed] [Google Scholar]
- Turner E.H., Matthews A.M., Linardatos E., Tell R.A., Rosenthal R. (2008) Selective publication of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine, 358, 252–260. [DOI] [PubMed] [Google Scholar]
- Uhlenhuth E.H., Park L.C. (1964) The influence of medication (Imipramine) and doctor in relieving depressed psychoneurotic outpatients. Journal of Psychiatric Research, 69, 101–122. [DOI] [PubMed] [Google Scholar]
- van der Lem R., van der Wee N.J., van Veen T., Zitman F.G. (2012) Efficacy versus effectiveness: a direct comparison of the outcome of treatment for mild to moderate depression in randomized controlled trials and daily practice. Psychotherapy and Psychosomatics, 81, 226–234. [DOI] [PubMed] [Google Scholar]
- Vandenbroucke J.P. (2004) When are observational studies as credible as randomised trials? Lancet, 363, 1728–1731. [DOI] [PubMed] [Google Scholar]
- Walsh B.T., Seidman S.N., Sysko R., Gould M. (2002) Placebo response in studies of major depression: variable, substantial, and growing. Journal of the American Medical Association, 287, 1840–1847. [DOI] [PubMed] [Google Scholar]
- Waring D.R. (2008) The antidepressant debate and the balanced placebo trial design: an ethical analysis. International Journal of Law and Psychiatry, 31, 453–462. [DOI] [PubMed] [Google Scholar]
- Weintraub W., Aronson H. (1963) Clinical judgment in psychopharmacological research. Journal of Neuropsychiatry, 4, 65–70. [PubMed] [Google Scholar]
- Wendler D., Miller F.G. (2004) Deception in the pursuit of science. Archives of Internal Medicine, 164, 597–600. [DOI] [PubMed] [Google Scholar]
- Wisniewski S.R., Rush A.J., Nierenberg A.A., Gaynes B.N., Warden D., Luther J.F., McGrath P.J., Lavori P.W., Thase M.E., Fava M., Trivedi M.H. (2009) Can phase III trial results of antidepressant medications be generalized to clinical practice? A STAR*D report. American Journal of Psychiatry, 166, 599–607. [DOI] [PubMed] [Google Scholar]
- Zimmerman M., Chelminski I., Posternak M.A. (2005) Generalizability of antidepressant efficacy trials: differences between depressed psychiatric outpatients who would or would not qualify for an efficacy trial. American Journal of Psychiatry, 162, 1370–1372. [DOI] [PubMed] [Google Scholar]
- Zimmerman M., Galione J. (2010) Psychiatrists' and nonpsychiatrist physicians’ reported use of the DSM‐IV criteria for major depressive disorder. Journal of Clinical Psychiatry, 71, 235–238. [DOI] [PubMed] [Google Scholar]
- Zimmerman M., Mattia J.I., Posternak M.A. (2002) Are subjects in pharmacological treatment trials of depression representative of patients in routine clinical practice? American Journal of Psychiatry, 159, 469–473. [DOI] [PubMed] [Google Scholar]
- Zimmerman M., McGlinchey J.B., Posternak M.A., Friedman M., Attiullah N., Boerescu D. (2006) How should remission from depression be defined? The depressed patient's perspective. American Journal of Psychiatry, 163, 148–150. [DOI] [PubMed] [Google Scholar]