Abstract
Health political background
The comparison of the effectiveness of health technologies is not only laid down in German law (Social Code Book V, § 139 and § 35b) but also constitutes a central element of clinical guidelines and decision making in health care. Tools supporting decision making (e. g. Health Technology Assessments (HTA)) are therefore in need of a valid methodological repertoire for these comparisons.
Scientific background
Randomised controlled head-to-head trials which directly compare the effects of different therapies are considered the gold standard methodological approach for the comparison of the efficacy of interventions. Because this type of trial is rarely found, comparisons of efficacy often need to rely on indirect comparisons whose validity is being controversially debated.
Research questions
Research questions for the current assessment are: Which (statistical) methods for indirect comparisons of therapeutic interventions do exist, how often are they applied and how valid are their results in comparison to the results of head-to-head trials?
Methods
In a systematic literature research all medical databases of the German Institute of Medical Documentation and Information (DIMDI) are searched for methodological papers as well as applications of indirect comparisons in systematic reviews. Results of the literature analysis are summarized qualitatively for the characterisation of methods and quantitatively for the frequency of their application.
The validity of the results from indirect comparisons is checked by comparing them to the results from the gold standard – a direct comparison. Data sets from systematic reviews which use both direct and indirect comparisons are tested for consistency by of the z-statistic.
Results
29 methodological papers and 106 applications of indirect methods in systematic reviews are being analysed. Four methods for indirect comparisons can be identified:
Unadjusted indirect comparisons include, independent of any comparator, all randomised controlled trials (RCT) that provide a study arm with the intervention of interest.
Adjusted indirect comparisons and
metaregression analyses include only those studies that provide one study arm with the intervention of interest and another study arm with a common comparator. While the aforementioned methods use conventional metaanalytical techniques,
Mixed treatment comparisons (MTC) use Bayesian statistics. They are able to analyse a complex network of RCT with multiple comparators simultaneously.
During the period from 1999 to 2008 adjusted indirect comparisons are the most commonly used method for indirect comparisons. Since 2006 an increase in the application of the more methodologically challenging MTC is being observed.
For the validity check 248 data sets, which include results of a direct and an indirect comparison, are available. The share of statistically significant discrepant results is greatest in the unadjusted indirect comparisons (25,5% [95% CI: 13,1%; 38%]), followed by metaregression analyses (16,7% [95% CI: -13,2%; 46,5%]), adjusted indirect comparisons (12,1% [95% CI: 6,1%; 18%]) and MTC (1,8% [95% CI: -1,7%; 5,2%]). Discrepant results are mainly detected if the basic assumption for an indirect comparison – between-study homogeneity – does not hold. However a systematic over- or underestimation of the results of direct comparisons by any of the indirectly comparing methods was not observed in this sample.
Discussion
The selection of an appropriate method for an indirect comparison has to account for its validity, the number of interventions to be compared and the quality as well as the quantity of available studies. Unadjusted indirect comparisons provide, contrasted with the results of direct comparisons, a low validity. Adjusted indirect comparisons and MTC may, under certain circumstances, give results which are consistent with the results of direct comparisons. The limited number of available reviews utilizing metaregression analyses for indirect comparisons currently prohibits empirical evaluation of this methodology.
Conclusions/Recommendations
Given the main prerequisite – a pool of homogenous and high-quality RCT – the results of head-to-head trials may be pre-estimated by an adjusted indirect comparison or a MTC. In the context of HTA and guideline development they are valuable tools if there is a lack of a direct comparison of the interventions of interest.
Abstract
Gesundheitspolitischer Hintergrund
Die vergleichende Nutzenbewertung von gesundheitsrelevanten Technologien ist nicht nur gesetzlich festgeschrieben (§ 139 und § 35b, SGB V: SGB = Sozialgesetzbuch), sondern ist auch ein zentrales Element von klinischen Leitlinien bzw. Entscheidungssituationen. Entscheidungsunterstützungsinstrumente wie Health Technology Assessments (HTA) sollten daher über ein valides methodisches Instrumentarium verfügen.
Wissenschaftlicher Hintergrund
Randomisierte kontrollierte Head-to-head-Studien, die Therapien direkt miteinander vergleichen, gelten als Goldstandard für den Wirksamkeitsvergleich. Da Studien dieses Typs nur begrenzt zur Verfügung stehen, sind Wirksamkeitsvergleiche auf indirekt vergleichende Methoden angewiesen, deren Validität allerdings noch kontrovers diskutiert wird.
Fragestellung
Fragestellungen für das vorliegende Assessment sind: Welche (statistischen) Methoden zur Durchführung indirekter Vergleiche therapeutischer Interventionen gibt es, wie häufig werden sie eingesetzt und wie ist ihre Validität im Vergleich zu den Ergebnissen direkter Vergleiche zu beurteilen?
Methodik
In einer systematischen Literaturrecherche werden die medizinnahen Datenbanken des Deutschen Instituts für medizinische Dokumentation und Information (DIMDI) nach Methodenpapieren und Anwendungen von indirekten Vergleichen in systematischen Reviews durchsucht. Die Literaturauswertung erfolgt qualitativ beschreibend (Methoden) und quantitativ für die Häufigkeit ihres Einsatzes.
Eine Validitätsprüfung der Methoden für indirekte Therapievergleiche ist über den Vergleich ihrer Ergebnisse mit dem Goldstandard - den Ergebnissen von Head-to-head-Studien - möglich. In systematischen Reviews, in denen Therapieverfahren sowohl direkt als auch indirekt verglichen werden, werden diese Ergebnisse mithilfe der z-Statistik auf Übereinstimmung geprüft.
Ergebnisse
29 Methodenpapiere und 106 Methodenanwendungen werden ausgewertet. Aus diesen lassen sich vier Methoden für indirekte Vergleiche identifizieren:
Nicht-adjustierte indirekte Vergleiche schließen, unabhängig vom Komparator, alle randomisierten kontrollierten Studien (RCT), die einen Studienarm mit einer der interessierenden Therapieoptionen enthalten, ein.
Adjustierte indirekte Vergleiche und
Metaregressionen greifen nur auf Studien zurück, die einen Arm mit einer Therapieoption von Interesse und einen Arm mit einem gemeinsamen Komparator aufweisen.
Während die genannten Verfahren konventionelle Metaanalysetechniken einsetzen, können die mit Bayes’schen Methoden arbeitenden Mixed treatment comparisons (MTC), ein komplexes Netzwerk aus RCT mit multiplen Komparatoren simultan analysieren.
Im Zeitraum von 1999 bis 2008 werden die adjustierten indirekten Vergleiche am häufigsten angewendet. Seit 2006 ist auch ein deutlicher Anstieg der Verwendung des methodisch anspruchsvolleren MTC zu verzeichnen.
Für die Validitätsprüfung stehen 248 Datensätze mit Ergebnisgegenüberstellungen aus direktem und indirektem Vergleich zur Verfügung. Der Anteil diskrepanter Ergebnisse mit statistischer Signifikanz war am größten bei den nicht-adjustierten indirekten Vergleichen (25,5% [95%-KI: 13,1; 38]), gefolgt von dem der Metaregressionen (16,7% [95%-KI: -13,2; 46,5]), der adjustierten indirekten Vergleiche (12,1% [95%-KI: 6,1; 18]) und des MTC (1,8% [95%-KI: -1,7; 5,2]). Diskrepante Ergebnisse werden vor allem dann beobachtet, wenn die Voraussetzung für die Durchführung eines indirekten Vergleichs – eine homogene Studienlage – nicht gegeben ist. Eine systematische Über- oder Unterschätzung der Ergebnisse direkter Vergleiche durch den indirekten Vergleich zeigt sich in dieser Stichprobe bei keinem der genannten Verfahren.
Diskussion
Die Auswahl einer geeigneten Methode für einen indirekten Vergleich hat sich an deren Validität, der Anzahl der zu vergleichenden Therapieoptionen sowie an der Qualität und Quantität der verfügbaren Studien zu orientieren. Nicht-adjustierte indirekte Vergleiche weisen in Gegenüberstellung zu direkten Vergleichen eine geringe Validität auf. Adjustierte indirekte Vergleiche und MTC können dagegen unter bestimmten Voraussetzungen Ergebnisse liefern, die in den meisten Fällen denen direkter Vergleiche entsprechen. Die Validität von indirekten Vergleichen mittels Metaregression kann auf Basis der wenigen, bisher verfügbaren Anwendungsbeispiele noch nicht beurteilt werden.
Schlussfolgerung/Empfehlung
Bei Einhaltung der zentralen Voraussetzung – Anwendung an einem Pool homogener, qualitativ hochwertiger RCT – lassen sich die Ergebnisse von qualitativ hochwertigen Head-to-head-Studien durch den Einsatz adjustierter indirekter Vergleiche und MTC abschätzen. Im Kontext von HTA und Leitlinienerstellung stellen sie somit wertvolle Hilfsinstrumente dar, wenn direkte Evidenz für einen Wirksamkeitsvergleich von Therapieverfahren nicht zur Verfügung steht.
Executive Summary
Health political background
In the system of statutory health insurance coverage decisions are increasingly based on the results of effectiveness or cost-effectiveness analyses conducted in the context of Health Technology Assessments (HTA).
Randomised controlled head-to-head trials which directly compare the effects of different therapies are considered the gold standard methodological approach for the comparison of the efficacy of medical interventions. While research progresses, more and more treatment options are being developed for certain indications. As concerns pharmacological interventions, proven positive effects compared to placebo may be sufficient to attain market approval. Therefore manufacturers rarely see the need to test the effects of new interventions against the effects of interventions that are in the market already. Given multiple therapeutic options for an indication, there will hardly be a head-to-head trial testing all options in parallel. Statements on comparative efficacy have to rely on indirect comparisons.
Scientific background
Comparisons are defined as indirect if the effects of interventions are compared to each other by their performance against a common comparator. This may be an active intervention (usually standard care) or placebo. Up to date many questions concerning the validity of indirect comparisons remain unanswered. In 2005 a British HTA-report was published, containing a comprehensive systematic overview of available methods for indirect comparisons and their validity. The report, which refers to publications up to 1999, introduces three methodological approaches for indirect comparisons: unadjusted and adjusted indirect comparisons, and metaregression-analyses. The authors conclude that discrepancies between the results of direct and indirect comparisons are considerable but their direction cannot be foreseen. It is pointed out that unadjusted indirect comparisons are highly prone to bias. Contrasting, adjusted indirect comparisons and metaregression-analyses provide a higher degree of validity.
On the basis of these results the current report gives an updated review of indirect comparisons by means of five research questions. It focuses on the comparative efficacy of medical interventions on the basis of high-quality randomised controlled trials (RCT).
Research questions
What methodological approaches for indirect comparisons of therapeutic interventions are available today (March 2008) and under what circumstances may they be applied?
What methodological approaches for indirect comparisons have been applied in systematic reviews and how often?
What is the validity of results from indirect comparisons compared to the results of direct comparisons and do both arrive at the same conclusions?
What is the validity of results from indirect comparisons compared to the results of direct comparisons if results from head-to-head trials are included in the indirect comparison?
Is it possible to identify a “gold standard methodology“ for indirect comparisons of competing interventions?
Methods
Systematic literature searches are conducted with two purposes:
Identification of papers describing methodological approaches for indirect comparisons.
Identification of systematic reviews which apply indirect comparisons (exclusively, or in addition to information from direct comparisons.
The basis of relevant references is extracted from the systematic review of Glenny et al. which covers the relevant literature up to 1999. To identify papers published after 1999 all medical databases of the German Institute of Medical Documentation and Information (DIMDI) and the ISI Web of Knowledge® are searched using the search strategy of Glenny et al. with minor modifications.
In addition, reference lists of the main methodological papers and systematic reviews as well as the homepages of the member institutions of the International network of agencies for Health Technology Assessment (INAHTA) are screened for relevant papers.
The description of the different methodological approaches for indirect comparisons is based as far as possible on information from methodological papers and completed by information from methods chapters of published applications. Their application frequency is calculated by counting the number of applications in all systematic reviews with indirect comparisons published 1999 to 2008.
Indirect comparisons which use metaanalysis techniques are validated empirically on the basis of systematic reviews that report results of direct as well as indirect comparisons. For every methodological approach the following hypothesis is tested: the results of the indirect comparison do not differ significantly from the results of the direct comparison. In order to test this hypothesis the difference in the results of a direct and an indirect comparison for the same intervention is calculated. This difference is named discrepancy. In order to make discrepancies from different reviews comparable, they are transformed into z-scores. The final validity check for the different methodological approaches for indirect comparisons was performed in four steps.
Test for systematic over- or underestimation: Are the z-scores normally distributed with an average value of z = 0 (Kolmogorov-Smirnov-Test, p≤0.05)?
Quantification of the amount of discrepancy: Calculation of the mean absolute value of z (absolute value of the mean of z-scores).
Determination of the share of statistically discrepant z-scores (absolute value of the mean of z-scores ≥1.96) among all z-scores.
For data sets with statistically significant discrepant z-scores: Homogeneity testing of the underlying study pool for the direct and indirect comparisons.
Finally it is reported in how many cases the direct and indirect comparisons arrive at the same conclusions.
While it is assumed that inclusion of head-to-head trials into indirect comparisons may level out discrepancies between direct and indirect comparison, the validity check (main analysis) is repeated in a subgroup of data sets (subgroup analysis), which do not include results from head-to-head trials into indirect comparisons.
Results
Method descriptions
Literature reveals that all methodological approaches for indirect comparisons are based on the same assumption: The observed variability among the results of studies that are going to be included into an indirect comparison is solely due to random error or - in other words - no meaningful between-study heterogeneity is present.
Four frequently applied methodological approaches for indirect comparisons, which use metaanalytical methods, are identified:
In an unadjusted indirect comparison the comparison of an intervention A with an intervention B is prepared by metaanalytically pooling the results of all study arms treated with A to get a summary estimate θA and by doing the same in a second metaanalysis with all study arms treated with B to get θB. This procedure is called “unadjusted indirect comparison” because the indirect comparison is not adjusting for events in the control group. There are four ways of comparing the summary effect estimates θA and θB: calculation of a summary effect estimate θA versus B; testing the difference between θA and θB for statistical significance; check the confidence intervals around θA and θB for overlap or a narrative comparison of the efficacy of A and B.
To perform an adjusted (for events in the comparator arms) indirect comparison the summary effect estimates θA versus comparator and θB versus comparator are calculated by conventional metaanalytic methods. For the comparison of the two summary effect estimates the same four methods as introduced in point 1 are applicable.
In metaregression-analyses the summary effect estimates θA versus comparator and θB versus comparator are estimated separately in two regression equations. In addition to adjusting for effects in the comparator arms the regression models can adjust for the effects of further covariates (which are regarded as the origin of heterogeneity – like i. e. age of study population or severity of illnesses). Again, the comparison of θA versus comparator and θB versus comparator is performed by the above mentioned four methods (see 1.).
Mixed treatment comparison (MTC) is a collective term for methodological approaches for indirect comparisons comparing more than two interventions simultaneously and possibly including head-to-head studies. MTC are able to rank an unlimited number of therapeutic options according to their efficacy. For that purpose Bayesian statistics is are applied to successively pool all available evidence from RCT in order to gain summary effect estimates for all possible comparisons of the interventions of interest.
Indirect comparisons without metaanalysis are performed if there is only one trial available for the options of interest or if available studies are highly heterogeneous. Indirect comparisons without metaanalysis also follow the principles of adjusted or unadjusted comparisons and may be performed by the four methods introduced in 1.
Application frequency of different methodological approaches for indirect comparisons
In 106 systematic reviews published between January 1999 and February 2008, found by the literature searches, a one metaanalytic method of an indirect comparison is applied (exception: Vandermeer et al. 2007 applied three different methods). The considerably most frequently applied method is the adjusted indirect comparison (60 times), followed by metaregression-analyses (17 times), unadjusted indirect comparisons (14 times), MTC (twelve times) and other approaches which cannot be allocated to the four main methodological groups (five times). In 2006 a steep rise in the utilisation of MTC is observed (ten examples from 2006 until 2007).
Validity check
For the validity check of the indirect approaches a total of 248 paired results from direct and indirect comparisons (z-scores) are available from 57 systematic reviews.
The test for systematic over- or underestimation reveals that none of the approaches for indirect comparisons systematically over- or underestimates the results of a corresponding direct comparison. Nevertheless, differences in the mean absolute z-scores are observed among the indirect methods: The largest are found with the unadjusted indirect comparisons (absolute value of the mean of z-scores = 1.63 [95%-CI: 1.20; 2.07]) while adjusted indirect comparisons (absolute value of the mean of z-scores = 0.95 [95 %-CI: 0.80; 1.09]), metaregression-analyses (absolute value of the mean of z-scores = 0.99 [95%-CI: 0.20; 1.79]) and MTC (n=57; absolute value of the mean of z-scores = 0.59 [95%-CI: 0.45; 0.73]) provide lower values. For the MTC a higher average z-score is observed in the subgroup analysis without inclusion of head-to-head trials (n=12; absolute value of the mean of z-scores = 0.83 [95%-CI: 0.40; 1.26]) while the results of the main and subgroup analyses are concordant for the other methods. It is to be noted though that the variance of the mean absolute z-scores differs considerably across the methods. The number of outstandingly high z-scores (absolute value of the mean of z-scores>1.96) varies among the indirect methodological approaches: the unadjusted indirect comparison provides a share of 25.5% (n=47; 95%-CI: 13.1%; 38.0%) of statistically significant discrepant z-scores, the adjusted indirect comparison of 12.1 % (n=116; 95%-CI: 6.1%; 18.0%), the metaregression-analysis of 16.7% (n=6; 95%-CI: -13.2%; 46.5%) and the MTC of 1.8% (n=57; 95%-CI: 2.1%; 34.3%). The results from the main and subgroup analysis are concordant. Summarising all indirect methods, 32 of 248 comparisons provide statistically significant discrepancies (12.9% [95%-CI: 8.7%; 17.1%]).
For 15 of the 32 statistically significant discrepancies (z-scores) no information concerning heterogeneity of the pooled studies is given by the original review authors. Proof of significant heterogeneity is found by the original review authors in eleven of the statistically significant discrepant comparisons but not in the remaining six.
Congruence of conclusions
In about half of the 248 comparisons of interventions no statistically significant difference in therapeutic efficacy is found - neither by direct nor by indirect comparison (49.2%; 95%-CI: 43.0%; 55.4%). In 21.8% (95%-CI: 16.6%; 26.9%) of cases one intervention is found to perform significantly better than the other by both the direct and the indirect comparison. In another 29% (95%-CI: 23.4%; 34.7%) of the analysed comparisons the conclusions of the direct and indirect comparison are not concordant. However the feared case that the direct comparison prefers the one and the indirect comparison the other intervention with statistical significance is observed rarely (five cases; corresponding to a share of 2% (95%-CI: 0.3%; 3.8%) among all cases.
Precision of indirect comparisons
In the analysed sample (n=248) the confidence intervals around the effect estimates of the indirect comparisons are found to be slightly smaller than those around the direct estimates (median difference: 9 % (25th percentile: -34%; 75th percentile: 30%) while the indirect comparisons include six times more studies than the direct comparisons (median: 6 (25th percentile: 4; 75th percentile: 13)). It may therefore be stated that for the analysed sample a six to one ratio of included studies (with an approximately equal number of participants) for the indirect and direct comparison yields almost comparable precision of effect estimates. This supports the claim of Glenny et al. that an indirect comparison must include four times as many studies (of equal size) as a direct comparison to yield the same precision.
Discussion
In decision making whether, and if so, which approach of indirect comparisons should be applied, four criteria should be taken into consideration:
1. Validity of the methodological approach
Compared to the results of head-to-head trials unadjusted indirect comparison provide the lowest validity. Some authors blame the method for breaking the randomisation of the included RCT because effects are not adjusted for events in the control groups. Therefore results are easily distorted by all types of biases that are normally typical for observational studies (i. e. selection bias and confounding).
In contrast the adjusted indirect comparison, the metaregression and the MTC adjust for events in the control groups and hereby preserve the randomisation of the included RCT. However, a selection bias on the meta-level may still appear if the included studies for one intervention use different inclusion criteria than the studies for the other intervention. The resulting unevenly distributed patient characteristics may, if they are associated with the outcome, act as confounders. Therefore the introduced methods for indirect comparisons should be applied only if the results that are going to be pooled are extracted from homogeneous studies. This prerequisite holds not only for the methodological approaches to indirect comparisons but for conventional metaanalyses as well.
These theoretical aspects are supported by the results of the empirical validity check. Adequate numbers of data were available to support the hypothesis that – provided a homogeneous pool of studies – adjusted indirect comparisons may arrive at the same results as direct comparisons.
Likewise a high validity can be ascribed to MTC, if they include head-to-head studies with the interventions of interest. The validity of metaregression-analyses, MTC without included head-to-head trials and the rarely used other methods cannot be appraised yet due to a limited number of available applications.
2. Number of therapies to compare
If only two interventions are to be compared indirectly the adjusted indirect comparison seems to be the most appropriate methodological approach considering the validity data and the limited methodological effort. If more than two interventions are to be compared, only a MTC is applicable to rank them in order of their efficacy.
3. If results from head-to-head trials are to be included
Beside MTC the three other methods for indirect comparisons also provide methodological extensions for the inclusion of head-to-head trials into an indirect comparison. However there haven’t been sufficient data for a check of their validity. It can only be stated yet that MTC which include head-to-head trials yield similar results as the head-to-head trial alone. Their additional advantage is the possible increase in precision of the effect estimate by combining the results of direct and indirect comparisons.
4. Heterogeneous trials
The indirect comparison by metaregression-analysis cannot yet be regarded a sufficiently validated method that trustworthily adjusts for factors that cause heterogeneity. Likewise adjusting for covariates in MTC by introduction of inconsistency factors has not been validated due to the limited number of applications. In conclusion: If considerable heterogeneity is present among the trials, the risk of bias in indirect comparisons is high – regardless of what methodological approach is used. In cases of low heterogeneity a conservative estimate may be calculated by a random effects model. Fixed effects models should only be applied in homogenous pools of studies. Both models are applicable in all methodological approaches for indirect comparisons described.
Conclusions
There are a number of methodological approaches available for indirect comparisons which differ in their ability to summarize the evidence from different pools of studies.
The empirical investigation reveals that mainly the results of unadjusted indirect comparisons differ from the results of direct comparisons. The other indirect methods may provide concordant results with direct comparisons, especially if the summarized studies are characterized by low heterogeneity. For that reason adjusted indirect comparisons, metaregression-analyses and MTC should only be used when study results are homogeneous. In the context of HTA and the development of clinical guidelines they are valuable tools, if direct evidence for a comparison of efficacy of interventions is not available.
Before indirect comparisons can be applied more broadly, it remains to be defined at which amount of heterogeneity (and inconsistency) they provide effect estimates of acceptable validity - because a perfectly homogeneous pool of studies is rarely found in real life.
