. 2013 Aug 30;13:106. doi: 10.1186/1471-2288-13-106

Table 1.

Recommendations for investigating clinical heterogeneity in systematic reviews

Recommendation category	Summary description
Review team	It is recommended to have at least one or two individuals with clinical expertise, and at least one or two individuals with methodological expertise in systematic reviews/meta-analyses and on the type of study designs you are including [19,20]. The team should recognize their own biases and attempt to compensate by including members with a wide range of (potentially conflicting) beliefs.
Planning	All investigations of clinical heterogeneity should ideally be pre-planned a priori and not be driven by observing the data [1,17,21-35]. But, methods for looking at data to identify unanticipated variables of interest (i.e., post-hoc investigations) need to be pre-specified, as well (e.g., looking at summary tables, graphical displays) [24,27,28,32,36]. Describe the following: which variables you will investigate, how this will be done, when you will perform these investigations, and how results will be interpreted and incorporated into your results and conclusions.
Rationale	Variables should have a clear scientific rationale for their role as a treatment effect modifier (e.g., pathophysiological, pharmacologic, evidence from prior research, clinical experience) [1,7,17,20,26,27,32-34,37],[38]. Exercise parsimony in defining variable choices [1,20,28,33,39], and consider that if variables are not reported, this may be due to an under reporting problem in primary studies. That is, not finding an effect for clinically relevant variables does not imply a consistency of effect [20].
Types of clinical variables to consider	Patient level: Age, baseline disease severity, sex, gender, ethnicity, comorbidities, genetic, other psychosocial variables, and other important features of the disease [2,3,7,16].
	Intervention level: Dose/strength/intensity of treatment, duration of treatment, brand/manufacturer, co-interventions, timing, route of administration, compliance, clinician training, implementation, other [1,2,4,5,8,12].
	Outcome level: Event type, outcome measure type, outcome definition, length of follow-up, timing of outcome measurement(s) [1,2,4-6].
	Other: Research setting, geographical issues, length of follow-up [1,3,4].
Role of statistical heterogeneity	Reviewers should think through all potentially relevant variables to explore and not rely on statistical measures of heterogeneity to justify such investigations [1,20,40,41]. Clinical heterogeneity related to specific individual factors could be present even in the absence of a significant statistical test for the presence of heterogeneity (e.g., Cochran’s Q test) [24,27,31,36].
Plotting and visual aids	Consider using graphical displays of data from trials to help identify potential clinical reasons for heterogeneity. Examples of plotting and visual aids of the data include: summary data sheets [27], forest plots [27,28,31,32,42], L’Abbé plots [24,32,43], funnel plots [24,44], Galbraith plots/radial plots [32], influence plots [24,45,46], dose/response curves [4], multidimensional scaling [47], and heat maps [48,49]. Reviewers should be careful to avoid data dredging while using these methods of data display.
Dealing with outliers	When there are individual trials that are clear outliers, attempt to determine why and consider a sensitivity analysis where this/these trial(s) are eliminated and observe how the effect estimate changes. One may also consider an influence analysis, in which the effect of deleting individual studies from the analysis on the overall estimate is explored.
Number of investigations to perform and variables to explore	Use parsimony as a guide to such investigations. A rule of thumb for the number of trials is that there should be close to ten trials when working with summary or aggregate patient data (APD) or ten individuals per variable, when working with pooled or individual patient data (IPD) [49-52]. Consider making a hierarchy of clinically related variables and investigate only those variables for which your rationale and power are sufficient.
The use of APD vs. IPD	APD = summary or aggregate data from trials only. This is subject to ecological bias [30,51,53-55]—that is, investigations of trial-level variables are valid (e.g., dose, duration), while investigations of patient-level variables are not (e.g., age, baseline severity).
	IPD = Original individual data on each patient. This type of data is valid for both trial-level and patient-level variables [16,22,34-36,56-60]. But, one must control for baseline difference between the patients across trials.
	Consider contacting authors and reviewing protocols of primary studies where available. Obtaining IPD for investigating clinically related patient-level variables is ideal.
The role of the best evidence syntheses	Pre-plan to use a best evidence synthesis if the studies are not reasonably combinable. Be sure to pre-plan criteria to determine combinability of included trials (e.g., sufficiently similar patient groups). This approach can also be useful for exploring differences between/within the included studies. Several recommendations for how to perform a narrative synthesis, for using levels of evidence or performing a best evidence synthesis exist in the literature e.g., [61-63].
Statistical methods	Many statistical methods are available for investigating the association of study findings with clinically related variables, including frequentist, Bayesian and mixed methods. Stratification and various forms of meta-regression can be useful. We recommend consulting respected texts and individuals with expertise in the statistical methods of meta-analyses and explorations of heterogeneity, especially meta-regression [23,27,28,32,35].
Interpretation of findings	Results are generally observational and thus hypothesis generating only [1,23,24,28,33,53]. Authors should express the validity of and confidence in their findings. When interpreting results of these investigations it is suggested to consider: confounding, other sources of bias (e.g., publication, misclassification, dilution, selection) [20,32], magnitude and direction of effect and CI [1,20], and thinking through the plausibility of causal relationships [41]. It may not be appropriate to conclude that there is consistency of effect if subgroup effects are not found [20]. Authors should use their findings to make specific recommendations about how future research could proceed or build upon these results (not just concluding that “more research is needed”).
Reporting	Consider the potential for lack of reporting of data or information relating to clinical variables in the primary studies. Consider contacting the authors for missing or additional data on important clinical variables. Reviewers must be careful to report all of their proposed and actual investigations of clinical heterogeneity. The PRISMA statement should be adhered to when reporting their reviews [11].