Abstract
The validity and applicability of a systematic review depends on the quality of the primary studies that are included and the quality of the methods used to conduct the review itself. Sometimes, observational studies represent the best available evidence. Subject to selection, information, and confounding biases, observational studies are thought to overestimate treatment or exposure effects. A systematic review of observational data must therefore attempt to minimize or prevent these sources of bias by developing explicit but also broad inclusion and exclusion criteria focused on extracting the best available evidence relevant to the review question. Systematic reviews must also make use of an expansive search strategy, with use of multiple resources, to demonstrate the reproducibility of selection and quality-assessment criteria, to perform a quantitative analysis and adjustment for confounding where appropriate, and to explore possible reasons for differences between the results of the primary studies. In this paper, we address the advantages and limitations of systematic reviews and meta-analyses of observational studies and suggest solutions at the design phase of protocol development.
Introduction
Systematic reviews are unique in their ability to provide a comprehensive collection, summary, and often analysis of all studies relevant to a focused research question. This study design utilizes a systematic approach to identify all of the available evidence from multiple studies in order to obtain an accurate and unbiased estimate of the association between interventions or exposures and events that would be widely applicable to a larger population1. While all successful systematic reviews comprehensively and explicitly summarize the results of a group of relevant studies, not every review includes a meta-analysis. A meta-analysis includes the pooling of data from individual studies with use of statistical techniques to derive quantitative estimates of the magnitude of treatment effects and their associated precision. By combining patient data from multiple studies appropriately, meta-analyses allow for larger sample sizes and therefore statistical power to determine treatment effects. Thus, every meta-analysis should be based on an underlying systematic review, but not every systematic review is analyzed with use of a meta-analysis. In the absence of large definitive clinical trials, systematic reviews and meta-analyses can help inform clinical guidelines and provide a stepping-off point for future clinical research.
Despite these benefits, systematic reviews are not without limitations. All reviews are retrospective and observational and are therefore subject to random error and systematic bias2. The validity and applicability of a systematic review depends on the quality of the primary studies that have been included in the review and on the conduct of the review itself. For instance, the results of a meta-analysis of retrospective observational studies may be subject to confounding, selection, and information biases not found in a review of well-done randomized controlled trials (Table I). In addition, as with any study, further bias can be introduced with an insufficient and inadequately rigorous methodological protocol. Among orthopaedic surgical research, Bhandari et al. found the quality of meta-analyses in major orthopaedic journals to be lacking2. This is a problem because methodological deficiencies limit the validity of a review's conclusions. In this paper, we review the influence of primary study design on the results and interpretation of systematic reviews, and we present methodological considerations in the context of conducting systematic reviews and meta-analyses of observational studies in orthopaedics.
TABLE I.
Common Terms
| Confounding bias | Occurs when two factors are closely associated and the effects of one confuse or distort the effects of the other factor on the outcome |
| Selection bias | The preferential inclusion of subjects with certain treatment outcomes |
| Information bias | Measurement error due to either an imperfect definition of a study variable or a flawed data-collection procedure |
| Effect size | A common measure used in meta-analyses, the effect size is the standardized magnitude of a treatment effect independent of study sample size. Generally, the larger the effect size, the greater the impact of an intervention |
| Odds ratio | The probability or ratio of risk for the outcome of interest between those exposed and those not exposed. A relative risk of 1 indicates that the exposure (for example, smoking) is not associated with the outcome |
| Relative risk | The ratio of the odds of being exposed amongst persons with and without the outcome of interest. An odds ratio of 1 indicates that there is no association between exposure and effect. This is the most appropriate measure of a dichotomous outcome in a case-control study |
The Influence of Study Design
The quality of any study within the hierarchy of evidence depends on “the confidence that the trial design, conduct, and analysis has minimized or avoided biases in its treatment comparisons.”3 It is generally held that, to varying degrees, different observational study types are subject to more sources of bias than randomized trials are4.
Although randomization is not inherent, well-done prospective observational studies remain generally less biased, and are therefore considered higher-quality evidence, than retrospective studies. This is because the predictor variable is measured before the outcome, thus establishing a time sequence of events and preventing predictor measurements from being influenced by knowledge of the outcome5. The prospective approach also allows investigators to measure multiple exposures and events more completely and accurately than is possible retrospectively5. As they are more prone to confounding and selection bias, retrospective studies may overestimate treatment effects6. For example, consider a study in which all of the information was collected from medical records retrospectively only for those patients who had at least one year of follow-up. It is possible that some patients may not be included because they had very poor results and dropped out of the study or even died before one year. The treatment effect would thus be overestimated due to selection bias. Because only the best-designed experiments may achieve enough safeguards to guard against bias and provide the most reasonable estimate of the true association between events1, use of the best available evidence for any systematic review is necessary.
In some cases, for ethical, feasibility, and applicability reasons, observational studies may represent the highest form of evidence in a body of literature. For example, in a recent systematic review that examined return to function after limb salvage or early amputation for the treatment of severe lower-limb injury, the highest available evidence consisted of small prospective cohort and case-control studies7. As it is generally considered unethical to randomize patients to either delayed or expedient surgery, the best available evidence of the effect on mortality of a delay in surgical treatment for a hip fracture comes from prospective observational studies (Level-II evidence)8. This is true for most research questions that seek to determine the influence of potential risk factors on patient outcomes. For instance, the question: What risk factors contribute to earlier compared with later falls among elderly patients undergoing rehabilitation following an acute illness?9, can best be answered by collecting data on many potential risk factors and determining if they are associated with a change in risk among those patients who fell at an earlier or later time during the study. A comprehensive summary and analysis of this literature can help not only to better inform clinical management but also to identify if there is a need for further research. However, because these studies are subject to the influence of known, unknown, or unmeasured prognostic factors due to a lack of randomization, their systematic compilation, analysis, and interpretation must be that much more rigorous.
Conducting a Systematic Review and Meta-Analysis of Observational Studies
Irrespective of study design, the steps involved in conducting a systematic review are the same (Table II). Specific methodological issues pertaining to systematic reviews of observational studies relevant to this process will be discussed.
TABLE II.
The Process of Conducting a Systematic Review36*
| Define the question | Specify inclusion and exclusion criteria, population, intervention or exposure, outcome, and methodology |
| Establish a priori hypotheses to explain heterogeneity | |
| Conduct a literature search | Decide on information sources: databases, experts, funding agencies, pharmaceutical companies, personal files, registries, citation lists, or retrieved articles |
| Determine restrictions: time frame, unpublished data, language | |
| Identify titles and abstracts | |
| Apply inclusion and exclusion criteria | Apply inclusion and exclusion criteria to titles and abstracts |
| Obtain full articles for eligible titles and abstracts | |
| Apply inclusion and exclusion criteria to full articles | |
| Select final eligible articles | |
| Assess agreement among reviewers on study selection | |
| Create data abstraction | Assess methodologic quality (validity of the study) |
| Assess agreement among reviewers on validity decisions | |
| Data abstraction: participants, interventions, comparison interventions, study design | |
| Results | |
| Conduct analysis | Determine method for pooling results |
| Pool results (if appropriate) | |
| Decide on handling missing data | |
| Explore heterogeneity | |
| Conduct sensitivity and subgroup analysis | Explore possibility of publication bias |
(Reproduced, in modified form, from: Bhandari M, Devereaux PJ, Montori V, Cinà C, Tandan V, Guyatt GH; Evidence-Based Surgery Working Group. Users' guide to the surgical literature: how to use a systematic literature review and meta-analysis. Can J Surg. 2004;47:61. Reprinted with permission of the publisher. ©2004 Canadian Medical Association.)
Even though a systematic review is retrospective in design, there is still the potential for systematic bias; this can be avoided by the development of an explicit protocol to detail the methodology of the study in an a priori manner. Every systematic review should start with a well-designed protocol that is centered on a researchable, focused, and clinically relevant question and that provides for a clear description of the population, intervention or exposure, and outcomes of interest. This typically determines the type of study design that best answers the question1. For example, a question dealing with the impact of smoking on time to bone union in patients with a tibial fracture10 might best be answered by a rigorously conducted prospective cohort study. Although a researcher may have a strong opinion on the best study design to answer the review question, this does not always mean that these types of studies are available or that studies of higher methodological quality are not available. In order to maximize the quality of a primary study set, and therefore the systematic review, the inclusion criteria for study design should initially be broad. Then, prior to the application of more specific eligibility criteria, reviewers may decide on which relevant study designs, if not all, would be best to use in order to minimize the chance of a biased estimate of treatment effect.
Defining Eligibility Criteria
In addition to indicating study design, the inclusion criteria specify the population, intervention or exposure, type of comparison, and outcomes of interest as defined by the research question. While the inclusion criteria tend to be broad and inclusive1, the exclusion criteria often limit the scope of the review and can ensure a more homogeneous sample of studies. Montori et al.1 and Bhandari et al.2 showed that 78% of orthopaedic meta-analyses explicitly stated their criteria for inclusion of studies in the review, but only 43% of them were judged as able to avoid bias in the selection of these studies. For a review of observational research, it is of even greater importance to provide a rationale for the inclusion and exclusion criteria that are used, showing how those criteria may minimize potential biases and confounding. For example, setting the eligibility criteria before the systematic search may prevent inclusion criteria bias (tailoring the inclusion criteria to a direction of effect after reviewing the results of relevant studies)11. Confounding bias may be minimized by narrowing the types of patients, conditions, or injuries for inclusion to produce a more homogeneous sample of patients. However, this method may be used at the expense of generalizability or an adequate sample size.
As previously mentioned, certain observational study designs can be subject to more bias than others, and their exclusion may help to increase the validity of the systematic review. If the intent is to conduct a meta-analysis, case series or ecological-type studies should invariably be excluded because noncomparative studies cannot be pooled statistically12. If both retrospective and prospective cohort studies relevant to a certain topic are available in the literature, reviewers may decide to exclude any retrospective designs because they are more prone to bias and overestimation of treatment effects5. However, there is controversy over how different study designs should be used in a systematic review. While different study types should rarely be merged in a meta-analysis12, the strength of an association may be increased if the results of good observational studies with different designs consistently support the superiority of one treatment over another, although the studies are unlikely to have the same biases6. Despite the potential for confounding, Shrier et al. suggest that including observational studies and randomized controlled trials in the same review may provide important additional information, thereby improving the inference of the results13.
Identification of Studies—The Search Strategy
The most important aspect and defining characteristic of a systematic review is the systematic search used to identify all of the studies that are relevant to the research question. A systematic search has been defined as a protocol that “lists all potential data sources and multiple (and frequently overlapping) strategies to consult them.”1 Largely contingent on the primary studies themselves, the validity of the results of any systematic review is invariably linked to the search strategy employed to find the breadth of this evidence. It is therefore essential to use a broad search strategy when identifying original publications for inclusion. Electronic databases such as MEDLINE and EMBASE provide the bulk of the data for most systematic reviews1. However, these databases and others, such as the Cochrane Controlled Trials Register14, have improved indexing and limit tools for searching controlled trials. It is not feasible at this point to narrow a search to a particular observational study type when searching these databases. Any attempt to do so could result in potentially relevant studies being habitually missed. It is also important to search multiple databases for observational literature. Lemeshow et al. empirically demonstrated that searches for observational studies limited to one or two databases retrieve only 60% to 80% of the pertinent publications15.
In addition to searching electronic databases, other sources of information can come from expert opinion in the field of interest, hand-searching of bibliographies of articles, and the posters and abstracts of conference proceedings for both published and unpublished literature. For example, the Orthopaedic Trauma Association has an online search feature to help narrow the search of their annual meeting abstracts and posters to the topic of interest. Reviewers may also decide to search for and possibly include relevant articles written in all languages without restrictions on the date of publication.
Used in combination, these methods can help to ensure that all relevant literature is accounted for, therefore minimizing publication bias. Publication bias is “the tendency on the parts of investigators, reviewers, and editors to submit or accept manuscripts for publication based on the direction or strength of the study findings.”16 Smaller studies with negative or nonsignificant results tend to not be submitted, published, or indexed in MEDLINE or other major databases or are published in a language other than English1. These studies are likely to be missed, resulting in a pooled overestimate of treatment or exposure effect.
The most widely used test for publication bias is the funnel plot. This method plots the magnitude of the exposure effect relative to the sample size of the study. The resulting scatterplot, in the absence of bias, resembles a symmetrical inverted funnel; smaller studies (which can be expected to vary more widely in both the positive and negative direction by chance) appear at the base of the funnel, while larger studies (which are given more weight in the pooled estimate) appear at the top center of the funnel. Conversely, if publication bias is present, funnel plots are often skewed and asymmetrical.
It is advised that reviewers enlist the help of subject-matter experts, research methodologists, and librarians familiar with conducting searches for systematic reviews to ensure that the search protocol is comprehensive but not so expansive as to make completing the review impractical.
The Screening Process and the Process of Assessing Reviewer Agreement
Following an initial broad and well-defined search, it is time to apply the a priori eligibility criteria and narrow the studies for inclusion. Throughout this process, investigator bias must be limited. As decisions regarding the inclusion or exclusion of individual studies, data extraction, and quality assessment often involve some degree of subjectivity, it is useful to have at least two people conduct these steps independently, with disagreements being resolved by consensus or, when necessary, by a third reviewer. Agreement between two reviewers at each stage of the review process (Fig. 1) can be measured with use of the kappa statistic17,18, a statistic that measures the observed agreement by chance between observers or measurements and describes the extent of this possible agreement over and above chance. For variables with more than two categories, such as a quality-assessment score, the intraclass correlation coefficient (which yields identical values to weighted kappa with quadratic weights) can be used to quantify agreement. Substantial agreement between reviewers (κ > 0.65) indicates that the criteria were “clear, objective, and consistently applied.”1 The application of the inclusion and exclusion criteria to titles, abstracts, and the full texts should therefore be piloted to ensure they are understood and applied consistently and accurately by all of the reviewers. The screening process may be documented through the use of a flow-chart diagram similar to that shown in Figure 1.
Fig. 1.

Flow diagram demonstrating the individual steps in the study-selection process relevant to a review of observational studies27. (Reproduced, in modified form, from: Zlowodzki M, Poolman RW, Kerkhoffs GM, Tornetta P 3rd, Bhandari M; International Evidence-Based Orthopedic Surgery Working Group. How to interpret a meta-analysis and judge its value as a guide for clinical practice. Acta Orthop. 2007;78:602. Reprinted with permission of Taylor and Francis Ltd. [http://www.informaworld.com/sort].)
Data Extraction and Assessment of Methodological Quality
The type of data to be extracted from relevant studies should be pertinent to the review question, be prespecified in the review protocol, and be extracted systematically1. Similar to the process of article selection, two reviewers should use a structured and piloted data extraction form to safeguard against extractor bias11. During the data extraction process, every effort should be made to obtain complete data on relevant outcomes from all studies. This, and asking the primary authors to confirm the accuracy of the data extracted, limits the possibility of transferring to the review any reporting and recording error biases found in the primary studies11.
The type of outcome data extracted depends on the unit of measure (e.g., mean, relative risk, odds ratio, or count data) that is applicable to the primary study designs used in the review. For example, when extracting dichotomous data from studies that collected their data retrospectively, it is impossible to estimate the incidence, prevalence, or risk of the development of the outcome of interest. In this case, only odds ratios can be calculated as opposed to the more easily interpreted relative risk (Table I)19. So, for instance, while it is proper to calculate the odds ratios for a meta-analysis of retrospective case-control studies, authors must then be careful not to interpret these estimates as if they were relative risks. Unless the outcome of interest is exceedingly rare, if the odds ratio is interpreted as a relative risk, it will always overstate any effect size19. Given this common problem in the literature, reviewers should calculate and use the relative risk as a summary estimate of dichotomous outcomes whenever appropriate (e.g., for prospective cohort data).
Further, because confounding is expected in any observational study, it may be useful to extract and analyze adjusted results. For example, given the review question: Among elderly patients, what is the effect of a delay in surgical treatment for a hip fracture on all causes of mortality?, some might argue that a major confounding factor affecting mortality in patients undergoing delayed surgery is that these patients tend to be sicker on admission and are therefore more likely to die than those who undergo immediate surgery8,20,21. Knowing this, the authors of the primary studies may adjust the mortality results by the severity of patient illness or the number of chronic health conditions patients had on admission for their hip fracture, as well as other known confounding variables, with use of multivariate logistic regression models. Multivariate regression describes the linear relationship between one dependent variable and multiple independent variables. These estimates, if similar enough in adjustment factors for the same outcome, may be combined in a meta-analysis to provide a less biased estimate of the overall effect from observational data. However, it is important to note that despite adjustment for potential confounding variables, the results may still be subject to confounding bias because other unknown or unmeasured factors could be missed.
The methodological quality of the primary studies must also be extracted. Generally, the features most relevant to the internal validity of an observational study include appropriate selection of participants, appropriate measurement of exposure and outcome variables, and appropriate use of design or analytical methods to control confounding22,23. Unlike the protocol for randomized controlled trials, there is no standard quality assessment tool for observational studies, and many of the tools that are available lack a rationale or are used improperly22. Given these issues, it is advised that primary studies not be excluded or weighted on the basis of arbitrary quality scores11,24. Instead, quality assessment is useful for alerting reviewers and readers to the extent of bias and the resulting uncertainty in the final study pool. Comprehensive and reliable tools for assessing quality and susceptibility to bias in observational studies have been summarized22,23. The best quality assessment tool for a particular systematic review will depend on the type of primary study design, the exposure or treatment, and the outcomes pertinent to the review question.
The Meta-Analysis and Evaluation of Heterogeneity
A well-done systematic review provides a summary of all of the available evidence in an attempt to answer a focused research question. If appropriate, this summary may be quantitative, with use of the method of meta-analysis. The advantage of implementing a meta-analysis is that, compared with the results of individual studies, pooled results can increase statistical power and lead to a more precise estimate of treatment effect25. When results are combined, larger studies with narrower confidence intervals are given more weight than smaller studies with less precise estimates of effect. This was the case in a meta-analysis in which the results of arthroplasty and internal fixation were compared with regard to hip fracture revision rates26. The pooled estimate is far more precise than the estimate for each individual study, as indicated by the wider confidence intervals around each point estimate in the forest plot (Fig. 2). Variability in the estimate of each study can come from the variance both within and between studies. When combining data, two statistical models can be used: (1) the fixed-effects model, and (2) the random-effects model. Briefly, the fixed-effects model assumes that the between-study variance is zero and that the true effect of treatment is the same for every study25,27. The random-effects model takes into account both sources of variance and assumes that an effect may vary across studies because of differences between studies27. Due to the increased potential for unknown or unmeasured variability among observational studies, the random-effects model28 is the more conservative and therefore preferable choice for observational data-pooling. Other quantitative techniques for combining data have been summarized25.
Fig. 2.
The effect of arthroplasty compared with that of internal fixation on the revision rates associated with displaced fractures of the femoral neck26. (Reprinted from: Bhandari M, Devereaux PJ, Swiontkowski MF, Tornetta P 3rd, Obremskey W, Koval KJ, Nork S, Sprague S, Schemitsch EH, Guyatt GH. Internal fixation compared with arthroplasty for displaced fractures of the femoral neck: a meta-analysis. J Bone Joint Surg Am. 2003;85:1678.)
The actual decision of whether or not to conduct a meta-analysis must be made at three stages in the review process. First, and most importantly, at the protocol development stage, reviewers need to determine whether or not the nature of the review question and eligibility criteria will most likely result in studies that are so different from each other that one summary would not sensibly describe all included studies. If so, reviewers should make the decision not to conduct a meta-analysis, but rather develop a qualitative summary of the evidence before starting the review process. For example, in a review describing all currently used methods for the prevention of osteoporosis in the elderly, such as those that are education, policy, and drug-related, it does not make sense or have any clinical meaning to combine all of the possible interventions for a final estimate given that each intervention is expected to have a differential effect on the prevalence of disease. Second, if it is likely that the results can be pooled, the decision to conduct a meta-analysis should be revisited following data extraction but before any analysis so as to prevent the decision from being driven by the data. At this point, the final study set is achieved and reviewers are able to discern if any unexpected dissimilarities among the outcomes, exposures, and populations would make pooling of these study results no longer reasonable. As long as the reviewer is “determining the best estimate of the same underlying truth that all of the studies being pooled are trying to measure,”1 there should be no hesitation to pool the study results. Lastly, at the data-analysis stage, formal statistical tests, such as the Cochran chi-square test (Cochran Q) and the I2 statistic, can be used to determine if the extent of unexplained heterogeneity between study estimates warrants a meta-analysis. While the Cochran Q statistic tests for heterogeneity at a predefined threshold of significance, the I2 statistic shows the proportion of variability across studies that can be attributed to heterogeneity rather than sampling error29. A value greater than 25% is considered to reflect low heterogeneity; 50%, moderate heterogeneity; and 75%, high heterogeneity27. However, there is no definitive cut-off value at which no data-pooling should be performed27. These tests are typically underpowered and therefore a more conservative level of significance should be applied (e.g., p < 0.1 for heterogeneity).
Heterogeneity in effect sizes should be expected when pooling observational data because confounding and selection biases often distort the findings of the primary studies. Because of these inherent biases, Egger et al. suggested that the statistical combination of data should not be a prominent component of systematic reviews of observational studies, and they warned that meta-analyses of observational data may produce very precise but equally spurious results30. However, testing for potential sources of heterogeneity may minimize the potential for biased estimates and generate hypotheses for future research. Also known as subgroup analyses, reviewers can test whether grouping the studies according to predefined differences in study design, quality, populations, interventions or exposures, outcomes, and other characteristics reduces heterogeneity. If so, then this result indicates that the characteristic that defined the subgroup is related to the treatment or exposure effect, and that the pooled subgroup estimate may represent this subgroup more sensibly1. Another approach is to do a meta-regression analysis to determine the effect of multiple predictor variables, defined a priori, on the final pooled estimate by performing standard regression techniques on the study data rather than relying on individual patient data.
Summary
Although some studies have shown that well-done observational studies produce estimates of effect similar to those of randomized controlled trials13,31,32, it remains generally held that poorly done observational studies can systematically overestimate the magnitude of the treatment effect and are therefore less valid than randomized controlled trials. As a result, systematic reviews of this type of evidence must be held to a higher standard of quality. The comprehensive checklists for reporting reviews of observational studies, as proposed by the Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group33 and by Oxman et al.34,35, are two important sources of information that may be applied to ensure this standard.
While the quality of the primary studies will always be the major limiting factor in drawing valid conclusions, the quality of the methods used to conduct the systematic review and meta-analysis is also important, as the use of high-quality methods will ensure that the summary of the results is as free of bias as possible in order to prevent the reporting of false significant findings of benefit or harm. Reviewers may minimize or prevent the introduction of bias in their review in several ways: by developing explicit but also broad inclusion and exclusion criteria focused on extracting the best available evidence relevant to the review question, by using an expansive search strategy utilizing multiple resources, by demonstrating the reproducibility of selection and quality-assessment criteria, by performing a quantitative analysis and adjustment for confounding where appropriate, and by exploring possible reasons for differences between the results of the primary studies. Despite these safeguards, however, potential residual confounding of observational studies might still limit any definitive conclusions. In addition, although the results of systematic reviews of observational studies must be presented and interpreted with caution, they remain an informative basis for future research as well as clinical decision-making when randomized controlled trial evidence is unavailable. For a review to be clinically relevant and informative, it is essential that a methodologically sound protocol be established a priori with the goal of producing the most valid and precise estimate of effect or risk possible with observational data.
Footnotes
Disclosure: The authors did not receive any outside funding or grants in support of their research for or preparation of this work. Neither they nor a member of their immediate families received payments or other benefits or a commitment or agreement to provide such benefits from a commercial entity.
References
- 1.Montori VM, Swiontkowski MF, Cook DJ. Methodologic issues in systematic reviews and meta-analyses. Clin Orthop Relat Res. 2003;413:43-54. [DOI] [PubMed] [Google Scholar]
- 2.Bhandari M, Morrow F, Kulkarni AV, Tornetta P 3rd. Meta-analyses in orthopaedic surgery. A systematic review of their methodologies. J Bone Joint Surg Am. 2001;83:15-24. [PubMed] [Google Scholar]
- 3.Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials. 1995;16:62-73. [DOI] [PubMed] [Google Scholar]
- 4.Kunz R, Oxman AD. The unpredictability paradox: review of empirical comparisons of randomised and non-randomised clinical trials. BMJ. 1998;317:1185-90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hulley SB, Cummings SR, Browner WS, Grady DG, Newman TB. Designing clinical research. 3rd ed. Philadelphia: Lippincott Williams and Wilkins; 2007. [Google Scholar]
- 6.Hartz A, Marsh JL. Methodologic issues in observational studies. Clin Orthop Relat Res. 2003;413:33-42. [DOI] [PubMed] [Google Scholar]
- 7.Busse JW, Jacobs CL, Swiontkowski MF, Bosse MJ, Bhandari M; Evidence-Based Orthopaedic Trauma Working Group. Complex limb salvage or early amputation for severe lower-limb injury: a meta-analysis of observational studies. J Orthop Trauma. 2007;21:70-6. [DOI] [PubMed] [Google Scholar]
- 8.Shiga T, Wajima Z, Ohe Y. Is operative delay associated with increased mortality of hip fracture patients? Systematic review, meta-analysis, and meta-regression. Can J Anaesth. 2008;55:146-54. [DOI] [PubMed] [Google Scholar]
- 9.Vassallo M, Sharma JC, Briggs RS, Allen SC. Characteristics of early fallers on elderly patient rehabilitation wards. Age Ageing. 2003;32:338-42. [DOI] [PubMed] [Google Scholar]
- 10.Castillo RC, Bosse MJ, MacKenzie EJ, Patterson BM; LEAP Study Group. Impact of smoking on fracture healing and risk of complications in limb-threatening open tibia fractures. J Orthop Trauma. 2005;19:151-7. [DOI] [PubMed] [Google Scholar]
- 11.Felson DT. Bias in meta-analytic research. J Clin Epidemiol. 1992;45:885-92. [DOI] [PubMed] [Google Scholar]
- 12.Sauerland S, Seiler CM. Role of systematic reviews and meta-analysis in evidence-based medicine. World J Surg. 2005;29:582-7. [DOI] [PubMed] [Google Scholar]
- 13.Shrier I, Boivin JF, Steele RJ, Platt RW, Furlan A, Kakuma R, Brophy J, Rossignol M. Should meta-analyses of interventions include observational studies in addition to randomized controlled trials? A critical examination of underlying principles. Am J Epidemiol. 2007;166:1203-9. [DOI] [PubMed] [Google Scholar]
- 14.The Cochrane Controlled Trials Register. The Cochrane Library. Issue 4 ed. Curbridge, United Kingdom: Update Software, 2001. [Google Scholar]
- 15.Lemeshow AR, Blum RE, Berlin JA, Stoto MA, Colditz GA. Searching one or two databases was insufficient for meta-analysis of observational studies. J Clin Epidemiol. 2005;58:867-73. [DOI] [PubMed] [Google Scholar]
- 16.Dickersin K. The existence of publication bias and risk factors for its occurrence. JAMA. 1990;263:1385-9. [PubMed] [Google Scholar]
- 17.Fleiss JL. Measuring agreement between two judges on the presence or absence of a trait. Biometrics. 1975;31:651-9. [PubMed] [Google Scholar]
- 18.Donner A, Klar N. The statistical analysis of kappa statistics in multiple samples. J Clin Epidemiol. 1996;49:1053-8. [DOI] [PubMed] [Google Scholar]
- 19.Davies HT, Crombie IK, Tavakoli M. When can odds ratios mislead? BMJ. 1998;316:989-91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hamlet WP, Lieberman JR, Freedman EL, Dorey FJ, Fletcher A, Johnson EE. Influence of health status and the timing of surgery on mortality in hip fracture patients. Am J Orthop. 1997;26:621-7. [PubMed] [Google Scholar]
- 21.Hardin GT. Timing of fracture fixation: a review. Orthop Rev. 1990;19:861-7. [PubMed] [Google Scholar]
- 22.Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, Petticrew M, Altman DG; International Stroke Trial Collaborative Group; European Carotid Surgery Trial Collaborative Group. Evaluating non-randomised intervention studies. Health Technol Assess. 2003;7:iii-x, 1-173. [DOI] [PubMed] [Google Scholar]
- 23.Sanderson S, Tatt ID, Higgins JP. Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography. Int J Epidemiol. 2007;36:666-76. [DOI] [PubMed] [Google Scholar]
- 24.Herbison P, Hay-Smith J, Gillespie WJ. Adjustment of meta-analyses on the basis of quality scores should be abandoned. J Clin Epidemiol. 2006;59:1249-56. [DOI] [PubMed] [Google Scholar]
- 25.Lau J, Ioannidis JP, Schmid CH. Quantitative synthesis in systematic reviews. Ann Intern Med. 1997;127:820-6. [DOI] [PubMed] [Google Scholar]
- 26.Bhandari M, Devereaux PJ, Swiontkowski MF, Tornetta P 3rd, Obremskey W, Koval KJ, Nork S, Sprague S, Schemitsch EH, Guyatt GH. Internal fixation compared with arthroplasty for displaced fractures of the femoral neck: a meta-analysis. J Bone Joint Surg Am. 2003;85:1673-81. [DOI] [PubMed] [Google Scholar]
- 27.Zlowodzki M, Poolman RW, Kerkhoffs GM, Tornetta P 3rd, Bhandari M; International Evidence-Based Orthopedic Surgery Working Group. How to interpret a meta-analysis and judge its value as a guide for clinical practice. Acta Orthop. 2007;78:598-609. [DOI] [PubMed] [Google Scholar]
- 28.DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7:177-88. [DOI] [PubMed] [Google Scholar]
- 29.Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327:557-60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Egger M, Schneider M, Davey Smith G. Spurious precision? Meta-analysis of observational studies. BMJ. 1998;316:140-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000;342:1887-92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ferriter M, Huband N. Does the non-randomized controlled study have a place in the systematic review? A pilot study. Crim Behav Ment Health. 2005;15:111-20. [DOI] [PubMed] [Google Scholar]
- 33.Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, Moher D, Becker BJ, Sipe TA, Thacker SB. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA. 2000;283:2008-12. [DOI] [PubMed] [Google Scholar]
- 34.Oxman AD, Guyatt GH. Validation of an index of the quality of review articles. J Clin Epidemiol. 1991;44:1271-8. [DOI] [PubMed] [Google Scholar]
- 35.Oxman AD, Guyatt GH, Singer J, Goldsmith CH, Hutchison BG, Milner RA, Streiner DL. Agreement among reviewers of review articles. J Clin Epidemiol. 1991;44:91-8. [DOI] [PubMed] [Google Scholar]
- 36.Bhandari M, Devereaux PJ, Montori V, Cinà C, Tandan V, Guyatt GH; Evidence-Based Surgery Working Group. Users' guide to the surgical literature: how to use a systematic literature review and meta-analysis. Can J Surg. 2004;47:60-7. [PMC free article] [PubMed] [Google Scholar]

