Increasing availability of large clinical data sets is driving a proliferation of observational epidemiology studies in perioperative care. This wealth of data must be judged both on its inherent quality and the statistical techniques used to analyse the data set. Without some basic understanding of the methods involved, it remains difficult to interpret the results of these perioperative studies. Indeed, some studies' conclusions may be very persuasive, but their limitations may necessitate cautious interpretation. In this editorial, we will briefly describe the statistical technique of propensity scoring, identifying when propensity scoring methods should be used and practical considerations in the development and the application of propensity scoring methods. We will conclude with some comments on the strengths and limitations of propensity scoring methods.
What is propensity scoring?
An inherent limitation of observational studies involving non-random allocation of treatment is a form of selection bias ‘confounding by indication’, that is, when underlying patient characteristics associated with a certain treatment (treatment propensity) may influence the outcome under study.1 For example, a thrombocytopenic may not be offered neuraxial anaesthesia or nitrous oxide may be withheld in a patient undergoing prolonged surgery.2 In these examples, the patient characteristics may not pose an absolute contraindication for the intervention, but clinicians may be more likely to be cautious in their application because of the risks associated with their use. Epidemiological studies investigating treatment effects often fail to account for treatment propensity. One of the approaches proposed to deal with confounding by indication is the use of propensity scores. In perioperative research, propensity scoring is being increasingly adopted, shedding light on the safety of nitrous oxide,3 the efficacy and safety of perioperative β-blockade4 or α2 agonist therapy,5 and the effect of epidurals on cancer recurrence6 or perioperative mortality,7 as examples.
Propensity scores are proposed as a statistical method of addressing the bias from confounding by indication. In simple terms, propensity scoring is a prediction model that predicts the likelihood of treatment based on a specified set of patient characteristics. The aim of propensity scoring is to balance two non-equivalent groups on observed characteristics to be able to obtain less biased estimates of treatment effects. This balancing can be achieved by either matching study subjects in comparison groups on propensity scores, weighting for propensity scores, or adjusting for propensity scores in the analysis. This therefore ensures that the distribution of characteristics is the same for treated and untreated groups of patients.
In this sense, propensity scoring attempts to simulate randomization of subjects as occurs in randomized controlled trials. Random assignment of subjects to treatment or control groups leads to a similar distribution of covariates in the groups. However, unlike randomization to treatment groups, the balancing achieved by propensity scoring is only on identified confounders rather than all possible confounders. The propensity scoring method is a two-part process, involving, first, the development of the score and, secondly, the use of the score to adjust for baseline differences between treatment groups.
Developing a propensity score
Propensity score development starts with identification of the factors that could influence treatment, including patient characteristics, disease severity, and characteristics of the treatment system.1 However, there are different recommendations regarding the selection of factors for calculating propensity scores.
One approach is to include all potential factors that could influence treatment (non-parsimonious approach).8 Another approach is to start by listing all factors associated with the treatment and then use an automatic selection procedure such as stepwise regression to create a more parsimonious model.9 A third approach recommends selecting variables for the propensity score calculation based not only on their association with the treatment (exposure) but also with the outcome of interest. This would then satisfy the widely accepted definition of ‘confounding by indication’, that is, ‘a type of confounding bias that occurs when a symptom or sign of disease is judged as an indication (or a contraindication) for a given therapy and is therefore associated both with the drug or medical procedure (or its avoidance) and with a higher probability of an outcome’.10
If important factors are not identified, or are omitted, the effectiveness of the propensity scoring method will be diminished.1 Therefore, inclusion of all clinically relevant factors is important in the development of propensity scores, both those indicating or contra-indicating treatment. Hence, clinical input aids in the identification of important factors to include in the propensity score.
Essentially, propensity scores are calculated by including factors (covariates), selected using one of the approaches outlined above, in a propensity derivation (regression) model along with treatment status. Propensity scores ranging from 0 to 1 (because the score is a probability) are derived for each subject in the study. The derived score then becomes the estimated likelihood of being in one treatment group vs the other. It is important to note that many researchers do not report how factors included in developing propensity score for their studies are identified and selected. This simple, yet vital piece of information allows the reader to assess whether the right factors were identified and goes a long way to establish confidence in the quality of a study.
Evaluation of the developed propensity score model is important. Goodness of fit is a test of how well a propensity score model describes the data. This is important because if a model does not accurately describe the data then estimated propensity scores generated from the model may be inaccurate.11 This may result in imbalance between the treatment groups and hence to biased estimates of treatment effects.
Using propensity scores
The essence of developing propensity scores is to use the scores to adjust for baseline differences between treatment groups. After the development and assessment of propensity scores, the next step is to create a balance between the treatment groups under study. Propensity scores can be applied to adjust for baseline differences between treatment groups by various methods:
matching sample of exposed (treated) and unexposed (untreated) individuals who have equal or similar propensity scores;
stratification of individuals based on their propensity scores;
estimating within strata treatment effects by including propensity scores as a covariate in the model or adjusting for the propensity score;
utilizing propensity scoring as a method for weighting observations.1,11
Each of the above approaches has its individual advantages and disadvantages. Matching on propensity scores results in well balanced but smaller groups for comparison, while stratification retains a larger sample size but the exposed groups are more heterogeneous within each stratum. Stratifying cases and controls based on propensity score, or including the propensity score as an independent variable in a multivariable model, allows inclusion of cases and controls that would have been lost due to the lack of close matches; the compromise is increased residual bias. Using propensity scores to weight observations yields similar results as in stratification based on propensity scores or when using propensity scores as an independent variable in the multivariable model. However, when propensity scores are near the extremes, that is, zero or one, unrealistically extreme weights may be generated, thus creating bias.
Propensity score vs adjustment for confounders in a multivariable model
Both propensity score and conventional multivariable regression methods have a similar inability to control confounding arising as a result of unmeasured factors. Typically, both approaches also obtain similar strengths or statistical significance of association between exposures and outcomes perhaps with slightly diminished association estimates obtained with propensity scoring methods.12
We emphasize that propensity score methods, such as the conventional multivariable regression method, cannot control for unknown confounders and for any covariates that are either not measured or erroneously measured. However, the sensitivity of the propensity score model to unknown confounders can be estimated, thus enabling consideration of the effect of unknown confounders on a study result. Nonetheless. if important covariates are omitted, the propensity score method is known to yield comparable bias in its estimates compared with those based solely on a regression model. Critically, the propensity score method demands a thorough clinical understanding and knowledge of the necessary covariates to be included in a model; failure of which could result in an inappropriate model being developed.
Conclusions
Using propensity scores could account for confounding by indication and provide a more statistically efficient solution over conventional multivariable regression techniques that adjust for a number of confounders. However, the development of appropriate propensity scores is dependent on the appropriateness of the covariate selection process and is unlikely to take into account all possible confounders. The use of this statistical technique is increasing in perioperative research of observational data. While these techniques in no way substitute for randomized controlled trials, they are suited for perioperative research where important clinical outcomes, such as stroke and death, are rare but devastating events. While observational data can never prove cause and effect, we must use the tools at our disposal to cautiously advance perioperative care.
Authors' contributions
G.O. wrote the first draft of the manuscript with guidance from P.M. P.M. and R.D.S. provided editorial input and guidance. All authors approved the final paper. R.D.S. submitted the editorial.
Declaration of interest
R.D.S. has acted as a consultant for Air Liquide, Paris, France, concerning the development of medical gases and has received speaker fees from Orion, Turku, Finland, and Hospira, Chicago, USA. P.M. has received an unrestricted educational grant from F. Hoffman La-Roche for research in the area of pandemic influenza and has received travel grants and speaker fees from F. Hoffman La-Roche and its subsidiaries.
Funding
R.D.S. and P.M. are supported by Anaesthesia/AAGBI. R.D.S. is also supported by Action Medical Research and NIH (USA). University College Hospital/University College London receives a proportion of funding from the UK Department of Health's National Institute for Health Research Biomedical Research Centres funding scheme.
References
- 1.Katz MH. Evaluating Clinical and Public Health Interventions: A Practical Guide to Study Design and Statistics. London: Cambridge University Press; 2010. [Google Scholar]
- 2.Sanders RD, Weimann J, Maze M. Biologic effects of nitrous oxide: a mechanistic and toxicologic review. Anesthesiology. 2008;109:707–22. doi: 10.1097/ALN.0b013e3181870a17. [DOI] [PubMed] [Google Scholar]
- 3.Leslie K, Myles P, Devereaux PJ, et al. Nitrous oxide and serious morbidity and mortality in the POISE trial. Anesth Analg. 2013;116:1034–40. doi: 10.1213/ANE.0b013e318270014a. [DOI] [PubMed] [Google Scholar]
- 4.London MJ, Hur K, Schwartz GG, Henderson WG. Association of perioperative beta-blockade with mortality and cardiovascular morbidity following major noncardiac surgery. J Am Med Assoc. 2013;309:1704–13. doi: 10.1001/jama.2013.4135. [DOI] [PubMed] [Google Scholar]
- 5.Ji F, Li Z, Nguyen H, et al. Perioperative dexmedetomidine improves outcomes of cardiac surgery. Circulation. 2013;127:1576–84. doi: 10.1161/CIRCULATIONAHA.112.000936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gottschalk A, Ford JG, Regelin CC, et al. Association between epidural analgesia and cancer recurrence after colorectal cancer surgery. Anesthesiology. 2010;113:27–34. doi: 10.1097/ALN.0b013e3181de6d0d. [DOI] [PubMed] [Google Scholar]
- 7.Leslie K, Myles P, Devereaux P, et al. Neuraxial block, death and serious cardiovascular morbidity in the POISE trial. Br J Anaesth. 2013;111:382–90. doi: 10.1093/bja/aet120. [DOI] [PubMed] [Google Scholar]
- 8.Glynn RJ, Schneeweiss S, Sturmer T. Indications for propensity scores and review of their use in pharmacoepidemiology. Basic Clin Pharmacol Toxicol. 2006;98:253–9. doi: 10.1111/j.1742-7843.2006.pto_293.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Patrick AR, Schneeweiss S, Brookhart MA, et al. The implications of propensity score variable selection strategies in pharmacoepidemiology: an empirical illustration. Pharmacoepidemiol Drug Saf. 2011;20:551–9. doi: 10.1002/pds.2098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Porta M. A Dictionary of Epidemiology. New York: Oxford University Press; 2006. [Google Scholar]
- 11.Weitzen S, Lapane KL, Toledano AY, Hume AL, Mor V. Principles for modeling propensity scores in medical research: a systematic literature review. Pharmacoepidemiol Drug Saf. 2004;13:841–53. doi: 10.1002/pds.969. [DOI] [PubMed] [Google Scholar]
- 12.Shah BR, Laupacis A, Hux JE, Austin PC. Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review. J Clin Epidemiol. 2005;58:550–9. doi: 10.1016/j.jclinepi.2004.10.016. [DOI] [PubMed] [Google Scholar]
