Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2026 Jan 22;32(1):e70350. doi: 10.1111/jep.70350

Variation in Variation Measurement: A Mapping Review of Methods to Study Clinical Variation

Amity E Quinn 1,2,3,, Jason E Black 1,3, Derek Chew 1,4,5, Tyler S Williamson 1,3,4, Dean Yergens 6, Peter Faris 1,3, Flora Au 7, Jane M Fletcher 1,3, Simarprit Sidhu 5, Rachelle Drummond 8, Becky Skidmore 9, Braden Manns 1,3,4,7
PMCID: PMC12826410  PMID: 41569922

ABSTRACT

Rationale

Variation in health care delivery exists at many levels (e.g., provider, practice, system) and can often be explained by various factors at each level. Understanding clinical variation presents an opportunity to improve the value of health care by identifying low‐value care (overuse), gaps in high‐value care (underuse), and how they can be improved. Numerous methods exist to describe or quantify clinical variation; however, these are not well identified or applied consistently.

Aim

A mapping review was used to identify and characterize available methods to describe and quantify clinical variation.

Method

We systematically searched health care and health services‐related literature for variation and related terms used in titles and abstracts. Titles and abstracts were screened for inclusion. We then identified graphical and statistical methods used, health care specialty, study setting, and health system performance area (e.g., quality, access, costs) using a keyword analysis.

Results

Of the 16,969 papers screened, we excluded 10,866 that did not measure a care process or health outcome, measure variation at the person‐level or higher, or analyze routinely collected data. We included 6,103 full‐text studies, which were analyzed using a keyword analysis. Most studies used basic methodological approaches (e.g., regression, crude comparisons, ranges). Fewer than 1000 studies used multilevel models, a more advanced methodologic approach that quantifies the magnitude and source of variation. Multilevel models were not commonly used to study variation in health care quality.

Conclusions

While understanding clinical variation is important for all health systems, the methods used are usually able to identify but not quantify or explain variation. This review advances our knowledge of the scope and application of these methods and can be used to improve the measurement of variation to increase the value and equity of health care.

Keywords: clinical variation, funnel plots, health system performance, mapping review, medical practice variation, methods, multilevel models

1. Introduction

Studying variation in health care delivery—including differences in utilization, practice, quality, costs, and outcomes of health services across populations and geographic regions—has been an important area of inquiry for health systems for over 40 years [1]. Globally, many health systems have developed atlases of variation to identify where there is substantial variation in services that could be addressed to improve quality and value [2, 3, 4]. These atlases focus on geographic variation; however, variation can be due to numerous factors, including differences in patient populations, provider practices, and health care system structures. While some differences in practice may be random, unwarranted variation that is unrelated to patient needs and preferences points to areas where policy or practice changes may reduce variation and, in turn, improve quality or reduce costs.

There are many statistical and graphical methods that can be used to study clinical variation [5]. These range from basic descriptive statistics and visualizations to more sophisticated techniques such as regression modeling, multilevel analysis, and geospatial analysis. The choice of method depends on the specific research question, characteristics of the data, and the desired measure of variation. Techniques such as scatterplots and funnel plots allow researchers to visualize differences in care and identify outliers or trends that might warrant further investigation [6]. These visual tools are particularly valuable in communicating complex information to policymakers and stakeholders who may not be familiar with advanced statistical methods.

Statistical methods play a crucial role in the study of clinical variation by directly quantifying variation. For instance, the coefficient of variation (i.e., a measure of dispersion around a mean) provides a value describing the amount of variation in a health outcome [7]. Alternatively, advanced statistical techniques, such as multilevel models [8], can describe the level(s) where variation occurs and identify factors that might reasonably explain some variation.

Despite calls to measure variation in health care, the methods currently being applied for this purpose are poorly understood. In this mapping review [9], we aim to understand the range of methods used to study clinical variation and assess potential gaps, including not using methods that enable the measurement of source and magnitude. We build on this work in our subsequent article, where we outline and compare important statistical and graphical methods to measure clinical variation [5].

2. Methods

We conducted a mapping review to identify the methods used to study clinical variation and to identify the contexts in which these methods have been applied. Unlike scoping reviews, which aim to clarify key concepts or identify knowledge gaps, mapping reviews focus on providing a comprehensive overview of the distribution and types of studies within a specific research area [9]. To achieve this, a mapping review describes the number of studies that meet specific criteria, such as addressing a specific topic (e.g., methods to measure clinical variation). Mapping reviews do not critically appraise the included studies, as this is out of scope and often not feasible with the number of studies included.

2.1. Search Strategy

A search strategy was developed by an information specialist (B.S.) using an iterative process in consultation with the review team. The MEDLINE strategy was peer reviewed prior to execution using the PRESS checklist [11]. Using the Ovid platform, we searched Embase and Ovid MEDLINE® (Epub Ahead of Print, In‐Process, In‐Data‐Review & Other Non‐Indexed Citations and Daily version) on July 27, 2022. We used a combination of MeSH (e.g., “Benchmarking”, “Time‐to‐Treatment”, “Unnecessary Procedures”) and a variety of free‐text vocabulary derived from the root words “variation”, “disparity”, and “discrepancy” combined with relevant concepts (e.g., practice, hospital, region) using proximity operators. There were no language restrictions but due to the very high volume of records retrieved, we removed animal studies, conference abstracts, case reports, general reviews, opinion pieces and other unwanted publication types. Results were limited to records with abstracts and a date limit of 2010 to the present. We also targeted and removed records pertaining to genetics, biology, ecology and plant physiology. We downloaded and deduplicated records using EndNote version 9.3.3 (Clarivate Analytics) and uploaded them to Covidence (Veritas Health Innovation Ltd). Specific details regarding the strategies appear in the Appendix.

2.2. Study Selection

The screening team (J.B., A.Q., J.F., S.S., and F.A.) conducted primary screening of titles and abstracts. Based on preliminary searches, many titles and abstracts were anticipated, such that screening by two reviewers was not feasible. Instead, each title and abstract was screened by one reviewer. Studies were included if they measured a care process or health outcome, measured variation at the person‐level or higher, and analyzed routinely collected data.

2.3. Keyword Analysis

We used the review management software Synthesis [10] to map abstracts according to the following domains: health care specialty (e.g., gynecology, nephrology, etc.), study setting (e.g., regional‐, hospital‐ or provider‐level variation), and health system performance area (e.g., quality, access, costs). We first identified keywords within the title and/or abstract for each domain using an iterative process with the study team, wherein keywords were tested and new keywords were generated based on unclassified abstracts. In some cases, keywords were combined using logical operators (i.e., and, or and not). We then classified abstracts identified in the first stage using the keyword algorithms; abstracts meeting the criteria for multiple keywords could receive multiple classifications.

2.4. Assessing Performance of Keyword Analysis

We selected a 10% random sample of the abstracts selected in the first stage. One person on the study team who did not develop the keywords (R.D.) read each title and abstract and manually labelled them using the domains to create a reference set. We compared the automated keyword algorithm classifications for variation methods to the manually labelled reference set to estimate the sensitivity, specificity, positive predictive value (PPV) and negative predictive value of the keyword algorithm.

2.5. Data Extraction

The following data were extracted from the included abstracts based on the keyword classifications: year published, health care specialty, study setting and health system performance area. The health system performance area aimed to capture the study content area, including context, topic area, or outcomes.

2.6. Mapping and Synthesizing the Results

Frequencies, bar charts and bubble plots were used to provide a visual and descriptive overview of the number of methods used to study clinical variation.

3. Results

3.1. Identification of Studies

Our initial search identified 16,969 abstracts. During primary screening, we excluded 10,866 abstracts (Figure 1). We assessed 6103 titles and/or abstracts using a validated keyword algorithm (see Supporting Information S1: Appendix 2 for the full list of keyword classifications). The number of studies per year increased over time, with fewer than 200 studies in 2010 and over 800 in 2021 (Figure 2).

Figure 1.

Figure 1

Identification of included studies.

Figure 2.

Figure 2

Number of studies identified that examined clinical variation, by year.

3.2. Keyword Performance Assessment

All keywords were reviewed in detail by JB and AQ and demonstrated robust face validity. Compared to manual labelling for validation performed independently by RD, our keyword analysis performed well in identifying most methods (Table 1): Bayesian, funnel plots, and percentile comparisons had near‐perfect identification, though some methods were not included in the random subset due to their infrequency. Assessment metrics could not be computed for these methods. Indeed, sensitivities were strong across all methods, except range (sensitivity: 0.56) and crude comparisons (i.e., unadjusted comparisons between groups or variables) (sensitivity: 0.36). Specificity was strong across all methods. PPV suffered in some cases. For example, clustering and temporal analysis had poor PPV (0.2 and 0.31, respectively). Negative predictive value was strong across all methods.

Table 1.

Validation results comparing manual labelling to the automated keyword analysis.

Method Sensitivity Specificity PPV NPV
Bayesian 1.00 0.99 0.61 1.00
Clustering 0.75 0.98 0.20 1.00
Coefficient of variation
Correlation 0.62 0.96 0.47 0.98
Crude comparisons 0.36 0.72 0.79 0.27
Funnel plot 1.00 1.00 1.00 1.00
Instrumental variable
Machine learning 1.00 1.00 0.50 1.00
Marginal structural model
Multilevel model 0.90 0.93 0.59 0.99
Percentile comparisons 0.91 0.96 0.59 0.99
Range 0.56 0.81 0.37 0.90
Regression 0.85 0.78 0.85 0.78
Spatial analysis 0.63 0.98 0.61 0.98
Systematic component of variation
Temporal analysis 0.73 0.86 0.31 0.97

Abbreviations: NPV, negative predictive value; PPV, positive predictive value.

‐: could not compute.

3.3. Study Designs

Regression was the most frequently used method to study clinical variation (Figure 3). The majority of studies identified (n = 3541) applied regression analysis. However, many studies used descriptive statistics, such as crude comparisons, ranges, or percentile comparisons, without using regression (n = 1268). Fewer than 1000 studies (n = 879) specified that multilevel modeling was used for statistical analysis.

Figure 3.

Figure 3

Statistical and graphical methods used to study variation.

Over a third of studies assessing clinical variation were focused on utilization of care (n = 2364) (Supporting Information S1: Figure 3.1). Cost and financing were the second most frequently studied health system performance areas (n = 1677). Approximately a quarter of studies addressed variation in health outcomes (n = 1295), including mortality and other health outcomes such as patient‐reported outcomes or well‐being. Studying variation in access to care was less common (n = 101).

2363 abstracts examined regional variation, while 1623 abstracts addressed hospital‐level variation, and 966 addressed provider‐level variation. Studies addressed a wide range of topics and clinical specialties (Supporting Information S1: 3.2). Studies of public health and preventive care, oncology, drugs (including medications and vaccines), cardiology, and surgery were the most common.

3.4. Analytic Approach by Health System Performance Area

Figure 4 visualises analytic approaches for the five most frequent health system performance areas. Regression analysis was frequently used to study all topics and outcomes included within health system performance areas, particularly health care utilization and cost (Figure 4). Multilevel modeling was most commonly used to measure health service use and cost and financing, and less commonly to measure quality and mortality.

Figure 4.

Figure 4

Number of studies using each analytic approach for the five most frequent health system performance areas.

4. Discussion

This study presents an overview of the methods used to study variation in health care in recent years. Despite decades of research on clinical variation, this is the first study to our knowledge to assess the methods most often used in this area of research. Descriptive analyses were most common, primarily crude comparisons (34.3%) and range (22.2%). Regression analysis was frequently used (58.0%), but more advanced statistical analysis, such as multilevel models (14.0%), which would indicate the sources of variation more accurately, were not. Studies most commonly assessed variation in health service use (38.7%) and cost and financing (27.5%), while variation in quality of care was less frequent (17.5%). Many studies addressed public health or medications (largely vaccines) (20.5%). Regardless of the methods used, the number of variation studies increased each year we observed, suggesting that the field is not saturated and likely more studies have been published since our search.

The majority of research on health and clinical variation identifies variation but does not measure the source or magnitude of variation [12]. Atlases of variation tend to use descriptive analyses, such as counts, medians, ranges, or percentile ranks. The UK Atlas of Variation, for example, includes maps, charts, and box‐and‐whisker plots as well as text that describes reasons for the variation and options for how to address [13]. Descriptive analyses provide essential information for areas to target for quality improvement. However, they fail to measure the source and magnitude of variation, which can have meaningful implications on how we interpret and respond to observed variation [14]. For example, if we identify variation in a patient outcome measure and assume patient factors are driving the variation, we are likely to create a patient‐level intervention. However, that variation might be driven by practice‐level factors (e.g., structural factors such as availability of equipment, technology or resources) that would be more appropriately addressed by a practice‐level intervention. With the source and magnitude of variation disentangled, we can design more effective strategies to improve health care value, quality, and equity.

Measuring the source and magnitude of variation is possible using multilevel models. Multilevel models enable measurement of (1) the proportion of variation at each level of analysis, (2) which variables are explaining variation and how much variation they are explaining, and (3) the impact of the variation on outcomes compared to other variables in models [8, 15, 16]. Applying these methods to routinely collected health care data has illustrated the impact of physician and practice site variation on cardiac imaging, an area of care impacted by financial incentives and availability of resources [17]. In contrast, the decision to start chronic dialysis—a highly nuanced decision—is driven by patient and physician factors and a shared decision‐making process; it is generally accepted that this is not driven by regional or other factors [18].

There are limitations of our study, mostly related to the large number of studies identified by our search strategy. It was not feasible to perform duplicate primary screening or full‐text screening to impose the necessary rigour generally associated with systematic and other similar review types. Instead, each screener received training, including detailed instructions for abstract evaluation to ensure that abstracts were considered consistently across screeners. Nonetheless, some abstracts may have been mistakenly included or excluded. As such, we relied on a keyword search we developed to characterize the studies included in our analysis, which may be under‐ or over‐estimating the use of variation methods among our abstracts. However, our manual assessment of our keyword search indicated the keywords accurately identified most methods, particularly specific, advanced methods such as multilevel models and funnel plots, as demonstrated by our performance assessment metrics. We observe discrepancies in the application of more basic and more advanced methods across important health care research topics, larger than would reasonably be expected based on our validation metrics. Indeed, regression was used far more commonly than multilevel models, which is unlikely to be explained by differences in measurement. Despite the strong face validity of the keywords used to identify other characteristics of included research studies, we may be misclassifying studies when describing where and how these methods are applied (i.e., specialty and health system performance area). However, our results provide the first information on how methods to measure clinical variation have been used in the medical literature. Based on the increasing number of clinical variation studies published each year, we expect many studies were published after our search was completed in June 2022. However, we are not aware of any methodological guidance or related influences that would substantially change the methods used to measure clinical variation. Thus, we feel that our results provide a robust description of methods to measure clinical variation, though this may not fully reflect more recent trends.

This mapping review characterises the methods used to assess clinical variation. Many studies used simple regression or other analytic techniques that do not account for the hierarchical structure of health care data. Indeed, this review identifies some potential gaps in variation research, specifically in the limited use of advanced statistical methods to understand where the variation is occurring (e.g., at the provider level) and what factors might explain the observed variation. Using more advanced statistical methods, such as multilevel models, we could have a more nuanced understanding of where variation is occurring and what is driving variation to optimize health system interventions and have the greatest impact.

Conflicts of Interest

Dean Yergens is a cofounder and codeveloper of Synthesis Research Inc, which owns the intellectual property for the Synthesis software application. The other authors declare no conflicts of interest.

Supporting information

SUPPLEMENT.

JEP-32-0-s001.docx (2.6MB, docx)

Acknowledgements

We thank Kaitryn Campbell, MLIS, MSc (St. Joseph's Healthcare Hamilton/McMaster University) for peer review of the MEDLINE search strategy.

Quinn A., Black J. E., Chew D., et al., “Variation in Variation Measurement: A Mapping Review of Methods to Study Clinical Variation,” Journal of Evaluation in Clinical Practice 32 (2025): 1‐7. 10.1111/jep.70350.

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  • 1. Wennberg J. E., Tracking Medicine: A Researcher's Quest to Understand Health Care (Oxford University Press, 2010). [Google Scholar]
  • 2.The Dartmouth Atlas of Health Care Series. The Dartmouth Institute for Health Policy and Clinical Practice. 1996.
  • 3.Atlas of Variation—Health atlases. UK Department of Health and Social Care. 2024.
  • 4. Bernal‐Delgado E., García‐Armesto S., and Peiró S., “Atlas of Variations in Medical Practice in Spain: The Spanish National Health Service under Scrutiny,” Health Policy 114 (2014): 15–30. [DOI] [PubMed] [Google Scholar]
  • 5. Black J. E., Chew D. S., Williamson T. S., Manns B. J., and Quinn A. E., “Advancing Methods to Study Clinical Variation in Health Care,” unpublished, Journal of Evaluation in Clinical Practice (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Spiegelhalter D. J., “Funnel Plots for Comparing Institutional Performance,” Statistics in Medicine 24 (2005): 1185–1202. [DOI] [PubMed] [Google Scholar]
  • 7. Bedeian A. G. and Mossholder K. W., “On the Use of the Coefficient of Variation as a Measure of Diversity,” Organizational Research Methods 3 (2000): 285–297. [Google Scholar]
  • 8. Austin P. C. and Merlo J., “Intermediate and Advanced Topics in Multilevel Logistic Regression Analysis,” Statistics in Medicine 36 (2017): 3257–3277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Campbell F., Tricco A. C., Munn Z., et al., “Mapping Reviews, Scoping Reviews, and Evidence and Gap Maps (EGMS): The Same But Different—The ‘Big Picture’ Review Family,” Systematic Reviews 12 (2023): 45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. McGowan J., Sampson M., Salzwedel D. M., Cogo E., Foerster V., Lefebvre C., “PRESS Peer Review of Electronic Search Strategies: 2015 Guideline Statement,” Journal of Clinical Epidemiology 75 (2016): 40–46. [DOI] [PubMed] [Google Scholar]
  • 11. Mardinger C., Drover A., Hyndman E., et al., “A Novel Computerized Approach to Scoping Reviews Using Synthesis Software: The First 15 Years of The American College of Surgeons National Surgical Quality Improvement Program,” Canadian Journal of Surgery 66 (2023): E156–E161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Atsma F., Elwyn G., and Westert G., “Understanding Unwarranted Variation in Clinical Practice: A Focus on Network Effects, Reflective Medicine and Learning Health Systems,” International Journal for Quality in Health Care 32 (2020): 271–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Reports—Getting It Right First Time—GIRFT. NHS England. 2021.
  • 14. Mercuri M. and Gafni A., “Examining the Role of the Physician as a Source of Variation: Are Physician‐Related Variations Necessarily Unwarranted?,” Journal of Evaluation in Clinical Practice 24 (2018): 145–151. [DOI] [PubMed] [Google Scholar]
  • 15. Merlo J., Yang M., Chaix B., Lynch J., and Råstam L., “A Brief Conceptual Tutorial on Multilevel Analysis in Social Epidemiology: Investigating Contextual Phenomena in Different Groups of People,” Journal of Epidemiology and Community Health 59 (2005): 729–736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Leckie G., Browne W. J., Goldstein H., Merlo J., and Austin P. C., “Partitioning Variation in Multilevel Models for Count Data,” Psychological Methods 25 (2020): 787–801. [DOI] [PubMed] [Google Scholar]
  • 17. Quinn A. E., Chew D. S., Faris P., et al., “Physician Variation and the Impact of Payment Model in Cardiac Imaging,” Journal of the American Heart Association 12, no. 24 (2023): e029149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Sood M. M., Manns B., Dart A., et al., “Variation in the Level of eGFR at Dialysis Initiation Across Dialysis Facilities and Geographic Regions,” Clinical Journal of the American Society of Nephrology 9 (2014): 1747–1756. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SUPPLEMENT.

JEP-32-0-s001.docx (2.6MB, docx)

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.


Articles from Journal of Evaluation in Clinical Practice are provided here courtesy of Wiley

RESOURCES