Abstract
Objectives:
The study aims to investigate the landscape and trends in the use of Bayesian statistics in surgical papers published in high-impact journals over the past 2 decades, determine the characteristics of these papers, and assess the quality of Bayesian analysis reporting.
Background:
Observational and clinical trials have traditionally employed frequentist approaches. Bayesian framework enables the incorporation of prior evidence, flexible modeling of uncertainty, and returns a direct probabilistic summary of the estimates of interest that can provide valuable insight. However, their use in high-impact surgical research remains underexplored.
Methods:
Surgical articles from high-impact surgical and medical journals indexed in Web of Science and PubMed were retrieved for the period from January 2000 to August 2024. Data extraction covered bibliometrics and content details. The Reporting of Bayes Used in Clinical Studies scale (ROBUST) was used to assess Bayesian reporting quality.
Results:
A total of 120 articles were analyzed. The use of Bayesian statistics in surgical research has increased over time (compounded annual growth rate: 12.3%). General surgery (N = 39, 32.5%) and cardiothoracic surgery (N = 20, 16.7%) were the most represented specialties. The most common study designs were retrospective cohort studies (N = 50, 41.7%), meta-analyses (N = 38, 31.7%), and randomized trials (N = 19, 15.8%). Regression-based methods were the most frequently used (N = 51, 42.5%). The average ROBUST score was 4.1 ± 1.6 out of 7, with 54.0% (N = 54) of studies specifying priors and 29.0% (N = 29) justifying them.
Conclusions:
Bayesian statistics is increasingly incorporated into surgical research, predominantly observational studies and meta-analyses. However, improvements in the quality and standardization of Bayesian reporting are needed to enhance transparency and reproducibility.
Keywords: surgery, bayesian statistics, bibliometric analysis
INTRODUCTION
Bayesian and frequentist paradigms are fundamentally different in their interpretation of probability and approach to statistical inference. In the frequentist paradigm, probability is defined as the expected frequency of observing an event across hypothetical repeated trials.1 In this framework, parameters are treated as fixed and unknown, and inferences are made solely from the observed data without considering prior information.2 Statistical information in this approach depends on the investigator’s intentions, such as study design and predefined statistical methods (eg, P value < 0.05).1 The primary goal is to calculate the probability of observing the data given the null hypothesis, and the confidence interval is used to quantify uncertainty.1
In contrast, the Bayesian paradigm interprets probability as a degree of belief in a hypothesis, which can be updated as new evidence is gathered.2 This approach combines prior knowledge with current evidence to progressively accumulate information and reduce uncertainty.1 Statistical information in the Bayesian approach is derived solely from the study results, adhering to the likelihood principle.1 The primary goal is to calculate the probability of a hypothesis given the observed data and prior information.1 Instead of confidence intervals, Bayesian inference provides credible intervals, which represent the 95% probability that the parameter lies between the 2 values.3
Bayes’ theorem is a probabilistic formula that updates the probability of a hypothesis based on prior knowledge and the observed data, and it is the core of Bayesian reasoning. The basis of Bayes’ theorem was first published in 1763 by Reverend Thomas Bayes, focusing on inverse probability—estimating the likelihood of past events based on observed outcomes.4 The concept was independently described a few decades later by the renowned French mathematician Pierre-Simon Laplace.1 Because of Laplace’s significant contributions to its development and application, it is sometimes referred to as the Bayes–Laplace theorem.1
Bayesian statistics, grounded in Bayes’ theorem, is a broader term referring to the statistical framework and paradigm that uses Bayesian probabilistic reasoning to quantify uncertainty about the unknown quantities of interest (eg, absolute risk difference between treatment groups).5
Bayesian statistical methods encompass practical applications of computation and estimation algorithms and modeling techniques for implementing Bayesian statistics.6 These methods include the application of prior knowledge through a prior distribution (which represents the probability distribution of parameters before observing the data) and observed data through a likelihood function (which represents the probability of observing data given values of the parameters) to update beliefs about parameters or hypotheses.7 In the statistical literature, Bayesian methods that permit a full and exact posterior inference utilizing the full data likelihood and priors are regarded as a full Bayesian approach. This contrasts with approaches that fall under approximate Bayesian inference, such as the integrated nested Laplace approximation, where numerical methods are used to approximate the full posterior distribution.8 Another commonly seen approximate Bayesian method is the Bayesian Information Criterion (BIC), which is often used for model comparison.9
The use of Bayesian statistics has become increasingly popular in medical research in the past few decades.7 Bayesian statistical estimation often requires complex integration over likelihoods and probability distributions that are analytically intractable, but the development of Markov Chain Monte Carlo algorithms like Metropolis-Hastings and Gibbs sampling provide efficient numerical solutions.10 Bayesian methods, once limited by computational challenges, have become more accessible with the development of off-the-shelf, open-access Markov Chain Monte Carlo software such as JAGS, BUGS, and STAN.11 Intuitive graphical user interface (GUI) tools with minimal need for coding, like JASP and SPSS, have also led to increased adoption.2 Moreover, the availability of numerous textbooks12,13 and educational programs14,15 has further facilitated their growing adoption in recent years.2
Despite the increasing use of Bayesian analyses in medical research, its application to surgical research remains poorly understood. Thus, this study aims to evaluate the use of Bayesian statistics in surgical research published in top journals from January 2000 to August 2024.
METHODS
The bibliometric review was conducted in accordance with preferred reporting items in systematic reviews and meta-analyses.16 The review was prospectively registered on the Open Science Framework (https://doi.org/10.17605/OSF.IO/KQDNG).
Journal Selection
A total of 18 journals were used in the literature search. The top 4 medical and top 4 surgical journals, along with the top journal from each of the 10 surgical subspecialties, were chosen based on impact factors as derived from Clarivate Analytics Journal Citation Reports. The 10 surgical subspecialties are defined according to the Royal College of Physicians and Surgeons of Canada,17 and the leading journal with the highest impact factor was identified for each surgical subspecialty. The 4 highest-impact medical journals included were the Lancet, New England Journal of Medicine, British Medical Journal, and Journal of the American Medical Association (JAMA). The 4 surgical journals included were JAMA Surgery, International Journal of Surgery, British Journal of Surgery, and Annals of Surgery. The 10 surgical subspecialty journals, ranked by decreasing impact factors, were American Journal of Transplantation, American Journal of Obstetrics and Gynecology, Journal of Neurology, Neurosurgery, and Psychiatry, JAMA Otolaryngology-Head & Neck Surgery, Journal of Urology, European Journal of Vascular and Endovascular Surgery, Journal of Thoracic and Cardiovascular Surgery, Bone & Joints, Plastic and Reconstructive Surgery, and Journal of Pediatric Surgery.
Literature Search
A literature search was performed using PubMed and Web of Science core collection (2000–present) during August 2024. The search was restricted to articles published in 18 high-impact journals, including 4 leading general medical journals, 4 leading general surgical journals, and the top-ranked journals from 10 surgical subspecialties. Search strings combining journal names with terms related to Bayesian methods and surgery were used. A medical librarian was consulted to assist in the development and refinement of the literature search strategy. The full search strategy is available in Supplemental Table S1, https://links.lww.com/AOSO/A517. Titles and abstracts, followed by full texts, were reviewed by 2 independent reviewers (Z.L. and A.I.) against predefined inclusion and exclusion criteria (Supplemental Table S2, https://links.lww.com/AOSO/A517). Discrepancies were resolved by consensus and, if still unresolved, the opinion of a senior author (S.E.F.).
Data Extraction
Bibliometric data were extracted by downloading PubMed and Web of Science metadata and uploading it to the R package “Bibliometrix.”18 The included bibliometric data comprised title, authors, journal, year of publication, citation count, country, and affiliation of authors. Each article was assigned a single country affiliation based on the affiliated country of the corresponding author. On the other hand, the unique affiliations for all authors associated with each article were also identified. To assess whether institutions contributing the most Bayesian surgical studies were specifically inclined toward Bayesian methods or were high-output publishers, the total number of articles published by each institution across the included journals from January 2000 to August 2024 was obtained using the Web of Science Core Collection. Manual data extraction was performed by 2 researchers (Z.L. and A.I.), and discrepancies were resolved by consensus. The following variables were collected: study design, surgical discipline, funding, general surgery subspecialty, and the primary use of Bayesian statistics (direct application of Bayes’ theorem, regression-based analysis, graphic models, classification, or predictive modeling). Network meta-analyses are classified as “graphic models” in our study instead of “regression-based,” although they also have a regression component. In addition, the type of prior (informative, noninformative, both, or not specified), the type of informative prior (enthusiastic or skeptical), sensitivity analysis of priors, and software and computation details were recorded. For articles that used Bayesian analysis, it was noted whether a conjugate prior was employed. A conjugate prior is defined when the likelihood function and the prior distribution share the same parametric form, allowing for direct analytical solutions without the need for numerical methods or simulations.19
Moreover, the quality of the reporting of the Bayesian analysis for articles that used Bayesian analysis was assessed by the Reporting of Bayes Used in Clinical Studies scale (ROBUST)20 based on 7 key items, including specification of prior, justification of prior, sensitivity of priors, specification of model, central tendency, and variance. The ROBUST scale ranges from 0 to 7, with each of the 7 key aspects contributing 1 point. A score of 7 indicates full compliance with ROBUST recommendations, whereas a score of 0 signifies no compliance. The scope of ROBUST is to assess Bayesian analysis in clinical studies.20 Therefore, Bayesian statistical indices used for model validation, such as the BIC, were not included in the list of articles assessed by ROBUST. Similarly, the Bayesian classification algorithm Naïve Bayes and scenarios where Bayes’ theorem was applied through simple calculations by directly inserting values into the formula without performing Bayesian computations were also excluded.
Statistical Analysis
Descriptive statistics were employed to summarize the data collected in this study. The normality of data was assessed by Shapiro-Wilk’s test and Q-Q plot. For parametric data, results were presented as the mean ± standard deviation (SD), while nonparametric data were expressed as the median (25th, 75th percentiles). The publication trend over time was assessed by the locally estimated scatterplot smoothing curve. The annual growth rate was calculated using a compound formula: (ending value/starting value) ^ (1/ n) − 1, where n is the number of years. For keyword analysis, The Web of Science Keyword Plus retrieved from metadata was used to create a word cloud and a co-occurrence network using the “Bibliometrix” package in R.
A heatmap was generated to visualize the distribution of study designs across the top 10 affiliations with the highest number of studies, using a gradient scale to indicate study counts. In addition, a stacked polar plot was created to illustrate the proportion of studies reporting each ROBUST item at different total score levels. The percentage of studies reporting each item at ROBUST score levels 1–7 was calculated. Visualization was performed using the “ggplot2” package in R.
Kendall’s tau correlation was performed to assess the relationships between numerical variables (citation count, year of publication, impact factor, and ROBUST score). The cutoff values for weak, moderate, and strong Kendall’s tau correlation are 0.06, 0.26, and 0.49, respectively.21
Additionally, the Kruskal-Wallis test was used to compare citation count and ROBUST score across different categories (continent, journal category, study design, and surgical discipline). To enhance the statistical power of Kruskal-Wallis tests, the number of categories from the original dataset was reduced. Countries were grouped by their respective continents to facilitate regional comparisons. “Retrospective cohort” and “prospective cohort” were grouped into “observational study.” Surgical disciplines were divided into “general surgery” and “nongeneral surgery.” Journals were categorized into 3 groups based on their focus: highest-impact medical journals, nonsubspecialty surgery journals, and subspecialty surgery journals.
All statistical analyses were performed using R (R Studio Version 2022.12.0 and R 4.2.2).
RESULTS
There were 462 articles retrieved from Web of Science and PubMed, and after de-duplication and screening, 120 articles were included in the final analysis (Fig. 1).
FIGURE 1.
Flow diagram outlining the study selection process, as formatted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Adapted with permission from Covidence.22
Bibliometric Information
The bibliometric information of the included articles is summarized in Table 1. A total of 120 articles from 17 journals were included. The journal JAMA Otolaryngology-Head & Neck Surgery did not have any publication using Bayesian statistics. The 3 journals with the highest number of articles were Annals of Surgery (26), International Journal of Surgery (26), and Journal of Thoracic and Cardiovascular Surgery (11). In total, there were 1311 authors and 4040 references. The median number of authors was 7.0 (5.0, 11.0), and the median number of cited references was 33.0 (22.8, 46.3) for each article. The median impact factor was 8.7 (7.5, 12.5), and the median number of citations was 21.0 (6.5, 68.5). The distribution of impact factor and the number of citations were both right-skewed (average impact factor 19.6 ± 27.9 and average citation number 147.1 ± 461.6). The annual growth rate of publications was 12.3%.
TABLE 1.
Bibliometric Information of Included Articles
| Variable | Value |
|---|---|
| Time range | 2000–2024 |
| Journals (N) | 17 |
| Articles (N) | 120 |
| Total authors (N) | 1311 |
| Median authors (N) | 7.0 (5.0, 11.0) |
| Total references (N) | 4040 |
| Median references | 33.0 (22.8, 46.3) |
| Median impact factor | 8.7 (7.5, 12.5) |
| Average impact factor | 19.6 ± 27.9 |
| Total citation (N) | 16911 |
| Median citation (N) | 21.0 (6.5, 68.5) |
| Average citation (N) | 147.1 ± 461.6 |
| Annual growth rate* (%) | 12.3 |
Average ± standard deviation; Median (25th percentile, 75th percentile).
Annual growth rate was calculated using the compound formula: (ending value/ starting value) ^ (1/ n) – 1, where the starting value is the number of publications in 2000, the ending value is the number of publications in 2024, and n is the number of years.
As shown by the locally estimated scatterplot smoothing curve in Figure 2, the number of surgical papers that use Bayesian statistics has increased over time, with a notable peak of 17 articles in 2020. The number of randomized controlled trials (RCTs) published over time is depicted in Supplemental Figure S1, https://links.lww.com/AOSO/A517, with peaks of 4 articles in 2020 and 2024. On the other hand, the overall volume of articles published by selected journals in the study generally remained constant over time. The articles published in 2018 received a total citation of 3623 citations, which is the highest compared to other years (Supplemental Figure S2, https://links.lww.com/AOSO/A517). The mean citation per year since publication per article is shown in Supplemental Figure S3, https://links.lww.com/AOSO/A517.
FIGURE 2.
Trends in the use of Bayesian statistics in surgical research from January 2000 to August 2024. The blue line represents the yearly number of articles, and the orange dotted line shows the LOESS trend curve. The gray dashed line (plotted on the secondary y-axis) depicts the total number of articles published in the selected high-impact surgical and medical journals each year. LOESS, locally estimated scatterplot smoothing.
The characteristics of the top 3 cited articles are summarized in Table 2. All 3 articles were RCTs published in the New England Journal of Medicine. The article with the highest number of citations (3509) was by Nogueira et al in 201 8.23 The characteristics of the top 10 cited articles are summarized in Supplemental Table S3, https://links.lww.com/AOSO/A517. The median sample size for all articles was 1746 (545, 7552).
TABLE 2.
Characteristics of Top 3 Cited Articles
| Author, Year of Publication | Title | Study Design | Surgical Discipline | Sample Size | Number of Citations | Journal | Study Outcome(s) Evaluated by Bayesian Statistics |
|---|---|---|---|---|---|---|---|
| Nogueira et al, 201823 | Thrombectomy 6–24 Hours after Stroke with a Mismatch between Deficit and Infarct | Randomized clinical trial | Neurosurgery | 206 | 3509 | NEJM | Difference in mean score for disability on the utility-weighted modified Rankin scale at 90 days and rate of functional independence at 90 days between thrombectomy and control group |
| Popma et al, 201924 | Transcatheter aortic-valve replacement with a self-expanding valve in low-risk patients | Randomized clinical trial | Cardiothoracic surgery | 1468 | 2312 | NEJM | Difference in incidence of a composite of death from any cause or disabling stroke at 24 months between treatment and control group |
| Reardon et al, 201725 | Surgical or transcatheter aortic-valve replacement in intermediate-risk patients |
Randomized clinical trial | Cardiothoracic surgery | 1746 | 1847 | NEJM | Difference in incidence of a composite of death from any cause or disabling stroke at 24 months between treatment and control group |
NEJM, New England Journal of Medicine.
Keyword Analysis
The Web of Science Keyword Plus retrieved from metadata was used to create a word cloud (Supplemental Figure S4A, https://links.lww.com/AOSO/A517) and a co-occurrence network (Supplemental Figure S4B, https://links.lww.com/AOSO/A517). Keywords with the highest frequency are “outcomes” (22), “surgery” (16), and “mortality” (16). The co-occurrence network maps the relationships between these keywords, showing how frequently terms appear together within the included studies. This network highlights “surgery” as a central node, strongly linked to critical “outcomes” and “mortality.”
Characteristics of Authors
The characteristics of the authors are summarized in Supplemental Table S4, https://links.lww.com/AOSO/A517. The top 3 countries with the highest number of publications and their corresponding total citations were the United States, with 57 articles and 13,816 citations; China, with 27 articles and 220 citations; and the United Kingdom, with 16 articles and 1945 citations.
The top 5 author affiliations with the highest frequencies were the University of Michigan System (28), the University of Texas System (27), the University of California System (11), Harvard University (11), and Duke University (10). In terms of the number of Bayesian articles to each institution’s overall publication output in the selected journals from January 2000 to August 2024, the University of Michigan contributed 28 Bayesian articles out of 3050 total publications (0.9%), the University of Texas System 27 out of 4761 (0.6%), the University of California System 11 out of 6780 (0.2%), Harvard University 11 out of 8601 (0.1%), and Duke University 10 out of 2416 (0.4%). These proportions are comparable to or slightly above the overall Bayesian article rate of 0.1% in the dataset (120 articles out of 112,492 articles). A heat map showing the interaction between the top 10 author affiliations and study design is shown in Supplemental Figure S5, https://links.lww.com/AOSO/A517. The University of Michigan System made the highest contribution to retrospective cohort studies (N = 16). Most of the top 10 affiliations demonstrated considerable contributions to RCTs.
In terms of funding sources, most studies were funded by government agencies, accounting for 60 occurrences. This was followed by academic institutions and hospitals, which funded 27 studies. Funding details were not specified for 20 studies, whereas 16 studies were unfunded.
Study Design and Surgical Discipline
As shown in Table 3, the highest frequency study design was a retrospective cohort study with 50 (41.7%) articles, followed by meta-analysis with 38 (31.7%) articles, and RCTs with 19 articles (15.8%).
TABLE 3.
Characteristics of Articles
| Variable | Value (N, %) |
|---|---|
| Surgical discipline (N = 120) | |
| General surgery (N = 39) | 39, 32.5* |
| Colorectal surgery | 13, 33.3† |
| Hepatopancreatobiliary surgery | 6, 15.4 |
| Thyroid surgery | 5, 12.8 |
| Multiple subspecialties | 5, 12.8 |
| Breast surgery | 3, 7.7 |
| Gastrointestinal surgery | 3, 7.7 |
| Bariatric surgery | 2, 5.1 |
| Esophageal surgery | 1, 2.6 |
| Endocrine surgery | 1, 2.6 |
| Cardiothoracic surgery | 20, 16.7 |
| Multiple disciplines | 13, 10.8 |
| Pediatric surgery | 8, 6.7 |
| Transplantation surgery | 8, 6.7 |
| Obstetrics and gynecology | 6, 5.0 |
| Orthopedic surgery | 6, 5.0 |
| Neurosurgery | 5, 4.2 |
| Trauma surgery | 5, 4.2 |
| Urology | 4, 3.3 |
| Vascular surgery | 3, 2.5 |
| Plastic surgery | 2, 1.7 |
| Surgery in general | 1, 0.8 |
| Study design (N = 120) | |
| Retrospective cohort | 50, 41.7 |
| Meta-analysis | 38, 31.7 |
| Randomized controlled trial | 19, 15.8 |
| Prospective cohort | 7, 5.8 |
| Other | 6, 5.0 |
Percentages for surgical specialties are calculated using a denominator of the total number of articles (N = 120).
Percentages for general surgical specialties are calculated using a denominator of the total number of articles classified as general surgery (N = 39).
Table 3 also summarizes the frequency of studies in different surgical specialties. The surgical specialty with the highest number of articles was general surgery (N = 39, 32.5%), followed by cardiothoracic surgery (N = 20, 16.7%) and pediatric surgery (N = 8, 6.7%). The general surgery subspecialty with the highest number of articles was colorectal surgery (N = 13, 33.3%).
Usage of Bayesian Approach
Table 4 summarizes the usage of Bayesian statistics in studies. The category “regression-based analysis” had the highest number of studies (N = 51, 42.5%). The second-highest category with 34 studies (28.3%) was “graphic models.” The third-highest category, with 27 studies (22.5%), was “direct application of Bayes’ theorem.” Among the 19 articles reporting RCTs, 15 utilized Bayesian analysis as the primary method, while 4 employed it as a secondary or ancillary analysis.
TABLE 4.
Summary of Usage of Bayesian Statistics in Studies
| Primary Use of Bayesian Statistics | Number of Articles (N, %) |
|---|---|
| *Regression-based analysis | 51, 42.5 |
| †Graphic models | 34, 28.3 |
| ‡Direct application of Bayes’ theorem | 27, 22.5 |
| §Classification | 4, 3.3 |
| Other | 4, 3.3 |
Definitions for categories:
Regression-based analysis: Bayesian methods that use regression frameworks.
Graphic models: Probabilistic analysis of variable relationships using graphical structures, eg, Bayesian network meta-analysis.
Direct application of Bayes’ theorem: Bayesian methods that use conjugate prior with closed-form Bayesian estimation and methods that directly use Bayes’ theorem.
Classification: Bayesian methods for data classification, eg, Naive Bayes.
Technical Aspects of Bayesian Approach
Supplemental Table S5, https://links.lww.com/AOSO/A517 summarizes the distribution priors used in studies. Noninformative priors were used in 33 (27.5%) studies, while informative priors were used in 24 studies (20.0%). The type of prior use was not specified in 58 studies (48.3%). In addition, among articles that used an informative prior, the most frequently employed informative priors were “minimally informative” (N = 12, 41.4%) and “data-driven” (N = 11, 37.9%). Skeptical and enthusiastic priors were each used in 2 studies (6.9%), respectively, and 1 study used both priors (3.4%). For articles that used Bayesian analysis (N = 100), 9 explicitly used conjugate priors, 20 did not, and 71 did not clearly specify whether conjugate priors were used. Supplemental Table S6, https://links.lww.com/AOSO/A517 summarizes the software used by articles. Most articles used R (N = 31, 25.8%), followed by STATA (N = 17, 14.2%) and WinBUGS (N = 11, 9.2%).
ROBUST Assessment
Table 5 summarizes the ROBUST assessment of the included articles. In total, 100 out of 120 articles employed Bayesian analysis and were assessed by ROBUST. The overall average ROBUST score was 4.1 ± 1.6, with 62 (62.0%) studies scoring between 3 and 5. A total of 54 studies (54.0%) specified prior distributions used in their analyses, but only 29 (29.0%) provided justification for these priors. Sensitivity analyses for different priors were reported in 14 studies (14.0%). The statistical model was specified in 76 studies (76.0%). The analytic technique was reported in 49 (49.0%) studies. Measures of central tendency were reported in 98 studies (98.0%), and variance was reported in 90 studies (90.0%). To visualize the patterns of reporting across ROBUST items, an UpSet plot was generated (Supplemental Figure S6, https://links.lww.com/AOSO/A517). This plot illustrates the frequency and intersection of reported elements among the 100 studies assessed. The most frequent reporting pattern, observed in 16 studies, included specification of the model, analytic technique, central tendency, and variance.
TABLE 5.
Summary of ROBUST Assessment of Articles Using Bayesian Analysis ROBUST, Reporting of Bayes Used in Clinical Studies
| Key Aspects | Value (N*, %) |
|---|---|
| 1. Specification of prior | 54, 54.0 |
| 2. Justification of prior | 29, 29.0 |
| 3. Sensitivity analysis | 14, 14.0 |
| 4. Statistical model | 76, 76.0 |
| 5. Analytic technique | 49, 49.0 |
| 6. Central tendency | 98, 98.0 |
| 7. Variance | 90, 90.0 |
| ROBUST score | |
| Score range 0–2 | 16, 16.0 |
| Score range 3–5 | 62, 62.0 |
| Score range 6–7 | 22, 22.0 |
| Average ROBUST score | 4.1 ± 1.6 |
100 out of 120 articles that used Bayesian analysis were assessed by ROBUST.
Pairwise Associations
Supplemental Table S7, https://links.lww.com/AOSO/A517 shows Kendall correlation analyses between citation count, impact factor, year of publication, and ROBUST score. There was a significant strong negative correlation between the year of publication and citation count (τ = −0.5, P < 0.001) and a significant weak positive correlation between the year of publication and impact factor (τ = 0.2, P < 0.01). The ROBUST score exhibits no correlation with citation count, the impact factor, and the year of publication.
Supplemental Table S8, https://links.lww.com/AOSO/A517 shows the P values of the Kruskal-Wallis tests between the categorical variables mentioned above and numerical variables (total citation and ROBUST score). For the ROBUST score, there are significant associations with both the journal category (P < 0.01) and the study design (P < 0.001). In terms of total citations, significant associations are between continent (P < 0.001), journal category (P < 0.01), and study design (P < 0.01). Boxplots for significant results are summarized in Supplemental Figures S7 and S8, https://links.lww.com/AOSO/A517.
DISCUSSION
The results show an increase in surgical studies using Bayesian statistics, with general surgery and cardiothoracic surgery being the most represented specialties. Bayesian frameworks were most commonly used in retrospective cohort studies, followed by meta-analyses and RCTs, with the top 3 cited articles being RCTs. The leading country for Bayesian surgical articles was the United States. Regarding the purpose of using Bayesian statistics, regression-based analysis and graphic models were the most commonly used. Most Bayesian studies were funded by government agencies. In terms of the quality of Bayesian reporting, the average ROBUST score was 4.1 out of 7, with only 54.0 % of studies specifying the prior and just 29% providing justification for it. In addition, pairwise association analyses revealed the potential impact of study design, journal category, and geographic location on Bayesian surgical research.
Bayesian frameworks have emerged as a valuable analytical approach across various medical fields due to their flexibility and ability to handle complex data. In a systematic review of papers in the journal Statistics in Medicine, Ashby et al. highlighted that Bayesian frameworks are widely used in clinical research, epidemiology, meta-analyses, molecular genetics, and neurophysiology. The Food and Drug Administration acknowledges that Bayesian analysis offers advantages over frequentist analysis.26 The Bayesian approach allows for the integration of prior knowledge, which is critical in surgical decision-making, incorporating information from RCTs, observational studies, or expert opinions.27 In addition, the Bayesian approach enhances the interpretation of research findings by expressing results in probabilistic terms rather than relying on dichotomization or P values.28 Although the Bayesian approach has been criticized for introducing subjectivity through priors, this can be addressed by testing different priors to confirm the robustness of the results.27 Moreover, the Bayesian approach provides practical benefits like data-driven shrinkage, regularization, and improved estimation efficiency, enhancing their applicability in medical research.7
We found that general surgery and its subspecialties were the most frequent surgical fields in this review. In general surgery, the Bayesian approach is valuable due to the diverse procedures and patient populations, allowing for the integration of uncertainty and prior knowledge into decision-making. The higher frequency of general surgery articles in our study may also be influenced by journal publication patterns, as a systematic review for surgical clinical trials indicated that general surgery is the most frequently represented specialty, accounting for 32.2% of all surgical trials.29
In cardiothoracic surgery, Bayesian models have been used since 1994, when the Society of Thoracic Surgeons introduced them for mortality risk assessment.30 These methods were chosen for their ability to adapt to changing patient populations, flexibly incorporate risk factors, and handle incomplete data, making them ideal for evolving clinical risk models.30 More recently, in a clinical trial by Sabatine et al31 published in 2021, a frequentist time-to-event analysis showed no significant difference in 5-year all-cause mortality between patients receiving percutaneous coronary intervention and those undergoing coronary artery bypass grafting (CABG). However, a Bayesian analysis suggested a probable survival advantage with CABG, showing an 85.7% probability that the absolute risk difference in mortality was greater than 0% favoring CABG. This contrast emphasizes the value of the Bayesian approach in providing additional insights and enhancing the interpretative power of clinical trial data.31
In terms of study design, our results show that Bayesian frameworks were most commonly used in observational retrospective cohort studies. In these studies, patient- and hospital-level variations are common, and Bayesian models allow researchers to account for these differences by modeling multiple levels, such as individual patient characteristics and institutional factors. One practical application is reliability adjustment, as demonstrated by Fry et al.32 This technique manages variability in hospital outcome rates by shrinking observed values toward the overall average based on hospital size and caseload.32 Smaller hospitals, which tend to have less reliable outcome rates due to lower caseloads, experience greater shrinkage towards the average, whereas larger hospitals with more stable data have less shrinkage.32 Minimizing statistical “noise” and providing a more accurate reflection of hospital performance ensures that comparisons between hospitals are not skewed by random variations or small sample sizes.32 As a result, Bayesian models are particularly effective in observational studies, where these types of variability are common and need to be appropriately managed for more reliable analysis and interpretation.
The Bayesian approach was also widely applied in meta-analyses, particularly in the form of Bayesian network meta-analysis (NMA). This approach offers a distinct advantage by enabling indirect comparisons between treatments that have not been directly compared. Unlike traditional meta-analyses, which only consider direct evidence, Bayesian mixed treatment comparison models can incorporate both direct and indirect comparisons into a single analytical framework.33 For instance, Evans et al33 used Bayesian NMA to compare the efficacy of multiple antifungal treatments, including fluconazole, liposomal amphotericin B, itraconazole, and micafungin, in patients undergoing liver transplant. Bayesian NMA offers several advantages over frequentist approaches, including comprehensive modeling of uncertainty in heterogeneity estimates, greater flexibility to accommodate complex data structures, improved handling of sparse data, a more intuitive probability-based interpretation, and the ability to incorporate prior knowledge.34–36
The third most common application of Bayesian statistics was in RCTs, which accounted for the top 3 cited articles. Bayesian approaches are highly valuable in clinical trials because they incorporate prior information and dynamically update treatment success probabilities as new data + are collected. This capability supports flexible trial designs, such as adaptive randomization, where patients are allocated to treatment arms with higher probabilities of benefit. For example, in the I-SPY 2 trial for breast cancer treatments, Bayesian adaptive randomization assigns patients to experimental or standard therapies based on biomarker subtypes.5 As more data are collected, the algorithm updates the probability of achieving a complete pathologic response for each subtype, guiding patient allocation to the most promising treatments.5
The majority of included articles employed regression-based analyses, such as linear regression, logistic regression, and hierarchical regression for multilevel data. Consistent with findings by Van de Schoot et al37 in a systematic review of Bayesian articles in psychology and Rietbergen et al38 in medical and epidemiological journals, most Bayesian articles also use regression-based analyses. The second most common type of analysis is graphical modeling. Graphical models in Bayesian NMA visually represent complex relationships between multiple treatments and studies or describe variable relationships within Bayesian networks.39
Our analysis using the ROBUST criteria revealed considerable variability in the quality of Bayesian reporting across surgical studies. While Bayesian methods are becoming more prevalent in surgical research, significant transparency and comprehensive reporting gaps persist. We selected the ROBUST scale over newer checklists, such as the Bayesian Analysis Reporting Guidelines,6 due to its simplicity and ease of use, as the Bayesian Analysis Reporting Guidelines’s complexity can make it difficult to apply consistently in surgical research. Similar findings have been reported by a recent bibliometric analysis by Bdair et al40 in 2023, who evaluated the quality of Bayesian reporting in orthopedic studies and found that only 39% of articles specified the prior distribution, and 18% provided justification. This underreporting of Bayesian methods extends beyond surgical research and is prevalent in other medical fields as well. Pibouleau et al41 also identified deficiencies in Bayesian reporting for implantable medical devices, with only 24% of articles fully reporting the prior distribution, 18% reporting a sensitivity analysis, and 35% explaining the model.6,41 Similarly, a review of Bayesian mixed treatment comparisons found that only 52.9% of articles reported the prior distribution, and 11.8% conducted a sensitivity analysis.6,42 A review of 70 epidemiological studies revealed that 33 articles did not specify priors, and 66 articles did not conduct a sensitivity analysis.6,43 These findings underscore the need for improved reporting standards and the adoption of checklists to enhance transparency and reproducibility in Bayesian research.
Several notable pairwise associations were identified. The negative correlation between the year of publication and citation count suggests that older studies tend to accumulate more citations, while a weak positive correlation with the impact factor indicates that newer studies are increasingly being published in higher-impact journals, which also possibly reflects the general trend of rising journal impact factors over time. Higher ROBUST scores were seen in RCTs and highest-impact medical journal publications. Highest-impact medical journals, RCTs, and North America were associated with the highest total citation counts. These results highlight the impact of study design, journal type, and geographic location on Bayesian surgical research should be studied further.
Next Steps
To advance Bayesian research in surgery, researchers can familiarize themselves with Bayesian methods and seek training in their application, starting with simpler models and gradually progressing to more complex analyses. Collaborating with statisticians experienced in Bayesian approaches is critical and can enhance study design and interpretation. Journals, editors, and reviewers should adopt standardized guidelines for reporting Bayesian analyses, such as the ROBUST scale, and ensure that studies adequately specify and justify their priors. Additionally, fostering a peer-review culture that includes statisticians with Bayesian expertise will help maintain rigor and transparency. Encouraging education on Bayesian methods and developing accessible resources for the surgical community will further promote their appropriate use and integration into clinical research.
Limitations
In this study, only the Web of Science and PubMed databases were used, which may have resulted in the omission of some surgical studies that employed Bayesian statistics during the study period. Furthermore, the selection of journals based on impact factors limits the representativeness of our sample, potentially excluding a broader range of papers in surgery that utilize Bayesian statistics. In addition, articles employing Bayesian network meta-analysis were classified as “graphic models” even though they also have a regression component, potentially reducing the representation of articles classified as “regression-based.” Moreover, we assume that a bibliometric search may have missed some articles using BIC.
CONCLUSIONS
The use of Bayesian statistics in surgical research has seen a significant rise, reflecting the method’s potential to incorporate prior knowledge and provide a more comprehensive analysis of clinical data. This trend is particularly notable in general and cardiothoracic surgery, where Bayesian frameworks have been increasingly applied in retrospective cohort studies and meta-analyses. However, the quality of reporting, as measured by the ROBUST scale, indicates room for improvement, particularly in the justification and sensitivity analysis of priors. Moving forward, there is a critical need for more standardized guidelines or the observance of current best practices to ensure the rigorous and transparent application of Bayesian statistics in surgical research. Such efforts would enhance the reliability of these analyses and further solidify their role in clinical decision-making.
ACKNOWLEDGMENTS
Z.L.: methodology, data collection, analysis, manuscript drafting, manuscript editing. A.I.: data collection, manuscript editing. D.V.: conceptualization, manuscript editing. K.L.: conceptualization, manuscript editing. S.E.F.: supervision, conceptualization, manuscript editing.
Supplementary Material
Footnotes
Disclosure: The authors declare that they have nothing to disclose.
This work was supported by the Bernard S. Goldman Chair in Cardiovascular Surgery. Z.L. was supported in part by a Sunnybrook Research Institute Summer Scholarship. D.V. was supported in part by the Canadian Institutes of Health Research (CIHR) Vanier Canada Graduate Scholarship.
Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s Web site (www.annalsofsurgery.com).
REFERENCES
- 1.Goligher EC, Heath A, Harhay MO. Bayesian statistics for clinical research. Lancet (London, England). 2024;404:1067–1076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Jevremov T, Pajić D. Bayesian method in psychology: a bibliometric analysis. Curr Psychol. 2024;43:8644–8654. [Google Scholar]
- 3.van de Schoot R, Kaplan D, Denissen J, et al. A gentle introduction to Bayesian analysis: applications to developmental research. Child Dev. 2014;85:842–860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bayes T. LII. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, F. R. S. communicated by Mr. Price, in a letter to John Canton, A. M. F. R. S. Philos Trans R Soc Lond. 1997;53:370–418. [Google Scholar]
- 5.Rugo HS, Olopade OI, DeMichele A, et al. Adaptive randomization of veliparib–carboplatin treatment in breast cancer. N Engl J Med. 2016;375:23–34. Available at: https://www.nejm.org/doi/full/10.1056/NEJMoa1513749. Accessed October 4, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kruschke JK. Bayesian analysis reporting guidelines. Nat Hum Behav. 2021;5:1282–1291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.van de Schoot R, Depaoli S, King R, et al. Bayesian statistics and modelling. Nat Rev Methods Primer. 2021;1:1–26. [Google Scholar]
- 8.Rue H, Martino S, Chopin N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B Stat Methodol. 2009;71:319–392. [Google Scholar]
- 9.Bayesian Information Criterion - an overview | ScienceDirect Topics. Available at: https://www.sciencedirect.com/topics/medicine-and-dentistry/bayesian-information-criterion. Accessed December 6, 2024. [Google Scholar]
- 10.van Ravenzwaaij D, Cassey P, Brown SD. A simple introduction to Markov Chain Monte-Carlo sampling. Psychon Bull Rev. 2018;25:143–154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Beraha M, Falco D, Guglielmi A. JAGS, NIMBLE, Stan: a detailed comparison among Bayesian MCMC software. 2021. doi:10.48550/arXiv.2107.09357
- 12.Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. 3rd ed. Chapman and Hall/CRC; 2013. doi:10.1201/b16018 [Google Scholar]
- 13.McElreath R. Statistical Rethinking: A Bayesian Course with Examples in R and STAN. 2nd ed. Chapman and Hall/CRC; 2020. doi:10.1201/9780429029608 [Google Scholar]
- 14.Liu K. Applied Bayesian Methods in Clinical Epidemiology and Health Care Research. Institute of Health Policy, Management and Evaluation. April 17, 2025. Available at: https://ihpme.utoronto.ca/course/had5314h/. Accessed December 4, 2024. [Google Scholar]
- 15.Data Science, Analytics and Engineering (Bayesian Machine Learning), MS - MS | Degree Details | ASU Degree Search. Available at: https://degrees.apps.asu.edu/masters-phd/major/ASU00/ESDSEBMLMS/data-science-analytics-and-engineering-bayesian-machine-learning-ms?init=false&nopassive=true. Accessed December 4, 2024. [Google Scholar]
- 16.Page MJ, Moher D, Bossuyt PM, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. 2021;372:n160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Information by Discipline. Available at: https://www-test.royalcollege.ca/en/standards-and-accreditation/information-by-discipline.html. Accessed September 15, 2024. [Google Scholar]
- 18.Aria M, Cuccurullo C. bibliometrix: an R-tool for comprehensive science mapping analysis. J Informetr. 2017;11:959–975. [Google Scholar]
- 19.Heuts S, Kawczynski MJ, Sayed A, et al. Bayesian analytical methods in cardiovascular clinical trials: why, when, and how. Can J Cardiol. 2025;41:30–44. [DOI] [PubMed] [Google Scholar]
- 20.Sung L, Hayden J, Greenberg ML, et al. Seven items were identified for inclusion when reporting a Bayesian analysis of a clinical study. J Clin Epidemiol. 2005;58:261–268. [DOI] [PubMed] [Google Scholar]
- 21.Weak or strong? How to interpret a Spearman or Kendall correlation - The DO Loop. Available at: https://blogs.sas.com/content/iml/2023/04/05/interpret-spearman-kendall-corr.html. Accessed October 14, 2024. [Google Scholar]
- 22.Covidence systematic review software, Veritas Health Innovation, Melbourne, Australia. Available at www.covidence.org. [Google Scholar]
- 23.Nogueira RG, Jadhav AP, Haussen DC, et al. Thrombectomy 6 to 24 Hours after Stroke with a Mismatch between Deficit and Infarct. N Engl J Med. 2018;378:11–21. [DOI] [PubMed] [Google Scholar]
- 24.Popma JJ, Deeb GM, Yakubov SJ, et al. Transcatheter Aortic-Valve Replacement with a Self-Expanding Valve in Low-Risk Patients. N Engl J Med. 2019;380:1706–1715. [DOI] [PubMed] [Google Scholar]
- 25.Reardon MJ, Mieghem NMV, Popma JJ, et al. Surgical or Transcatheter Aortic-Valve Replacement in Intermediate-Risk Patients. N Engl J Med. 2017;376:1321–1331. [DOI] [PubMed] [Google Scholar]
- 26.Health C for D and R. Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. January 19, 2020. Available at: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/guidance-use-bayesian-statistics-medical-device-clinical-trials. Accessed September 6, 2024. [Google Scholar]
- 27.Hatton GE, Pedroza C, Kao LS. Bayesian statistics for surgical decision making. Surg Infect. 2021;22:620–625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Tam DY, Friedrich JO, Arora RC, et al. Frequentist or Bayesian: coronary artery bypass grafting offers advantages over percutaneous coronary intervention in left main coronary disease. J Thorac Cardiovasc Surg. 2023;166:136–140. [DOI] [PubMed] [Google Scholar]
- 29.Robinson NB, Fremes S, Hameed I, et al. Characteristics of randomized clinical trials in surgery from 2008 to 2020: a systematic review. JAMA Netw Open. 2021;4:e2114494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Edwards FH, Clark RE, Schwartz M. Coronary artery bypass grafting: the Society of Thoracic Surgeons National Database experience. Ann Thorac Surg. 1994;57:12–19. [DOI] [PubMed] [Google Scholar]
- 31.Sabatine MS, Bergmark BA, Murphy SA, et al. Percutaneous coronary intervention with drug-eluting stents versus coronary artery bypass grafting in left main coronary artery disease: an individual patient data meta-analysis. Lancet (London, England). 2021;398:2247–2257. [DOI] [PubMed] [Google Scholar]
- 32.Fry BT, Smith ME, Thumma JR, et al. Ten-year trends in surgical mortality, complications, and failure to rescue in medicare beneficiaries. Ann Surg. 2020;271:855–861. [DOI] [PubMed] [Google Scholar]
- 33.Evans JDW, Morris PJ, Knight SR. Antifungal prophylaxis in liver transplantation: a systematic review and network meta-analysis. Am J Transplant. 2014;14:2765–2776. [DOI] [PubMed] [Google Scholar]
- 34.Dias S, Caldwell DM. Network meta-analysis explained. Arch Dis Child Fetal Neonatal Ed. 2019;104:F8–F12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Hackenberge BK. Bayesian meta-analysis now – let’s do it. Croat Med J. 2020;61:564–568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Sadeghirad B, Foroutan F, Zoratti MJ, et al. Theory and practice of Bayesian and frequentist frameworks for network meta-analysis. BMJ Evid Based Med. 2023;28:204–209. [DOI] [PubMed] [Google Scholar]
- 37.van de Schoot R, Winter SD, Ryan O, et al. A systematic review of Bayesian articles in psychology: the last 25 years. Psychol Methods. 2017;22:217–239. [DOI] [PubMed] [Google Scholar]
- 38.Rietbergen C. Quantitative Evidence Synthesis with Power Priors. February 12, 2016. Available at: https://dspace.library.uu.nl/handle/1874/329030. Accessed January 15, 2025. [Google Scholar]
- 39.Perkins ZB, Yet B, Marsden M, et al. Early identification of trauma-induced coagulopathy: development and validation of a multivariable risk prediction model. Ann Surg. 2021;274:e1119–e1128. [DOI] [PubMed] [Google Scholar]
- 40.Bdair F, Mangala S, Kashir I, et al. The reporting quality and transparency of orthopaedic studies using Bayesian analysis requires improvement: a systematic review. Contemp Clin Trials Commun. 2023;33:101132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Pibouleau L, Chevret S. Bayesian statistical method was underused despite its advantages in the assessment of implantable medical devices. J Clin Epidemiol. 2011;64:270–279. [DOI] [PubMed] [Google Scholar]
- 42.Sobieraj DM, Cappelleri JC, Baker WL, et al. Methods used to conduct and report Bayesian mixed treatment comparisons published in the medical literature: a systematic review. BMJ Open. 2013;3:e003111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Rietbergen C, Debray TPA, Klugkist I, et al. Reporting of Bayesian analysis in epidemiologic research should become more transparent. J Clin Epidemiol. 2017;86:51–58.e2. [DOI] [PubMed] [Google Scholar]


