Comparative analysis of algorithmic approaches in ensemble learning: bagging vs. boosting

Hongke Zhao; Wenhui Liu; Yaxian Wang; Likang Wu

doi:10.1038/s41598-025-15971-0

. 2025 Oct 1;15:34218. doi: 10.1038/s41598-025-15971-0

Comparative analysis of algorithmic approaches in ensemble learning: bagging vs. boosting

Hongke Zhao ^1,², Wenhui Liu ^1,², Yaxian Wang ^1,², Likang Wu ^1,^2,^✉

PMCID: PMC12488980 PMID: 41034276

Abstract

Ensemble learning is widely applied in various real-world settings, with Bagging and Boosting being two core algorithms. Although these techniques have been extensively investigated through experimental comparisons of their performance in various scenarios, few studies have analyzed and quantified their benefits, costs, and complexities to support algorithm-aware decision making. In this study, we develop a theoretical model to compare Bagging and Boosting in terms of performance, computational costs, and ensemble complexity, and validate it through experiments on four datasets (MNIST, CIFAR-10, CIFAR-100, IMDB) with varying data complexity and computational environments. The results show that, for MNIST, as ensemble complexity increases (e.g., from 20 to 200), Bagging’s performance improves from 0.932 to 0.933 before plateauing, while Boosting improves from 0.930 to 0.961 before showing signs of overfitting. At the same ensemble complexity, such as 200 base learners, Boosting requires approximately 14 times more computational time than Bagging, indicating substantially higher computational costs. Similar patterns are observed across the other three datasets, confirming the generality of our findings and revealing consistent trade-offs between performance and computational costs. Taken together, these results confirm the robustness of our theoretical predictions and provide a foundation for practical guidance. Specifically, decision-makers prioritizing cost-efficiency may prefer Bagging, whereas those focusing on maximizing performance might find Boosting more beneficial. For simpler datasets on average-performing devices, Boosting can be effective, whereas Bagging is more suitable for complex datasets on high-performing devices. Overall, this study contributes by integrating analytical modeling with empirical validation across multiple datasets to provide theoretical insights and practical guidance. It systematically compares Bagging and Boosting in terms of performance, computational costs, and ensemble complexity, thereby enabling practitioners to choose the most appropriate method under varying data complexities, performance needs, and resource constraints.

Subject terms: Computer science, Information technology

Introduction

In the rapidly evolving field of data-driven machine learning, ensemble learning has become a key methodology for improving predictive accuracy and model robustness^1,2. Their applicability spans diverse fields, including authorship identification³, healthcare^4,5, engineering tasks⁶, and solutions for class imbalance problems^7,8. Their effectiveness is further highlighted by their dominance in major competitions such as the KDD Cup⁹. Among ensemble techniques, Bagging and Boosting are two foundational approaches. Bagging reduces variance and overfitting by training diverse models on bootstrapped subsets of data and aggregating predictions, typically via majority voting^10,11. It performs well on high-dimensional datasets but can be computationally intensive and less interpretable. In contrast, Boosting iteratively corrects errors by assigning higher weights to misclassified instances, thereby reducing bias and offering more interpretable models^12,13. However, it is prone to overfitting and also computationally demanding due to its sequential nature.

While prior research has compared the performance of Bagging and Boosting across diverse datasets and applications^14–16, there remains a significant gap in studies examining the trade-off between algorithmic effectiveness and implementation cost-especially from a managerial decision-making perspective. Much of the existing literature focuses on enhancing predictive accuracy or refining algorithmic designs^15,17,18, or on theoretical insights such as variance reduction and overfitting control¹⁹. Some works have also proposed fine-tuning strategies for learners with low variance or bias²⁰. However, limited attention has been given to how these methods perform under real-world resource constraints or within operational decision-making frameworks.

In many real-world applications, such as data science competitions (e.g., KDD Cup, Kaggle), efficiency in time, computational resources, and memory is crucial. Ensemble learning is widely adopted for its strong predictive performance, yet choosing between Bagging and Boosting remains a key decision that requires balancing model accuracy, implementation cost, and ensemble complexity. From a decision-making perspective, this choice is rarely straightforward. Under Herbert A. Simon’s theory of bounded rationality²¹, algorithm users cannot fully explore all possible alternatives or accurately predict outcomes, often leading to suboptimal decisions. While Bagging generally incurs lower computational costs than Boosting under similar conditions, Boosting typically achieves higher accuracy. Yet, the criteria guiding this selection remain poorly understood in practice. Currently, algorithm selection is often made without systematic guidance, especially given the growing number of available AI models. Traditional experimental approaches are costly and lack generalizability. In contrast, this study proposes a novel approach by comparing the performance and cost of Bagging and Boosting through theoretical modeling and analysis, offering both practical insights and strategic value for algorithm deployment in resource-constrained environments.

Specifically, this study explores the choice between Bagging and Boosting algorithms from the perspective of algorithmic profit. In particular, we investigate the following research issues: (1) When time and computational resources are limited, should algorithm decision-makers choose Bagging or Boosting? (2) After selecting an algorithm, how should they determine the number of base learners? (3) How do the ensemble complexity, the preferences of the algorithm decision-makers, and the performance of datasets and equipment impact Bagging and Boosting? To answer these questions, we conduct a systematic comparison of Bagging and Boosting through both theoretical modeling and empirical validation. First, we develop a theoretical framework that models the performance, time cost, and computational cost of both algorithms. This enables us to analyze how ensemble complexity affects overall efficiency. Based on this model, we define algorithmic profit-a measure that incorporates decision-maker preferences-and derive the optimal profit and corresponding ensemble complexity for each algorithm, thereby identifying their most suitable application scenarios. Second, we validate our theoretical findings through experiments on publicly available datasets. The results largely align with our analytical conclusions, supporting the validity of the decision rules derived from our model. Furthermore, this consistency demonstrates the feasibility of using theoretical modeling to analyze algorithm utility-an innovative methodological approach in the field of ensemble learning.

Our analysis shows that the choice between Bagging and Boosting should depend on time and computational resource constraints. When these costs are relatively balanced, Boosting is preferred for its higher accuracy. However, when time efficiency is critical and computational resources are limited or costly, Bagging is the better option. Boosting generally requires more base learners than Bagging, especially with complex datasets or low-performance hardware. Ensemble complexity, decision-maker preferences, dataset characteristics, and computing resources all influence the performance and cost trade-offs of both methods. As ensemble complexity increases, so do performance, time cost, and computational demand. Once performance plateaus, Bagging outperforms Boosting on complex datasets, while Boosting performs better on simpler ones. Notably, Boosting’s time cost rises sharply with complexity, whereas Bagging’s remains nearly constant. Computational resource consumption grows quadratically for Boosting but only linearly for Bagging.

Our paper makes three main contributions. Firstly, it compares Bagging and Boosting algorithms to reveal their theoretical foundations. Previous research has mainly focused on comparing the accuracy of these algorithms with different types of base learners in various scenarios. However, this study compares their performance and costs with different ensemble complexities. Secondly, it provides theoretical guidance for practitioners who need to choose the appropriate ensemble learning algorithm for specific machine-learning problems. Although many people use ensemble learning algorithms, few provide detailed theoretical guidance for practitioners from the perspective of balancing algorithm performance and cost. Thirdly, it offers a new research paradigm for comparing and selecting between different algorithms. This study defines the performance and costs of Bagging and Boosting algorithms, compares the two from the perspective of algorithm profit through modeling, and validates the theoretical conclusions through experiments on public datasets. This provides a new method for comparing other machine learning algorithms, which can reduce the experimental costs of algorithm selection traditionally. It is relevant to structuring and modeling of software development/maintenance operations and the value of information in operational decision-making.

Model setting

The introduction of ensemble learning and the Related Work section can be found in the Supplementary. In this section, we will make model settings. Specifically, we provide the problem description and hypotheses of the model. Figure 1 provides an overview of the modeling framework and validation process, summarizing the key steps from problem formulation to hypothesis testing and result interpretation.

Fig. 1 — Overview of the modeling framework and experimental validation process.

Problem description

We explore the practical challenges of decision-making in machine learning, particularly in the context of ensemble learning. Machine learning models are often used to inform decisions by predicting outcomes as a function of the choices made, with the advantage of capturing complex, nonlinear relationships present in real-world problems. While this can lead to improved predictive accuracy, the resulting complexity also increases the difficulty of selecting optimal decisions²². In this setting, decision-making involves optimizing an objective function shaped by model predictions, while also balancing performance, cost, and algorithmic complexity. This creates a multi-dimensional trade-off that decision-makers must carefully navigate.

In the context of ensemble learning, decision-makers face the challenge of balancing performance and cost in order to maximize profits. They must not only consider the predictive accuracy of the algorithm but also its complexity and associated costs. For example, an e-commerce platform improved profitability by adjusting recommendation outputs to prioritize high-margin products, even without gains in predictive accuracy²³. Similarly, Booking.com applies uplift modeling to guide promotion decisions based on the estimated net effect, taking into account both potential gains and associated costs²⁴. These cases highlight the practical need to optimize algorithmic profit, the net value generated after accounting for both performance and cost²⁵.

In this paper, we define ensemble complexity as the number of base learners used in techniques such as Bagging and Boosting. We assume that decision-makers are rational and risk-neutral, and that they have access to a training set D, from which multiple subsets are generated using a specific sampling method, each corresponding to a base learner. The number of base learners, m, represents the complexity of the algorithm (hereafter referred to as complexity). The decision-maker aims to maximize algorithmic profit, defined as performance minus cost. This linear form is commonly used in decision analysis for its clarity and interpretability²⁵. While non-linear utility functions could be considered in more complex settings²⁶, we leave them for future research. Importantly, the relationships between complexity, performance, and cost may themselves be non-linear, which we explore in the following.

Relationship between algorithm performance and complexity

We hypothesize the relationship between the performance and complexity m of Bagging is Inline graphic , reflecting stable but diminishing returns with increased ensemble size. For Boosting, we hypothesize the relationship between the performance and complexity m is , where and ,capturing rapid early gains and performance decline due to overfitting at higher complexity. These assumptions are supported by prior theoretical and empirical findings. Foundational work has shown that Bagging reduces variance via bootstrapped resampling, leading to steady, diminishing performance gains as more base learners are added²⁷. In contrast, Boosting is more sensitive to iteration count, achieving rapid early accuracy gains but experiencing degradation when overfitting occurs²⁸. A comprehensive review confirms that Bagging generally shows monotonically increasing accuracy, while Boosting often follows an inverted-U performance curve as ensemble size grows¹⁹. These findings support the functional forms adopted in our model and provide a theoretical basis for analyzing the trade-off between complexity and performance.

Figure 2 illustrates the hypothesized relationships between algorithm performance (P) and ensemble complexity (m) for Bagging and Boosting algorithms.

Fig. 2 — Hypothesis of algorithm performance vs. complexity.

Hypothesis 1

As the value of m increases, Bagging shows a relatively slow increase in accuracy, while the accuracy of Boosting increases rapidly but is prone to overfitting.

Relationship between algorithm cost and complexity

We assume that the total algorithmic cost consists of two components: time cost and computational resource cost. For Boosting, which operates sequentially, the time cost increases linearly with the number of base learners, modeled as Inline graphic . For Bagging, which supports parallel execution, we assume time cost remains constant regardless of ensemble size, modeled as . Regarding computational resources, Boosting involves iterative reweighting, leading to a quadratic cost increase , while Bagging, due to its independent learners, incurs only linear computational cost Inline graphic , where and are constants.

These assumptions are grounded in prior theoretical and empirical findings. Research shows that Bagging benefits from parallelization, resulting in stable training time regardless of ensemble size, whereas Boosting’s sequential learning process leads to a linear increase in time cost as the number of base learners grows¹⁰. Furthermore, Boosting’s iterative reweighting process introduces nonlinear computational complexity, while Bagging maintains relatively lower and more predictable computational demands^19,29. We further validate these assumptions through controlled experiments under various datasets, parameter settings, and model configurations. The results consistently confirm the robustness of our cost assumptions across different environments.

Hypothesis 2a

With the increase of m, Boosting’s time cost increases linearly, while Bagging’s time cost remains relatively constant.

Hypothesis 2b

With an increase in m, Boosting incurs a quadratic increase in computational resource cost, while Bagging exhibits a linear increase.

Model analysis

In this section, we first define the profit of Bagging and calculate the optimal profit and optimal complexity of Bagging. Then, we define the profit of Boosting and calculate its optimal profit and optimal complexity. At last, we conduct parameters sensitivity analysis and comparison on the optimal profit and optimal complexity of Bagging and Boosting.

Bagging

Consider a scenario where the decision-makers opt for the Bagging algorithm to achieve the goal of maximizing profits. It is assumed that Inline graphic determines how much unit performance affects profits, while reflects the influence of unit cost on profits. Based on these hypotheses, the optimization problem for decision-makers can be defined as follows:

Based on Hypothesis 1 and 2, the equation can be further developed as:

Proposition 1

In the context of the Bagging algorithm, it can be established that an optimal solution exists, and the solution is

The optimal profit is then:

The proof process is as follows. The first derivative of the profit function is given by: Inline graphic . The second derivative is: . Since , the function is concave, indicating a maximum. Setting the first derivative to zero yields the optimal value of : . Substituting into gives the optimal profit: .

Boosting

Imagine a situation where the decision-makers choose the Boosting algorithm to maximize profits. We assume that Inline graphic represents the extent to which unit performance affects profits, while represents the degree of impact of unit cost on profits. At this point, the optimization problem for decision-makers can be formulated as follows:

Substituting in Hypothesis 1 and 2, we get

Proposition 2

Under the Boosting algorithm, it can be shown that an optimal solution exists, and the optimal solution is:

The optimal profit is:

The proof process is as follows. The first derivative of the profit function is given by: Inline graphic . The second derivative is: . Since , the function is concave. Setting the first derivative to zero yields the optimal value of : . Substituting into gives the optimal profit: .

Model analysis and comparison

Based on the propositions, we carefully conduct parameters sensitivity analysis and comparison on the optimal profit and optimal complexity of Bagging and Boosting. We derive 6 corollaries and divide them into two categories related to parameter Inline graphic and parameters and . The detailed analysis process and corollaries are as follows.

Corollary of

Corollary 1

The impact of Inline graphic on optimal profit and optimal complexity under two algorithms:

Under the Bagging algorithm, the optimal complexity increases as increases and the optimal profit increases with an increase in .
Under the Boosting algorithm, the optimal complexity increases as increases, while the optimal profit initially decreases and then increases with an increase in .

In Bagging algorithms, increasing the value of the parameter Inline graphic leads to higher complexity and profit. This is because Bagging uses bootstrapping and aggregation to improve performance, which becomes better as increases. However, in Boosting, the relationship between and profit is more complex. Initially, increasing may cause a drop in profit due to overfitting on challenging instances. But as Inline graphic continues to grow, Boosting promotes a stronger combination of weak learners, leading to increased profit. This highlights the importance of finding the right balance between performance and cost in ensemble methods, with playing a crucial role in determining the optimal trade-offs. Figure 3(a), (b) reveal the analysis of parameter sensitivity of Inline graphic using the Boosting algorithm.

Fig. 3 — Sensitivity Analysis under boosting algorithm with the key parameter (**a,b**) and parameters & (**c,d**).

Corollary 2

The impact of Inline graphic on optimal profit comparison under two algorithms:

The optimal profit of the Bagging algorithm has both positive and negative values, and when is relatively small, the profit is negative. While the optimal profit of boosting consistently remains positive.
Regarding algorithm selection, it is observed that when the value of is relatively small, Boosting is the superior choice. However, as the value of increases, Bagging becomes a more advantageous option.

It is observed that Bagging’s optimal profit is negative when Inline graphic is relatively small, while the optimal profit of boosting consistently remains positive. Remarkably, the negative values of Bagging’s optimal profit do not affect the final decision-making process. Consequently, it has been noticed that when is relatively small, Boosting is a more optimal choice. Conversely, as Inline graphic increases in magnitude, Bagging proves to be the superior selection. In conclusion, Boosting is more applicable as it ensures positive values of optimal profit across a larger range of . This finding informs us that Boosting is a more favorable choice, particularly in scenarios with relatively small Inline graphic . Bagging becomes advantageous in situations with relatively large . Figures 4(a–c) illustrate the profit comparison between Bagging and Boosting under variation.

Fig. 4 — Profit comparison (**a–c**) and complexity comparison (**d–f**) with the key parameter .

Corollary 3

The impact of Inline graphic on optimal complexity comparison under two algorithms:

When is relatively large, the optimal complexity of both Bagging and Boosting is positive. When is relatively small, the optimal complexity is meaningless.
Regarding algorithm selection, it is observed that Boosting has a higher optimal complexity when is relatively small. Conversely, as increases in magnitude, Bagging has a higher optimal complexity.

This corollary illustrates the nuanced impact of Inline graphic on the optimal complexity under Bagging and Boosting algorithms. The parameter can be seen as the preferences of decision-makers. When is relatively small, Boosting tends to exhibit more complex models due to its iterative error-correction mechanism, even when performance is not the primary focus. Conversely, as Inline graphic increases, Bagging’s optimal complexity surpasses that of Boosting, as it aggregates more models to enhance performance. This delineates how affects the complexity adaptation of these algorithms. Boosting is inherently more complex at lower performance thresholds and Bagging becomes more complex when there is more emphasis on performance. Figures 4(d–f) demonstrate a comparison of optimal complexity between Bagging and Boosting with varying values of Inline graphic .

Corollary of and

Corollary 4

The impact of Inline graphic and on optimal profit and optimal complexity under two algorithms:

Under the Bagging algorithm, as the cost coefficient increases, the optimal complexity decreases. The optimal profit shows a quadratic variation (first decreasing, then increasing) with an increase in , while it decreases with an increase in .
Under the Boosting algorithm, the optimal complexity and the optimal profit both decrease as the cost coefficients and increase.

This corollary elucidates how the time cost coefficient Inline graphic and computing resource cost coefficient impact the optimal complexity and optimal profit of Bagging and Boosting algorithms. In Bagging, a higher value of leads to reduced complexity in balancing resource expenditure. Interestingly, Bagging’s optimal profit first declines with rising Inline graphic , and then increases, suggesting an adaptive response to cost pressures. Conversely, an increase in reduces optimal profit due to longer training durations. In Boosting, increased complexity correlates with higher , indicating a preference for more complex models despite escalating costs. However, optimal profit diminishes with greater Inline graphic and , reflecting Boosting’s vulnerability to both time and resource costs. This highlights the strategic interplay between considering cost and ensemble complexity in ensemble learning. Figure 3(c), (d) display the parameter sensitivity analysis of and under Boosting algorithm.

Corollary 5

The impact of Inline graphic and on optimal profit comparison under two algorithms:

The optimal profit under bagging may be either positive or negative. The optimal profit under boosting is always positive.
In the context of algorithm selection, it is observed that choosing the Boosting algorithm is better when there is a relatively small difference between and . The preference for the Bagging algorithm emerges predominantly in scenarios where is relatively small and is significantly high.

When comparing two algorithms based on the parameters Inline graphic and , we found some interesting insights. The Bagging algorithm generates variable profits which can be positive or negative depending on specific conditions or parameter values. On the other hand, the Boosting algorithm consistently yields positive profits, indicating a more stable and reliable outcome regardless of the varying parameters. We also observed that the Boosting algorithm is generally preferred for its wider applicability and benefits. However, the Bagging algorithm remains useful in cases where the cost coefficient Inline graphic is relatively low and is relatively high, highlighting the importance of considering these key parameters in algorithm selection. Figures 5(a-c) show the optimal profit comparison between the Bagging and Boosting algorithms with varying and values.

Fig. 5 — Profit comparison (**a–c**) and complexity comparison (**d–f**) with the key parameters and .

Corollary 6

The impact of Inline graphic and on optimal complexity comparison under two algorithms:

When is relatively small, Bagging’s optimal complexity makes sense. Boosting’s optimal complexity makes sense when is relatively small.
When is relatively high and is relatively small, the optimal complexity of Boosting is greater than that of Bagging. When is relatively small, the optimal complexity of Bagging is greater than that of Boosting.

Corollary 6 explores how two critical parameters, namely time cost coefficient Inline graphic and computing resource cost coefficient , impact the optimal number of base learners in Bagging and Boosting. The optimal co mplexity, influenced by these parameters, can fluctuate positively or negatively under both algorithms. However, only when is relatively small, can Bagging’s optimal complexity make sense. Boosting’s optimal complexity makes sense when Inline graphic is relatively small. Specifically, when is relatively small, Bagging tends to adopt a greater number of base learners irrespective of the variations in . This scenario is likely due to Bagging’s ability to parallelize training, allowing for an increase in base learners without significantly impacting the overall time cost. In situations where Inline graphic is relatively high and is low, Boosting tends to employ a larger number of base learners. This preference might stem from that Boosting is a sequential computing approach to model improvement, which necessitates more judicious use of each learner. Figures 5(d–f) indicate the optimal complexity comparison between Bagging and Boosting with variation in Inline graphic and .

Experimental validation

We present the validation of hypotheses and then demonstrate the validation of corollaries. In the experiments, Bagging employed bootstrap sampling, while Boosting adopted weighted sampling. The source code for this paper can be found at: https://github.com/252820/Bagging-vs-Boosting.

Hypothesis validation

We validate Hypothesis on MNIST³⁰, CIFAR-10, CIFAR-100³¹, and IMDB³². MNIST is used as a baseline due to its relatively simple structure-focused on handwritten digit recognition. In contrast, CIFAR-10 and CIFAR-100 contain more complex image data, offering a more challenging environment to evaluate model robustness and adaptability. The IMDB dataset, which consists of textual reviews labeled for sentiment polarity, allows us to assess the generalizability of ensemble methods beyond vision tasks. For both Bagging and Boosting algorithms, we employ decision tree as the base learner, owing to its versatility and interpretability across various learning tasks. The Bagging algorithm is implemented using the BaggingClassifier from the scikit-learn library-a widely adopted tool known for its efficiency and ease of use. For Boosting, we use the AdaBoostClassifier, which is renowned for its ability to enhance the performance of weak learners. To evaluate the performance of the ensemble methods, we focus on three key metrics: test set accuracy, training time, and the size of the generated pickle files. These metrics provide insights into model accuracy, computational efficiency, and scalability-essential factors for practical deployment.

We conducted a series of experiments to rigorously evaluate the hypotheses of this study. To ensure robust and generalizable conclusions, we adopted a controlled experimental setup in which one variable was adjusted at a time. This included varying the complexity of the base learners and the random seed settings. These modifications ensured that our findings were not dependent on specific initial conditions or partitioning methods. Moreover, we tested our hypotheses across diverse datasets with different characteristics and levels of complexity, further validating the broad applicability of our results. Detailed descriptions of the experimental setup, including specific parameter configurations such as decision tree depths and random seed values used to ensure reproducibility, are provided in Table 1 of the Supplementary. To assess the statistical significance of performance differences across algorithms and datasets, paired t-tests were conducted on both accuracy metrics and computational costs. The results of these t-tests, reported in Table 2 of the Supplementary for each comparison, show that all p-values were below 0.05, confirming the statistical significance of the observed differences. For greater transparency, we also report the variance and standard deviation of both accuracy and computational time in Table 3 of the Supplementary.